-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-4232 performance issue in VALUES clause query #4330
base: main
Are you sure you want to change the base?
Conversation
...df/src/test/java/org/eclipse/rdf4j/sail/nativerdf/benchmark/SPARQLValuesClauseBenchmark.java
Show resolved
Hide resolved
Results from my laptop:
The query with the VALUES clause uses ~300ms per query while the other query uses ~0.6ms. |
On my laptop:
Looks like you have a slightly better laptop :) |
Query explanation simple query:
Query explanation values clause query:
|
I still can't see exactly what goes wrong, other than that it just seems to be a pathological case in terms of the data shape. I'm starting to wonder if we should have an optimizer that just duplicates the bindingsetassignment clause into the right arg of the left join. We can't pre-bind values in the right arg of the left join, but perhaps if we optimize to something like this:
(ugly manual editing by me to get the idea across) I'm not yet sure if that is something that is generally legal tbh. Just recording my thoughts for the next time I revisit. |
In this case I think it's an issue with the join optimizer. The join optimizer doesn't really need to be limited by scoping issues when it calculates cardinality, as long as any optimizations are legal. The join optimizer could assume that the binding set assignment will apply and use it to both calculate the size and to ignore the positive effect that binding the |
This is the generically correct way to parse VALUES clauses. An optimizer can potentially look at the ordering in the algebra to push the values clause down into the join tree (by inspecting which parts of the tree have variables bound in the VALUES clause).
This benchmark uses generated data conforming to the query pattern, and executes performance tests on both the variant with a VALUES clause, and (as a baseline) the simple equivalent query. Unfortunately, sofar I have been unable to reproduce any significant performance difference.
60e2402
to
33ce6d6
Compare
I don't quite understand what you're getting at. As far as I can tell the problem is that the join optimizer calculates an initial cost for the |
33ce6d6
to
7794d0f
Compare
GitHub issue resolved: #4232
Briefly describe the changes proposed in this PR:
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)