-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-3199 Add workloads based on cost-based access path selection issues #718
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, suggestions and questions.
This workload was created to reproduce various plan selection issues. | ||
First, it inserts 1000 documents with 2 uniformly distributed fields, and creates indexes on | ||
both fields. Then it runs several pipelines, which will be slow due to incorrect plan selection. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some relevant keywords? For example see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SchemaVersion: 2018-07-01 | ||
Owner: "@mongodb/query" | ||
Description: | | ||
This workload was created to reproduce various plan selection issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there important metrics to look at for this workload?
Then it runs several pipelines, which will be slow due to incorrect plan selection.
Is 1,000 docs a large enough dataset to show any issues?
Won't the SelectiveIndex phase be fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we would be looking at throughput. The purpose of adding these workloads to Genny is to measure the overall impact of plan selection issues, and their resolution, on the user. All of these cases are known to choose the "worse" plan. But if we fix this plan selection issue, and our fix creates extra overhead that outweighs the time that would be saved by choosing the right plan, then it is not an improvement for the user. We would be alerted in this situation by a regression in one of these workloads, or even the lack of the expected perf improvement.
With regards to the number of documents - I was modeling this after the original jstests, but I've realized those tests were meant to verify the incorrect plan was chosen (by checking explain output) rather than recreate a performance issue, so more documents are likely needed. I will need more time to run these queries in a local environment and play around to repro the performance difference. I'm going to re-request review once I've had time to look at this. Thank you!
both fields. Then it runs several pipelines, which will be slow due to incorrect plan selection. | ||
|
||
GlobalDefaults: | ||
Database: &Database planselection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit camel case for database name, i.e PlanSelection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
keys into consideration for breaking ties. First, it inserts 1000 documents with 4 uniformly | ||
distributed fields, and creates several compound indexes. Then it runs a pipeline, which will be | ||
slow due to incorrect plan selection. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add keywords and significant metrics.
Is the small dataset size an issue?
slow due to incorrect plan selection. | ||
|
||
GlobalDefaults: | ||
Database: &Database residualpredcosting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: camel case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- keys: {a: 1, b: 1, d: 1} | ||
- keys: {a: 1, d: 1} | ||
Document: | ||
a: {^Cycle: {ofLength: 10, fromGenerator: {^Inc: {start: 0}}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: The fields for a given documents will contain the same value. Is this deliberate and / or would a random generation approach be better?
I was modeling this after the original jstests, but I've realized those tests were meant to verify the incorrect plan was chosen (by checking explain output) rather than recreate a performance issue, so more documents are likely needed. I will need more time to run these queries in a local environment and play around to repro the performance difference. I'm going to re-request review once I've had time to look at this. Thank you! |
No problem, we can continue the review once you are happy with the dataset scale. |
Evergreen patch
These workloads are based on the repro tests for:
SERVER-20616 Plan ranker sampling from the beginning of a query's execution can result in poor plan selection
SERVER-21697 Plan ranking should take query and index keys into consideration for breaking ties
SERVER-12923 Plan ranking is bad or plans with blocking stages
SERVER-13211 Optimal index not chosen for query plan when many indexes match same prefix