PERF-3199 Add workloads based on cost-based access path selection issues #718

jenniferpeshansky · 2022-08-03T21:12:46Z

These workloads are based on the repro tests for:
SERVER-20616 Plan ranker sampling from the beginning of a query's execution can result in poor plan selection
SERVER-21697 Plan ranking should take query and index keys into consideration for breaking ties
SERVER-12923 Plan ranking is bad or plans with blocking stages
SERVER-13211 Optimal index not chosen for query plan when many indexes match same prefix

jimoleary

Some nits, suggestions and questions.

jimoleary · 2022-08-08T09:34:33Z

src/workloads/query/PlanSelection.yml

+  This workload was created to reproduce various plan selection issues.
+  First, it inserts 1000 documents with 2 uniformly distributed fields, and creates indexes on
+  both fields. Then it runs several pipelines, which will be slow due to incorrect plan selection.
+


Can you add some relevant keywords? For example see here.

jimoleary · 2022-08-08T09:40:33Z

src/workloads/query/PlanSelection.yml

+SchemaVersion: 2018-07-01
+Owner: "@mongodb/query"
+Description: |
+  This workload was created to reproduce various plan selection issues.


Are there important metrics to look at for this workload?

Then it runs several pipelines, which will be slow due to incorrect plan selection.

Is 1,000 docs a large enough dataset to show any issues?

Won't the SelectiveIndex phase be fast?

I believe we would be looking at throughput. The purpose of adding these workloads to Genny is to measure the overall impact of plan selection issues, and their resolution, on the user. All of these cases are known to choose the "worse" plan. But if we fix this plan selection issue, and our fix creates extra overhead that outweighs the time that would be saved by choosing the right plan, then it is not an improvement for the user. We would be alerted in this situation by a regression in one of these workloads, or even the lack of the expected perf improvement.

With regards to the number of documents - I was modeling this after the original jstests, but I've realized those tests were meant to verify the incorrect plan was chosen (by checking explain output) rather than recreate a performance issue, so more documents are likely needed. I will need more time to run these queries in a local environment and play around to repro the performance difference. I'm going to re-request review once I've had time to look at this. Thank you!

jimoleary · 2022-08-08T09:52:36Z

src/workloads/query/PlanSelection.yml

+  both fields. Then it runs several pipelines, which will be slow due to incorrect plan selection.
+
+GlobalDefaults:
+  Database: &Database planselection


nit camel case for database name, i.e PlanSelection

jimoleary · 2022-08-08T09:53:09Z

src/workloads/query/ResidualPredCosting.yml

+  keys into consideration for breaking ties. First, it inserts 1000 documents with 4 uniformly
+  distributed fields, and creates several compound indexes. Then it runs a pipeline, which will be
+  slow due to incorrect plan selection.
+


Can you add keywords and significant metrics.

Is the small dataset size an issue?

jimoleary · 2022-08-08T09:53:23Z

src/workloads/query/ResidualPredCosting.yml

+  slow due to incorrect plan selection.
+
+GlobalDefaults:
+  Database: &Database residualpredcosting


nit: camel case

jimoleary · 2022-08-08T09:55:12Z

src/workloads/query/ResidualPredCosting.yml

+    - keys: {a: 1, b: 1, d: 1}
+    - keys: {a: 1, d: 1}
+    Document:
+      a: {^Cycle: {ofLength: 10, fromGenerator: {^Inc: {start: 0}}}}


question: The fields for a given documents will contain the same value. Is this deliberate and / or would a random generation approach be better?

jenniferpeshansky · 2022-08-08T21:18:53Z

I was modeling this after the original jstests, but I've realized those tests were meant to verify the incorrect plan was chosen (by checking explain output) rather than recreate a performance issue, so more documents are likely needed. I will need more time to run these queries in a local environment and play around to repro the performance difference. I'm going to re-request review once I've had time to look at this. Thank you!

jimoleary · 2022-08-09T08:31:47Z

No problem, we can continue the review once you are happy with the dataset scale.

PERF-3199 Add workloads based on cost-based access path selection issues

47ee1e0

jenniferpeshansky requested review from dalyd and allenwux August 3, 2022 21:12

jimoleary self-requested a review August 8, 2022 09:13

jimoleary requested changes Aug 8, 2022

View reviewed changes

Keywords and CamelCase

6511873

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF-3199 Add workloads based on cost-based access path selection issues #718

PERF-3199 Add workloads based on cost-based access path selection issues #718

jenniferpeshansky commented Aug 3, 2022 •

edited

Loading

jimoleary left a comment

jimoleary Aug 8, 2022

jenniferpeshansky Aug 8, 2022

jimoleary Aug 8, 2022

jenniferpeshansky Aug 8, 2022 •

edited

Loading

jimoleary Aug 8, 2022

jenniferpeshansky Aug 8, 2022

jimoleary Aug 8, 2022

jimoleary Aug 8, 2022

jenniferpeshansky Aug 8, 2022

jimoleary Aug 8, 2022

jenniferpeshansky commented Aug 8, 2022 •

edited

Loading

jimoleary commented Aug 9, 2022

PERF-3199 Add workloads based on cost-based access path selection issues #718

Are you sure you want to change the base?

PERF-3199 Add workloads based on cost-based access path selection issues #718

Conversation

jenniferpeshansky commented Aug 3, 2022 • edited Loading

jimoleary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenniferpeshansky Aug 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenniferpeshansky commented Aug 8, 2022 • edited Loading

jimoleary commented Aug 9, 2022

jenniferpeshansky commented Aug 3, 2022 •

edited

Loading

jenniferpeshansky Aug 8, 2022 •

edited

Loading

jenniferpeshansky commented Aug 8, 2022 •

edited

Loading