Initial draft for Stitcher workload (SQLServer only). #361

anjagruenheid · 2023-09-22T13:21:54Z

Implementation of Stitcher workload as described in "Stitcher: Learned Workload Synthesis from Historical Performance Footprints.": https://openproceedings.org/2023/conf/edbt/paper-19.pdf by Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra M. Ciortea, Konstantinos Karanasos, Subru Krishnan, published in EDBT 2023.

bpkroth · 2023-10-02T21:44:58Z

config/sqlserver/stitcher/2023-02-24-15-11-28-957898.xml

Would be nicer I think to move these into a subdirectory.
Even better would be to provide a config mechanism that would allow something like the following:

<params>  <include_workload_info>data/stitcher/workload_a.xml</include_workload_info> </params>

Now have data/stitcher/workload_a.xml contain something like the following:

<workload_info> <transaction_types> ...  </transaction_types> <works> <work> <comment>timestamp: 2023-02-24-15-11-28-957898, some other data</comment> <time>...</time> <rate>...</rate> <weights>...</weights> <terminals>...</terminals>  </work>  ... </works> </workload_info>

Hmm, I can look into that. It would definitely make it more generic for any other benchmarks that are generated in this manner.

Okay, thinking about it some more, I think we could generalize this as a new type of benchmark (multi-benchmark?) where the user specifies as input a set of configurations and an execution order as part of its xml definition. The execution order can either contain a pointer to a configuration or a 'sleep' command. The tricky part is rewriting the DBWorkload.java class because you'd need a way to instantiate BenchBase in subthreads. I can give it a go if you're okay with that plan? The stitcher workload would then be one instantiation of such a multi-benchmark.

bpkroth · 2023-10-02T21:51:39Z

scripts/stitcher/README.md

+Prior to executing the shell scripts, the data needs to be preloaded with the following commands:
+
+```sh
+java -jar benchbase.jar -b tpcc -c config/$dbms/stitcher/2023-02-24-15-11-28-957898.xml --create=true --load=true --execute=false


This seems a little strange to need to preload a dated config.
Why not preload a particular DB with an explicit scale factor?

Basically, it seems to me that for stitcher, which at a high level tries to construct workload phases out of existing benchmarks, need to

Preload a number of known benchmarks at a particular scalefactor for each.

Run a sequence of workloads corresponding to each of those workloads.

If we ignore the case where some workloads may overlap in timeperiod for a moment (i.e., they aren't strictly sequential where one workload phase ends before another begins), then it'd be nice to be able to list the full sequence of 1 and 2 inside the same config, or else provide a standard reusable script that can take a directory that's laid out in the appropriate format and replay all of these steps directly.

What's here right now is kind of a one off and not super reusable other than as a template.

Also, is it possible for us to have instructions for generating the config sequence given a resource utilization trace?

I'm not sure how these were originally generated but I don't think the date matters that much? My guess is that these were the configurations of experiments that they collected telemetry for and that were then (partially) picked to mimic this one specific workload. If you look at the configurations, they have different SFs but also different query weight distributions. You would preload 4 datasets (TPC-H, TPC-C SF 16 and 160, and YCSB) and then execute these different configurations on top of those preloaded instances. This benchmark is a static snapshot of a real-world benchmark that we can publish (i.e., add to an open-source repo) because it was already used in published work. Afaik, there is some discussion as to whether they want to open-source the Stitcher code but no resolution yet.

scripts/stitcher/stitcher_sqlserver.sh

Co-authored-by: Brian Kroth <[email protected]>

anjagruenheid added 2 commits September 22, 2023 15:19

Initial draft for Stitcher workload (SQLServer only).

528fbb6

Compressing script a bit.

e25bc02

bpkroth reviewed Oct 2, 2023

View reviewed changes

scripts/stitcher/stitcher_sqlserver.sh Show resolved Hide resolved

anjagruenheid and others added 3 commits October 12, 2023 13:40

Update scripts/stitcher/stitcher_sqlserver.sh

d4a453a

Co-authored-by: Brian Kroth <[email protected]>

Merge branch 'main' into stitcher_workload

0e29b53

Merge branch 'main' into stitcher_workload

30ffdb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial draft for Stitcher workload (SQLServer only). #361

Initial draft for Stitcher workload (SQLServer only). #361

anjagruenheid commented Sep 22, 2023

bpkroth Oct 2, 2023 •

edited

Loading

anjagruenheid Oct 12, 2023

anjagruenheid Oct 13, 2023

bpkroth Oct 2, 2023 •

edited

Loading

bpkroth Oct 2, 2023

anjagruenheid Oct 12, 2023

Initial draft for Stitcher workload (SQLServer only). #361

Are you sure you want to change the base?

Initial draft for Stitcher workload (SQLServer only). #361

Conversation

anjagruenheid commented Sep 22, 2023

bpkroth Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

anjagruenheid Oct 12, 2023

Choose a reason for hiding this comment

anjagruenheid Oct 13, 2023

Choose a reason for hiding this comment

bpkroth Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

bpkroth Oct 2, 2023

Choose a reason for hiding this comment

anjagruenheid Oct 12, 2023

Choose a reason for hiding this comment

bpkroth Oct 2, 2023 •

edited

Loading

bpkroth Oct 2, 2023 •

edited

Loading