Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add condensed plot comparing performance of different functions to summary report #4

Open
Th3Whit3Wolf opened this issue May 19, 2019 · 7 comments

Comments

@Th3Whit3Wolf
Copy link

Feature Enhancement

I was wondering if you would be interested in giving results with geometric and charting it similiar to how benchmarksgame does their programming language benchmarks?

@bheisler
Copy link
Owner

Hey. thanks for trying Criterion.rs and thanks for the suggestion.

Could you maybe explain why you think this would be useful? I'm not very familiar with geometric means or the benchmarks game.

@Th3Whit3Wolf
Copy link
Author

This will probably be more informative and credible then me.

@bheisler
Copy link
Owner

Thanks for the link, but I'm afraid I still don't follow. The paper recommends using geometric mean as an alternative to the arithmetic mean when summarizing normalized performance ratios. This doesn't appear to be relevant to Criterion.rs, as Criterion.rs doesn't normalize the measurements. I can see why it makes sense for the benchmarks game, which does attempt to compare the performances of many implementations of the same problem by displaying normalized ratios.

@Th3Whit3Wolf
Copy link
Author

Would you be open to having a method of doing that through criterion? Some people use criterion as a benchmark tool to compare multiple frameworks or to multiple approachs to multiple problems for multiple situations. The user obviously could write their own program to do this but criterion creates such nice violin graphs and handles the date already.

If criterion could do this (Rust Benchmarks)[https://github.com/Th3Whit3Wolf/template-benchmarks-rs] like this I feel like could be more popular and perhaps we could make a Rust Benchmarks organization that could test multiple librays like:

  • GUI Toolkits
  • Web Assembly Frameworks
  • Game Engine
  • Markup Frameworks
  • ETC

Culminate all the data so that users could choose what libraries they use based one the performance relative to their use case based on real data.

@bheisler
Copy link
Owner

Ah, I see what you're getting at now. I'll try and summarize for my future self, so I don't forget. When benchmarking just one input over multiple functions or one function with multiple inputs, it's easy to compare the results in a violin plot. However, when benchmarking multiple functions with multiple inputs, it's no longer easy to see at-a-glance how the functions behave relative to each other, and the violin plots can be extremely large. It would be helpful to have a condensed plot showing the relative performance of the different functions, and a boxplot showing how many times slower each function is than the fastest one is a good way to do that. However, if you take the arithmetic mean of those multipliers, that introduces statistical bias so it's better to use the geometric mean.

That does sound like a good addition to the summary reports, yes. This probably won't be top priority for me though, so it might take a while to add it.

@bheisler bheisler changed the title [Feature] Geometric Mean [Feature] Add condensed plot comparing performance of different functions to summary report Jun 14, 2019
@Th3Whit3Wolf
Copy link
Author

That's awesome and thank you! I can't wait!

@bheisler bheisler transferred this issue from bheisler/criterion.rs Jun 28, 2020
@jpgoldberg
Copy link

I came here to make the same feature request, but a bit more specifically.

When comparing two functions, such as fibonacci_fast() and fibonacci_fast() it would be nice to actually run a t-test between them. The violin plot is great, and in clear cases like the Fibonacci examples one certainly does not need to do statistics.

So far, I have been able to avoid having to dig into the data and running my own t-tests, as the things that I have been testing don't end up with overlapping confidence intervals, but I anticipate cases where I would fine the t-test useful.

I don't have any useful advice on how to make inferences when there are more than two things in a group to compare. There was a time in my life when I had a basic understanding of Tukey's Honest Test, but that was a long time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants