Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark reliability of torchbenchmarks #2527

Open
jerryzh168 opened this issue Oct 28, 2024 · 4 comments
Open

Benchmark reliability of torchbenchmarks #2527

jerryzh168 opened this issue Oct 28, 2024 · 4 comments

Comments

@jerryzh168
Copy link
Contributor

Recently I found that for the same model, the native benchmark code in torchbenchmarks does not give expected time, i.e. one is consistently slower than the other one, or one could be slower by up to 20%, I'm relying on torchao.utils.benchmark_model for now, please help take a look to see what might be the problem.

For details please see: #2519

@seemethere
Copy link
Member

This seems like this is an issue with model code, our expectation is that repo owners should own model code while our team owns infrastructure.

@kit1980
Copy link
Member

kit1980 commented Oct 31, 2024

I think the time variability from run to run is expected when running on a devgpu.
TorchBench servers have some special settings to reduce the variability.

@seemethere
Copy link
Member

I think the time variability from run to run is expected when running on a devgpu. TorchBench servers have some special settings to reduce the variability.

Oh so is this more of an infrastructure thing?

@jerryzh168
Copy link
Contributor Author

jerryzh168 commented Oct 31, 2024

I feel this might be related to benchmarking code, since with the exact same setup, machine etc. torchao.utils.benchmark_model gives stable results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants