-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance compared to ForwardDiff #121
Comments
Hi, thank you for the report:) It's a bit of a hectic time, so I just wanted to let you know that it may be a few weeks before I can get the chance to deeply examine your case and implement the performance optimizations described below. Briefly: the slowdown is very likely the fault of the mutable state used in implementing the pruning backend: to see this, try running your example with If your real, general problem does have discrete perturbations in all your triples, this wouldn't solve that case -- however, I've been meaning to revisit the way these "mutable" states work anyway and thus performance optimize the general case too :) But in the current design, the 10x slowdown over ForwardDiff is indeed to be expected :/ |
Thank you for your quick and detailed answer. My current use case looks like:
I guess that even though the expensive part does not have randomness, x will still have a discrete perturbation component that will need to be combined in 2. so it would not work... I'll wait patiently for the update then :) |
Ah, you could try registering your deterministic model as a single StochasticAD primitive via https://gaurav-arya.github.io/StochasticAD.jl/dev/devdocs.html#via-StochasticAD.propagate and see if that yields any speedup 🙂 |
Thanks for pointing me to propagate, I did not know about it. It did actually caused a speed-up for the expensive part of the model, but I realized that the bottleneck is caused by other simple operations between large vectors of triples, like the one in the original post. On another topic, and perhaps I should open a new issue for this, how difficult would it be to implement GPU support for stochastic triples? |
A new issue for that would definitely be appropriate! I don't know much about GPUs, but my guess is that it would be important to write rules for vector operations (e.g. using |
ok I may have a go at this and open an issue once I made a bit of progress |
it would be fast if the scalar function |
In this small example:
One can see a ~10x difference in performance between ForwardDiff and StochasticAD. I am currently using StochasticAD for big models and it is causing a bit of a bottleneck. I would expect that both backends would have similar performance in this case since there is no discrete stochasticity involved.
Is there a way to reduce the number of allocations?
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: