Performance compared to ForwardDiff #121

arnauqb · 2024-04-12T12:08:29Z

In this small example:

using ForwardDiff, StochasticAD, BenchmarkTools

function test(x, y, alpha)
    x = x * alpha
    return sum(x .* y)
end

x = rand(10000);
y = rand(10000);
alpha = 2.0;
@btime ForwardDiff.derivative(alpha -> test(x, y, alpha), alpha); # 44.045 μs (7 allocations: 312.67 KiB)
@btime StochasticAD.derivative_estimate(alpha -> test(x, y, alpha), alpha); # 727.524 μs (40028 allocations: 1.83 MiB)

One can see a ~10x difference in performance between ForwardDiff and StochasticAD. I am currently using StochasticAD for big models and it is causing a bit of a bottleneck. I would expect that both backends would have similar performance in this case since there is no discrete stochasticity involved.

Is there a way to reduce the number of allocations?

Any help would be appreciated!

gaurav-arya · 2024-04-15T01:11:26Z

Hi, thank you for the report:) It's a bit of a hectic time, so I just wanted to let you know that it may be a few weeks before I can get the chance to deeply examine your case and implement the performance optimizations described below.

Briefly: the slowdown is very likely the fault of the mutable state used in implementing the pruning backend: to see this, try running your example with backend = SmoothedFIsBackend(). For the particular case you've posted, I could very likely performance optimize it by avoiding creating a mutable state when there are no discrete perturbations, as a special case.

If your real, general problem does have discrete perturbations in all your triples, this wouldn't solve that case -- however, I've been meaning to revisit the way these "mutable" states work anyway and thus performance optimize the general case too :) But in the current design, the 10x slowdown over ForwardDiff is indeed to be expected :/

arnauqb · 2024-04-15T09:26:52Z

Thank you for your quick and detailed answer. My current use case looks like:

Sample x from a vector of Bernoullis
Run x through an expensive but deterministic model.

I guess that even though the expensive part does not have randomness, x will still have a discrete perturbation component that will need to be combined in 2. so it would not work...

I'll wait patiently for the update then :)

gaurav-arya · 2024-04-15T11:45:25Z

Ah, you could try registering your deterministic model as a single StochasticAD primitive via https://gaurav-arya.github.io/StochasticAD.jl/dev/devdocs.html#via-StochasticAD.propagate and see if that yields any speedup 🙂

arnauqb · 2024-04-15T14:51:11Z

Thanks for pointing me to propagate, I did not know about it. It did actually caused a speed-up for the expensive part of the model, but I realized that the bottleneck is caused by other simple operations between large vectors of triples, like the one in the original post.

On another topic, and perhaps I should open a new issue for this, how difficult would it be to implement GPU support for stochastic triples?

gaurav-arya · 2024-04-15T16:18:15Z

A new issue for that would definitely be appropriate! I don't know much about GPUs, but my guess is that it would be important to write rules for vector operations (e.g. using StochasticAD.propagate), rather than only scalar code as in StochasticAD currently, as scalar code would be slow on a GPU? But perhaps I'm wrong about that... In particular, I wonder whether or not something like map on a GPU array using a scalar f would be slow or fast. I imagine ForwardDiff's Dual numbers would present a similar problem -- I wonder whether or not they currently play well with GPUs?

arnauqb · 2024-04-16T13:38:20Z

ok I may have a go at this and open an issue once I made a bit of progress

Moelf · 2024-04-16T16:40:30Z

In particular, I wonder whether or not something like map on a GPU array using a scalar f would be slow or fast.

it would be fast if the scalar function f is written with only "simple" operations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance compared to ForwardDiff #121

Performance compared to ForwardDiff #121

arnauqb commented Apr 12, 2024 •

edited

Loading

gaurav-arya commented Apr 15, 2024

arnauqb commented Apr 15, 2024

gaurav-arya commented Apr 15, 2024 •

edited

Loading

arnauqb commented Apr 15, 2024

gaurav-arya commented Apr 15, 2024

arnauqb commented Apr 16, 2024

Moelf commented Apr 16, 2024 •

edited

Loading

Performance compared to ForwardDiff #121

Performance compared to ForwardDiff #121

Comments

arnauqb commented Apr 12, 2024 • edited Loading

gaurav-arya commented Apr 15, 2024

arnauqb commented Apr 15, 2024

gaurav-arya commented Apr 15, 2024 • edited Loading

arnauqb commented Apr 15, 2024

gaurav-arya commented Apr 15, 2024

arnauqb commented Apr 16, 2024

Moelf commented Apr 16, 2024 • edited Loading

arnauqb commented Apr 12, 2024 •

edited

Loading

gaurav-arya commented Apr 15, 2024 •

edited

Loading

Moelf commented Apr 16, 2024 •

edited

Loading