This repository has been archived by the owner on May 21, 2022. It is now read-only.

Adam optimizer #13

Open

CorySimon opened this issue Mar 21, 2017 · 1 comment

CorySimon commented Mar 21, 2017

This package is really useful as learning rate updaters. I'm using a variant of the Adam scheme here for SGD.

I think it is unnecessary to have \rho_i^t as vectors. Shouldn't these be Float64's?
Also, pedantic, I'm not sure why they are called \rho instead of \beta.
https://github.com/JuliaML/StochasticOptimization.jl/blob/master/src/paramupdaters.jl#L123-L124

Author

CorySimon commented Mar 21, 2017

Also, comparing to the paper,
https://arxiv.org/pdf/1412.6980.pdf
the update of \theta is not correct for the Adam optimizer.
Shouldn't it be:

θ[i] -= α * m[i] / (1.0 - β₁ᵗ) * sqrt(1.0 - β₂ᵗ) / (sqrt(v[i]) + ϵ * sqrt(1.0 - β₂ᵗ))

Please confirm that I am correct, and I will make a pull request. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.