You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd be pleased to add this vector-vs-covector content if you see fit.
It's not just a point of theoretical pedantry: once students understand it, things like the coordinate-dependence of gradient descent (said another way, the fact that the learning rate is an inverse riemannian metric rather than a number) become manifest. From here, implicit regularization becomes easy to analyze near a minimum just based on dimensional analysis! Thus we come to appreciate once again how an inner product (that says how "alike" its two inputs are) controls generalization behavior --- just as with kernel svms, l2 regularized underdetermined linear regression, and more.
I'd be pleased to add this vector-vs-covector content if you see fit.
It's not just a point of theoretical pedantry: once students understand it, things like the coordinate-dependence of gradient descent (said another way, the fact that the learning rate is an inverse riemannian metric rather than a number) become manifest. From here, implicit regularization becomes easy to analyze near a minimum just based on dimensional analysis! Thus we come to appreciate once again how an inner product (that says how "alike" its two inputs are) controls generalization behavior --- just as with kernel svms, l2 regularized underdetermined linear regression, and more.
[email protected]
The text was updated successfully, but these errors were encountered: