-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement dynamic_propagation with meta tensors #204
Comments
The tricky part of using meta tensors will be:
|
@eellison any interest in this one? |
There's some work on various parts of this (might be a FB-only link): https://docs.google.com/document/d/1W1eWV5F4UEEkeVOIRUNwt68Pb_1kg_mwodBMB4Pzjac/edit?usp=sharing
This is pretty annoying haha - Can resolved this with something called "fake tensors". Perhaps we could make a tensor subclass for this - I used to have a meta tensor tracing mode for I think the right solution should probably leverage decompositions significantly. cc: @ezyang |
We are going to do fake tensors. |
For generating random inputs, this will interact poorly for indexing ops and data dependent ops (like nonzero). So we just need full coverage there. The PoR is also to make it possible to add meta coverage in Python so it's easy to drop in a prop function as necessary. |
Awesome, fake tensors sounds perfect! |
Happy to take this on, but theres actually kind of an annoying amount of operators that might not trivially preserve the inputs device. E.g. any conversions, plus all of the annoying behavior wrt/0-dim tensors. @ezyang do we have have any timeline on fake tensors in core? depending on the timeline it might make more sense to wait for their availability Edit: actually, just special-casing the conversion operators and re-materializing and running on the case where there are different device inputs should be fine |
eellison signed up to make fake tensors in core happen |
the basic functionality works we just need to fix all the bugs |
One obvious source of memory overhead from TorchDynamo is config.dynamic_propagation=True. With this mode, TorchDynamo will create an
example_value
copy of every tensor and nn.Module, in order to have accurate python type/dtype/device/shape information. This could easily double memory usage in the worst case.This approach is nice, in that in it highly accurate and trivial to implement -- however it is very wasteful in the memory department.
We should rewrite
dynamic_propagation
to use meta tensors (and fall back to real tensors for ops where meta tensors aren't implemented).It is a very possible there are other sources of memory overhead as well, I think @anijain2305 is looking into one.
Most things should work if you disable dynamic_propagation. The exceptions are it allows constant inlining of tensor properties (dtype/device/ndim/shape/contiguous/layout/etc) and handling of ops that return lists/tuples/etc.
_Originally posted by @jansel in pytorch/pytorch#93751
The text was updated successfully, but these errors were encountered: