Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reference cycle #223

Merged
merged 1 commit into from
May 10, 2022
Merged

Remove reference cycle #223

merged 1 commit into from
May 10, 2022

Conversation

anijain2305
Copy link
Contributor

For the following test, Dynamo was holding on to more memory than necessary. And this memory was released after calling gc.collect(). This indicated that their was a reference cycle somewhere.


def print_mem(name):
    print(name, torch.cuda.memory_allocated() / 10**9, "GB")


def fn1():
    x = torch.randn(1024, 1024, 1024, device="cuda")
    print_mem("Inside func")
    return x.sum()


fn1()
print_mem("Outside func")
print("------ Eager Done --------")
print("\n\n\n")

# torchdynamo.config.debug = True
# torchdynamo.config.trace = True

with torchdynamo.optimize("eager"):
    fn1()
    print_mem("Outside func")
print("------ TorchDynamo Done --------")
print("\n\n\n")


graph = refcycle.garbage()
sccs = graph.strongly_connected_components()

refcycles = graph.source_components()
for idx, r in enumerate(refcycles):
    r.export_image(f"cycle{idx}.svg")

print(refcycles)

I used refcycle package to find the reference cycles, and following is the culprit.

image

This PR does the following

  • Breaks the cycle by setting output_graph.root_tx = None after we are done with tracing
  • Also, cleanup the graphargs, the data structure that was holding on to the tensor.

@anijain2305
Copy link
Contributor Author

Cross referencing #69 and pytorch/pytorch#93751

Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, glad we found this one!

@anijain2305 anijain2305 merged commit 7aeb1d4 into pytorch:main May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants