-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rope_benchmark #3550
base: main
Are you sure you want to change the base?
rope_benchmark #3550
Conversation
} | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part that's worth reviewing.
code above were directly dumped from Kevin's rope example script. (Note that I have to update the script with nv_enable_matmul
in thunder.jit, otherwise we are seeing segmentation at nvfuser definition level)
I also want to add another toy example where we'll sweep on the batch size. But I'll do that in a separate PR. |
@Priya2698 is adding the Thunder backend #3394. Does it mean we can just have the forward functions? |
We will also benchmark backward pass with Thunder backend. |
Yes, so, we don't need to have the backward implementations explicitly, right? |
Looking at the thunder-nvfuser timing. Strangely the benchmark number doesn't match with the benchmark from kevin's example.
But if I run the manual rope_example, I'm getting these
I'll double check the measurement script, as well as compile options (i.e. thunder trace options). We need to do the same sanity check for torchcompile later. |
Rope benchmark extracted from lightning trace.