[WIP] Add flex attention for gpt2 #34861

mayankagarwals · 2024-11-21T16:00:09Z

What does this PR do?

Adding flex_attention for Gpt2 model following #34809

Who can review?

Rocketknight1 · 2024-11-21T17:20:07Z

cc @ArthurZucker for review, but in the meantime @mayankagarwals you can make the tests pass by doing pip install transformers[quality] and then make fixup in the repo directory

mayankagarwals · 2024-11-24T16:04:03Z

Hi @Rocketknight1

Thanks a ton, got it.

Waiting for discussions to close here #34896. Thinking that PR should follow similar guidelines to keep one standard design of implementation of generic attention block.

@vasqu Let me know once major design bits are closed! I'm following your PR for gpt neox and aligned with everything there.

vasqu · 2024-11-24T18:56:17Z

@mayankagarwals Sure I'll let you know when stuff clears up!

ArthurZucker · 2024-11-25T17:21:56Z

Having a look at your PR @vasqu, I think quite a few things were fixed since so will check!

initial commit

08ee8bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add flex attention for gpt2 #34861

[WIP] Add flex attention for gpt2 #34861

mayankagarwals commented Nov 21, 2024

Rocketknight1 commented Nov 21, 2024

mayankagarwals commented Nov 24, 2024

vasqu commented Nov 24, 2024

ArthurZucker commented Nov 25, 2024

[WIP] Add flex attention for gpt2 #34861

Are you sure you want to change the base?

[WIP] Add flex attention for gpt2 #34861

Conversation

mayankagarwals commented Nov 21, 2024

What does this PR do?

Who can review?

Rocketknight1 commented Nov 21, 2024

mayankagarwals commented Nov 24, 2024

vasqu commented Nov 24, 2024

ArthurZucker commented Nov 25, 2024