Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Name generalization of transformer blocks #44

Merged
merged 2 commits into from
Nov 27, 2023

Conversation

abourramouss
Copy link
Contributor

As @danielgrittner pointed out, some models can have different naming conventions but follow the same pattern, this pr fixes the issue by using the base_model_prefix and finding the transformer block name, instead of always using transformer_h_X.

Tests work the same as they did, as this doesn't affect the workings of the partitioning, we just are generalizing the way to identify transformer blocks.

abourramouss and others added 2 commits November 27, 2023 00:37
Since the name of a transformer block (start and end nodes) can follow the pattern model_layers_X (for mistral) or transformer_h_X we must generalize it.

Co-Authored-By: Daniel Grittner <[email protected]>
@xrsrke xrsrke merged commit 3e5ff02 into xrsrke:main Nov 27, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants