Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for stella_en_v5 embedding model -400M variant #2608

Merged
merged 16 commits into from
Nov 29, 2024

Conversation

iskng
Copy link
Contributor

@iskng iskng commented Nov 9, 2024

Stella_en_400m_v5 is #6 on MTEB as of 9th Nov 2024.

Model Card

This PR adds support for the model along with some examples.

license: Model is licensed MIT

Authors example from the model card added and reproduced.

@AnubhabB
Copy link
Contributor

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

I guess that way, it'll be easier for end users and maintainers.

It would be great if you could mark this as a draft PR for the time being till we sort this out?

Thanks

@LaurentMazare
Copy link
Collaborator

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

+1 to this, if it's easy to add support for the 400m model in the existing one that would make it simpler to maintain over time (though there is already a lot of duplication among models so if it's a significant effort to merge the two, I'm happy with the separate file).

@iskng iskng marked this pull request as draft November 10, 2024 18:40
@iskng
Copy link
Contributor Author

iskng commented Nov 12, 2024

Should have mentioned I only really tested this for inference on metal and cpu so not sure if the cuda implementation is right, had to disable the ues_efficient_memory because trying to get xformers on a mac was rough.

Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model.
Would love to learn to make this faster, if you know of any resources I'm just starting to dig around candle. Thx

@AnubhabB
Copy link
Contributor

AnubhabB commented Nov 18, 2024

@iskng Sorry about the delay, I've been on the move and didn't get a chance to work on this.

I'm not sure about the correct way of proceeding with this PR (and my changes to it) so I just created a PR to your feature branch here.

This incorporates Stella 1.5B and Stella 400M under a single entry point and flow. Basically it's a single model now with minor divergence in implementation when required. Take a look, I've been able to reproduce the author (and your's) results.

Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model.
Would love to learn to make this faster, if you know of any resources I'm just starting to dig around candle. Thx

It might not be strictly your implementation, Candle is still very very young and a bit rough around the edges when it comes to head-on performance comparisons with PyTorch based implementations. Having said that, I've attempted to simplify some of your current implementation (basically removing anything which is not directly composed out of Config, removed a bunch of redundant transpose, reshape etc. ops). Let me know if you see any performance improvements? I'll be curious.

Also, I've updated the stella_en_v5 example to support both the variants --which 400m and --which 1.5b and reported both the results. It should be safe to remove those files/ dir as well.

AnubhabB and others added 3 commits November 19, 2024 02:18
Your implementation of `Stella 400M` and the previous `Stella 1.5B` now supported in a single file and entry point
@iskng
Copy link
Contributor Author

iskng commented Nov 23, 2024

Its about 15% faster. Learned a lot from this thanks!

Removed the now redundant 400m

@AnubhabB
Copy link
Contributor

@iskng great, if you are satisfied un-draft this PR and we should be ready for merge with Candle.

@LaurentMazare
Copy link
Collaborator

Looks pretty good, you may want to run rustfmt though. Anything else we should be waiting for before merging this one?

@AnubhabB
Copy link
Contributor

I believe this is generally complete.

@iskng waiting on you to finalize this PR. As @LaurentMazare highlighted please run a cargo fmt --all and push when you un-mark this PR as WIP and we should be good to go.

@iskng iskng marked this pull request as ready for review November 26, 2024 17:43
@AnubhabB
Copy link
Contributor

@LaurentMazare I think this is ready to merge. Whenever you have some time.

@LaurentMazare
Copy link
Collaborator

Please apply rustfmt and fix the clippy lints. Also please remove the commented out code.

@AnubhabB
Copy link
Contributor

There are too many lints, from what I can tell its all over the place (not just related to this PR) - guessing it has to do with changes Rust 1.83 introduced (for example manual_div_ceil), but I don't seem to find anything in Clippy changelog to explain all the complains about needless_lifetimes.

I think that should be a separate PR by itself.

Parking this PR for now, will first work on the lints in a separate PR and come back to this!

@LaurentMazare LaurentMazare merged commit 4f59ed3 into huggingface:main Nov 29, 2024
10 checks passed
@LaurentMazare
Copy link
Collaborator

Looks like all the clippy fixes are gone post merge, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants