[QUESTION] Protocol of adding a new model (Stella_en_<*>_v5 family) implementation with Candle #2525

AnubhabB · 2024-10-01T05:37:54Z

Hi,

I have a working implementation of Stella_en_<*>_v5 family of models which is one of the top ranking model in the MTEB leaderboard for reranking and retrieval.

It's basically built on top of the candle-transformers::qwen2 implementation with the language modeling head swapped for their pre-trained dense layer.

I was hoping to open a pull request with candle along with an example.

Questions:

Would it be OK?
If it is, is there a protocol to follow for candle_transformers? Something akin to How to add a model to transformers
For embedding models are there any candle standard API implementations required? My implementations just spits out the logits from the forward pass.

Looking for some guidance. Thanks in advance.

CC:
@EricLBuehler @LaurentMazare

The text was updated successfully, but these errors were encountered:

LaurentMazare · 2024-10-01T06:39:42Z

Adding popular models such as these ones is certainly fine (as long as the license permit it, e.g. yolo-v8 directly in candle-transformers or llama 3.2 vision would be problematic).
There isn't a very formal process for this, the best is probably to start by looking at similar models and how they are implemented and do it in the same style. If you go along this way and want to formalize it in a "How to add a model" document, the contribution would be welcome.
There is no standard api provided in candle-transformers just spitting the logits seems fine as long as there is an example showing how to use them.

On best practices:

Ideally one should follow the python reference implementation if it exists and have links in the code to the different part of this reference implementation.
The code should be validated on some examples against the reference implementation and the logits should not be far apart (usually better to use f32 than bf16 for this).
It's good to have an example with a readme that links to the model card and gives some input/output samples.

AnubhabB · 2024-10-01T08:26:31Z

Thanks a ton for the elaborate response @LaurentMazare, I'll take a stab at it after verifying the license.

iskng · 2024-11-08T00:45:08Z

Am I missing something, the smaller stella_400M docs say its based off the same model but the config lists different activations functions and swapping of the model address produces:
Error:` Tokenizer doesn't contain expected `<|endoftext|>` token

AnubhabB · 2024-11-08T04:48:54Z

I don't think you are missing anything .. I was stumped there too. The 1.5B variant and 400M variant are not just config changes. I've not had the chance to work on the 400M variant yet for Candle transformers. Candle transformers support the 1.5B variant for now.

iskng · 2024-11-08T05:28:52Z

Ah thanks for the heads up!

If you dont mind what was your process for the 1.5B?
Did you start by converting the modeling_qwen.py then just implement the task specific queries and the tokenizer.

The 400 has a modeling.py that Iv got converted and outputting embeddings but not matching the sample and they're way off.

AnubhabB · 2024-11-08T06:07:55Z

Yes, thats what I did .. picked up the base Qwen model Candle implementation, removed everything and kept adding one module at a time while validating the output with HF Transformers implementation.

Just noting down a few misses I had during the phase you are in that resulted in output mismatch:

The tokenizer padding - it's padding left not right the way most embedding models do, I had missed this and ended up having similar but not same outputs
Check if you can reproduce the results reported by the authors - during AutoModel I missed trust_remote_code=True - gave me a lot of headache before I figured it out.
The file you should look at is modeling.py - with trust_remote_code=True this is the file which is getting executed
float32 seemed to be important for stability for the final embedding projection layer - with bf16 I had NaNs IIRC

Some of the above may be obvious .. these are just mistakes I made.

By the way .. a PR with 400M would be welcome. Would very much love to hear about the progress you make.

iskng · 2024-11-08T06:41:27Z

The tokenizer padding_side comes from tokenization_qwen.py right but the 400M doesn't use that as far as I can tell.
AutoTokenizer has trust_remote_code=True but theres no specific .py file for the 400M.

Seems like more based on gte-large-en-v1.5

AnubhabB · 2024-11-08T07:05:11Z

The tokenizer config actually comes from tokenizer_config and yes they are using padding: right here unlike 1.5B variant and looks like they are using BertTokenizer.

Most importantly, they define custom AutoModel and AutoConfig calls here which is referencing the modeling.py and config.py.

To implement this in candle we'll need to walk through and re-write starting with modeling.py -> class NewModel and config.py -> class NewConfig.

Checkout the linked tokenizer_config, that should give you relevant pointers.

By the way, for the stella_en_400M_v5 I don't see tokenization_qwen.py. Use this as your reference implementation instead of qwen.

iskng · 2024-11-09T07:38:04Z

Yea got which repo just wasn't clear on if there was an existing architecture reference in candle.

Ended up just starting from scratch, seems like it needs this custom rope implementation also.

Can get this output but not clear on how to hook in with the MLR layer.
For the moment just copied your mean pooling and stuck it on the end.
Similarity scores: [[0.8398, 0.2990],
[0.3282, 0.8095]]

iskng · 2024-11-09T19:48:03Z

Below's what I got so far, can reproduce the repos example and followed your 1.5B example.
Still a little unclear on what's needed to submit for the model. Doesn't include any of the training related dropouts and the config is hardcoded in the model. More of inference only for these specific weights.
Here's the PR

AnubhabB mentioned this issue Oct 8, 2024

Adds support for Stella_en_v5 embedding model - 1.5B variant #2551

Merged

3 tasks

LaurentMazare closed this as completed in #2551 Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Protocol of adding a new model (Stella_en_<*>_v5 family) implementation with Candle #2525

[QUESTION] Protocol of adding a new model (Stella_en_<*>_v5 family) implementation with Candle #2525

AnubhabB commented Oct 1, 2024

LaurentMazare commented Oct 1, 2024

AnubhabB commented Oct 1, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 9, 2024

iskng commented Nov 9, 2024 •

edited

Loading

[QUESTION] Protocol of adding a new model (Stella_en_<*>_v5 family) implementation with Candle #2525

[QUESTION] Protocol of adding a new model (Stella_en_<*>_v5 family) implementation with Candle #2525

Comments

AnubhabB commented Oct 1, 2024

LaurentMazare commented Oct 1, 2024

AnubhabB commented Oct 1, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 8, 2024

AnubhabB commented Nov 8, 2024

iskng commented Nov 9, 2024

iskng commented Nov 9, 2024 • edited Loading

iskng commented Nov 9, 2024 •

edited

Loading