Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for word2vec keyed vector #200

Open
dkuku opened this issue Apr 13, 2024 · 4 comments
Open

Add support for word2vec keyed vector #200

dkuku opened this issue Apr 13, 2024 · 4 comments

Comments

@dkuku
Copy link

dkuku commented Apr 13, 2024

I tried to use this library with a pretrained models from https://github.com/sdadas/polish-nlp-resources?tab=readme-ov-file#word2vec and I found out that these are in keyed vector format which is currently not supported.

@danieldk
Copy link
Member

Unless I misremember, the keyed vector format relies heavily on Python pickling, so I don't think that something we would want to support.

@dkuku
Copy link
Author

dkuku commented Apr 14, 2024

Thanks for the answer.

@sebpuetz
Copy link
Member

You might be able to build a finalfusion model in finalfusion-python by loading the KeyedVector model via gensim and constructing the necessary objects. Writing to finalfusion is supported on the Embeddings class through its write() method. You'd then be able to use it in the Rust crate.

Conversion might not be lossless, depending on what kind of "extra" information the gensim model has

@dkuku
Copy link
Author

dkuku commented Apr 14, 2024

Thanks, it may come useful at some point. I'm trying to polish my rust and nlp skills by creating elixir bindings for finalfusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants