Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to LLAMA_CACHE 🤗 #1

Open
Vaibhavs10 opened this issue Jun 4, 2024 · 2 comments
Open

Move to LLAMA_CACHE 🤗 #1

Vaibhavs10 opened this issue Jun 4, 2024 · 2 comments

Comments

@Vaibhavs10
Copy link

Vaibhavs10 commented Jun 4, 2024

Hi @cocktailpeanut,

Big fan of your work! and love all that you're doing to democratise ML. Congratulations on llamanet, it looks rad!

I saw that you are creating your own cache, llamanet and persisting models there (correct me if I'm wrong).
We recently upstream changes to llama.cpp which allows one to directly download and cache the models from the Hugging Face Hub (Note: for this you'd need to compile the server with LLAMA_CURL=1)

With the curl support all you'd need to do is pass --hf-repo & --hf-file and the model checkpoint would automatically be downloaded and cached in LLAMA_CACHE ref

This would make it easier for people to use already cached model checkpoints and should benefit well in case we make improvements to the overall caching system too.

AFAICT, you should be able to benefit from this directly by changing this line:

let args = ["-m", req.file]

Let me know what you think!
VB

@cocktailpeanut
Copy link
Contributor

cocktailpeanut commented Jun 4, 2024

@Vaibhavs10 yes i actually loved that --hf-repo feature and it was the approach i was using early on, but eventually had to comment it out (see: https://github.com/pinokiocomputer/llamanet/blob/main/llamacpp.js#L63-L65) and manually download instead.

The reason was actually exactly what you mentioned. It seems that the prebuilt binaries on the releases page are not compiled with the LLAMA_CURL=1 option, and if I wanted to get this to work using the --hf-repo option, I would have to build the files myself on the fly, and make it work on every platform.

I did try this approach with my past project Dalai https://github.com/cocktailpeanut/dalai where I did everything programmatically, from running quantization to running cmake, all programatically. It worked for most cases but the problem is always the edge cases, where the cmake commands would fail for some reason. It was too messy to try to do all this through the library, which is why this time around I wanted to avoid it as much as possible, which is why I'm downloading the prebuilt binaries from the releases page instead.

So at the moment I don't really have an option to use the --hf-repo because I will probably avoid running the cmake command within the library.

That said, I was made aware of huggingface.js yesterday and looking into that one. Using that would give us the same benefits, right?

EDIT: Just looked through the huggingface.js docs and the source and it looks like it doesn't do what I thought it does, seems to be designed to work in the browser with no access to the file system.

@Vaibhavs10
Copy link
Author

Vaibhavs10 commented Jun 4, 2024

Aha! It makes sense and that's also a good feedback too! Let me see what can be done about this! 🤗

Perhaps we can upstream this change up to llama.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants