-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to LLAMA_CACHE
🤗
#1
Comments
@Vaibhavs10 yes i actually loved that The reason was actually exactly what you mentioned. It seems that the prebuilt binaries on the releases page are not compiled with the I did try this approach with my past project Dalai https://github.com/cocktailpeanut/dalai where I did everything programmatically, from running quantization to running cmake, all programatically. It worked for most cases but the problem is always the edge cases, where the cmake commands would fail for some reason. It was too messy to try to do all this through the library, which is why this time around I wanted to avoid it as much as possible, which is why I'm downloading the prebuilt binaries from the releases page instead. So at the moment I don't really have an option to use the That said, I was made aware of huggingface.js yesterday and looking into that one. Using that would give us the same benefits, right? EDIT: Just looked through the huggingface.js docs and the source and it looks like it doesn't do what I thought it does, seems to be designed to work in the browser with no access to the file system. |
Aha! It makes sense and that's also a good feedback too! Let me see what can be done about this! 🤗 Perhaps we can upstream this change up to llama.cpp. |
Hi @cocktailpeanut,
Big fan of your work! and love all that you're doing to democratise ML. Congratulations on
llamanet
, it looks rad!I saw that you are creating your own cache,
llamanet
and persisting models there (correct me if I'm wrong).We recently upstream changes to
llama.cpp
which allows one to directly download and cache the models from the Hugging Face Hub (Note: for this you'd need to compile the server withLLAMA_CURL=1
)With the curl support all you'd need to do is pass
--hf-repo
&--hf-file
and the model checkpoint would automatically be downloaded and cached inLLAMA_CACHE
refThis would make it easier for people to use already cached model checkpoints and should benefit well in case we make improvements to the overall caching system too.
AFAICT, you should be able to benefit from this directly by changing this line:
llamanet/llamacpp.js
Line 20 in 16fc952
Let me know what you think!
VB
The text was updated successfully, but these errors were encountered: