Move to `LLAMA_CACHE` 🤗 #1

Vaibhavs10 · 2024-06-04T08:31:16Z

Big fan of your work! and love all that you're doing to democratise ML. Congratulations on llamanet, it looks rad!

I saw that you are creating your own cache, llamanet and persisting models there (correct me if I'm wrong).
We recently upstream changes to llama.cpp which allows one to directly download and cache the models from the Hugging Face Hub (Note: for this you'd need to compile the server with LLAMA_CURL=1)

With the curl support all you'd need to do is pass --hf-repo & --hf-file and the model checkpoint would automatically be downloaded and cached in LLAMA_CACHE ref

This would make it easier for people to use already cached model checkpoints and should benefit well in case we make improvements to the overall caching system too.

AFAICT, you should be able to benefit from this directly by changing this line:

llamanet/llamacpp.js

Line 20 in 16fc952

let args = ["-m", req.file]

Let me know what you think!
VB

The text was updated successfully, but these errors were encountered:

cocktailpeanut · 2024-06-04T13:58:06Z

@Vaibhavs10 yes i actually loved that --hf-repo feature and it was the approach i was using early on, but eventually had to comment it out (see: https://github.com/pinokiocomputer/llamanet/blob/main/llamacpp.js#L63-L65) and manually download instead.

The reason was actually exactly what you mentioned. It seems that the prebuilt binaries on the releases page are not compiled with the LLAMA_CURL=1 option, and if I wanted to get this to work using the --hf-repo option, I would have to build the files myself on the fly, and make it work on every platform.

I did try this approach with my past project Dalai https://github.com/cocktailpeanut/dalai where I did everything programmatically, from running quantization to running cmake, all programatically. It worked for most cases but the problem is always the edge cases, where the cmake commands would fail for some reason. It was too messy to try to do all this through the library, which is why this time around I wanted to avoid it as much as possible, which is why I'm downloading the prebuilt binaries from the releases page instead.

So at the moment I don't really have an option to use the --hf-repo because I will probably avoid running the cmake command within the library.

That said, I was made aware of huggingface.js yesterday and looking into that one. Using that would give us the same benefits, right?

EDIT: Just looked through the huggingface.js docs and the source and it looks like it doesn't do what I thought it does, seems to be designed to work in the browser with no access to the file system.

Vaibhavs10 · 2024-06-04T16:11:50Z

Aha! It makes sense and that's also a good feedback too! Let me see what can be done about this! 🤗

Perhaps we can upstream this change up to llama.cpp.

Vaibhavs10 mentioned this issue Jun 4, 2024

[ci] add LLAMA_CURL flags to the prebuilt binaries ggerganov/llama.cpp#7747

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to `LLAMA_CACHE` 🤗 #1

Move to `LLAMA_CACHE` 🤗 #1

Vaibhavs10 commented Jun 4, 2024 •

edited

Loading

cocktailpeanut commented Jun 4, 2024 •

edited

Loading

Vaibhavs10 commented Jun 4, 2024 •

edited

Loading

Move to LLAMA_CACHE 🤗 #1

Move to LLAMA_CACHE 🤗 #1

Comments

Vaibhavs10 commented Jun 4, 2024 • edited Loading

cocktailpeanut commented Jun 4, 2024 • edited Loading

Vaibhavs10 commented Jun 4, 2024 • edited Loading

Move to `LLAMA_CACHE` 🤗 #1

Move to `LLAMA_CACHE` 🤗 #1

Vaibhavs10 commented Jun 4, 2024 •

edited

Loading

cocktailpeanut commented Jun 4, 2024 •

edited

Loading

Vaibhavs10 commented Jun 4, 2024 •

edited

Loading