Switch to https://github.com/abetlen/llama-cpp-python #9

ericcurtin · 2024-07-30T20:56:19Z

Right now we call llama.cpp directly, long-term we should go with either llama.cpp directly or llama-cpp-python. Because maintaining two different llama.cpp backends isn't ideal, they will never be in sync from a version perspective etc. More maintenance.

The API's of llama-cpp-python seem to be more stable, if we can get it to behave the same as the current implementation, we should consider switching.

Tagging @MichaelClifford as he suggested the idea and may be interested.

rhatdan · 2024-07-30T21:17:53Z

I would prefer to go with the python route.

ericcurtin · 2024-07-31T11:18:15Z

I would prefer to go with the python route.

I agree, the main problem we have right now, is this "--instruct" option in llama.cpp direct was very useful for creating daemonless interactive terminal-based chatbots:

llama-main -m model --log-disable --instruct

they have actually since removed this --instruct option in llama.cpp in the last month.

I briefly tried to to do the same with llama-cpp-python, I couldn't get something working that worked well on a wide array of models like --instruct. But I only tried for an hour maybe so, I'm sure someone would figure this out, it's something several projects have done already in one form or another.

ericcurtin · 2024-08-01T15:10:04Z

Tagging @abetlen , we also sent an email with more details to [email protected]

MichaelClifford · 2024-08-05T13:47:44Z

Hi @ericcurtin 👋 I agree, we should go with only one and not try to support both backends. That said, I don't have a very strong opinion as to which. We currently use llama-cpp-python in the recipes as well as the extensions' playground. So if we want to keep things consistent, it probably does make the most sense to stick with llama-cpp-python here too.

My only hesitation with llama-cpp-python is, it is another layer of abstraction between us and llama-cpp that we will need to rely on. And there have been a few instances in the past (getting the granite models working for example) where llama-cpp-python lagged a bit behind llama-cpp.

So, really I'm open to either approach. Let's figure out what ramalam's requirements are and pick the tool that works best for us 😄

Ben-Epstein · 2024-09-29T13:40:37Z

For what it's worth, running on MacOS sequoia (M3), llama-cpp-python consistently fails on my machine, but ramalama in its current form works. It might be worth testing to see if that holds true among more Apple silicon machines before switching

ericcurtin · 2024-09-29T13:55:07Z

Yeah... To be honest at this point, if we do add this, it will probably be just another --runtime, like --runtime llama-cpp-python

rhatdan · 2024-11-06T20:38:46Z

No one is working on this, is this something we should still consider?

ericcurtin · 2024-11-06T21:30:38Z

llama-cppy-python does appear as though it implements a more feature complete OpenAI Compatible Server to the direct llama.cpp one, but I don't know for sure:

https://llama-cpp-python.readthedocs.io/en/latest/server/

it also implements multi-model server support.

I'm unsure, maybe we should consider it. This is one of those python things that we could probably run fine in a container, we'd probably only want read-only model access for it.

ericcurtin · 2024-11-06T21:33:27Z

I don't mind either way, leaving this open or closing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to https://github.com/abetlen/llama-cpp-python #9

Switch to https://github.com/abetlen/llama-cpp-python #9

ericcurtin commented Jul 30, 2024 •

edited

Loading

rhatdan commented Jul 30, 2024

ericcurtin commented Jul 31, 2024 •

edited

Loading

ericcurtin commented Aug 1, 2024 •

edited

Loading

MichaelClifford commented Aug 5, 2024

Ben-Epstein commented Sep 29, 2024

ericcurtin commented Sep 29, 2024

rhatdan commented Nov 6, 2024

ericcurtin commented Nov 6, 2024

ericcurtin commented Nov 6, 2024

Switch to https://github.com/abetlen/llama-cpp-python #9

Switch to https://github.com/abetlen/llama-cpp-python #9

Comments

ericcurtin commented Jul 30, 2024 • edited Loading

rhatdan commented Jul 30, 2024

ericcurtin commented Jul 31, 2024 • edited Loading

ericcurtin commented Aug 1, 2024 • edited Loading

MichaelClifford commented Aug 5, 2024

Ben-Epstein commented Sep 29, 2024

ericcurtin commented Sep 29, 2024

rhatdan commented Nov 6, 2024

ericcurtin commented Nov 6, 2024

ericcurtin commented Nov 6, 2024

ericcurtin commented Jul 30, 2024 •

edited

Loading

ericcurtin commented Jul 31, 2024 •

edited

Loading

ericcurtin commented Aug 1, 2024 •

edited

Loading