You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggllm.cpp is a fork of llama.cpp that supports running falcon architecture models, and it is getting quite good.
I think it would be a good idea if possible to either branch or natively add support for it in this tool.
I attempted to run it with some success using symlinks to the appropriate files, and it did work somewhat, so it may not be a real heavy lift for someone more knowledgeable about the architecture of this API. The problem seemed to be that the end of stream tokens and stream closed from ggllm.cpp differently than it does with llama.cpp and this caused it to unload. Other problems are that some of the needed arguments were not supported, for example prompt ingestion batch sizes.
To get it to load, I simply built as normal and sym-linked ggllm.cpp folder to llama.cpp, and inside that folder sym-linked falcon_main to main, then fired it up with chatbot-ui and the appropriate model and arguments.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
ggllm.cpp is a fork of llama.cpp that supports running falcon architecture models, and it is getting quite good.
I think it would be a good idea if possible to either branch or natively add support for it in this tool.
I attempted to run it with some success using symlinks to the appropriate files, and it did work somewhat, so it may not be a real heavy lift for someone more knowledgeable about the architecture of this API. The problem seemed to be that the end of stream tokens and stream closed from ggllm.cpp differently than it does with llama.cpp and this caused it to unload. Other problems are that some of the needed arguments were not supported, for example prompt ingestion batch sizes.
To get it to load, I simply built as normal and sym-linked ggllm.cpp folder to llama.cpp, and inside that folder sym-linked falcon_main to main, then fired it up with chatbot-ui and the appropriate model and arguments.
Any thoughts here?
Beta Was this translation helpful? Give feedback.
All reactions