-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using RAG or Multi-Agent features with llama models the script stops replying without meeting the termination condition or errors #4422
Comments
@gianx89 thanks for raising the issue. Will you be able to share more how you are provisioning your local API? Anything that might help reproducing? |
Hi and thanks, @MohMaz. I’ve tried the following provisioning modes:
I’ve been focusing on various Llama versions and tested them all in different "sizes". Here’s what I’ve tried:
I’ve also experimented with Mistral and other models in the RAG-only version of my project, but I consistently encountered problems. Now, I’m shifting my focus to the Multi-Agent aspect of the project. I’ve tried reducing the token size (using an approximate method) and limiting the size of the chat history, but the results remain the same. Here’s my
Reproducing the ProblemThis is a link to a Python file to reproduce the problem. Currently, it uses You must provide some documents in the RAG documents folder; otherwise, you might not be able to reproduce the issue. The problem sometimes worsens when I include context from RAG. |
I think the termination condition was triggered. It could be the message transform that truncates the message and prevents the message from displaying fully. @thinkall @WaelKarkoub what do you think? |
It happened even without truncation. I added it trying to solve the problem. I'll post later an output without truncation. Or you can try it disabling the truncation in the provided script. |
Here it's an output without truncation, the same problem happens.
This is the link containing the code to reproduce the error. |
Hi to all! I'm using Autogen to develop a RAG system, even better if Multi-Agent. I must use open source, preferably local models, like llama 3.1 or llama 3.2 . I'm using ChromaDB as my vector database.
I'm developing a system that can write a Comic Book Story with a specific format. There is writer or teams of writers that writes the story, a critic or a team of critics that gives advices on how to improve the story a manager or team of managers that incorporates this suggestions. When the story is deemed satisfactory, an agent writes "TERMINATE".
I don't have any issues using OpenAI APIs and models like GPT-3.5 Turbo or GPT-4. However, when working with open-source or local models, I encounter unpredictable behaviors.
The agents starts talking:
The chats ends without error but abruptly, without meeting the termination condition. Sometimes, very rarely, I get the right results and the chat ends with "TERMINATE" string.
Any suggestions? I can't share all the code at the moment, but I can reply with snippets of it.
P.S.
The max_rounds are pretty high (100)
The text was updated successfully, but these errors were encountered: