Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

40 VARCHAR Limit on Model Name #6615

Closed
5 tasks done
mbbyn opened this issue Jul 24, 2024 · 5 comments · Fixed by #6623 or #6723
Closed
5 tasks done

40 VARCHAR Limit on Model Name #6615

mbbyn opened this issue Jul 24, 2024 · 5 comments · Fixed by #6623 or #6723
Assignees
Labels
🐞 bug Something isn't working

Comments

@mbbyn
Copy link

mbbyn commented Jul 24, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.14

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

First, add an OpenAI API compatible model with a long name.

image

Then, try to use the model in a Knowledge Base settings. Observe the failure and error logs.

✔️ Expected Behavior

It should work with long model names.

❌ Actual Behavior

It fails to save the model name since it's >40 VARCHAR.

sqlalchemy.exc.DataError: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(40)
dify_api.1
all values are hidden
dify_api.1
[SQL: INSERT INTO dataset_collection_bindings (provider_name, model_name, type, collection_name) VALUES (%(provider_name)s, %(model_name)s, %(type)s, %(collection_name)s) RETURNING dataset_collection_bindings.id, dataset_collection_bindings.created_at]
dify_api.1
[parameters: {'provider_name': 'openai_api_compatible', 'model_name': 'sentence-transformers/distiluse-base-multilingual-cased-v1', 'type': 'dataset', 'collection_name': 'Vector_index_df1864b1_1392_456d_ad1f_0ec6030969a3_Node'}]

image

@dosubot dosubot bot added the 🐞 bug Something isn't working label Jul 24, 2024
@mbbyn
Copy link
Author

mbbyn commented Jul 24, 2024

For those interested, we are hosting HuggingFace Text Embedding Inference server, which exposes an OpenAI compatible API. In order to ask it to infer on a certain model, we pass the model name to the server, which are usually very long names.

@mbbyn
Copy link
Author

mbbyn commented Jul 24, 2024

Related #1857 @crazywoola

@crazywoola crazywoola self-assigned this Jul 24, 2024
@crazywoola
Copy link
Member

Actually, it's pretty easy to fix this, I will upgrade the migrations later.

@crazywoola crazywoola linked a pull request Jul 24, 2024 that will close this issue
12 tasks
@mbbyn
Copy link
Author

mbbyn commented Jul 25, 2024

I have tested this change, but the issue still persists, unfortunately. The PR updates provider_name in embeddings table, but we would want to update model_name in dataset_collection_bindings instead. It is also worth increading the limit of model_name in other tables, such as embeddings.

@HiroshigeAoki
Copy link
Contributor

@mbbyn
I ran into the same issue. So I fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
3 participants