Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Milvus does not support reading scalar indexes created for special characters like % or other charactor, which causes the querynode to crash. #37912

Open
1 task done
xiaojunxiang2023 opened this issue Nov 21, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@xiaojunxiang2023
Copy link

xiaojunxiang2023 commented Nov 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

Milvus Server: 2.4.11
java-sdk: 2.4.3

Current Behavior

When my client uses a scalar field as the query condition, it causes the querynode to crash. The querynode logs show an error indicating the presence of illegal characters.
image

The content of my scalar field is: ABC!DEFGHI-JK_@#$%^&*LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
image

I think Milvus:

  1. Supports writing special characters.
  2. Also supports building indexes for special characters.
  3. However, it does not support reading such indexes, which causes the querynode to crash.

Expected Behavior

I think this logic is very unreasonable. Validation should be done during data insertion or at the time of building the index, and any issues should be raised then, rather than waiting until the data is actually used to throw an error.

Steps To Reproduce

The reproduction steps are as follows:

Create a scalar field and build an index for it.
Insert the content ABC!DEFGHI-JK_@#$%^&*LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 into the scalar field.
Use a query condition with the scalar field like "A%".
Specify a vector field to execute the search.
This will reproduce the issue, causing the querynode to crash.

Notes:
You can implement this with multithreading and loops for stress testing.
When using like, make sure % is not at the beginning of the query string. Starting with % disables prefix matching, which prevents the index from being used and avoids triggering the issue.

Milvus Log

No response

Anything else?

No response

@xiaojunxiang2023 xiaojunxiang2023 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 21, 2024
@xiaofan-luan
Copy link
Collaborator

@xiaojunxiang2023
you are using 2.4.x but from my knowlege milvus 2.4 don't have rust related logic. Only 2.5 when we introduce tantivy.
can you share your test code if possible?
Are you using prefix filtering, A like "AA%" or random like "%AA%"

@xiaofan-luan
Copy link
Collaborator

/assign @sunby
please help on it as well

@sunby
Copy link
Contributor

sunby commented Nov 22, 2024

It looks like the query expression is "^%" and it is parsed to "^(.|\n)*" which is invalid for regex parser. We should escape the origin pattern and then pass it to tantivy.

sre-ci-robot pushed a commit that referenced this issue Nov 22, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2024
@yanliang567 yanliang567 removed their assignment Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants