Performance impact of large number of collections #37594
Unanswered
jubingc
asked this question in
Q&A and General discussion
Replies: 2 comments 12 replies
-
Take a look at this chart so understand how milvus manages the data in shards/partitions/collections/segments: |
Beta Was this translation helpful? Give feedback.
5 replies
-
@yanliang567 is working on the effect of large number of the collections/partitions The goal here is to support: This could be part of milvus 2.5.X |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
According to the documentation:
I would like to better understand how the number of collections and partitions affects performance. Could you clarify what constitutes "too many" collections in this context?
As an example, the calculation below multiplies the number of collections, shards, and partitions. However, shards are primarily for data writing, while partitions and segments are used for data reading. Why are these elements multiplied together?
Additionally, per the documentation, the maximum number of partitions in a collection is 4,096 (with a default of 1,024, controlled by rootCoord.maxPartitionNum). Given a shared rootCoord.maxGeneralCapacity, which of the following configurations would likely yield better performance?
Beyond performance, I’d also appreciate insights into the pros and cons of each setup. Some drawbacks of the second setup I’m aware of include:
a. The recommended size for a partition is up to 1 billion items (reference).
b. There is currently no way to filter data within a partition quickly.
Are there additional pros or cons to consider for each of these configurations?
Beta Was this translation helpful? Give feedback.
All reactions