-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: speed up search iterator stage 1 #37947
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PwzXxm The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
17da513
to
a6bd30c
Compare
@PwzXxm cpp-unit-test check failed, comment |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37947 +/- ##
==========================================
+ Coverage 81.07% 82.94% +1.87%
==========================================
Files 1369 1080 -289
Lines 190968 165569 -25399
==========================================
- Hits 154819 137324 -17495
+ Misses 30659 22744 -7915
- Partials 5490 5501 +11
|
rerun cpp-unit-test |
@PwzXxm cpp-unit-test check failed, comment |
00a762f
to
97467bf
Compare
@PwzXxm E2e jenkins job failed, comment |
@PwzXxm go-sdk check failed, comment |
@PwzXxm cpp-unit-test check failed, comment |
97467bf
to
e74fbfd
Compare
@PwzXxm cpp-unit-test check failed, comment |
@PwzXxm E2e jenkins job failed, comment |
e74fbfd
to
de74b0c
Compare
@PwzXxm cpp-unit-test check failed, comment |
/hold |
/unhold |
@PwzXxm cpp-unit-test check failed, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: just some questions
heap.pop(); | ||
|
||
// last_bound may change between NextBatch calls, discard any invalid results | ||
if (!IsValid(cur_rst, last_bound, radius, range_filter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so v2 iterator will not return better results compared to former iterations page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this stage, the next one will try to take care of this.
const float dist = result.first; | ||
const bool is_valid = | ||
!last_bound.has_value() || dist > last_bound.value(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: no need to consider the positive or negative metrics for dist and last_bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distances are converted when entering this class, no need to worry about it here
@@ -124,6 +125,19 @@ SearchOnGrowing(const segcore::SegmentGrowingImpl& segment, | |||
|
|||
// step 3: brute force search where small indexing is unavailable | |||
auto vec_ptr = record.get_data_base(vecfield_id); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cached itrator will be created every time, so what is 'cached'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will introduce a pool of results in the next stage, as commented in https://github.com/milvus-io/milvus/pull/37947/files/9f6b88743198a575eb84cb427bcd41a7631676b7#diff-7344957165f4632a9363de767323618b7db0bd2d0f7cf7165965d3fb2612f18b. This class tries to provide a framework for the further implementation. If you think this name is confusing, I will change the naming if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to bother, just follow your scheme
search_result.distances_.resize(nq_ * batch_size_); | ||
|
||
for (size_t query_idx = 0; query_idx < nq_; ++query_idx) { | ||
auto rst = GetBatchedNextResults(query_idx, search_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: seems that offsets and distances data retrieved are copied twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distance-id pairs need to be sorted before copy to the search_result
. Knowhere needs to provide the ability to give batched results via iterator to eliminate this copy.
@PwzXxm E2e jenkins job failed, comment |
/lgtm |
/hold |
/unhold
|
/run-cpu-e2e |
Signed-off-by: Patrick Weizhi Xu <[email protected]>
Signed-off-by: Patrick Weizhi Xu <[email protected]>
9f6b887
to
e7d1366
Compare
New changes are detected. LGTM label has been removed. |
@PwzXxm go-sdk check failed, comment |
@PwzXxm cpp-unit-test check failed, comment |
issue: #37548