msmarco-v2 index & BM25: question about concatenated fields #1899
Replies: 1 comment 4 replies
-
The fields are concatenated programmatically during indexing, see here: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/collection/MsMarcoV2PassageCollection.java#L131 So this statement is accurate:
And it does correspond to the pre-built here: https://github.com/castorini/pyserini/#two-click-reproductions |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Carefully read the Anserini: BM25 Baselines for the MS MARCO V2 Collections doc, i found that
By using the prebuilt Indexes msmarco-v2-passage-augmented, i found that the url, title, headings, and passage fields are not concatenated, but in a json format. This index can also be retrieved by BM25, just curious which part is involved in BM25, only the passage fields? or just use all elements internally?
Or maybe the index built by the Anserini: BM25 Baselines for the MS MARCO V2 Collections doc is not the same as prebuilt Indexes msmarco-v2-passage-augmented?
Beta Was this translation helpful? Give feedback.
All reactions