You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Files larger than approx the max_trigram_count (default 20k) are treated as binary, thus excluded from content-based indexing.
If you search for lang:binary, there's chance you'll run into some large source codes or other content you still would wish to index.
The logic lives in https://github.com/google/zoekt/blob/master/indexbuilder.go#L298 . Could maybe plumb some options that excludes files of certain pattern from the "too long, probably binary" treatment (raising the trigram limit is not really as option, there's always just one more file that is above any limit and would have been nice to index).
The text was updated successfully, but these errors were encountered:
Actually filesize-based exemption is already available, and the limit is 2MB by default. What I stumbled into is indeed the max_trigram_count. Maybe that could have a filename-based exemption similar to
robinp-tw
changed the title
Need to exempt certain file types from being treated as binary due to size
Need to exempt certain file types from being treated as binary due to trigram count
Oct 7, 2021
Files larger than approx the
max_trigram_count
(default 20k) are treated as binary, thus excluded from content-based indexing.If you search for
lang:binary
, there's chance you'll run into some large source codes or other content you still would wish to index.The logic lives in https://github.com/google/zoekt/blob/master/indexbuilder.go#L298 . Could maybe plumb some options that excludes files of certain pattern from the "too long, probably binary" treatment (raising the trigram limit is not really as option, there's always just one more file that is above any limit and would have been nice to index).
The text was updated successfully, but these errors were encountered: