-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP experiment: allow extended lz4 lookback window #26
base: fs/branch_9_7
Are you sure you want to change the base?
Conversation
@magibney this looks awesome. QQ
|
* provide {@link #DEFAULT_EXTENDED_MAX_DISTANCE} to allow tests to be run exercising lz4 with | ||
* {@link #EXTENDED_MAX_DISTANCE}. | ||
*/ | ||
public static final boolean DEFAULT_EXTENDED_MAX_DISTANCE = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add sys prop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do ... I opted not to because the main use for this would be to run full test suite with EXTENDED_MAX_DISTANCE
config, and that would be a bit more involved than simply adding a sysprop. If we're going make this configurable via sysprop there would be some benefit, but IMO only if we go as far as supporting configuration via project property for running tests.
Practically this only affects the lookback window for existing "inner" lz4 -- it has no impact on the configuration of lz4 as employed by TeeDirectory/CompressingDirectory. For these, EXTENDED_MAX_DISTANCE
defaults to true
, and is configurable via solrconfig.xml
.
Just opened cowpaths/fullstory-solr#213, which addresses this: "extended lookback window" feature boolean is encoded in the file compression header, so both existing files and new files will be read properly (backward compatible) and all new files will be written with the extended lookback window. |
There are some use cases (e. g., 256k block-level compression applied over index files) where the period of pattern repetition is longer. Such cases benefit from a combination of LZ4.HighCompressionHashTable and a longer lookback window (256k instead of the default Lucene lz4 64k lookback window). The benefits are both in compression (real-world cases with ~3x improved compression!), but also in latency/ CPU-efficiency, in some cases with >2x faster execution.
This is quick/dirty patch to support this approach for demonstration/experimentation.