Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP experiment: allow extended lz4 lookback window #26

Open
wants to merge 3 commits into
base: fs/branch_9_7
Choose a base branch
from

Conversation

magibney
Copy link
Collaborator

There are some use cases (e. g., 256k block-level compression applied over index files) where the period of pattern repetition is longer. Such cases benefit from a combination of LZ4.HighCompressionHashTable and a longer lookback window (256k instead of the default Lucene lz4 64k lookback window). The benefits are both in compression (real-world cases with ~3x improved compression!), but also in latency/ CPU-efficiency, in some cases with >2x faster execution.

This is quick/dirty patch to support this approach for demonstration/experimentation.

@hiteshk25
Copy link
Collaborator

@magibney this looks awesome. QQ

  1. what will happen if this new code will see existing compressed data?
  2. Let's create another ticket to verify this on playpen (new collection, existing collection, etc)
  3. Then we can merge it

* provide {@link #DEFAULT_EXTENDED_MAX_DISTANCE} to allow tests to be run exercising lz4 with
* {@link #EXTENDED_MAX_DISTANCE}.
*/
public static final boolean DEFAULT_EXTENDED_MAX_DISTANCE = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add sys prop here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do ... I opted not to because the main use for this would be to run full test suite with EXTENDED_MAX_DISTANCE config, and that would be a bit more involved than simply adding a sysprop. If we're going make this configurable via sysprop there would be some benefit, but IMO only if we go as far as supporting configuration via project property for running tests.

Practically this only affects the lookback window for existing "inner" lz4 -- it has no impact on the configuration of lz4 as employed by TeeDirectory/CompressingDirectory. For these, EXTENDED_MAX_DISTANCE defaults to true, and is configurable via solrconfig.xml.

@magibney
Copy link
Collaborator Author

what will happen if this new code will see existing compressed data?

Just opened cowpaths/fullstory-solr#213, which addresses this: "extended lookback window" feature boolean is encoded in the file compression header, so both existing files and new files will be read properly (backward compatible) and all new files will be written with the extended lookback window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants