Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up compression by 2100% #18

Merged
merged 5 commits into from
Jun 3, 2024
Merged

Conversation

cadmic
Copy link
Contributor

@cadmic cadmic commented May 31, 2024

I replaced the string search functions with the "hash chain" algorithm used by gzip (described in https://www.rfc-editor.org/rfc/rfc1951.html#section-4). Essentially, substrings of length 3 (the minimum match length) in the 0x1000-byte compression window are placed into linked lists by hash code, so we can quickly skip to the next candidate instead of searching the whole window.

This did require some changes from gzip though, since its implementation is pretty quirky. The biggest change is that the window is searched front-to-back here instead of back-to-front in gzip. When searching back-to-front you can let the hash chains grow indefinitely and garbage-collect the hash nodes whenever, but here we keep both head and tail pointers so we can garbage collect from the head as soon as a byte falls out of the window. (Alternatively, we could still search back-to-front and get rid of the abort-early check when we reach the maximum match length. I didn't think of this until just now so I haven't tried it.)

I used a script to test on OOT segments (https://gist.github.com/cadmic/ab7a8b2ce576f0c2ccd1f8b18f06dff0, run cargo build --release first). Everything matches and the user (CPU) time to run the script went from 95.75s to 4.55s.

Copy link
Contributor

@AngheloAlf AngheloAlf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running some numbers:

cargo test --release

  • Current main :
test result: ok. 128 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 26.43s
  • This PR:
test result: ok. 128 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.25s

Compressing Majora's Mask:

  • Current main :
$ time make compress
.venv/bin/python3 tools/buildtools/compress.py --in build/n64-us/mm-n64-us.z64 --out build/n64-us/mm-n64-us-compressed.z64 --dma-start `tools/buildtools/dmadata_start.sh mips-linux-gnu-nm build/n64-us/mm-n64-us.elf` --compress `cat build/n64-us/compress_ranges.txt` --threads 4
Putting together the compressed rom...
.venv/bin/python3 -m ipl3checksum sum --cic 6105 --update build/n64-us/mm-n64-us-compressed.z64
Calculated checksum: 5354631C 03A2DEF0
Writing updated ROM to 'build/n64-us/mm-n64-us-compressed.z64'
2a0a8acb61538235bc1094d297fb6556  build/n64-us/mm-n64-us-compressed.z64
build/n64-us/mm-n64-us-compressed.z64: OK

real    0m34.510s
user    1m43.208s
sys     0m1.186s
  • This PR:
$ time make compress
.venv/bin/python3 tools/buildtools/compress.py --in build/n64-us/mm-n64-us.z64 --out build/n64-us/mm-n64-us-compressed.z64 --dma-start `tools/buildtools/dmadata_start.sh mips-linux-gnu-nm build/n64-us/mm-n64-us.elf` --compress `cat build/n64-us/compress_ranges.txt` --threads 4
Putting together the compressed rom...
.venv/bin/python3 -m ipl3checksum sum --cic 6105 --update build/n64-us/mm-n64-us-compressed.z64
Calculated checksum: 5354631C 03A2DEF0
Writing updated ROM to 'build/n64-us/mm-n64-us-compressed.z64'
2a0a8acb61538235bc1094d297fb6556  build/n64-us/mm-n64-us-compressed.z64
build/n64-us/mm-n64-us-compressed.z64: OK

real    0m2.502s
user    0m3.362s
sys     0m0.665s

It's pretty incredible.

Would be nice to test this with a real project that uses MIO0 or Yay0 before merging tho

@ethteck
Copy link
Member

ethteck commented Jun 2, 2024

Paper Mario (Yay0):
time ./configure --clean us: 9.777s -> 9.608s
time ninja 25.360s -> 24.576s

The vast majority of the build time is not comprised of compressing or decompressing yay0 files, so this isn't surprising to me. Still, nice to see some improvement! and it still matches of course

@AngheloAlf AngheloAlf merged commit 4f0d435 into decompals:main Jun 3, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants