Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more filters #154

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Add more filters #154

wants to merge 35 commits into from

Conversation

asinghvi17
Copy link
Member

@asinghvi17 asinghvi17 commented Aug 20, 2024

This PR aims to add the following filters, for compatibility with NetCDF4 and HDF5 files:

  • Fletcher32
  • FixedScaleOffset
  • Quantize
  • Shuffle
  • Delta

It also splits the filters into a new folder and documents the API for people who might want to add new filters.

@coveralls
Copy link

coveralls commented Aug 20, 2024

Pull Request Test Coverage Report for Build 11975993358

Details

  • 126 of 160 (78.75%) changed or added relevant lines in 8 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-1.2%) to 86.355%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Filters/vlenfilters.jl 4 5 80.0%
src/Filters/shuffle.jl 35 37 94.59%
src/metadata.jl 1 4 25.0%
src/Filters/Filters.jl 10 14 71.43%
src/Filters/delta.jl 13 17 76.47%
src/Filters/fletcher32.jl 37 42 88.1%
src/Filters/quantize.jl 15 20 75.0%
src/Filters/fixedscaleoffset.jl 11 21 52.38%
Files with Coverage Reduction New Missed Lines %
src/Storage/dictstore.jl 1 96.15%
Totals Coverage Status
Change from base Build 11952631782: -1.2%
Covered Lines: 905
Relevant Lines: 1048

💛 - Coveralls

@asinghvi17 asinghvi17 changed the title [WIP] Add more filters Add more filters Aug 23, 2024
@asinghvi17 asinghvi17 marked this pull request as ready for review August 23, 2024 22:14
@asinghvi17
Copy link
Member Author

asinghvi17 commented Aug 23, 2024

Everything here more or less works.

I also want to add this to the Python tests to test the I/O. What's the best way to do that?

- Kerchunk often encodes the compressor as the last filter, so we check that the compressor isn't hiding in the filters array if the compressor is null.
- Similarly, the dtype is often unknown in this case, or the transform is not encoded correctly, so we ensure that the datatypes of `data` and `a2` remain the same by reinterpreting.
@asinghvi17
Copy link
Member Author

This PR is pretty much done from my end - I will continue development and refactoring the compressor/filter stack under a single Codec type in a new branch https://github.com/JuliaIO/Zarr.jl/tree/as/codecs which already holds a bit of preliminary/API work.

asinghvi17 added a commit to JuliaIO/Kerchunk.jl that referenced this pull request Sep 13, 2024
This means we won't load the filters unless Zarr doesn't have them.
@meggart
Copy link
Collaborator

meggart commented Nov 6, 2024

Thanks again @asinghvi17 . Last time we talked you mentioned you wanted to add some last changes to this PR. Is this ready for review now?

@asinghvi17
Copy link
Member Author

I still need to add some tests going from Python -> Julia and back again, should be able to get to that today though. Thanks for the ping :D

@meggart
Copy link
Collaborator

meggart commented Nov 22, 2024

Hey @asinghvi17 another ping. I would really like to start working on the zarrv3 branch again but would ideally merge this one and rebase to minimize conflicts. Also, if you could rebase this PR on current master there would not be unrelated test failures anymore.

not just vectors, as it was previously constrained to
- Never use reinterpret
- use array comprehensions to support 0-dimensional arrays correctly, the performance impact is negligible based on testing
- only round if the target type is an integer, otherwise let it be if it's a float.
@asinghvi17
Copy link
Member Author

asinghvi17 commented Nov 22, 2024

Sorry, this slipped off my radar for a bit. Should be done now, and all CI seems green as well. Feel free to merge whenever convenient for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants