Skip to content

Latest commit

 

History

History
221 lines (170 loc) · 15.2 KB

README.md

File metadata and controls

221 lines (170 loc) · 15.2 KB

Zarr

Awesome

drawing

Zarr is a cloud-native, chunked, compressed, and hierarchical array data format.

Contents

Resources

Topics

Resources

Existing resources

The Zarr website is already an excellent resource for learning about Zarr and its ecosystem. This list is intended to complement the website with a curated and opinionated list of resources.

This list focuses on Geo/Earth Sciences, but is not limited to that domain.

Existing lists

Lists

Introductory videos

Introductory talks Youtube playlist

Two excellent and up-to-date introductory talks:

Zarr V3

Zarr V3 is the upcoming version of Zarr. It is a major update that will bring many new features and improvements.

If you're getting into Zarr now, it might be a good idea to start with Zarr V3.

For an excellent in-depth overview, see the ESIP series of talks

Libraries

This list contains libraries that directly relate to Zarr in some way.

For implementations of Zarr, see Zarr Implementations.

Storage & I/O

ETL

Developer-oriented

  • numcodecs: Compression and transformation codecs used by Zarr
  • pydantic-zarr: Pydantic models for Zarr objects
  • traverzarr: Traversing Zarr JSON as if it's a filesystem
  • zarr_checksum: Calculating checksum information form Zarr
  • zarrdump: Describe zarr stores from the command line

Visualization: For tools & libraries for visualization, see visualization section

Kerchunk

Kerchunk allows you to efficiently read chunked data formats such as GRID, NetCDF, COGs by exposing them as a Zarr store.

Talks and tutorials

Future of Kerchunk

In the future, Kerchunk will be split into upstream functionality in Zarr itself and a new VirtualiZarr package.

Platforms

  • Arraylake: a data lake platform based on Zarr. The company, Earthmover was started by core Zarr developers.

Articles

Talks & Videos

Existing lists

Talks

Life sciences

Zarr has seen great adoption in the life sciences domain.

  • bdz: Zarr-based format for storing quantitative biosystems dynamics data
  • ome-zarr-py: Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
  • ez_zarr: Easy, high-level access to OME-Zarr filesets
  • hdmf-zarr: Zarr I/O backend for HDMF

Talks and resources

Visualization

Zarr has seen most work on visualization in the bioimaging community:

Topics

Zarr & other array data formats

For a general overview, see

Essentially all other common array data formats can be exposed as Zarr. See Kerchunk.

NetCDF & HDF5

Zarr, NetCDF, and HDF5 are three separate data formats that nonetheless relate to each other in multiple ways.

Resources

COG: Cloud-Optimized GeoTIFF

N5

Zarr and N5 are two similar array data formats that share common goals and development.

The Zarr V3 spec aims to provide a common implementation target (sources: 1, 2)

Links

GeoZarr

GeoZarr is a proposal for a Zarr-based geospatial data format, being submitted as an OGC standard

GeoZarr will define a metadata convention for Zarr stores that contain geospatial data.

It will also define the relationship of Zarr with CF and NetCDF

Links

Zarr & STAC

STAC provides a common structure for describing and cataloging spatiotemporal assets.

With its hierarchical structure and key-value metadata support, Zarr's capabilities overlap significantly with STAC.

The communities have not yet converged on a canonical representation of Zarr datasets through STAC.

Today, a good example of exposing Zarr in STAC is Planetary Computer

More discussion & Related links

In the future, the Zarr V3 Spec and GeoZarr convention will likely enable greater interoperability between STAC and Zarr.