Reading HDF5 files with Zarr building upon Cloud-Performant NetCDF4/HDF5 Reading with the Zarr Library. This allows for efficiently reading HDF5 files stored remotely, and integration with Zarr-based computation tools.
Requires latest dev installation of h5py, with HDF5>=1.10.5.
Check available HDF5 version:
$ h5cc -showconfig
$ conda install "hdf5>=1.10.5"
Download and install HDF5
e.g.
$ cd hdf5*/bin
$ ./h5redeploy
Follow h5py instructions for custom installation
For example:
$ HDF5_DIR=$CONDA_PREFIX pip install --no-binary=h5py h5py
$ HDF5_DIR=/path/to/hdf5 pip install --no-binary=h5py h5py
$ pip install git+https://github.com/catalystneuro/HDF5Zarr.git
HDF5Zarr can be used to read a local HDF5 file where the datasets are actually read using the Zarr library. Download example dataset from https://girder.dandiarchive.org/api/v1/item/5eda859399f25d97bd27985d/download:
import requests
import os.path as op
file_name = 'sub-699733573_ses-715093703.nwb'
if not op.exists(file_name):
response = requests.get("https://girder.dandiarchive.org/api/v1/item/5eda859399f25d97bd27985d/download")
with open(file_name, mode='wb') as localfile:
localfile.write(response.content)
import zarr
from hdf5zarr import HDF5Zarr
file_name = 'sub-699733573_ses-715093703.nwb'
hdf5_zarr = HDF5Zarr(filename = file_name, store_mode='w', max_chunksize=2*2**20)
zgroup = hdf5_zarr.consolidate_metadata(metadata_key = '.zmetadata')
Without indicating a specific zarr store, zarr uses the default zarr.MemoryStore
.
Alternatively, pass a zarr store such as:
store = zarr.DirectoryStore('storezarr')
hdf5_zarr = HDF5Zarr(file_name, store = store, store_mode = 'w')
Examine structure of file using Zarr tools:
# print dataset names
zgroup.tree()
# read
arr = zgroup['units/spike_times']
val = arr[0:1000]
Once you have a zgroup object, this object can be read by PyNWB using
from hdf5zarr import NWBZARRHDF5IO
io = NWBZARRHDF5IO(mode='r+', file=zgroup)
Export metadata from zarr store to a single json file
import json
metadata_file = 'metadata'
with open(metadata_file, 'w') as mfile:
json.dump(zgroup.store.meta_store, mfile)
Requires a local metadata_file, constructed in previous steps.
import s3fs
from hdf5zarr import NWBZARRHDF5IO
# import metadata from a json file
with open(metadata_file, 'r') as mfile:
store = json.load(mfile)
fs = s3fs.S3FileSystem(anon=True)
f = fs.open('dandiarchive/girder-assetstore/4f/5a/4f5a24f7608041e495c85329dba318b7', 'rb')
hdf5_zarr = HDF5Zarr(f, store = store, store_mode = 'r')
zgroup = hdf5_zarr.zgroup
io = NWBZARRHDF5IO(mode='r', file=zgroup, load_namespaces=True)
Here is the entire workflow for opening a file remotely:
import zarr
import s3fs
from hdf5zarr import HDF5Zarr, NWBZARRHDF5IO
file_name = 'sub-699733573_ses-715093703.nwb'
store = zarr.DirectoryStore('storezarr')
hdf5_zarr = HDF5Zarr(filename = file_name, store=store, store_mode='w', max_chunksize=2*2**20)
zgroup = hdf5_zarr.consolidate_metadata(metadata_key = '.zmetadata')
fs = s3fs.S3FileSystem(anon=True)
f = fs.open('dandiarchive/girder-assetstore/4f/5a/4f5a24f7608041e495c85329dba318b7', 'rb')
hdf5_zarr = HDF5Zarr(f, store = store, store_mode = 'r')
zgroup = hdf5_zarr.zgroup
io = NWBZARRHDF5IO(mode='r', file=zgroup, load_namespaces=True)
nwb = io.read()