More flexible HDF5 support (includes h5py) #290
Closed
d-chambers
started this conversation in
Ideas
Replies: 1 comment
-
closed by #310 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In my experience (although I haven't tested it lately), pytables is faster than h5py for common read/write operations, and our current indexing scheme makes heavy use of pytables database-like features. However, pytables doesn't support all of the hdf5 datatypes we encounter in DAS files. For example, when reading a terra15 file spews about 8 warnings like this:
This is because pytables doesn't support enumerations. These warnings are annoying, but as long as the user doesn't need the value of one of these attributes it can be worked around. However, as I was trying to add support for the Brady Hot springs data (see #278), I found those HDF5 files use linked coordinates, which pytables does not support. As a result, these files simply cannot be read, or at least some important coordinates cannot be accessed.
To remedy both of these issues I propose the following:
We take h5py as an official dependency. It appears it only requires numpy, so it should only add a single dependency.
We create an abstraction in dascore/utils/hdf5 which unifies data and attribute access for both h5py and pytables (so that either one can be used as a "backend" but the class behaves the same regardless). We could create two called
HDF5Reader
andHDF5Writer
, and these would behave the same as pytables or h5py open files.We modify the stream/buffer reuse stuff mentioned here to return instances of
HDF5Reader
orHDF5Writer
rather than apytables.File
. We could then make it so the reader can be specified as a parameter. EGBeta Was this translation helpful? Give feedback.
All reactions