-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect granule sizes #785
Comments
Unfortunately, this cannot be computed unambiguously because the supplied size values are not necessarily consistently computed. For example, some providers might compute an MB value as Unfortunately, this was a poor design decision in the UMM, and it should have simply been designed such that the size reported in the metadata is always bytes (which also avoids rounding errors, even if we know for sure what to multiply by). This is why the UMM was later modified to include a SizeInBytes metadata value. UMM-G v1.6 added SizeInBytes. See the description in the schema, which describes exactly this problem. With that said, I don't currently have a suggestion for a sensible solution to this. Further, even if the above were not the case (i.e., even without any ambiguity), I'm not sure that computing the "size" as the sum of individual sizes makes sense. I suppose it might make sense if you want to know the total volume that would be downloaded if all files in the granule were downloaded, but I'm not sure that's a common use case. Even if that is a common use case, I would think we should also include a mechanism for users to obtain individual file sizes as well (again, ignoring the ambiguity mentioned above). One path to explore for the size ambiguity might be to provide some sort of In addition, we might want to provide some sort of |
What about having |
I think what we must first do is clearly define the use cases and requirements around the use of any type of size "computation" we want to support. Without gaining some clarity around what we want/need, there's little sense in discussing how to implement anything. What specifically do we want to support/provide through a |
How does Earthdata Search currently handle granule size estimation? I know that they provide an estimated size upon ordering (see screenshot). Maybe we could leverage their work? https://github.com/nasa/earthdata-search |
Great suggestion @asteiker! this is what they say:
And they seem to convert units into a common unit https://github.com/nasa/earthdata-search/blob/619d533e53906550ed6428162c25b4878d858768/static/src/js/util/project.js#L8 (there is more code) So I think we should follow a similar logic to have consistency on what users see in the NASA portal. Maybe we can be even more accurate with the size when we have that data available. And this relates to a conversation we had @chuckwondo about having some lazy loading of the results, I don't remember very well if we talked/covered using a "resultset" class where we could paginate the results from CMR etc. As for now I think we should implement the following: If a granule has complete metadata on size and units, we should sum them up and report them to the user in |
earthaccess is reporting incorrect granule sizes because we are not correctly parsing the UMM path that should contain the size. Usually granules map to a single file and if the granule metadata contains the size in MB all works as expected, if the granule contains multiple files and/or the units are not MB, the reported size will be incorrect.
This issue was reported by David Giles
Example:
This is granule is a great example, contains multiple files in different units. The correct size should be ~47MB + 8KB
The method tat needs to be updated is
earthaccess/earthaccess/results.py
Line 253 in be1ec48
The text was updated successfully, but these errors were encountered: