-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get db metadata to check for completeness #313
Comments
I understand that you need to check if, in an arkimet dataset, for each day you have data for a whole set of products and levels, on each day (or every 6 hours, or the actual model output interval). Is that understanding correct? |
yes, hourly analysis and forecast, different variables, some superficial, some on different levels (and we go back to 2018, still the reference year for aq). |
I cannot think of any existing functionality to do something that specific out of the box, and it should be reasonably doable with a bit of Python. This is an example script that queries a dataset at regular intervals and checks that there is some data for all intervals. It could be a good base from which to build to check your required combinations of metadata: #!/usr/bin/python3
import argparse
import datetime
from collections import defaultdict
import arkimet as arki
class Instant:
def __init__(self):
self.levels = set()
self.products = set()
class Checker:
def __init__(self):
self.instants = defaultdict(Instant)
def on_metadata(self, md):
reftime = md.to_python("reftime")["time"]
instant = self.instants[reftime]
try:
instant.levels.add(md["level"])
except KeyError:
pass
try:
instant.products.add(md["product"])
except KeyError:
pass
def report(self):
begin = min(self.instants)
until = max(self.instants)
cur = begin
while cur <= until:
try:
instant = self.instants.get(cur)
if instant is None:
print("data missing for reftime", cur)
continue
if not instant.levels:
print("levels missing at reftime", cur)
if not instant.products:
print("products missing at reftime", cur)
finally:
cur = cur + datetime.timedelta(hours=1)
def main():
parser = argparse.ArgumentParser(description="check a dataset for completeness")
parser.add_argument("dataset", action="store", help="Path to the dataset")
args = parser.parse_args()
checker = Checker()
with arki.dataset.Session() as session:
cfg = arki.dataset.read_config(args.dataset)
with session.dataset_reader(cfg=cfg) as ds:
ds.query_data("reftime:every 1 hour", on_metadata=checker.on_metadata)
checker.report()
if __name__ == "__main__":
main() |
Thanks ! I tried to adapt it to my case, but I struggled with the documentation about the metadata and I have a couple of questions: from various prints (print(dir(md))), I discovered:
About the dataset: with arki-query I can get information also about a grib file. Can i use this script with dataset = grib: file.grib as in the command line ? I tried but could not make it work. thanks |
As a general pointer, which doesn't answer your questions at the moment, the existing documentation for the Metadata class in Python can be found here: https://arpa-simc.github.io/arkimet/python/arkimet.html#arkimet.Metadata The dictionary keys for to_python are different for each metadata type (origin, level, ...) and for each style of metadata type (grib product, bufr product, ...). There is no detailed documentation of the representation as a dictionary, and
If you need values for levels you can extract them from the dictionary of You should be able to open a grib file as if it were a dataset by passing its path to import arkimet
with arkimet.dataset.Session() as session:
cfg = arkimet.dataset.read_config("file.grib")
... |
one more question: if arki-query command-line, I can query more dataset by listing them (arki-query '' dataset1 dataset2). Can I put a list of dataset in args.dataset? with arki.dataset.Session() as session: |
ciao,
I would like to check if my db is complete or how much is complete or has missing fields. The db is grib, for a couple of years, selected model levels and variables.
As far as I understand, my solutions are:
arki-query --dump
and then process the text (multiple lines for each datum, very long text for a couple of years db, ...),arki-query --dump --summary --summary-restrict levels
and check everything (shorter text but same procedure multiple times).Is there a way to get an output of only the unique parameters of metadata on a row?
Do you have any advice?
Thanks
Lidia
The text was updated successfully, but these errors were encountered: