Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement aggregate QC flags to meet NDBC GTS Ingest requirements #277

Closed
kbailey-noaa opened this issue Aug 31, 2023 · 21 comments
Closed
Assignees
Labels

Comments

@kbailey-noaa
Copy link
Contributor

NDBC currently does not read any glider QC flags as part of the GDAC harvest of real-time T and S data that they deliver to the GTS.

In order for NDBC to read QC flags, the GDAC must implement the QARTOD “Aggregate/Rollup” flag variable, following the IOOS Metadata Profile requirements:
https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#quality-controlqartod
and
https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#requirements-for-the-qartod-aggregaterollup-flag

NDBC will only examine contents of the aggregate_quality_flag variable for filtering purposes for GTS harvest. NDBC will not (and should not) read detailed QC flags.

Rules governing the ‘Aggregate/Rollup’ flag variable:

The value of the variable is the UNESCO/QARTOD v2 convention:
9 = Missing
2 = Not Eval
1 = Pass
3 = Suspect
4 = Fail
The value of the rollup flag should be the worst result out of all of the individual tests. It’s called a “Summary Flag” in the QARTOD Data Flags manual (pg 3).
Here’s how it’s done in the ioos_qc library with the corresponding test.
The variable should have a standard name attribute of aggregate_quality_flag. See the QARTOD section for more information.
NDBC should exclude any values that are QC fail (and missing) but include everything else (not eval, pass, suspect)

Please implement these aggregate flags at least for real-time T and S, as these are the only variables that NDBC is harvesting and delivering to the GTS.

NDBC POCs: Dawn Petraitis and Bill Smith
IOOS DMAC POC: Micah Wengren

@benjwadams
Copy link
Contributor

Variable already exists in underlying NetCDF dataset, called qartod_%(name)s_primary_flag: https://github.com/ioos/glider-dac/blob/main/glider_qc/glider_qc.py#L338-L366

Needs to be added to ERDDAP datasets.xml via modification of scripts/build_erddap_catalog.py script.

@leilabbb
Copy link
Contributor

leilabbb commented Sep 6, 2023

The variable qartod_%(name)s_primary_flag does exist in the ERDDAP datasets.xml so there is no need to modify the build_erddap_catalog.py script

Modifications need to be applied to the glider_qc.py : the flag_meaning and standard_name attributes need to be replaced to adhere to the IOOS metadata requirements.

@kerfoot
Copy link
Contributor

kerfoot commented Sep 6, 2023

I still do not understand what is going on with this. Here is the link to all data sets that have updated since September 1, 2023:

Data Sets with min_time set to 2023-09-01

49 data sets are returned. Now, if I leave the date as September 1, 2023 AND add that the data set must contain the qartod_temperature_primary_flag:

Data Sets with min_time = 2023-09-01 AND contains qartod_tempreature_primary_flag

The number of data sets returned goes from 49 to 1.

Question: why do the majority of real-time data sets not include the qartod* flags?

@leilabbb
Copy link
Contributor

leilabbb commented Oct 3, 2023

Update on progress made:

  • the glider_qc.py script is using the ioos-qc library to create QARTOD test results for geophysical / legacy variables [pressure, conductivity, temperature, salinity, density].

  • the QARTOD rollup qc variable with the name "qartod_[legacy variable]_primary_flag" and standard_name "aggregate_quality_flag" is created for every legacy variable existing in the netcdf glider file.

  • the aggregate qc flags uses the ioos-qc the 'aggregate' and 'qartod_compare' methods to set the resultant rollup qc array of all tests run. The rollup up array takes the worst flag of a data point with the worst flag being 4 = FAIL in the following order:
    9 = Missing
    2 = Not Eval
    1 = Pass
    3 = Suspect
    4 = Fail

  • The qc tests implemented and working are:
    'gross_range_test'
    'qc_rollup'

  • The qc tests implemented and need more testing are:
    'spike_test'
    'rate_of_change_test'

  • The qc tests implemented and not working is:
    'flat_line_test',

see pull request #289

@leilabbb
Copy link
Contributor

Update:
The qc tests were tested further and the expected results were verified. Additional conditional statements need to be added to the qc method to handle special cases of data arrays.

@kerfoot
Copy link
Contributor

kerfoot commented Oct 12, 2023

The details of this were discussed on the 2023-10-12 tech meeting. Many of the qc tests are based on the assumption that the time data array is monotonically increasing. This means:

  1. There are no invalid timestamps (i.e.: t == 0 or t that occurs before or after the deployment)
  2. There are no duplicate timestamps
  3. The timestamps are monotonically ascending

If these assumptions are not met, then many of the tests (flat line, spike, etc.) cannot provide an accurate qc assessment. There are many examples from previously submitted data sets in which this assumption is not met.

We discussed implementing a new test, run prior to all other qc tests, that checks for a monotonically ascending time array. If any of the conditions listed above are not met, there are a couple of options:

  1. Adding a global attribute that specifies whether this condition has been met, ie.:
    NC_GLOBAL:valid_times = "True"
    or:
    NC_GLOBAL:valid_times = "False"
  2. Removing the offending file(s) from the aggregation

If the value for the attribute in 1 is True, then we proceed with all subsequent qc tests. This option is much easier to implement from a processing perspective. The decision was made to evaluate these options further and then make a decision during the following tech call.

@leilabbb
Copy link
Contributor

leilabbb commented Dec 6, 2023

To add to the above discussion, there is also the case of geophysical variables associated with incorrect standard names that need to be addressed to explain why QARTOD is not run on these variables. Example: https://gliders.ioos.us/erddap/info/bass-20150827T1909/index.html

@kbailey-noaa
Copy link
Contributor Author

Can someone please provide a status update for this issue? We (IOOS) have a meeting with NDBC reps next week, and this will probably come up.

@leilabbb
Copy link
Contributor

leilabbb commented Feb 6, 2024

#277 should be marked as complete.
Creating a new issue to address challenges encountered in implementing the QC test on the data arrays. See #318

@kbailey-noaa
Copy link
Contributor Author

@leilabbb Closing this issue means that the aggregate QC flags are implemented for T and S, following the IOOS Metadata Profile requirements, and that we are ready for NDBC to test. Is this the case?

@leilabbb
Copy link
Contributor

Yes.
The aggregate QC flag is created for T and S unless the variable does not exist in the file or its array is all NaN or Fill Values.

Example Variable "qartod_temperature_primary_flag"
In file "ru39_20231203T182905Z_rt.nc" (

<class 'netCDF4._netCDF4.Variable'>
int8 qartod_temperature_primary_flag(time)
valid_min: 1
valid_max: 9
_FillValue: 9
units: 1
flag_meanings: PASS NOT_EVALUATED SUSPECT FAIL MISSING
flag_values: [1 2 3 4 9]
ioos_category: Quality
qartod_test: 'qc_rollup'
standard_name: aggregate_quality_flag
long_name: QARTOD Primary Flag for sea_water_temperature
qartod_package: https://github.com/ioos/ioos_qc/blob/main/ioos_qc/qartod.py
references: http://gliders.ioos.us/static/pdf/Manual-for-QC-of-Glider-Data_05_09_16.pdf
qartod_config: {'gross_range_test': {'suspect_span': [0, 35], 'fail_span': [-2, 40]}, 'spike_test': {'suspect_threshold':
0.02396099641919136, 'fail_threshold': 0.04792199283838272}, 'rate_of_change_test': {'threshold':
0.03594149462878704}, 'flat_line_test': {'tolerance': 1, 'suspect_threshold': 3600, 'fail_threshold':
9000}}

The values of the flags:
masked_array(data=[1, --, --, 1, --, --, 1, --, 1, --, --, --, 1, --, 1,
--, --, 1, --, --, 1, --, 1, --, 1, --, --, 1, --, --,
--, 1, --, --, 1, --, --, 1, --, 1, --, --, 1, --, --,
--, --, 1, --, 1, --, --, 1, --, --, 1, --, 1, --, --,
1, --, --, --, 1, --, 1],
mask=[False, True, True, False, True, True, False, True,
False, True, True, True, False, True, False, True,
True, False, True, True, False, True, False, True,
False, True, True, False, True, True, True, False,
True, True, False, True, True, False, True, False,
True, True, False, True, True, True, True, False,
True, False, True, True, False, True, True, False,
True, False, True, True, False, True, True, True,
False, True, False],
fill_value=9,
dtype=int8)

@kbailey-noaa
Copy link
Contributor Author

@mwengren can you please review?

@kbailey-noaa
Copy link
Contributor Author

@leilabbb can you provide a link to a dataset where this is implemented?

@leilabbb
Copy link
Contributor

@leilabbb can you provide a link to a dataset where this is implemented?

This data set has the primary_flag implemented.

@kbailey-noaa
Copy link
Contributor Author

Thanks.
@mwengren Can you pls review this? Does the variable name matter, or does NDBC just look at the standard name?

I thought we'd expect to see a variable name of, for example, sea_water_temperature_qc_agg but instead I see qartod_temperature_primary_flag. But the standard name is aggregate_quality _flag...

@leilabbb
Copy link
Contributor

I think this was decided early on and was already in the system before I started working on QC. There is this document that has the *_primary_flag variable, which I believe won't break the GTS workflow.

@mwengren
Copy link
Member

The IOOS Metadata Profile rules are based on using CF ancillary variables (so the data variable includes the name of the aggregate QC variable in its ancillary_variables attribute), and the aggregate variable also has standard name aggregate_quality_flag.

At a quick glance, it looks like this is in line with those rules. That should make it easier for NDBC to read Glider files in the same way they do the IOOS Metadata Profile files, but they'd have to confirm that probably.

@kbailey-noaa
Copy link
Contributor Author

Thanks all!
@leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?

@leilabbb
Copy link
Contributor

leilabbb commented Aug 29, 2024

Thanks all!
@leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?

email sent

@sarinamann-noaa
Copy link

@sarinamann-noaa will request feedback from Bill Smith and Dawn Petraitis- can schedule a meeting to get information on their needs

@sarinamann-noaa
Copy link

---------- Forwarded message ---------
From: Dawn Petraitis - NOAA Federal [email protected]
Date: Wed, Nov 13, 2024 at 1:18 PM
Subject: Re: GDAC - Testing the use of QARTOD variables for GTS release
To: BaghdadBrahim, Leila [email protected]
Cc: Sarina Mann - NOAA Affiliate [email protected], Bill Smith - NOAA Affiliate [email protected], Adams, Benjamin [email protected], Moretti, Donald [email protected], Kathleen Bailey - NOAA Federal [email protected]

Hi Leila,
Thanks for the confirmation. I have submitted a ticket to have NDBC do the needed work to adjust our code. Since I just submitted the ticket, I don't have a date for when it will completed. Once I have a date, I'll let everyone know.

The good news is that we haven't seen any issues so far on our end with the addition of the QC aggregate flags to the real time data files.

Thanks,
Dawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants