-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize ENVO terms #25
Comments
Is this a matter of normalizing ENVO terms to something (more authoritative? better structured? better coverage?) Or is it a matter of normalizing from the NMDC/MIxS schema to ENVO? Or from user-submitted values (intended for NMDC/MIxS) to ENVO? |
name->ID |
let's look at the input table
^^ these are ok. This also conforms to our schema
but look at others
^^ the submitter gave strings not IDs. We want to fix replace aquatic with ENVO ID for aquatic biome replace saline water with ENVO ID for aquatic biome I think "pacific ocean" is just the wrong string for env_local_scale for ones that can't be matched, just report and move on replace each string with mixs syntax "LABEL [ENVO:nnnn]" |
@hrshdhgd have you done much with this yet? @wdduncan helped me find relevant input data and utilities and I have been reading about MIxS in general. I think I could do the following now: map unique values from
|
Also @cmungall and others, it seems that |
Thaks @wdduncan I'm curious, but this is probably not relevant to this task: |
@turbomam I'm not sure about the meaning of the <BioSample submission_date="2008-04-04T08:44:24.950" last_update="2019-06-20T16:11:22.271" publication_date="2008-04-04T00:00:00.000" access="public" id="2" accession="SAMN00000002">
<Ids>
<Id db="BioSample" is_primary="1">SAMN00000002</Id>
<Id db="WUGSC" db_label="Sample name">19655</Id>
<Id db="SRA">SRS000002</Id>
</Ids>
....
</Biosample> In this case the |
@turbomam, by
I have not yet. I think that seems like a good plan.
I'm guessing a
Something we'll need to discuss further
There is a field named |
I also just noticed that the |
I have been working on runNER some more and I have added the following features:
Question: @cmungall , while adding the MIxS syntax in the format - |
Notebook for documenting steps towards normalizing ENVO terms in the 3 columns - env_broad_scale env_medium and env_local_scale
Notebook for documenting steps towards normalizing ENVO terms in the 3 columns - env_broad_scale env_medium and env_local_scale
These are mostly strings. Some do not correspond to a class label, e.g. 'tundra'
There should be a repair step that gets the IDs. I suggest a denormalized/flattened schema where we append _id onto the field name, e.g. env_local_scale_id=ENVO:nnnn. In the NMDC/MIxS schema this is a compound object
The text was updated successfully, but these errors were encountered: