-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broad/narrow xref qualifiers are not exported in OBO JSON format #28558
Comments
See also geneontology/obographs#102 |
Hi @balhoff RHEA folks also noted that in the go.owl and go.obo, only broad and exact RHEAs are exported, but narrow are missing. Could all xrefs be added to all files? Thanks, Pascale |
An example for GO-Rhea narrowMatch would be GO:0004126 -> RHEA:13433 and RHEA:16069 |
@pgaudet this is on purpose: #20770 (comment) Should we change the policy? In the past all trailing qualifiers were removed from the main release, so we filtered narrowMatch xrefs before that happened. But now that I look at go.owl and go-basic.obo, I see trailing qualifiers. To be honest I'm not sure how that changed. So perhaps we should just stop filtering the narrow xrefs. |
Removal of trailing qualifiers is supposed to happen due to this line: go-ontology/src/ontology/Makefile Line 265 in 617fba5
|
It seems we didn't do what was requested:
We dont have NARROWs but we have BROADs . In the normal files we should have all mappings, making sure to include types, and we can discuss making simpler files if needed. Does that make sense? |
The mapping files are generated before the narrows are removed. My understanding is that @cmungall wanted the narrow xrefs to remain in the mappings files. But probably I should have revisited that workflow when we started using 'broad' as well. |
But NARROW are valid to make GO annotations. Why would we remove those? |
We have been leaving them in the mapping file, but removing them from the ontology, since the stripping of trailing qualifiers was confusing users (looking like there were several equivalent xrefs). But now it seems like the trailing qualifiers are not stripped anyway, which I don't understand after checking the makefile (this was not changed). |
I'll put this on next Monday's call. |
The other issue is that the mapping file doesn't contain the mapping type, so it is not possible to use that file to make annotations. |
Looks like the broad/narrow/exact/related xref qualifiers are also omitted from the go-basic.obo file, which is the one that FlyBase consumes. We want/need to be able to filter xrefs based on these qualifiers, so we'd also want them to appear in the go-basic.obo file. |
Right - we need all the cross-references in all the files, otherwise it creates more confusion. |
Fixed Not in the go-basic.obo file Not in the go.obo file This just needs to be applied at the next GOA release (planned for Dec 2024), and will then trickle through to GOC, in early 2025. |
@pgaudet Can I check what's going to happen with the narrowMatch xrefs, specifically EC xrefs in the go-basic.obo file - which of the following is correct?:
Thanks. |
I fear that will be confusing for consumers of go-basic.obo, since some GO terms will get multiple EC/RHEA/MetaCyc xrefs with no indication of why or how they differ, or which is the accurate 1:1 mapping. Including them will certainly be a problem for FlyBase, since we run a little GO2EC and GO2RHEA pipeline where we take our GO-MF annotations and computationally add an EC and RHEA annotation to GO-annotated genes, based on the xrefs in the go-basic file. To do this accurately, we need to use a file that either has only exactMatches or has additional xref types that are tagged with their type so we can filter them out. If untyped narrowMatch xrefs are to be added to go-basic.obo, I wonder if @balhoff could generate a new mapping file (GO2EC/RHEA/MetaCyc) that includes just exactMatch xrefs that FlyBase (and any other interested group) could ingest to compute accurate GO2EC/RHEA/MetaCyc annotations based on their GO annotation set? |
Can you use the go-plus file? This will be included as a skos property. Also, can you not rely on UniProt to provide the RHEA and EC mappings? |
FB currently only loads the go-basic file, and I was told that switching to or additionally using another version of go would be too higher dev cost (especially in current climate of transitioning to Alliance....)
Most D. melanogaster proteome is still unreviewed/TrEMBL only, so many computed RHEA/EC mappings are wrong because of upstream issues. Also, I have no control over UniProt assigned RHEA/EC annotations, whereas I do have control by doing the mapping via GO annotations - and this also ensures consistency and synchrony.
That seems it would work, but the skos relationships would be still appear on the xref line of the OBO file, right? That is, they would appear like this: I can understand the reason for adding all narrowMatch xref (untyped) to the EC2GO and rhea2GO mapping files, but including untyped narrowMatch xrefs to the OBO files just seems confusing to me. What is the reason/advantage of doing this? |
This allows RHEA > GO mapping, and for this, narrowMatches are safe. However I didn't realize anyone did GO > RHEA mapping from this. But they still seem OK in that direction as well? For example: GO:0047429 'nucleoside triphosphate diphosphatase activity'
These seem correct, and better than no mapping? Also, if we remove these mappings from the OBO files, many terms end up with no mappings at all; I am not sure that's better? We would also loose all the reactions that differ based on NAD/NADP. |
@balhoff Can you tell how many terms have narrow, broad and exact/nothing? |
I thought the separate rhea2GO mapping file was used for that purpose, rather than the obo file? Or maybe you're saying that the rhea2GO mapping file is produced from the obo file, so the narrowMatch xrefs need to be in the obo file?
Hmm, yes that example (and the NAD/NADP examples you mention) would work OK. But I was thinking about a GO term like GO:0008239 dipeptidyl-peptidase activity. This has several narrowMatch EC xrefs, that are currently not in go-basic.obo, but would be added under the current proposal: You may well be right, and overall it's 'better' for most users to include rather than exclude narrowMatch xrefs in the obo file. |
Good point! These are weird ones that we keep to be exhaustive WRT EC. How about making them 'related'? Related synonyms are not exported (like Broad). |
Yes, that should work to solve my immediate issue, and without the FlyBase devs having to change anything about our pipeline. But can we clearly articulate when a synonym should be made 'related' rather 'narrow'. That is, can we define 'weird'? I think the EC (and MetaCyc) xrefs on this recently merged term (based on #28380 (comment)) would also be made 'related' by the same argument: |
These are included in go-plus.owl, but not go-plus.json.
The text was updated successfully, but these errors were encountered: