Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broad/narrow xref qualifiers are not exported in OBO JSON format #28558

Open
balhoff opened this issue Jul 22, 2024 · 25 comments
Open

broad/narrow xref qualifiers are not exported in OBO JSON format #28558

balhoff opened this issue Jul 22, 2024 · 25 comments
Assignees

Comments

@balhoff
Copy link
Member

balhoff commented Jul 22, 2024

These are included in go-plus.owl, but not go-plus.json.

@balhoff balhoff self-assigned this Jul 22, 2024
@cmungall
Copy link
Member

See also geneontology/obographs#102

@pgaudet
Copy link
Contributor

pgaudet commented Aug 20, 2024

Hi @balhoff

RHEA folks also noted that in the go.owl and go.obo, only broad and exact RHEAs are exported, but narrow are missing.

Could all xrefs be added to all files?

Thanks, Pascale

@parit
Copy link

parit commented Aug 20, 2024

An example for GO-Rhea narrowMatch would be GO:0004126 -> RHEA:13433 and RHEA:16069

@balhoff
Copy link
Member Author

balhoff commented Aug 20, 2024

@pgaudet this is on purpose: #20770 (comment)

Should we change the policy? In the past all trailing qualifiers were removed from the main release, so we filtered narrowMatch xrefs before that happened. But now that I look at go.owl and go-basic.obo, I see trailing qualifiers. To be honest I'm not sure how that changed. So perhaps we should just stop filtering the narrow xrefs.

@balhoff
Copy link
Member Author

balhoff commented Aug 20, 2024

Removal of trailing qualifiers is supposed to happen due to this line:

ontology-release-runner --catalog-xml catalog-v001.xml --ignoreLock --skip-release-folder --skip-format owx --skip-format metadata --outdir $(BUILD_DIR) --allow-overwrite --asserted --simple --no-reasoner --remove-trailing-qualifiers $< &&\

@pgaudet
Copy link
Contributor

pgaudet commented Aug 20, 2024

It seems we didn't do what was requested:

probably NARROW and EXACT synonyms are OK; but we may not want to include BROAD synonyms for mappings.

We dont have NARROWs but we have BROADs .

In the normal files we should have all mappings, making sure to include types, and we can discuss making simpler files if needed.

Does that make sense?

@balhoff
Copy link
Member Author

balhoff commented Aug 20, 2024

It seems we didn't do what was requested:

probably NARROW and EXACT synonyms are OK; but we may not want to include BROAD synonyms for mappings.

We dont have NARROWs but we have BROADs .

The mapping files are generated before the narrows are removed. My understanding is that @cmungall wanted the narrow xrefs to remain in the mappings files. But probably I should have revisited that workflow when we started using 'broad' as well.

@pgaudet
Copy link
Contributor

pgaudet commented Aug 20, 2024

But NARROW are valid to make GO annotations. Why would we remove those?

@balhoff
Copy link
Member Author

balhoff commented Aug 20, 2024

We have been leaving them in the mapping file, but removing them from the ontology, since the stripping of trailing qualifiers was confusing users (looking like there were several equivalent xrefs). But now it seems like the trailing qualifiers are not stripped anyway, which I don't understand after checking the makefile (this was not changed).

@pgaudet
Copy link
Contributor

pgaudet commented Aug 20, 2024

I'll put this on next Monday's call.

@pgaudet
Copy link
Contributor

pgaudet commented Aug 20, 2024

The other issue is that the mapping file doesn't contain the mapping type, so it is not possible to use that file to make annotations.

@sjm41
Copy link
Contributor

sjm41 commented Aug 22, 2024

These are included in go-plus.owl, but not go-plus.json.

Looks like the broad/narrow/exact/related xref qualifiers are also omitted from the go-basic.obo file, which is the one that FlyBase consumes. We want/need to be able to filter xrefs based on these qualifiers, so we'd also want them to appear in the go-basic.obo file.

@pgaudet
Copy link
Contributor

pgaudet commented Aug 22, 2024

Right - we need all the cross-references in all the files, otherwise it creates more confusion.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 5, 2024

Fixed
see got example GO:0061775 - RHEA:13065 [Broad] : not in the mapping file anymore
https://release.geneontology.org/2024-11-03/ontology/external2go/rhea2go

Not in the go-basic.obo file
[Term]
id: GO:0061775
name: cohesin loader activity
namespace: molecular_function
def: "Facilitating a conformational change to load a cohesin complex around sister chromatids." [GOC:vw, PMID:26687354]
synonym: "cohesin loading activity" EXACT []
is_a: GO:0140097 ! catalytic activity, acting on DNA

Not in the go.obo file
[Term]
id: GO:0061775
name: cohesin loader activity
namespace: molecular_function
def: "Facilitating a conformational change to load a cohesin complex around sister chromatids." [GOC:vw, PMID:26687354]
synonym: "cohesin loading activity" EXACT []
is_a: GO:0140097 ! catalytic activity, acting on DNA
property_value: term_tracker_item #14205 xsd:anyURI
property_value: term_tracker_item #21700 xsd:anyURI
property_value: term_tracker_item #23400 xsd:anyURI
property_value: term_tracker_item #28520 xsd:anyURI
created_by: dph
creation_date: 2016-07-13T12:54:43Z


This just needs to be applied at the next GOA release (planned for Dec 2024), and will then trickle through to GOC, in early 2025.

@sjm41
Copy link
Contributor

sjm41 commented Dec 5, 2024

@pgaudet Can I check what's going to happen with the narrowMatch xrefs, specifically EC xrefs in the go-basic.obo file - which of the following is correct?:

  1. narrowMatch xrefs will continue to be excluded from go-basic.obo (as now)
  2. narrowMatch xrefs are going to be added to go-basic.obo, and will be tagged with a narrowMatch qualifier
  3. narrowMatch xrefs are going to be added to go-basic.obo, but won't be tagged with a narrowMatch qualifier so will be indistinguishable from exactMatch xrefs in this file.

Thanks.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 5, 2024

#3 narrowMatch xrefs are going to be added to go-basic.obo, but won't be tagged with a narrowMatch qualifier so will be indistinguishable from exactMatch xrefs in this file.

The type will only be present in go-plus, as a skos property

@balhoff

@sjm41
Copy link
Contributor

sjm41 commented Dec 5, 2024

#3 narrowMatch xrefs are going to be added to go-basic.obo, but won't be tagged with a narrowMatch qualifier so will be indistinguishable from exactMatch xrefs in this file.

I fear that will be confusing for consumers of go-basic.obo, since some GO terms will get multiple EC/RHEA/MetaCyc xrefs with no indication of why or how they differ, or which is the accurate 1:1 mapping.
Are we sure we want/need to include untyped narrowMatch xrefs in this file?

Including them will certainly be a problem for FlyBase, since we run a little GO2EC and GO2RHEA pipeline where we take our GO-MF annotations and computationally add an EC and RHEA annotation to GO-annotated genes, based on the xrefs in the go-basic file. To do this accurately, we need to use a file that either has only exactMatches or has additional xref types that are tagged with their type so we can filter them out.

If untyped narrowMatch xrefs are to be added to go-basic.obo, I wonder if @balhoff could generate a new mapping file (GO2EC/RHEA/MetaCyc) that includes just exactMatch xrefs that FlyBase (and any other interested group) could ingest to compute accurate GO2EC/RHEA/MetaCyc annotations based on their GO annotation set?

@pgaudet
Copy link
Contributor

pgaudet commented Dec 5, 2024

Can you use the go-plus file? This will be included as a skos property.

Also, can you not rely on UniProt to provide the RHEA and EC mappings?

@balhoff
Copy link
Member Author

balhoff commented Dec 5, 2024

One thing I want to add to @pgaudet's comment above is that we can include the skos relationships in the OBO files. There are some technical dependencies preventing me from adding those quite yet. @sjm41 when those are available would it be hard for you to use those values instead of xref?

@sjm41
Copy link
Contributor

sjm41 commented Dec 6, 2024

Can you use the go-plus file? This will be included as a skos property.

FB currently only loads the go-basic file, and I was told that switching to or additionally using another version of go would be too higher dev cost (especially in current climate of transitioning to Alliance....)

Also, can you not rely on UniProt to provide the RHEA and EC mappings?

Most D. melanogaster proteome is still unreviewed/TrEMBL only, so many computed RHEA/EC mappings are wrong because of upstream issues. Also, I have no control over UniProt assigned RHEA/EC annotations, whereas I do have control by doing the mapping via GO annotations - and this also ensures consistency and synchrony.

One thing I want to add to @pgaudet's comment above is that we can include the skos relationships in the OBO files. There are some technical dependencies preventing me from adding those quite yet. @sjm41 when those are available would it be hard for you to use those values instead of xref?

That seems it would work, but the skos relationships would be still appear on the xref line of the OBO file, right? That is, they would appear like this:
xref: EC:1.1.1.38 {source="skos:broadMatch”}
xref: EC:1.1.1.40 {source="skos:broadMatch”}
xref: EC:4.1.1.112 {source="skos:exactMatch”}

I can understand the reason for adding all narrowMatch xref (untyped) to the EC2GO and rhea2GO mapping files, but including untyped narrowMatch xrefs to the OBO files just seems confusing to me. What is the reason/advantage of doing this?

@pgaudet
Copy link
Contributor

pgaudet commented Dec 6, 2024

What is the reason/advantage of doing this?

This allows RHEA > GO mapping, and for this, narrowMatches are safe.

However I didn't realize anyone did GO > RHEA mapping from this. But they still seem OK in that direction as well? For example: GO:0047429 'nucleoside triphosphate diphosphatase activity'
has narrow mappings to

  • RHEA:23996 a ribonucleoside 5'-triphosphate + H2O = a ribonucleoside 5'-phosphate + diphosphate + H+
    and
  • RHEA:44644 a 2'-deoxyribonucleoside 5'-triphosphate + H2O = a 2'-deoxyribonucleoside 5'-phosphate + diphosphate + H+

These seem correct, and better than no mapping?

Also, if we remove these mappings from the OBO files, many terms end up with no mappings at all; I am not sure that's better?

We would also loose all the reactions that differ based on NAD/NADP.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 6, 2024

@balhoff Can you tell how many terms have narrow, broad and exact/nothing?

@sjm41
Copy link
Contributor

sjm41 commented Dec 6, 2024

This allows RHEA > GO mapping, and for this, narrowMatches are safe.

I thought the separate rhea2GO mapping file was used for that purpose, rather than the obo file? Or maybe you're saying that the rhea2GO mapping file is produced from the obo file, so the narrowMatch xrefs need to be in the obo file?

However I didn't realize anyone did GO > RHEA mapping from this. But they still seem OK in that direction as well? For example: GO:0047429 'nucleoside triphosphate diphosphatase activity'
has narrow mappings to
RHEA:23996 a ribonucleoside 5'-triphosphate + H2O = a ribonucleoside 5'-phosphate + diphosphate + H+
and
RHEA:44644 a 2'-deoxyribonucleoside 5'-triphosphate + H2O = a 2'-deoxyribonucleoside 5'-phosphate + diphosphate + H+
These seem correct, and better than no mapping?

Hmm, yes that example (and the NAD/NADP examples you mention) would work OK.
For the GO:0047429 case, there's an exactMatch to EC:3.6.1.9 (that covers both ribonucleoside and deoxyribonucleoside substrates), and narrowMatches to the two RHEAs specific for each substrate.

But I was thinking about a GO term like GO:0008239 dipeptidyl-peptidase activity. This has several narrowMatch EC xrefs, that are currently not in go-basic.obo, but would be added under the current proposal:
xref: EC:3.4.14.1 {source="skos:narrowMatch"}
xref: EC:3.4.14.11 {source="skos:narrowMatch"}
xref: EC:3.4.14.2 {source="skos:narrowMatch"}
xref: EC:3.4.14.4 {source="skos:narrowMatch"}
xref: EC:3.4.14.5 {source="skos:narrowMatch"}
These narrowMatch work OK in the EC->GO direction, but not in a GO->EC direction. That is, it would not be accurate to give a gene/protein annotated with the general "dipeptidyl-peptidase activity" all these more specific ECs:
EC:3.4.14.1 dipeptidyl-peptidase I
EC:3.4.14.11 Xaa-Pro dipeptidyl-peptidase
EC:3.4.14.2 dipeptidyl-peptidase II
EC:3.4.14.4 dipeptidyl-peptidase III
EC:3.4.14.5 dipeptidyl-peptidase IV

You may well be right, and overall it's 'better' for most users to include rather than exclude narrowMatch xrefs in the obo file.
But I do think we should include the 'narrowMatch' tags in the obo file so that users can identify them as such and choose to filter them out if required/desired.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 6, 2024

Good point!

These are weird ones that we keep to be exhaustive WRT EC.

How about making them 'related'? Related synonyms are not exported (like Broad).

@sjm41
Copy link
Contributor

sjm41 commented Dec 9, 2024

These are weird ones that we keep to be exhaustive WRT EC.
How about making them 'related'? Related synonyms are not exported (like Broad).

Yes, that should work to solve my immediate issue, and without the FlyBase devs having to change anything about our pipeline.

But can we clearly articulate when a synonym should be made 'related' rather 'narrow'. That is, can we define 'weird'?

I think the EC (and MetaCyc) xrefs on this recently merged term (based on #28380 (comment)) would also be made 'related' by the same argument:
id: GO:0008467
name: [heparan sulfate]-glucosamine 3-sulfotransferase activity
xref: EC:2.8.2.23 {source="skos:narrowMatch"} = [heparan sulfate]-glucosamine 3-sulfotransferase 1
xref: EC:2.8.2.29 {source="skos:narrowMatch"} = [heparan sulfate]-glucosamine 3-sulfotransferase 2
xref: EC:2.8.2.30 {source="skos:narrowMatch"} = [heparan sulfate]-glucosamine 3-sulfotransferase 3
xref: MetaCyc:2.8.2.23-RXN {source="skos:narrowMatch"}
xref: MetaCyc:2.8.2.29-RXN {source="skos:narrowMatch"}
xref: MetaCyc:2.8.2.30-RXN {source="skos:narrowMatch"}
xref: RHEA:15461 {source="skos:exactMatch"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants