Understanding the source (and quality) of Inferred from Electronic Annotation (IEA) #369
Replies: 3 comments
-
Generally, what we refer to as the source is the group/entity that made the annotation. For this one, the source (the group/database which sent it to GO and is responsible for maintaining it) would be the entry in column 15 (Assigned By), which does happen to be InterPro- but the Assigned By database does NOT have to match the with/from identifier. The with/from field contains an identifier (which in theory could be from any of the databases we recognise) and in this case the identifier happens to be one from InterPro. Databases tend to use internal-to-them identifiers (SGD will use an SGD identifier when possible, etc.), as they typically don't want to add an additional step of converting them, or the alternative isn't exact- some internal identifiers may be more specific: see MGI:5206529 and MGI:3837824 vs. an outside identifier, like P11087 or CO1A1_MOUSE). Some groups may not have internal identifiers for some pieces of information already captured elsewhere, like EC numbers. So, no, the information in the with/from does not necessarily mean the database identifier used in that field is also the group/method of making the annotation. The group making the annotation may be using an identifier from another database to describe the information used to make the annotation, such as a hypothetical annotation where InterPro specifies SGD:S000001855 in column 8 to indicate the annotation compared the sequence of S. cerevisiae ACT1 to the annotated entity to make a conclusion about the protein's function. As for the method, you must look at the Reference field. GO_REF:0000004 indicates the annotation was based on UniProt keyword mapping. GO_REF:0000002 indicates the annotation was made by associating GO terms with InterPro records. GO_REF:0000104 involves UniRule, where GO terms are manually assigned to each rule in UniRule. Again, see https://github.com/geneontology/go-site/tree/master/metadata/gorefs#readme for the full descriptions. |
Beta Was this translation helpful? Give feedback.
-
@pgaudet or @thomaspd would you like to address the user's first question:
|
Beta Was this translation helpful? Give feedback.
-
Hi @suzialeksander Intuitively that seems reasonable, however AFAIK this was never benchmarked, so it's not really possible to make this statement, unless qualified by "we assume that but have no evidence for"... |
Beta Was this translation helpful? Give feedback.
-
A User has written into our Helpdesk email:
Our goal is to understand the source (and quality) of the Inferred from Electronic Annotation (IEA) entries.
I was trying to find the field that will encode the source/method of the IEA entry.
based on http://geneontology.org/docs/guide-go-evidence-codes/
I see that the three main sources for the IEA entries are:
Can I assume they are numbered 1 to 3 following a decrease in accuracy/reliability?
The list I've attached [above] contains all the unique categories of "with/from fields" (without the details following the colon, as you've noticed).
What about all those other fields that we've seen? how are they generated electronically?
From what I understood the "with/from field" should potentially encode detail about the source of the IEA http://wiki.geneontology.org/index.php/Inferred_from_Electronic_Annotation_(IEA)
For example the following IEA entry:
UniProtKB A0A011N8T2 mpl involved_in GO:0071555 GO_REF:0000002 IEA InterPro:IPR005757 P UDP-N-acetylmuramate--L-alanyl-gamma-D-glutamyl-meso-2,6-diaminoheptandioate ligase mpl|AW10_02615 protein taxon:1454003 20210612 InterPro
with/from field: InterPro:IPR005757: does the InterPro encodes that the source is InterPro2GO?
which filed in the last entry from? "InterPro": is this the filed that encodes the source?
How can I find out that the entry is created following " UniProt controlled vocabulary terms" (2nd IEA source type)?
I can see several "with/from field" categories that contain this title:
UniProtKB-SubCell
UniProtKB-KW
UniProtKB
For example this entry:
UniProtKB A0A009KJV3 rplA enables GO:0003723 GO_REF:0000043 IEA UniProtKB-KW:KW-0694 F 50S ribosomal protein L1 rplA|J518_2488 protein taxon:1310619 20210612 UniProt
Does the last field "UniProt" specifies that the source corresponds to UniProt controlled vocabulary term?
Is "UniRule" related to UniProt entries?
Can I assume that the "ensembl" with/from filed category specifies that the it is inferred from Ensembl gene trees (3rd IEA source type)?
Beta Was this translation helpful? Give feedback.
All reactions