-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Represent enzymatic complex enablers according to GO-CAM spec #302
Comments
Yes! We have treated case 2 as obviously true for a long time, but I'm not sure the code to enable it has ever been implemented in the GO-CAM conversion process. And when it is implemented, it should also generate a report of all the catalystActivity instances that it fixed in this way, to be fed back to Reactome to patch the Reactome annotations that are the single source of truth here. @dustine32 ? |
Also, asking @huaiyumi for any examples of active subunit annotation in Reactome that I can look for in the BioPAX. |
There are 1784 catalystActivity instances in our central database whose activeUnit slot is not null. Let me figure out who to ask here to get you a table of the subset of these instances that has actually been released. We should be able to generate a table of the dbID of each instance, its physicalEntity (the complex), its activeUnit (the individual EWAS gene product), and the dbID and name of the reaction in which it occurs. Are there other attributes you'd want in the table? |
@dustine32 but meanwhile, here is a short arbitrary list of catalystActivity instances whose physicalEntity is a heteromeric complex and whose activeUnit is a protein monomer, as a starting point to begin to explore the BioPAX to see what can be done on point 2, above, and what a useful format would be for bulk processing. https://reactome.org/content/schema/instance/browser/1806156 Each URL points to a page that lists the names and dbIDs of the heteromeric complex, the protein monomer activeUnit, and the reaction that the caqtalystActivity mediates. I can also make a list of samples of catalystActivity instances where the physicalEntity is a set of heteromeric complexes and the activeUnit is a set of monomers or a set of subcomplexes, also of cases where the heteromeric complex involves both protein and non-protein (RNA or DNA or small-molecule) subunits, if any of those are of interest. I hope, from this test material, we can figure out what you need in a comprehensive list. |
@dustine32 Does the catalyst activity here help? R-HSA-21271 |
@deustp01 @ukemi Thank you for these examples! I don't really need the full list of all activeUnits as these few helped me find where in the BioPAX I can expect to find them. An example for reaction R-HSA-1675883:
Here, the |
If I understand what you're saying correctly, yes, if you look at the instancebrowser view for the EWAS PI4KB [Golgi membrane] (Homo sapiens) its role as the activeUnit of a complex involved in catalysis is shown as a comment. But if you come at the annotation from the other direction - start with the catalystActivity instance 1-phosphatidylinositol 4-kinase activity of ARF1/3:GTP:PI4KB [Golgi membrane] then its role is shown as an attribute. Or am I misunderstanding the problem? Also, it makes sense to me to work starting from reactions that have associated catalystActivities, systematically looking at what the physicalEntity of each catalystActivity is, and if that physicalEntity is not at EWAS or set of EWASs, then proceed further to see if it fits case 2 above. Also also, if a by-product of this survey were a list of catalystActivites where the physicalEntity is a complex or set of complexes but the activeUnit slot is null, that list would be the starting point for re-curation to fill the empty slots. And if in each case the components of the complex could be checked in the central GO annotation file to see if any have been assigned the same GO molecular function as Reactome has assigned to the whole complex, that would make the re-curation process at Reactome much faster and more reliable. @ukemi I know we talked about something like this with Ben Good; I don;t know how close he got to implementing it. Or does this last part duplicate work you've already done to generate the tables described in #296 (which I haven't looked at yet)? |
That failed - in many cases the catalystActivity of the whole complex has been assigned to all of its protein components - but perhaps re-doing it with the cleaned-up fly set of complex component functions would yield good results. |
Summarizing the discussion so far as a to-do list.
|
@dustine32 @ukemi just to document the current state / need, here is an active unit in the first reaction of the "carnitine biosynthesis" pathway and in the derived GO-CAM. The physical entity is a complex involving one copy of one gene product and one copy each of a couple of chemical entities. Can the GO-CAM generation script be re-done to identify the gene product and make it the enabler? Or if that is hard or dangerous, can we plan to generate a list (partial is OK to start) of the number 2 cases, that we can use to figure out how to bulk-edit the Reactome annotations to add the missing activeUnit annotations, so that the existing GO-CAM genertion script can use them? (A practical issue here is whether David and I, as we (re)curate pathways should add this information manually as part of our work, or leave it out because a script will soon be available to do it automatically? |
@dustine32 I think this proposal is directly in line with what we had talked about on the call today. I think you have already done it for when there is a single GP as the enabler, but we should do it with the complexes too. For these cases, would it be possible to use the UniProt GCRP identifiers instead of a REACTO id? We need to start weeding away the Reacto identifiers. |
Hi @dustine32. Note that part of the requirement in the second point of this post is to ignore small molecules.
This is consistent with the discussion in #327. |
For Reactome complexes that are controllers of enzymatic reactions, these should be aligned with GO-CAM to specify gene product IDs and not PRO IDs. To do this, we will need to handle two different cases:
The text was updated successfully, but these errors were encountered: