-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GO-CAM stats & downloads to the pipeline #1180
Comments
For 2, we will use this query. |
Some details discussed by email: The query to fetch all the GO-CAMs having at least 3 activities connected through causal relationships is this one: Dustin, you could create a method to call that SPARQL query from there: go_stats.py#L132 This would modify the go-stats.json . Then, for this stats to appear in the go-annotation-changes.json , you may need to alter part of this code: go-stats/go_annotation_changes.py#L8 as well as this one go_annotation_changes.py#L180 that is used to create the text/tab report that Pascale checks before the release. If we want that stats to be used from the GO website, then it has to be added also to the go-stats-summary.json (loaded on the front page to show the stats on the top right). You can pick up the stats to add to that file here: go-stats/go_reports.py#L229 |
From 2021-09-14 Alliance Pathways call, add condition to "Total count of GO-CAMs" query to select only modelstate=="production" models. |
Shouldn't only "production" be available on the prod SPARQL endpoint? |
Just be careful not to select the _inferred GO-CAMs in your query, but yes those will only be production models from that triple store |
@kltm Ah yes, you are correct, thanks! Forgot that that is part of producing the endpoint triplestore. Shouldn't be an issue then. @lpalbou Thanks for the heads up about excluding the "_inferred" models! @kltm Would these models also already get excluded from the production triplestore? I couldn't find any containing "inferred" in the title. |
So, if I'm following here (tagging in @balhoff), there may be two types of models in a store: "real" models (noctua-generated and imports) and GAF-derived. The latter are likely to be uninteresting (with imported models being a separate interesting case--they may need to be marked). I guess the idea would be to filter those GAF-derived ones out; it might be worthwhile to look at their creation to see how they can be easily filtered. |
From 2021-11-05 slack #developers discussion: The Alliance site's gene page pathway viewer has a GO-CAMs tab with a number that's currently computed on the fly by a call to the GO-CAM API, which then queries the GO production RDF triplestore: To reduce number of calls to the GO triplestore, we could just precompute this number (or better yet a Adding this brainstorming note here since it's likely the code area where we would be implementing. Tagging @kltm |
Hi dustin. If you want to precompute, I would suggest to start from all the genes in GO-CAMs (less than in the Alliance) and create a dict { GP1 -> [model1, model2] , GP2 -> [model3] ... }. That file could be updated indeed at every release and used by the GO-CAM API since the goal was indeed to only show publicly released models. The out of sync is a good point though... I wonder if go cams couldn't be published every months as .json files on release.geneontology.org ? The S3 could then serve as the source of data and it would be in sync with the cached API. It could help users get access to GO-CAMs as well, especially if the json is already in a format structured around activities ? Have a good week end :) |
Ah thanks so much! It definitely helps to get your confirmation here. An activity-centric JSON format standard, ready for external users to consume, would be a good way to handle the caching aspect here. As we develop this, we can invent format versions, similar to the GPAD/GAF specs, and then update tools (like gocam-viz) to handle the differences. Definitely "project-able". |
Exactly, then the viewer could be just a viewer and external users would have a simple file to work on. Tagging @cmungall as he had some ideas on the structure of such gocam file, more oriented PPIs. |
Discussing w/ @pgaudet we'll revisit this fresh in an new issue in a new project. |
This is a ticket to detail some under-the-hood processes and keep track of some proposals/requirements to be later discussed and prioritize.
There were discussions about having GO-CAM stats computed and shown on the GO website (e.g. Tighter integration and access of GO-CAMs in view of the article release geneontology.github.io#180) or for general QCs. Those should be stored with the other GO stats recently delivered.
In addition, the GO-CAM downloads are handled for the moment by my secondary pipeline which should also be merged to the main go pipeline - speaking of the red links:
The text was updated successfully, but these errors were encountered: