Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a quick reference that maps typecode to schema class name #2285

Open
eecavanna opened this issue Dec 4, 2024 · 8 comments
Open

Make a quick reference that maps typecode to schema class name #2285

eecavanna opened this issue Dec 4, 2024 · 8 comments
Assignees
Labels
documentation Improvements or additions to documentation X SMALL Less than 8 hours, less than 1 day

Comments

@eecavanna
Copy link
Collaborator

@SamuelPurvine reached out to me today to suggest that a table like the following be included in the schema documentation:

ID structured pattern class type located in
wfmag MagsAnalysis workflow https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
wfmtex MetatranscriptomeExpressionAnalysis workflow https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
wftan MetatranscriptomeAnnotation workflow https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
... ... ...
pex ProtocolExecution PlannedProcess
procsm ProcessedSample
subspr SubSamplingProcess MaterialProcessing

Note: That is an excerpt of what he sent me.

He sent it to me as a TSV file that he created by hand in collaboration with @kheal. Here are the TSV lines corresponding to the above table rows:

ID structured pattern	class	type 	located in
wfmag	MagsAnalysis	workflow	https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
wfmtex	MetatranscriptomeExpressionAnalysis	workflow	https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
wftan	MetatranscriptomeAnnotation	workflow	https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/workflow_execution_activity.yaml
...	...	...	
pex	ProtocolExecution	PlannedProcess	
procsm	ProcessedSample		
subspr	SubSamplingProcess	MaterialProcessing

I think he and @kheal want there to be an easier way to gather this information than looking at the schema.

Here are my thoughts from a technical maintenance perspective:

  1. I noticed the referenced YAML files are schema source files; so I assume the target audience of this is schema maintainers.
  2. The data in the table seems to me to be coupled with the contents of the schema (since it maps typecodes to class names, for example), so I would want it to be derived from the schema programmatically, as opposed to being kept up-to-date by hand.
  3. The data in the table seems to me to be coupled with the contents of the Git repository (since it references the file path of a source file), so I would want it to be derived from the contents of the Git repository programmatically, as opposed to being kept up-to-date by hand.

I will defer to @SamuelPurvine and @kheal regarding the use cases they have in mind, and an explanation of the data in the table.

Note: GitHub's "Discussions" feature may be a better fit for this conversation than a GitHub Issue, at this point.

@eecavanna eecavanna added documentation Improvements or additions to documentation backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. labels Dec 4, 2024
@turbomam
Copy link
Member

turbomam commented Dec 4, 2024

https://api.microbiomedata.org/docs#/metadata/get_nmdc_schema_typecodes_nmdcschema_typecodes_get retrieves the typecode to class relationships

I don't understand the values in the type column. It looks like some of them are the ia_s parent of the class on each row, but others may be ad-hoc names?

@turbomam
Copy link
Member

turbomam commented Dec 4, 2024

The source file can be obtained from the from_schema annotation on each element, if the root nmdc.yaml file is used as input into the gen-linkml CLI or the Python equivalent.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Dec 4, 2024

Thanks for sharing your initial thoughts about this, @turbomam.

Speaking of Runtime endpoints, an alternative to including a table like this in the schema documentation could be (depending upon the use cases @SamuelPurvine and @kheal have in mind) to implement a Runtime API endpoint that returns the JSON "equivalent" of the table; or (as opposed to returning the full "table") performs a lookup (using the "table" under the hood), although the fact that they have built this example table leads me to suspect that doing such one-by-one lookups does not satisfy the use case(s) they have in mind.

@SamuelPurvine
Copy link
Contributor

Hi folx! This was generated by Camilo wondering (in a meeting) where in the heck he needed to go to get ahold of what the ID typecodes mean, without trying to dig through all of the yaml files since these are distributed. I started in on trying to cull this info and quickly realized this was a silly thing to do by hand (which is why it's ad hoc and incomplete above), especially as things change and such. Moreover, I can see the utility of having a one stop place in the documentation (i.e. not having to dig it out of the api with various arcane commands) that would reference these ID structures, not just for schema maintainers and newbies, but also the general public who comes to the site and is confronted with ID structures, again who may not be facile with dredging data from an api. Would love to see it as part of https://microbiomedata.github.io/nmdc-schema/identifiers/, maybe as an extension of the "IDs minted for use within NMDC" section, as a quick guide to what each of the type codes relates to.

And yes, it would be great if it ran every so often to keep up to date.

Of course, if there's no desire to have such info available (easily) in the documentation, they by all means ignore the request!

@kheal
Copy link
Contributor

kheal commented Dec 4, 2024

https://api.microbiomedata.org/docs#/metadata/get_nmdc_schema_typecodes_nmdcschema_typecodes_get retrieves the typecode to class relationships

@turbomam - thanks for pointing out this endpoint, I had missed it!

Between this endpoint and the https://api.microbiomedata.org/docs#/metadata/get_by_id_nmdcschema_ids__doc_id__get:~:text=Get%20By-,Id,-If%20the%20identifier endpoint my needs are satisfied, but I see value in transparent documentation of typecode mapping in the schema documentation.

I think a simple table of typecode | class (with link to documentation of that class) | collection is what users might benefit from.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Dec 4, 2024

Thanks, @kheal and @SamuelPurvine.

@SamuelPurvine, @turbomam, @aclum, and I discussed this during today's metadata meeting.

Action plan

I will prototype a "page" (or something page-like) in the schema documentation that shows a mapping from typecode to schema class name and, if practical, a link to documentation for that schema class. The "page" will be auto-generated from the schema. It will not exist as a file in the repository, but will be generated whenever the schema documentation gets built (that's also how the inter-collection diagram gets generated). I will prototype it in a PR and then invite people to review it.

@eecavanna eecavanna self-assigned this Dec 4, 2024
@eecavanna eecavanna added X SMALL Less than 8 hours, less than 1 day and removed backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. labels Dec 4, 2024
@eecavanna
Copy link
Collaborator Author

...next sprint.

@eecavanna eecavanna changed the title Document mapping between typecode, class, a custom category, and YAML source file Make a quick reference that maps typecode to schema class name Dec 4, 2024
@eecavanna
Copy link
Collaborator Author

Here's a link to a Python notebook in a "private" GitHub Gist (accessible to anyone that has this link, so it is effectively public), in which I implemented an algorithm for generating a Markdown table of typecodes, schema class names, and schema class documentation URLs.

https://gist.github.com/eecavanna/dce7c1672ad7d65f265972649d248342

We can use this notebook as a conversation piece at the next metadata meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation X SMALL Less than 8 hours, less than 1 day
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants