-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell type annotation: scGPT workflow #832
base: main
Are you sure you want to change the base?
Conversation
* update description concat component * Update src/dataflow/concatenate_h5mu/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> --------- Co-authored-by: Dries Schaumont <[email protected]>
Co-authored-by: Vladimir Shitov <[email protected]>
Co-authored-by: DriesSchaumont <[email protected]>
* Remove uses of auto: [publish: true] * Undo removal of publish component * Fix integration test
* add component to subset obsp * update changelog * update descriptions * fix tests * add comment
* update knn component * update changelog * update changelog * address pr comments * fix tests * fix tests
* update scanvi * typo * update changelog
#894) * Fix ingestion components not working when optional arguments are unset * update changelog * fix test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me source-wise!
I think we might need to merge main into this branch again, right?
After the merge is done, I'll run the component manually once to verify the behaviour. Let me know when this PR is ready for me to do the manual run!
model_dict = {} | ||
model_dict["model_state_dict"] = f_model_dict | ||
model_dict["id_to_class"] = {k: str(k) for k in range(15)} | ||
torch.save(model_dict, ft_model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the test resources need to be updated for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes, to be able to run the integration tests.
The resources only contain the foundation model, but a finetuned model has a different file architecture and scGPT annotation strictly requires a finetuned model.
"obsm_gene_tokens": "gene_id_tokens", | ||
"obsm_tokenized_values": "values_tokenized" | ||
], | ||
toState: {id, output, state -> ["output": output.output]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the final output of this workflow will be an h5mu file with only the hvg files and the predicted celltypes & probabilities, correct?
Would it make sense to revert the h5mu back to the original input, but then copy the new outputs structures (predicted celltypes and probabilities) to the original input data.
Interested to hear your thoughts on this -- I can be convinced to not include this step in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it should't output the file with hvg features only. In the bigger annotation workflow (where combining multiple methods is possible), this output file could be the input of another annotation workflow, where no hvg subsetting is desired.
I'm tempted to include the HVG subsetting logic inside the annotation component (it's also the case for e.g. scANVI), rather than coing the subsetting in a separate component. Then we can also copy the annotations back to the original input, wdyt?
Changelog
Issue ticket number and link
Closes #xxxx (Replace xxxx with the GitHub issue number)
Checklist before requesting a review
I have performed a self-review of my code
Conforms to the Contributor's guide
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI tests succeed!