Cell type annotation: scGPT workflow #832

dorien-er · 2024-07-09T12:43:24Z

Changelog

Workflow for scGPT transformer-based cell type annotation
Minor fix to scGPT annotation module to allow for multi-processing

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

…flow

* update description concat component * Update src/dataflow/concatenate_h5mu/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> --------- Co-authored-by: Dries Schaumont <[email protected]>

Co-authored-by: Vladimir Shitov <[email protected]>

Co-authored-by: DriesSchaumont <[email protected]>

* Remove uses of auto: [publish: true] * Undo removal of publish component * Fix integration test

* add component to subset obsp * update changelog * update descriptions * fix tests * add comment

* update knn component * update changelog * update changelog * address pr comments * fix tests * fix tests

* update scanvi * typo * update changelog

#894) * Fix ingestion components not working when optional arguments are unset * update changelog * fix test

rcannood

Looks good to me source-wise!

I think we might need to merge main into this branch again, right?

After the merge is done, I'll run the component manually once to verify the behaviour. Let me know when this PR is ready for me to do the manual run!

CHANGELOG.md

rcannood · 2024-11-22T11:43:56Z

resources_test_scripts/scgpt.sh

+model_dict = {}
+model_dict["model_state_dict"] = f_model_dict
+model_dict["id_to_class"] = {k: str(k) for k in range(15)}
+torch.save(model_dict, ft_model_path)


do the test resources need to be updated for this PR?

Unfortunately yes, to be able to run the integration tests.
The resources only contain the foundation model, but a finetuned model has a different file architecture and scGPT annotation strictly requires a finetuned model.

rcannood · 2024-11-22T11:51:33Z

src/workflows/annotation/scgpt_annotation/main.nf

+          "obsm_gene_tokens": "gene_id_tokens",
+          "obsm_tokenized_values": "values_tokenized"
+        ],
+        toState: {id, output, state -> ["output": output.output]}


the final output of this workflow will be an h5mu file with only the hvg files and the predicted celltypes & probabilities, correct?

Would it make sense to revert the h5mu back to the original input, but then copy the new outputs structures (predicted celltypes and probabilities) to the original input data.

Interested to hear your thoughts on this -- I can be convinced to not include this step in this PR.

You're right, it should't output the file with hvg features only. In the bigger annotation workflow (where combining multiple methods is possible), this output file could be the input of another annotation workflow, where no hvg subsetting is desired.

I'm tempted to include the HVG subsetting logic inside the annotation component (it's also the case for e.g. scANVI), rather than coing the subsetting in a separate component. Then we can also copy the annotations back to the original input, wdyt?

dorien-er added 17 commits July 9, 2024 14:43

scgpt cell type annotation workflow

bf671cc

update changelog

c7f2553

update annotation component

45ca460

update annotation subworkflow

d4ca247

elevate integration test permissions

70961ce

Merge remote-tracking branch 'origin/main' into scgpt-annotation-work…

77da8af

…flow

remove temp scgpt annotation component

04987c4

scgpt cell type annotation workflow

949937e

remove temp scgpt annotation component

864364e

update scgpt test resources

f8ba860

update scgpt test resources

b83be7e

update scgpt test resources

6c34c59

update to viash 9

538bff1

add test workflow

4b59667

fix multi-processing

5d6f526

expand test workflow

bea7c9e

refactorings

6f09dc2

dorien-er marked this pull request as ready for review September 6, 2024 12:56

dorien-er and others added 2 commits September 6, 2024 14:57

Merge branch 'main' into scgpt-annotation-workflow

ff7fc82

update test workflow

1b2a160

dorien-er changed the title ~~scgpt cell type annotation workflow~~ Cell type annotation: scGPT transformer annotation Sep 10, 2024

dorien-er changed the title ~~Cell type annotation: scGPT transformer annotation~~ Cell type annotation: scGPT workflow Sep 10, 2024

cleanup

7ff7b38

dorien-er requested a review from DriesSchaumont September 10, 2024 07:34

dorien-er and others added 6 commits September 10, 2024 10:23

update changelog

a9fcfa7

Add checkout back to integration test CI

6aae494

CI: add missing checkout when syncing s3

cf8c0ea

Update description for concatenate_h5mu component (#880)

aa28cd2

* update description concat component * Update src/dataflow/concatenate_h5mu/config.vsh.yaml Co-authored-by: Dries Schaumont <[email protected]> --------- Co-authored-by: Dries Schaumont <[email protected]>

Leftover updates for viash 0.9 (#882)

aa865cb

Add LSI (#552)

dc38e15

Co-authored-by: Vladimir Shitov <[email protected]>

dorien-er and others added 24 commits November 18, 2024 15:18

Update random_forrest_annotation component (#878)

7c9a8cb

Add copy var component (#877)

1b5e8f2

Add TF-IDF normalization (#870)

458e3fc

Co-authored-by: DriesSchaumont <[email protected]>

SVM annotation component (#845)

9ab9893

CI: avoid components being tested twice

aee759d

CI: Only build containers once

e52f528

Onclass annotation component (#844)

5adab08

add metrics to uns slot after conversion to h5mu

c4af8e1

update changelog

6516f32

update changelog

16e906a

Remove uses of auto: [publish: true] (#886)

5ff83fb

* Remove uses of auto: [publish: true] * Undo removal of publish component * Fix integration test

Add component to subset obsp (#888)

90475c1

* add component to subset obsp * update changelog * update descriptions * fix tests * add comment

Accept pre-calculated distances in knn component (#890)

84ddecb

* update knn component * update changelog * update changelog * address pr comments * fix tests * fix tests

scANVI updates (#892)

374c10c

* update scanvi * typo * update changelog

Update CHANGELOG.md (#895)

c32a877

Fix ingestion components not working when optional arguments are unset (

90ffa51

#894) * Fix ingestion components not working when optional arguments are unset * update changelog * fix test

Fix location of CHANGELOG entry from PR #823 (#896)

73a8401

Fix missing dependency in from_cellranger_multi_to_h5mu (#897)

5ceefbe

scgpt cell type annotation workflow

349a82e

Merge branch 'main' into scgpt-annotation-workflow

22e3787

Merge branch 'main' into scgpt-annotation-workflow

e511ed3

cleanup

7ffcf50

cleanup

a1701f5

cleanup

6e5b306

rcannood reviewed Nov 22, 2024

View reviewed changes

cleanup

bd8253b

dorien-er requested a review from rcannood November 22, 2024 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cell type annotation: scGPT workflow #832

Cell type annotation: scGPT workflow #832

dorien-er commented Jul 9, 2024 •

edited

Loading

rcannood left a comment

rcannood Nov 22, 2024

dorien-er Nov 22, 2024

rcannood Nov 22, 2024

dorien-er Nov 22, 2024

Cell type annotation: scGPT workflow #832

Are you sure you want to change the base?

Cell type annotation: scGPT workflow #832

Conversation

dorien-er commented Jul 9, 2024 • edited Loading

Changelog

Issue ticket number and link

Checklist before requesting a review

rcannood left a comment

Choose a reason for hiding this comment

rcannood Nov 22, 2024

Choose a reason for hiding this comment

dorien-er Nov 22, 2024

Choose a reason for hiding this comment

rcannood Nov 22, 2024

Choose a reason for hiding this comment

dorien-er Nov 22, 2024

Choose a reason for hiding this comment

dorien-er commented Jul 9, 2024 •

edited

Loading