diff --git a/.gitignore b/.gitignore index 813394c..0b837f0 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,4 @@ tests/testthat/run_pipeline_vignette_config_patches/*/*.yaml* !tests/testthat/run_pipeline_vignette_config_patches/*/*.default.yaml /doc/ /Meta/ +README.html diff --git a/DESCRIPTION b/DESCRIPTION index bb944f9..f466794 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: scdrake Type: Package Title: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language -Version: 1.5.2 +Version: 1.6.0 Authors@R: c( person( @@ -17,6 +17,12 @@ Authors@R: role = c("aut"), email = "jan.kubovciak@img.cas.cz" ), + person( + given = "Lucie", + family = "Pfeiferova", + role = c("aut"), + email = "lucie.pfeiferova@img.cas.cz" + ), person( given = "Michal", family = "Kolar", diff --git a/NEWS.md b/NEWS.md index d24b211..aaba922 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,9 @@ +# scdrake 1.6.0 +- `scdrake` now allows processing of spatial transcriptomics data from spot-based technologies (Visium). + - See `vignette("scdrake_spatial")`. +- Added annotation using user-defined marker genes. +- Updated `stage_input_qc` and `stage_norm_clustering` vignettes. + # scdrake 1.5.0 - Major refactoring: diff --git a/README.Rmd b/README.Rmd index aa166ea..d85f4d4 100644 --- a/README.Rmd +++ b/README.Rmd @@ -22,10 +22,10 @@ knitr::opts_chunk$set( [![Overview and outputs](https://img.shields.io/badge/Overview%20&%20outputs-vignette("pipeline_overview")-informational)](https://bioinfocz.github.io/scdrake/articles/pipeline_overview.html) [![Pipeline diagram](https://img.shields.io/badge/Pipeline%20diagram-Show-informational)](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md) ![License](https://img.shields.io/github/license/bioinfocz/scdrake) -[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) +[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![Docker Image CI](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml/badge.svg?branch=main)](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml) -`{scdrake}` is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data. +`{scdrake}` is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and spot-based spatial transcriptomics data (SRT). `{scdrake}` is an R package built on top of the `{drake}` package, a [Make](https://www.gnu.org/software/make)-like pipeline toolkit for [R language](https://www.r-project.org). @@ -34,9 +34,13 @@ The main features of the `{scdrake}` pipeline are: - Import of scRNA-seq data: [10x Genomics Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) output, delimited table, or `SingleCellExperiment` object. -- Quality control and filtering of cells and genes, removal of empty droplets. +- Import of SRT data: + [10x Genomics Space Ranger](https://www.10xgenomics.com/support/software/space-ranger/latest/getting-started/what-is-space-ranger) + output, delimited table, or `SingleCellExperiment` object, and tissue positions file as in Space ranger. +- Quality control and filtering of cells/spots and genes, removal of empty droplets. - Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction. -- Cell type annotation. +- Spatially variable genes detection (for SRT data) +- Cell type annotation using reference sets, cell type annotation using user-provided marker genes. - Integration of multiple datasets. - Computation of cluster markers and differentially expressed genes between clusters (denoted as "contrasts"). - Rich graphical and HTML outputs based on customizable RMarkdown documents. @@ -378,7 +382,7 @@ By contributing to this project, you agree to abide by its terms. ### Funding This work was supported by [ELIXIR CZ](https://www.elixir-czech.cz) research infrastructure project -(MEYS Grant No: LM2018131) including access to computing and storage facilities. +(MEYS Grant No: LM2018131 and LM2023055) including access to computing and storage facilities. ### Software and methods used by `{scdrake}` @@ -402,4 +406,4 @@ Many things are used by `{scdrake}`, but these are really worth mentioning: - The code is styled automatically thanks to `{styler}`. - The documentation is formatted thanks to `{devtools}` and `{roxygen2}`. -This package was developed using `{biocthis}`. +This package was developed using `{biocthis}`. \ No newline at end of file diff --git a/README.md b/README.md index f080793..765a958 100644 --- a/README.md +++ b/README.md @@ -13,12 +13,13 @@ outputs](https://img.shields.io/badge/Overview%20&%20outputs-vignette(%22pipelin diagram](https://img.shields.io/badge/Pipeline%20diagram-Show-informational)](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md) ![License](https://img.shields.io/github/license/bioinfocz/scdrake) [![Lifecycle: -experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) +experimental](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![Docker Image CI](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml/badge.svg?branch=main)](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml) `{scdrake}` is a scalable and reproducible pipeline for secondary -analysis of droplet-based single-cell RNA-seq data. `{scdrake}` is an R +analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and +spot-based spatial transcriptomics data (SRT). `{scdrake}` is an R package built on top of the `{drake}` package, a [Make](https://www.gnu.org/software/make)-like pipeline toolkit for [R language](https://www.r-project.org). @@ -28,11 +29,17 @@ The main features of the `{scdrake}` pipeline are: - Import of scRNA-seq data: [10x Genomics Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) output, delimited table, or `SingleCellExperiment` object. -- Quality control and filtering of cells and genes, removal of empty - droplets. +- Import of SRT data: [10x Genomics Space + Ranger](https://www.10xgenomics.com/support/software/space-ranger/latest/getting-started/what-is-space-ranger) + output, delimited table, or `SingleCellExperiment` object, and + tissue positions file as in Space ranger. +- Quality control and filtering of cells/spots and genes, removal of + empty droplets. - Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction. -- Cell type annotation. +- Spatially variable genes detection (for SRT data) +- Cell type annotation using reference sets, cell type annotation + using user-provided marker genes. - Integration of multiple datasets. - Computation of cluster markers and differentially expressed genes between clusters (denoted as “contrasts”). @@ -108,8 +115,8 @@ You can pull the Docker image with the latest stable `{scdrake}` version using ``` bash -docker pull jirinovo/scdrake:1.5.2 -singularity pull docker:jirinovo/scdrake:1.5.2 +docker pull jirinovo/scdrake:1.6.0 +singularity pull docker:jirinovo/scdrake:1.6.0 ``` or list available versions in [our Docker Hub @@ -151,7 +158,7 @@ docker run -d \ -e USERID=$(id -u) \ -e GROUPID=$(id -g) \ -e PASSWORD=1234 \ - jirinovo/scdrake:1.5.2 + jirinovo/scdrake:1.6.0 ``` For Singularity, also make shared directories and execute the container @@ -234,7 +241,7 @@ for `{scdrake}` and you can use it to install all dependencies by ``` r ## -- This is a lockfile for the latest stable version of scdrake. -download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.5.2/renv.lock") +download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.6.0/renv.lock") ## -- You can increase the number of CPU cores to speed up the installation. options(Ncpus = 2) renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories()) @@ -254,7 +261,7 @@ installed from the lockfile). ``` r remotes::install_github( - "bioinfocz/scdrake@1.5.2", + "bioinfocz/scdrake@1.6.0", dependencies = FALSE, upgrade = FALSE, keep_source = TRUE, build_vignettes = TRUE, repos = BiocManager::repositories() @@ -321,7 +328,7 @@ vignette](https://bioinfocz.github.io/scdrake/articles/scdrake.html) ## Vignettes and other readings See for a documentation website of -the latest stable version (1.5.2) where links to vignettes below become +the latest stable version (1.6.0) where links to vignettes below become real :-) See for a documentation @@ -341,6 +348,7 @@ website of the current development version. - General information: - Pipeline overview: `vignette("pipeline_overview")` - FAQ & Howtos: `vignette("scdrake_faq")` + - Spatial extension: `vignette("scdrake_spatial")` - Command line interface (CLI): `vignette("scdrake_cli")` - Config files (internals): `vignette("scdrake_config")` - Environment variables: `vignette("scdrake_envvars")` @@ -352,8 +360,9 @@ website of the current development version. - Stage `01_input_qc`: reading in data, filtering, quality control -\> `vignette("stage_input_qc")` - Stage `02_norm_clustering`: normalization, HVG selection, - dimensionality reduction, clustering, cell type annotation - -\> `vignette("stage_norm_clustering")` + SVG selection, dimensionality reduction, clustering, + (marker-based) cell type annotation -\> + `vignette("stage_norm_clustering")` - Integration pipeline: - Stage `01_integration`: reading in data and integration -\> `vignette("stage_integration")` @@ -436,8 +445,8 @@ contributing to this project, you agree to abide by its terms. ### Funding This work was supported by [ELIXIR CZ](https://www.elixir-czech.cz) -research infrastructure project (MEYS Grant No: LM2018131) including -access to computing and storage facilities. +research infrastructure project (MEYS Grant No: LM2018131 and LM2023055) +including access to computing and storage facilities. ### Software and methods used by `{scdrake}` diff --git a/_pkgdown.yml b/_pkgdown.yml index 019ce22..cda5626 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -29,7 +29,7 @@ navbar: text: Integration pipeline guide href: articles/scdrake_integration.html spatial: - text: Spatial extention + text: Spatial extension href: articles/scdrake_spatial.html faq: text: FAQ & Howtos diff --git a/inst/Rmd/single_sample/02_norm_clustering.Rmd b/inst/Rmd/single_sample/02_norm_clustering.Rmd index c1d7f50..4c42244 100644 --- a/inst/Rmd/single_sample/02_norm_clustering.Rmd +++ b/inst/Rmd/single_sample/02_norm_clustering.Rmd @@ -153,7 +153,9 @@ downstream methods. We want to select genes that contain useful information abou removing genes that contain random noise. This aims to preserve interesting biological structure without the variance that obscures that structure, and to reduce the size of the data to improve computational efficiency of later steps. -More information in [OSCA](https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html) +In STR, we can identify spatially variable genes (SVGs). We define SVGs as genes with spatially correlated patterns of expression across the tissue area. Based on paper from Li et al 2021 we decided to generate a combined set of HVGs and Spatialy variable genes (SVGs). + +More information in [OSCA](https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html and [BestPracticesST](https://lmweber.org/BestPracticesST/)) ```{r, results = "asis"} scdrake::catg0('**HVG metric: "{hvg_metric}"**\n\n') @@ -363,11 +365,11 @@ if (!is.null(cfg$CELL_ANNOTATION_SOURCES)) { ```{r, results = "asis"} if (isTRUE(cfg$MANUAL_ANNOTATION)) { - scdrake::md_header("Manual cell annotation", 1, extra = "{.tabset}") + scdrake::md_header("Marker-based cell annotation", 1, extra = "{.tabset}") scdrake::catn( glue::glue("**Annotation was done for {cfg$ANNOTATION_CLUSTERING}**")) cat("\n\n") - cat("For manual annotation we modified an implemented function from the Giotto package. The enrichment Z score is calculated by using method (PAGE) from Kim SY et al., BMC bioinformatics, 2005 as $$ Z = \frac{((Sm – mu)*m^\frac{1}{2})}{delta} $$. \n + cat("For marker-based annotation we modified an implemented function from the Giotto package. The enrichment Z score is calculated by using method (PAGE) from Kim SY et al., BMC bioinformatics, 2005 as $$ Z = \frac{((Sm – mu)*m^\frac{1}{2})}{delta} $$. \n For each gene in each spot/cell, mu is the fold change values versus the mean expression and delta is the standard deviation. Sm is the mean fold change value of a specific marker gene set and m is the size of a given marker gene set.") diff --git a/vignettes/_vignette_signpost.Rmd b/vignettes/_vignette_signpost.Rmd index 49b7a64..bf9ea5f 100644 --- a/vignettes/_vignette_signpost.Rmd +++ b/vignettes/_vignette_signpost.Rmd @@ -10,6 +10,7 @@ - General information: - Pipeline overview: `vignette("pipeline_overview")` - FAQ & Howtos: `vignette("scdrake_faq")` + - Spatial extension: `vignette("scdrake_spatial")` - Command line interface (CLI): `vignette("scdrake_cli")` - Config files (internals): `vignette("scdrake_config")` - Environment variables: `vignette("scdrake_envvars")` @@ -19,7 +20,7 @@ - Pipelines and stages: - Single-sample pipeline: - Stage `01_input_qc`: reading in data, filtering, quality control -> `vignette("stage_input_qc")` - - Stage `02_norm_clustering`: normalization, HVG selection, dimensionality reduction, clustering, cell type annotation + - Stage `02_norm_clustering`: normalization, HVG selection, SVG selection, dimensionality reduction, clustering, (marker-based) cell type annotation -> `vignette("stage_norm_clustering")` - Integration pipeline: - Stage `01_integration`: reading in data and integration -> `vignette("stage_integration")` diff --git a/vignettes/scdrake_spatial.Rmd b/vignettes/scdrake_spatial.Rmd index a44308d..871870d 100644 --- a/vignettes/scdrake_spatial.Rmd +++ b/vignettes/scdrake_spatial.Rmd @@ -1,5 +1,5 @@ --- -title: "Spatial extention" +title: "Spatial extension" date: "`r glue::glue('Document generated: {format(Sys.time(), \"%Y-%m-%d %H:%M:%S %Z%z\")}')`" package: scdrake output: @@ -7,27 +7,27 @@ output: toc: true toc_float: true vignette: > - %\VignetteIndexEntry{Spatial extention} + %\VignetteIndexEntry{Spatial extension} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- *** -`{scdrake}` now offer spatial extension for the first stage (`01_input_qc`) and the second stage (`02_norm_clustering`) of the single-sample pipeline. The spatial possibility is aimed on Visium technology, respectively on spot-based technologies. Scdrake provides comparable results with Seurat, Giotto (R), as well as scanpy (Python). However, we strongly discourage usage of scdrake for other technologies than Visium. For futher analyses of spatial dataset we recommend [CARD](https://github.com/YMa-lab/CARD) for deconvolution and [CellChat2](https://github.com/SiYangming/CellChat2) or [IGAN](https://github.com/Zhu-JC/IGAN) for cell-cell interaction. +`{scdrake}` now offer spatial extension for the first stage (`01_input_qc`) and the second stage (`02_norm_clustering`) of the single-sample pipeline. The spatial possibility is aimed at Visium technology, respectively on spot-based technologies. Scdrake provides comparable results with Seurat, Giotto (R), as well as scanpy (Python), and correspond to [Best Practices for Spatial Transcriptomics](https://lmweber.org/BestPracticesST/). For now, we discourage usage of scdrake for other technologies than Visium. For futher analyses of the spatial dataset we recommend [CARD](https://github.com/YMa-lab/CARD) for deconvolution and [CellChat2](https://github.com/SiYangming/CellChat2) or [IGAN](https://github.com/Zhu-JC/IGAN) for cell-cell interaction. This vignette should serve as a supplement to other vignettes, as `vignette("stage_input_qc")` and `vignette("stage_norm_clustering")`). *** -## Spatial extention functions +## Spatial exsention functions *** ### Spatial visualization -For (`01_input_qc`) and (`02_norm_clustering`) of the single-sample pipeline we now offer visualization of tissue, as pseudo tissue spot visualization. Spatial extention will add spot coordinates (array_col and array_row) from SpaceRanger tissue_possitions.csv file, and will filter away all spots, that are by SpaceRanger labeled as not in tissue. Visualization function are implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/). Visualization is automatically used for quality control and dimension reduction results. +For (`01_input_qc`) and (`02_norm_clustering`) of the single-sample pipeline we now offer visualization of tissue, as pseudo tissue spot visualization. Spatial extension will add spot coordinates (array_col and array_row) from SpaceRanger tissue_possitions.csv file, and will filter away all spots, that are by SpaceRanger labeled as not in tissue. Visualization function are implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/). Visualization is automatically used for quality control and dimension reduction results. *** @@ -37,8 +37,8 @@ For spatial analyses in stage 02_norm_clustering `vignette("stage_norm_clusterin *** -### Manual annotation +### Marker-based annotation -Manual annotation was implemented for both single-cell and spatial datasets. In summary, expression profiles and statistical metrics are computed for each cell/spot, the result is visualized using a heatmap and dimension reduction plot. For spatial datasets is enabled to visualized results in tissue coordinates, both enrichment plots for each annotation label (individual enrichment plots) and for overall results for each spot. Manual annotation is implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/), the function is based on [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144). +Marker-based annotation was implemented for both single-cell and spatial datasets. In summary, expression profiles and statistical metrics are computed for each cell/spot, the result is visualized using a heatmap and dimension reduction plot. For spatial datasets is enabled to visualized results in tissue coordinates, both enrichment plots for each annotation label (individual enrichment plots) and for overall results for each spot. Marker-based annotation is implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/), the function is based on [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144). *** diff --git a/vignettes/stage_input_qc.Rmd b/vignettes/stage_input_qc.Rmd index b45a3d3..0ccc243 100644 --- a/vignettes/stage_input_qc.Rmd +++ b/vignettes/stage_input_qc.Rmd @@ -120,7 +120,7 @@ INPUT_QC_REPORT_RMD_FILE: "Rmd/single_sample/01_input_qc.Rmd" **Type:** character scalar -A path to RMarkdown file used for HTML report of this pipeline stage. For spatial extention, the default RMarkdown file is `01_input_qc_spatial.Rmd` +A path to RMarkdown file used for HTML report of this pipeline stage. For spatial extension, the default RMarkdown file is `01_input_qc_spatial.Rmd` *** @@ -143,7 +143,7 @@ You can also negate the selection by specifying `negate: true`. *** -#### Spatial extention +#### Spatial extension ```yaml diff --git a/vignettes/stage_norm_clustering.Rmd b/vignettes/stage_norm_clustering.Rmd index 2194b31..a3bcac6 100644 --- a/vignettes/stage_norm_clustering.Rmd +++ b/vignettes/stage_norm_clustering.Rmd @@ -117,7 +117,7 @@ several `{scdrake}` functions. *** -#### Spatial extention +#### Spatial extension ```yaml @@ -128,7 +128,7 @@ SPATIAL: False If `True`, pipeline enables spatial extension. -This option enables spatial extension as selection of spatially variable genes (SVGS), pseudo tissue visualization and other visualization options for manual annotation. +This option enables spatial extension as selection of spatially variable genes (SVGS), pseudo tissue visualization and other visualization options for marker-based annotation. *** @@ -675,7 +675,7 @@ will override corresponding parameters in `TRAIN_PARAMS` in `CELL_ANNOTATION_SOU *** -#### Manual cell annotation signatures +#### Marker-based cell annotation signatures ```yaml MANUAL_ANNOTATION: False @@ -683,7 +683,7 @@ MANUAL_ANNOTATION: False **Type:** logical scalar -Whether manual annotation is enabled. Manual annotation is based on expression profiles, markers are taken from user-defined file. Resulted annotation is stored in `annotation_metadata` target. Manual cell annotation implemented from Giotto package enrichment function, which is based on PAGE method from [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144). +Whether marker-based annotation is enabled. Marker-based annotation is based on expression profiles, markers are taken from user-defined file. Resulted annotation is stored in `annotation_metadata` target. Marker-based cell annotation implemented from Giotto package enrichment function, which is based on PAGE method from [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144). *** @@ -760,7 +760,7 @@ HEATMAP_DIMRED: "umap" **Type:** character scalar -Which dimension reduction use to show result of manual annotation. +Which dimension reduction use to show result of marker-based annotation. ***