update README, fix misspelling (#12)

* update README, fix misspelling * update description * update description * update README.md * readme and vignettes typos * readme version * news update * rm README.html, rerender README.md --------- Co-authored-by: gorgitko <[email protected]>
bioinfocz · Sep 14, 2024 · 7eb50d5 · 7eb50d5
1 parent ae3452f
commit 7eb50d5
Show file tree

Hide file tree

Showing 11 changed files with 70 additions and 41 deletions.
diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,4 @@ tests/testthat/run_pipeline_vignette_config_patches/*/*.yaml*
 !tests/testthat/run_pipeline_vignette_config_patches/*/*.default.yaml
 /doc/
 /Meta/
+README.html
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: scdrake
 Type: Package
 Title: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language
-Version: 1.5.2
+Version: 1.6.0
 Authors@R:
     c(
       person(
@@ -17,6 +17,12 @@ Authors@R:
         role = c("aut"),
         email = "[email protected]"
       ),
+      person(
+        given = "Lucie",
+        family = "Pfeiferova",
+        role = c("aut"),
+        email = "[email protected]"
+      ),
       person(
         given = "Michal",
         family = "Kolar",

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,9 @@
+# scdrake 1.6.0
+- `scdrake` now allows processing of spatial transcriptomics data from spot-based technologies (Visium).
+  - See `vignette("scdrake_spatial")`.
+- Added annotation using user-defined marker genes.
+- Updated `stage_input_qc` and `stage_norm_clustering` vignettes.
+
 # scdrake 1.5.0
 
 - Major refactoring:

diff --git a/README.Rmd b/README.Rmd
@@ -22,10 +22,10 @@ knitr::opts_chunk$set(
 [![Overview and outputs](https://img.shields.io/badge/Overview%20&%20outputs-vignette("pipeline_overview")-informational)](https://bioinfocz.github.io/scdrake/articles/pipeline_overview.html)
 [![Pipeline diagram](https://img.shields.io/badge/Pipeline%20diagram-Show-informational)](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md)
 ![License](https://img.shields.io/github/license/bioinfocz/scdrake)
-[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
+[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
 [![Docker Image CI](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml/badge.svg?branch=main)](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml)
 
-`{scdrake}` is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data.
+`{scdrake}` is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and spot-based spatial transcriptomics data (SRT).
 `{scdrake}` is an R package built on top of the `{drake}` package, a [Make](https://www.gnu.org/software/make)-like pipeline
 toolkit for [R language](https://www.r-project.org).
 
@@ -34,9 +34,13 @@ The main features of the `{scdrake}` pipeline are:
 - Import of scRNA-seq data:
   [10x Genomics Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)
   output, delimited table, or `SingleCellExperiment` object.
-- Quality control and filtering of cells and genes, removal of empty droplets.
+- Import of SRT data:
+  [10x Genomics Space Ranger](https://www.10xgenomics.com/support/software/space-ranger/latest/getting-started/what-is-space-ranger)
+  output, delimited table, or `SingleCellExperiment` object, and tissue positions file as in Space ranger.
+- Quality control and filtering of cells/spots and genes, removal of empty droplets.
 - Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction.
-- Cell type annotation.
+- Spatially variable genes detection (for SRT data)
+- Cell type annotation using reference sets, cell type annotation using user-provided marker genes.
 - Integration of multiple datasets.
 - Computation of cluster markers and differentially expressed genes between clusters (denoted as "contrasts").
 - Rich graphical and HTML outputs based on customizable RMarkdown documents.
@@ -378,7 +382,7 @@ By contributing to this project, you agree to abide by its terms.
 ### Funding
 
 This work was supported by [ELIXIR CZ](https://www.elixir-czech.cz) research infrastructure project
-(MEYS Grant No: LM2018131) including access to computing and storage facilities.
+(MEYS Grant No: LM2018131 and LM2023055) including access to computing and storage facilities.
 
 ### Software and methods used by `{scdrake}`
 
@@ -402,4 +406,4 @@ Many things are used by `{scdrake}`, but these are really worth mentioning:
 - The code is styled automatically thanks to `{styler}`.
 - The documentation is formatted thanks to `{devtools}` and `{roxygen2}`.
 
-This package was developed using `{biocthis}`.
+This package was developed using `{biocthis}`.
diff --git a/README.md b/README.md
@@ -13,12 +13,13 @@ outputs](https://img.shields.io/badge/Overview%20&%20outputs-vignette(%22pipelin
 diagram](https://img.shields.io/badge/Pipeline%20diagram-Show-informational)](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md)
 ![License](https://img.shields.io/github/license/bioinfocz/scdrake)
 [![Lifecycle:
-experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
+experimental](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
 [![Docker Image
 CI](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml/badge.svg?branch=main)](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml)
 
 `{scdrake}` is a scalable and reproducible pipeline for secondary
-analysis of droplet-based single-cell RNA-seq data. `{scdrake}` is an R
+analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and
+spot-based spatial transcriptomics data (SRT). `{scdrake}` is an R
 package built on top of the `{drake}` package, a
 [Make](https://www.gnu.org/software/make)-like pipeline toolkit for [R
 language](https://www.r-project.org).
@@ -28,11 +29,17 @@ The main features of the `{scdrake}` pipeline are:
 -   Import of scRNA-seq data: [10x Genomics Cell
     Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)
     output, delimited table, or `SingleCellExperiment` object.
--   Quality control and filtering of cells and genes, removal of empty
-    droplets.
+-   Import of SRT data: [10x Genomics Space
+    Ranger](https://www.10xgenomics.com/support/software/space-ranger/latest/getting-started/what-is-space-ranger)
+    output, delimited table, or `SingleCellExperiment` object, and
+    tissue positions file as in Space ranger.
+-   Quality control and filtering of cells/spots and genes, removal of
+    empty droplets.
 -   Higly variable genes detection, cell cycle scoring, normalization,
     clustering, and dimensionality reduction.
--   Cell type annotation.
+-   Spatially variable genes detection (for SRT data)
+-   Cell type annotation using reference sets, cell type annotation
+    using user-provided marker genes.
 -   Integration of multiple datasets.
 -   Computation of cluster markers and differentially expressed genes
     between clusters (denoted as “contrasts”).
@@ -108,8 +115,8 @@ You can pull the Docker image with the latest stable `{scdrake}` version
 using
 
 ``` bash
-docker pull jirinovo/scdrake:1.5.2
-singularity pull docker:jirinovo/scdrake:1.5.2
+docker pull jirinovo/scdrake:1.6.0
+singularity pull docker:jirinovo/scdrake:1.6.0
 ```
 
 or list available versions in [our Docker Hub
@@ -151,7 +158,7 @@ docker run -d \
   -e USERID=$(id -u) \
   -e GROUPID=$(id -g) \
   -e PASSWORD=1234 \
-  jirinovo/scdrake:1.5.2
+  jirinovo/scdrake:1.6.0
 ```
 
 For Singularity, also make shared directories and execute the container
@@ -234,7 +241,7 @@ for `{scdrake}` and you can use it to install all dependencies by
 
 ``` r
 ## -- This is a lockfile for the latest stable version of scdrake.
-download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.5.2/renv.lock")
+download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.6.0/renv.lock")
 ## -- You can increase the number of CPU cores to speed up the installation.
 options(Ncpus = 2)
 renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())
@@ -254,7 +261,7 @@ installed from the lockfile).
 
 ``` r
 remotes::install_github(
-  "bioinfocz/scdrake@1.5.2",
+  "bioinfocz/scdrake@1.6.0",
   dependencies = FALSE, upgrade = FALSE,
   keep_source = TRUE, build_vignettes = TRUE,
   repos = BiocManager::repositories()
@@ -321,7 +328,7 @@ vignette](https://bioinfocz.github.io/scdrake/articles/scdrake.html)
 ## Vignettes and other readings
 
 See <https://bioinfocz.github.io/scdrake> for a documentation website of
-the latest stable version (1.5.2) where links to vignettes below become
+the latest stable version (1.6.0) where links to vignettes below become
 real :-)
 
 See <https://bioinfocz.github.io/scdrake/dev> for a documentation
@@ -341,6 +348,7 @@ website of the current development version.
 -   General information:
     -   Pipeline overview: `vignette("pipeline_overview")`
     -   FAQ & Howtos: `vignette("scdrake_faq")`
+    -   Spatial extension: `vignette("scdrake_spatial")`
     -   Command line interface (CLI): `vignette("scdrake_cli")`
     -   Config files (internals): `vignette("scdrake_config")`
     -   Environment variables: `vignette("scdrake_envvars")`
@@ -352,8 +360,9 @@ website of the current development version.
         -   Stage `01_input_qc`: reading in data, filtering, quality
             control -\> `vignette("stage_input_qc")`
         -   Stage `02_norm_clustering`: normalization, HVG selection,
-            dimensionality reduction, clustering, cell type annotation
-            -\> `vignette("stage_norm_clustering")`
+            SVG selection, dimensionality reduction, clustering,
+            (marker-based) cell type annotation -\>
+            `vignette("stage_norm_clustering")`
     -   Integration pipeline:
         -   Stage `01_integration`: reading in data and integration -\>
             `vignette("stage_integration")`
@@ -436,8 +445,8 @@ contributing to this project, you agree to abide by its terms.
 ### Funding
 
 This work was supported by [ELIXIR CZ](https://www.elixir-czech.cz)
-research infrastructure project (MEYS Grant No: LM2018131) including
-access to computing and storage facilities.
+research infrastructure project (MEYS Grant No: LM2018131 and LM2023055)
+including access to computing and storage facilities.
 
 ### Software and methods used by `{scdrake}`
 

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -29,7 +29,7 @@ navbar:
       text: Integration pipeline guide
       href: articles/scdrake_integration.html
     spatial:
-      text: Spatial extention
+      text: Spatial extension
       href: articles/scdrake_spatial.html
     faq:
       text: FAQ & Howtos

diff --git a/inst/Rmd/single_sample/02_norm_clustering.Rmd b/inst/Rmd/single_sample/02_norm_clustering.Rmd
@@ -153,7 +153,9 @@ downstream methods. We want to select genes that contain useful information abou
 removing genes that contain random noise. This aims to preserve interesting biological structure without the variance
 that obscures that structure, and to reduce the size of the data to improve computational efficiency of later steps.
 
-More information in [OSCA](https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html)
+In STR, we can identify spatially variable genes (SVGs). We define SVGs as genes with spatially correlated patterns of expression across the tissue area. Based on paper from Li et al 2021 we decided to generate a combined set of HVGs and Spatialy variable genes (SVGs). 
+
+More information in [OSCA](https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html and [BestPracticesST](https://lmweber.org/BestPracticesST/))
 
 ```{r, results = "asis"}
 scdrake::catg0('**HVG metric: "{hvg_metric}"**\n\n')
@@ -363,11 +365,11 @@ if (!is.null(cfg$CELL_ANNOTATION_SOURCES)) {
 
 ```{r, results = "asis"}
 if (isTRUE(cfg$MANUAL_ANNOTATION)) {
- scdrake::md_header("Manual cell annotation", 1, extra = "{.tabset}")
+ scdrake::md_header("Marker-based cell annotation", 1, extra = "{.tabset}")
    scdrake::catn(
     glue::glue("**Annotation was done for {cfg$ANNOTATION_CLUSTERING}**"))
    cat("\n\n")
-  cat("For manual annotation we modified an implemented function from the Giotto package. The enrichment Z score is calculated by using method (PAGE) from Kim SY et al., BMC bioinformatics, 2005 as $$ Z = \frac{((Sm – mu)*m^\frac{1}{2})}{delta} $$. \n
+  cat("For marker-based annotation we modified an implemented function from the Giotto package. The enrichment Z score is calculated by using method (PAGE) from Kim SY et al., BMC bioinformatics, 2005 as $$ Z = \frac{((Sm – mu)*m^\frac{1}{2})}{delta} $$. \n
  For each gene in each spot/cell, mu is the fold change values versus the mean expression
  and delta is the standard deviation. Sm is the mean fold change value of a specific marker gene set
  and  m is the size of a given marker gene set.")

diff --git a/vignettes/_vignette_signpost.Rmd b/vignettes/_vignette_signpost.Rmd
@@ -10,6 +10,7 @@
 - General information:
   - Pipeline overview: `vignette("pipeline_overview")`
   - FAQ & Howtos: `vignette("scdrake_faq")`
+  - Spatial extension: `vignette("scdrake_spatial")`
   - Command line interface (CLI): `vignette("scdrake_cli")`
   - Config files (internals): `vignette("scdrake_config")`
   - Environment variables: `vignette("scdrake_envvars")`
@@ -19,7 +20,7 @@
 - Pipelines and stages:
   - Single-sample pipeline:
     - Stage `01_input_qc`: reading in data, filtering, quality control -> `vignette("stage_input_qc")`
-    - Stage `02_norm_clustering`: normalization, HVG selection, dimensionality reduction, clustering, cell type annotation
+    - Stage `02_norm_clustering`: normalization, HVG selection, SVG selection, dimensionality reduction, clustering, (marker-based) cell type annotation
       -> `vignette("stage_norm_clustering")`
   - Integration pipeline:
     - Stage `01_integration`: reading in data and integration -> `vignette("stage_integration")`

diff --git a/vignettes/scdrake_spatial.Rmd b/vignettes/scdrake_spatial.Rmd
@@ -1,33 +1,33 @@
 ---
-title: "Spatial extention"
+title: "Spatial extension"
 date: "`r glue::glue('<sup>Document generated: {format(Sys.time(), \"%Y-%m-%d %H:%M:%S %Z%z</sup>\")}')`"
 package: scdrake
 output:
   BiocStyle::html_document:
     toc: true
     toc_float: true
 vignette: >
-  %\VignetteIndexEntry{Spatial extention}
+  %\VignetteIndexEntry{Spatial extension}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}  
 ---
 
 ***
 
-`{scdrake}` now offer spatial extension for the first stage (`01_input_qc`) and the second stage (`02_norm_clustering`) of the single-sample pipeline. The spatial possibility is aimed on Visium technology, respectively on spot-based technologies. Scdrake provides comparable results with Seurat, Giotto (R), as well as scanpy (Python). However, we strongly discourage usage of scdrake for other technologies than Visium. For futher analyses of spatial dataset we recommend [CARD](https://github.com/YMa-lab/CARD) for deconvolution and [CellChat2](https://github.com/SiYangming/CellChat2) or [IGAN](https://github.com/Zhu-JC/IGAN) for cell-cell interaction.
+`{scdrake}` now offer spatial extension for the first stage (`01_input_qc`) and the second stage (`02_norm_clustering`) of the single-sample pipeline. The spatial possibility is aimed at Visium technology, respectively on spot-based technologies. Scdrake provides comparable results with Seurat, Giotto (R), as well as scanpy (Python), and correspond to [Best Practices for Spatial Transcriptomics](https://lmweber.org/BestPracticesST/). For now, we discourage usage of scdrake for other technologies than Visium. For futher analyses of the spatial dataset we recommend [CARD](https://github.com/YMa-lab/CARD) for deconvolution and [CellChat2](https://github.com/SiYangming/CellChat2) or [IGAN](https://github.com/Zhu-JC/IGAN) for cell-cell interaction.
 
 This vignette should serve as a supplement to other vignettes, as `vignette("stage_input_qc")` and `vignette("stage_norm_clustering")`). 
 
 
 ***
 
-## Spatial extention functions
+## Spatial exsention functions
 
 ***
 
 ### Spatial visualization
 
-For (`01_input_qc`) and (`02_norm_clustering`) of the single-sample pipeline we now offer visualization of tissue, as pseudo tissue spot visualization. Spatial extention will add spot coordinates (array_col and array_row) from SpaceRanger tissue_possitions.csv file, and will filter away all spots, that are by SpaceRanger labeled as not in tissue. Visualization function are implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/). Visualization is automatically used for quality control and dimension reduction results.    
+For (`01_input_qc`) and (`02_norm_clustering`) of the single-sample pipeline we now offer visualization of tissue, as pseudo tissue spot visualization. Spatial extension will add spot coordinates (array_col and array_row) from SpaceRanger tissue_possitions.csv file, and will filter away all spots, that are by SpaceRanger labeled as not in tissue. Visualization function are implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/). Visualization is automatically used for quality control and dimension reduction results.
 
 ***
 
@@ -37,8 +37,8 @@ For spatial analyses in stage 02_norm_clustering `vignette("stage_norm_clusterin
 
 ***
 
-### Manual annotation
+### Marker-based annotation
 
-Manual annotation was implemented for both single-cell and spatial datasets. In summary, expression profiles and statistical metrics are computed for each cell/spot, the result is visualized using a heatmap and dimension reduction plot. For spatial datasets is enabled to visualized results in tissue coordinates, both enrichment plots for each annotation label (individual enrichment plots) and for overall results for each spot. Manual annotation is implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/), the function is based on [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144).  
+Marker-based annotation was implemented for both single-cell and spatial datasets. In summary, expression profiles and statistical metrics are computed for each cell/spot, the result is visualized using a heatmap and dimension reduction plot. For spatial datasets is enabled to visualized results in tissue coordinates, both enrichment plots for each annotation label (individual enrichment plots) and for overall results for each spot. Marker-based annotation is implemented from the [Giotto package](https://drieslab.github.io/Giotto_website/), the function is based on [Kim SY et al](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-144).  
 
 ***
diff --git a/vignettes/stage_input_qc.Rmd b/vignettes/stage_input_qc.Rmd
@@ -120,7 +120,7 @@ INPUT_QC_REPORT_RMD_FILE: "Rmd/single_sample/01_input_qc.Rmd"
 
 **Type:** character scalar
 
-A path to RMarkdown file used for HTML report of this pipeline stage. For spatial extention, the default RMarkdown file is `01_input_qc_spatial.Rmd`
+A path to RMarkdown file used for HTML report of this pipeline stage. For spatial extension, the default RMarkdown file is `01_input_qc_spatial.Rmd`
 
 ***
 
@@ -143,7 +143,7 @@ You can also negate the selection by specifying `negate: true`.
 
 ***
 
-#### Spatial extention
+#### Spatial extension
 
 
 ```yaml
-Original file line number
+Diff line change
@@ Expand Up / @@ -14,3 +14,4 @@ tests/testthat/run_pipeline_vignette_config_patches/*/*.yaml* @@
     !tests/testthat/run_pipeline_vignette_config_patches/*/*.default.yaml
     /doc/
     /Meta/
+    README.html