Erasmus MC University Medical Center Rotterdam, Department of Genetic Identification, Rotterdam, The Netherlands.
de-goulash is a bioinformatics pipeline build in Snakemake which allows clustering mixed individuals using 10x single-cell RNA-seq.
The pipeline is divided in two main steps.
The following inputs are needed:
- Possorted_genome_bam.bam
- barcodes.tsv as output from 10x. this file contains the cells to use onwards.
- genome.fasta (Human reference genome e.g. hg19 or hg38 in fasta format)
- *MT.fasta (mitochondrial DNA sequence in fasta format, same build as genome)
- *region.txt
- *MT_regions.txt
Input files with asterik * [4, 5, 6] can be generated with the python script.
python process_reference.py [path/genome.fasta]
2) Individual genetic identification and biogeographical ancestry assigment. It requires the output variant calling for each assignated cluster from step 1 and it will calculate likelihood of forensic parameters, population assignation, execute haplogrep and finally Yleaf v.2.2.
The inputs needed includes the following:
- Exone reference: exome_96_remmapedto38.vcf.gz
- Reference population based on 1000G project: 100G_populations.txt
- Path where the chromosomes for 1000G variant calling: /single-cell/input/1000G/ [https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/]
- sample_bam: /single-cell/input/1/possorted_genome_trimmed.bam
- barcodes: /single-cell/input/1/barcodes_reduced.txt
- reference: /single-cell/input/reference/genome.fasta
- regions: /single-cell/input/reference/regions.txt #for parallel freebayes, region file can be generated with https://github.com/nh13/freebayes/blob/master/scripts/fasta_generate_regions.py
- reference_MT: /single-cell/input/reference/MT.fasta
- regions_MT: /single-cell/input/reference/MT_regions.txt
- cores: 4
- dp: 50 # SNP filtering depth
- qual: 60 # SNP filtering quality
- thr_cell_1: 10 #Minimal number of SNPs per cell
- thr_cell_2: 20 #Minimal number of SNPs per cell
- threshold_coverage: 10 #treshhold total coverage of selected SNPs per cell
- threshold_coverage_pos: 5 #treshold coverage per selected SNP per cell
- threshold_base_calling: 90
- n_neighbors: 5 #setting for UMAP clustering
- n_components: 300
- clusters: 0 # if clusters > 1 then nBclust is executed to predict number of clusters to use
- ref_exome: /single-cell/input/exome_96_remmapedto38.vcf.gz
- ref_population: /single-cell/input/1000G/1000G_populations.txt
- dirpath_1000G: /single-cell/input/1000G/
- dirpath_analysis: output
- dp_2: 50 #SNP filtering depth
- qual_2: 60 #SNP filtering quality
- read_depth: 1
- quality: 20
- base_calling: 90
- positions_file: /single-cell/software/Yleaf/Position_files/WGS_hg38.txt
We provided a docker image where you can run the pipeline without having to install any other dependency than docker. Although you need root permissions to proceed.
Download docker image (2.03gb)
docker pull geniderasmusmc/de-goulash:1
Tested in Docker version 19.03.2, build 6a30dfc
docker --version
You can execute de-goulash Snakemake pipeline throught docker image-container. You have to manually mount the current directory where input files are located.
- Current directory where input files are located -> /current/directory/de-goulash/
- Default root location inside the container (do not change) -> :/single-cell
- Container name -> geniderasmusmc/de-goulash:1
- Target file [only change output name e.g. output_test/iter2/cells_merge_clusters.vcf] -> output/iter2/cells_merge_clusters.vcf
docker run -it -v /current/directory/de-goulash/:/single-cell geniderasmusmc/de-goulash:1 output/iter2/cells_merge_clusters.vcf --snakefile Snakefile --configfile config.yaml --cores 1
docker run -it -v /current/directory/de-goulash/:/single-cell geniderasmusmc/de-goulash:1 --snakefile Snakefile_analysis --configfile config.yaml --cores 1
Instead of using docker container you can install everything independently and run Snakemake directly
- R 3.6.1 -- "Action of the toes"
- Python 3.7.3
- Linux Ubuntu 18.04
- Java Run time environment 8
Recommended use conda or Python3 venv
pip3 install requirements.txt
Rscript requirements.R
git clone https://github.com/genid/de-goulash.git
Step 1
snakemake output/iter2/cells_merge_clusters.vcf --snakefile Snakefile --configfile config.yaml --cores 1
Step 2
snakemake --snakefile Snakefile_analysis --configfile config.yaml --cores 1