A nextflow pipeline for West Nile Virus (WNV)
The pipeline is developed to analyze WNV genome data in fastq format. It can identify the taxon species, the percent of reads mapped against reference genome, the coverage and depth of reference genome, consensus sequence, VADR flag, SNPs, ect.
Nextflow is required. The detail of installation can be found in https://github.com/nextflow-io/nextflow.
Python v3.7 or higher is required.
Singularity/Apptainer is also required. The detail of installation can be found in https://singularity-tutorial.github.io/01-installation/.
conda create -n WNV -c conda-forge python=3.10
conda activate WNV
- put your data files into directory /fastqs. Your data file's name should look like "JBS22002292_1.fastq.gz", "JBS22002292_2.fastq.gz". Test data can be found in the directory /fastqs/testdata. If you want to use the test data, copy them to the directory /fastqs.
- open file "parames.yaml", set the parameters.
- get into the top directory of the pipeline, run
sbatch ./daytona_wnv.sh
All results can be found in the directory /output.
- For first running the pipeline, if there are no index files (*.fai, *.sa, *.pac, *.bwt, *.ann, *.amb) in the folder "reference", you need run the indexing command to generate 5 index files in the folder "reference":
bwa index reference.fasta
- Default python is v3.6 in HPG, while the pipeline requires at least python 3.7. Do not directly use "module load python" in HPG terminal, as the loaded python misses some modules, such as "site". The recommanded way is to install a higher python version by conda.