-
Notifications
You must be signed in to change notification settings - Fork 1
GEMSTONE_Plate_Swipes_Illumina_PE
v1.0.0
This workflow processes paired-end Illumina reads from plate swipes/plate bacterial metagenomes. It could also be used for broader bacterial metagenomic analysis. It removes human reads, performs QA and QC, and metagenomic assembly. Furthermore, it estimates taxa abundances with Kraken2 and Bracken, does strain-level identification with StrainGE, bins MAGs, performs AMR genotyping, and identifies plasmid contigs.
This workflow was based on the PHB v1.0.0 TheiaMeta workflow from Theiagen.
Click to open or hide
Click to open or hide
Boolean
Optional
Default = false
If true
, aligns reads to the assembly, computes coverage, and retrieves aligned and unaligned reads.
File
Optional
Reference file for consensus calling, in FASTA format. If provided, performs consensus assembly and calling.
Boolean
Optional
Default = false
If true
, performs strain-level detection with StrainGE.
Boolean
Optional
Default = false
If true
, identifies taxa and estimates their abundance with Kraken2/Bracken.
Boolean
Optional
Default = false
If true
, performs genome binning and refinement of bins with metaWRAP.
Click to open or hide
File
Required
FASTQ file with forward raw reads. Must be Illumina paired-end.
File
Required
FASTQ file with reverse raw reads. Must be Illumina paired-end.
String
Required
Name or ID of the sample.
String
Required
Genus or species name, as determined in the lab or by GAMBIT. Must be written in full, with whitespaces (e.g., Escherichia coli and not E. coli nor Escherichia_coli). Used to select a StrainGE reference database by matching the genus passed in this parameter with those in each database. Multiple genera are supported through a "/" separator, such as "Escherichia/Klebsiella" - this will call StrainGE twice, once using an Escherichia database, and once with a Klebsiella database.
Click to open or hide
File
Required
TSV configuration file for StrainGE databases. It should be a table with two columns: one with the database genus name (e.g., Escherichia or Proteus), and another with the path to the tar archive with the StrainGE database for that genus. An example of this table is:
Escherichia | gs://fc-secure-uuid/databases/strainge/escherichia_shigella.tar.gz |
---|---|
Shigella | gs://fc-secure-uuid/databases/strainge/escherichia_shigella.tar.gz |
Pseudomonas | gs://fc-secure-uuid/databases/strainge/pseudomonas.tar.gz |
Staphylococcus | gs://fc-secure-uuid/databases/strainge/staphylococcus.tar.gz |
Proteus | gs://fc-secure-uuid/databases/strainge/proteus.tar.gz |
Klebsiella | gs://fc-secure-uuid/databases/strainge/klebsiella.tar.gz |
Acinetobacter | gs://fc-secure-uuid/databases/strainge/acinetobacter.tar.gz |
Enterobacter | gs://fc-secure-uuid/databases/strainge/enterobacter.tar.gz |
Enterococcus | gs://fc-secure-uuid/databases/strainge/enterococcus.tar.gz |
Int
Optional
Default = 5
Maximum number of strains searched by StrainGST.
Int
Optional
Default = 100
Disk size in Gb for the StrainGE task.
Int
Optional
Default = 4
Number of CPUs used in the StrainGE task.
Int
Optional
Default = 128
RAM in Gb for the StrainGE task.
Int
Optional
Default = 23
K-mer sized used when creating the StrainGST databases.
Click to open or hide
File
Optional
Compressed Kraken2/Bracken database as a tar archive. Required if call_kraken
is true
. Please make sure that the archive contains all the files needed to run Bracken (refer to the Bracken docs).
Int
Optional
Default = 150
Input read length.
String
Optional
Default = "G"
Taxonomic level for Bracken abundance estimation. Defaults to genus level (G). Other possible options are K (kingdom level), P (phylum), C (class), O (order), F (family), and S (species)
Int
Optional
Default = 256
Disk size in Gb for the Kraken2/Bracken task.
Int
Optional
Default = 4
Number of CPUs used in the Kraken2/Bracken task task.
Int
Optional
Default = 32
RAM in Gb for the Kraken2/Bracken task task.
Click to open or hide
File
Optional
Compressed checkM database as a tar archive to be used in metaWRAP. Required if call_metawrap
is true
.
Int
Optional
Default = 80
Minimum completion of MAG bins, as determined by checkM, in percentage (i.e., a value of 80
means only bins with comlpetion greater or equal to 80% will be returned).
Int
Optional
Default = 10
Maximum contamination of MAG bins, as determined by checkM, in percentage (i.e., a value of 10
means only bins with contamination less than or equal to 10% will be returned).
Int
Optional
Default = 1000
Minimum length in bp of contigs included in the MAG bins returned by metaWRAP.
String
Optional
Default = "--metabat2 --maxbin2 --concoct"
Contig binning tools used in metaWRAP, as flags. These flags are used in the metaWRAP binning command.
Click to open or hide
String
Version of the TheiaMeta workflow used.
String
Analysis date.
Click to open or hide
String
Optional
Kraken2 version used.
String
Optional
Name of the Kraken2 and Bracken Docker image used.
File
Optional
Kraken2 report with taxa classifications.
File
Optional
Percentage of human-classified reads in the sample as determined by Kracken2.
File
Optional
Bracken report with estimated abundances per taxon.
String
Optional
Bracken version used.
Click to open or hide
File
Optional
FASTQ file of forward reads with human reads removed.
File
Optional
FASTQ file of reverse reads with human reads removed.
String
Optional
Name of the NCBI Human Scrubber Docker image used.
Int
Number of reads in read1
.
Int
Number of reads in read2
.
Int
Number of pairs of reads in read1
and read2
(raw reads).
String
Version of fastq_scan used.
String
Optional
Name of the fastq_scan Docker image used.
Int
Number of reads after QC in read1_clean
.
Int
Number of reads after QC in read2_clean
.
Int
Number of read pairs after QC in read1_clean
and read2_clean
(clean reads).
String
Optional
Version of trimmomatic used.
String
Optional
Name of the trimmomatic Docker image used.
File
FASTQ file with forward cleaned reads, after QC and de-hosting.
File
FASTQ file with reverse cleaned reads, after QC and de-hosting.
String
Name of the BBDuk Docker image used.
Float
Average length in bp of clean reads (read1_clean
and read2_clean
).
Click to open or hide
File
FASTA file of the final assembly (MAG). If no reference
is used, it is the output from metaSPAdes + Pilon.
String
Version of metaSPAdes used.
String
Name of the metaSPAdes Docker image used.
String
Version of minimap2 used.
String
Name of the minimap2 Docker image used.
String
Version of samtools used.
String
Name of the samtools Docker image used.
String
Version of Pilon used.
String
Name of the Pilon Docker image used.
Int
Assembly (MAG) length in bp.
Int
Number of MAG contigs.
Int
Length of the largest MAG contig in bp.
String
Version of Quast used for assembly QC.
String
Name of the Quast Docker image used for assembly QC.
Float
Optional
Percentage coverage of the reference genome (if provided).
Float
Optional
Mean depth of coverage of the final assembly. Returned only if output_additional_files
is true
.
String
Optional
Version of bedtools used for assembly QC. Returned only if output_additional_files
is true
.
String
Optional
Name of the bedtools Docker image used for assembly QC. Returned only if output_additional_files
is true
.
File
Optional
Unmapped forwards reads to the assembly. Returned only if output_additional_files
is true
.
File
Optional
Unmapped reverse reads to the assembly. Returned only if output_additional_files
is true
.
File
Optional
Mapped forwards reads to the assembly. Returned only if output_additional_files
is true
.
File
Optional
Mapped reverse reads to the assembly. Returned only if output_additional_files
is true
.
Float
Optional
Percentage of mapped reads to the assembly. Returned only if output_additional_files
is true
.
Click to open or hide
Array[File]
Optional
Files with k-merized input reads. The size of the array depends on how many genera are assigned in lab_determined_genus
, but the contents of each file should be the same. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
StrainGST databases used in each call to StrainGE. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
and straingst_found_db
are true
.
Boolean
Optional
Whether a StrainGST database matching the genera in lab_determined_genus
was found. Returned only if call_strainge
is true
.
Array[File]
Optional
Text files of strains found by StrainGST when using each database. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
Reports with StrainGST statistics, including strain relative abundances, with each databased used. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
Concatenated references as FASTA files needed for downstream StrainGR analysis. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
Indexed BAM file of clean reads mapped to the concatenated references (in straingr_concat_fasta
). The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
HDF5 files with variant calling results from StrainGR. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
and straingst_found_db
are true
.
Array[File]
Optional
StrainGR reports with variant calling statistics. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
and straingst_found_db
are true
.
Array[String]
Optional
Name of the StrainGE Docker image used for strain-level detection. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
is true
.
Array[String]
Optional
Version of StrainGE used for strain-level detection. The size of the array depends on how many genera are assigned in lab_determined_genus
. Returned only if call_strainge
is true
.
Click to open or hide
File
MAG gene annotations from Bakta in GenBank format.
File
MAG gene annotations from Bakta in GFF3 format.
File
MAG gene annotations from Bakta in TSV format.
File
Summary report of MAG gene annotation from Bakta.
String
Version of Bakta used for MAG gene annotation.
Click to open or hide
File
TSV file with plasmid/chromosome classification of contigs from MOB-recon.
File
TSV file with plasmid typing results from MOB-typer.
File
FASTA file of chromosomal contigs in the MAG.
File
FASTA file of plasmid contigs in the MAG.
String
Name of the MOB-recon/MOB-suite Docker image used for plasmid identification.
String
Version of MOB-recon/MOB-suite used for plasmid identification.
Click to open or hide
File
Report of all genes (virulence, stress, and AMR) found by AMRFinderPlus, as a TSV file.
File
Report of AMR genes found by AMRFinderPlus, as a TSV file.
File
Report of stress genes found by AMRFinderPlus, as a TSV file.
File
Report of virulence genes found by AMRFinderPlus, as a TSV file.
String
Comma separated list of core AMR genes found by AMRFinderPlus.
String
Comma separated list of plus AMR genes found by AMRFinderPlus.
String
Comma separated list of stress genes found by AMRFinderPlus.
String
Comma separated list of virulence genes found by AMRFinderPlus.
String
Comma separated list of classes of antimicrobial drugs for which AMR genes were found by AMRFinderPlus.
String
Comma separated list of subclasses of antimicrobial drugs for which AMR genes were found by AMRFinderPlus.
String
Version of AMRFinderPlus used for AMR genotyping.
String
Version of AMRFinderPlus database used for AMR genotyping.
Click to open or hide
String
Optional
Name of the metaWRAP Docker image used for MAG binning and bin refinement. Returned only if call_metawrap
is true
.
String
Optional
Version of metaWRAP used for MAG binning and bin refinement. Returned only if call_metawrap
is true
.
File
Optional
TSV with MAG bins statistics (including contamination and completeness). Returned only if call_metawrap
is true
.
Int
Optional
Number of MAG bins found by metaWRAP. Returned only if call_metawrap
is true
.
String
Optional
Contig binning tools used in metaWRAP, as flags. These flags were used in the metaWRAP binning command and provided as an input parameter. They are returned as outputs for record keeping. Returned only if call_metawrap
is true
.
Array[File]
Optional
MAG bins found by metaWRAP as FASTA files. Returned only if call_metawrap
is true
.
File
Optional
Bin assignments to each contig in the MAG, as a TSV file. Returned only if call_metawrap
is true
.
Marco Teixeira
Colin Worby