WDL wrapper around RefNAAP for execution on Terra.bio.
RefNAAP is a reference based Oxford Nanopore Technologies (ONT) assembly analysis pipeline for RABV genomes. In summary, it performs the following steps:
- t QCs the files using fastQC and multiQC to generate a quality report.
- It trims the left and right ends of the reads by 25 basepairs, and filters out reads shorter than 50bp. These values can be costumized.
- It generates the assembly reads using reference-based assembly with minimap2, gap fixing, and medaka.
It uses a reference file composed of 14 different RABV sequences for the reference-based assembly.
RefNAAP-wdl
is available on Terra.bio, a cloud-native platform for researchers to access data, run analysis tools, and collaborate. With Terra.bio, you can easily process your data without prior knowledge of the command-line.
The following steps, assume you have already set up an account on Terra.bio and created a workspace to work with RefNAAP-wdl
.
To begin using RefNAAP-wdl
on Terra.bio, you will need to import the workflow from Dockstore, which is available at: RefNAAP-wdl Dockstore Import.
Figure 1: RefNAAP-wdl on Dockstore.
Once you are on the Dockstore page for RefNAAP-wdl
, you will want to locate the Launch with
section on the right side of the page and click on Terra.
Figure 2: Launching a workflow with Terra.bio on Dockstore.
After clicking the Terra button, you will be transported to Terra.bio. Once here you will decide on the Destination Workspace. Please select which of your workspaces you would like to import this workflow into. Once you have selected a Destination Workspace, all that remains is to click the Import button.
Figure 3: Importing workflow interface on Terra.bio.
The RefNAAP-wdl
should now be available in Terra.bio on the WORKFLOWS tab. When clicking on the RefNAAP-wdl
the workflow interface loads. On the workflow configuration section you will need to select the Run workflow(s) with inputs defined by data table. RefNAAP-wdl
is a sample-level workflow.
Figure 4: RefNAAP-wdl on Terra.bio.
Several inputs are available for workflow costumization: required inputs that are necessary for execution, and optional inputs that have default values but can be overwritten by the user.
Note: To provide inputs from the data table, terra uses the
this.{column_name}
notation. For example, to pass the ONT reads that are in theont_read
column on the data table to theread1
input, the value should be passed asthis.ont_reads
.
Figure 5: RefNAAP-wdl inputs.
Table 1: Input description for RefNAAP-wdl
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
refnaap_wf | read1 | File | Base-called ONT read file in FASTQ file format (compressed) | Required | |
refnaap_wf | samplename | String | Name of sample to be analyzed | Required | |
refnaap | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
refnaap | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
refnaap | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/internal/refnaap:b3ad097" | Optional |
refnaap | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
refnaap | min_coverage | Int | Amplicon regions need a minimum of this average coverage number | 5 | Optional |
refnaap | model | String | Basecall model | "r10_min_high_g303" | Optional |
refnaap | size | Int | Filter reads less than this length | 50 | Optional |
refnaap | trim_left | Int | Bases to trim from left side of read | 25 | Optional |
refnaap | trim_right | Int | Bases to trim from right side of read | 25 | Optional |
Note: Available basecall models:
r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r10_min_high_g303, r10_min_high_g340, r941_min_fast_g303, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_prom_fast_g303, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360
The RefNAAP-wdl
produces four outputs that are populated back to the data table.
Figure 5: RefNAAP-wdl outputs.
Table 2: Output description for RefNAAP-wdl
Variable | Type | Description |
---|---|---|
refnaap_analysis_date | String | Date of analysis with RefNAAP. |
refnaap_assembly_fasta | File | Consensus assembly generated by RefNAAP in FASTA format. |
refnaap_docker | String | Dockerfile used. |
refnaap_multiqc_report | File | MultiQC report generated by RefNAAP in HTML format. |
If you have any questions or concerns, please raise a GitHub issue or email Theiagen's general support at [email protected].