Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Workflow] Flye_denovo to replace DragonFlye #692

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
38b1caa
placeholder
sage-wright Oct 4, 2024
cdf0913
make flye task
sage-wright Oct 9, 2024
46629d8
rename fasta
sage-wright Oct 10, 2024
b45b41d
make workflow a workflow
sage-wright Oct 11, 2024
62b48e2
update output files for flye
fraser-combe Nov 19, 2024
787121c
v1 bandage plot flye assembly visual
fraser-combe Nov 19, 2024
e9f29b4
medaka initial commit
fraser-combe Nov 20, 2024
df7f79f
initial commit racon framework
fraser-combe Nov 20, 2024
5a3fcd6
framework for tasks dnaapler porechop and racon
fraser-combe Nov 21, 2024
d6fe1bc
update docker images
fraser-combe Nov 22, 2024
89f7d3b
update outdir medaka
fraser-combe Nov 22, 2024
35982db
update medaka and dnaapler
fraser-combe Nov 25, 2024
d738ab1
add polypolish and separate bwa mem -a tasks
sage-wright Nov 25, 2024
85a68dc
remove comment cruft
sage-wright Nov 25, 2024
c7db697
initial commit bash contig filtering
fraser-combe Nov 25, 2024
0371972
initial commit bash contig filtering
fraser-combe Nov 25, 2024
facd02f
update medaka docker image
fraser-combe Nov 27, 2024
d26fc95
refactor assembly tasks and workflows for clarity and consistency
fraser-combe Nov 27, 2024
d45481e
add dnaapler to wf
fraser-combe Nov 27, 2024
eef27a9
update racon
fraser-combe Nov 27, 2024
0fee522
add polisher options to flye_consensus wf
fraser-combe Nov 27, 2024
17289d1
update workflow and tasks and altered racon with minimap in docker to…
fraser-combe Nov 29, 2024
646b116
update dnapler and tody flye consensus wf
fraser-combe Dec 2, 2024
8e56dc0
update filter contigs task initial attempt aowrking
fraser-combe Dec 3, 2024
f37bf3b
updated flye consensus wf and filter contigs
fraser-combe Dec 3, 2024
2865a2d
update docker images porechop and dnaapler
fraser-combe Dec 9, 2024
f9cbde4
optional trim and polish tasks, update porechop and dnaapler mode
fraser-combe Dec 9, 2024
4b97b30
incporporate hybrid assemblies with polypolish
fraser-combe Dec 9, 2024
1993791
update meta wf description
fraser-combe Dec 9, 2024
fa10d0c
start updating docs, remove run polypolish logic and update po0lypoli…
fraser-combe Dec 9, 2024
63c7e2e
update racon with polishing round logic with updated minimap2 in dock…
fraser-combe Dec 9, 2024
c92358b
additional comments for filtering contigs logic
fraser-combe Dec 10, 2024
420ed91
add assembly stats output
fraser-combe Dec 10, 2024
74d62a9
update filter task metrics output
fraser-combe Dec 10, 2024
ac3667f
per dev meeting update wf name, remove metrics output, re arrange fol…
fraser-combe Dec 10, 2024
c6064cb
update wf to pass miniwdl checks
fraser-combe Dec 10, 2024
6878957
update medaka top use auto model selection or user provide overide
fraser-combe Dec 10, 2024
1d2de80
all local tests successful for each path now to add to theiaprok
fraser-combe Dec 11, 2024
afeceb1
update flye call
fraser-combe Dec 11, 2024
cc5f966
add all tasks inputs to subworkflow for terra, increase CPU allocatio…
fraser-combe Dec 12, 2024
475e543
rename some input specific variables for terra users to know which ta…
fraser-combe Dec 12, 2024
e86e193
debugging racon terra
fraser-combe Dec 12, 2024
97df2ec
debugging potential memory usage increase for racon
fraser-combe Dec 12, 2024
6309913
more debugging for racon failing terra only
fraser-combe Dec 12, 2024
fac2a23
trying new dockerfile with updated cmake command for cpu compatibility
fraser-combe Dec 12, 2024
bc8fafa
trying test dockerfile with updated cpu installs
fraser-combe Dec 13, 2024
a379f6b
trying new racon build with flags for cpu optimization on terra
fraser-combe Dec 13, 2024
3938c18
Increase maxRetries for dnaapler and contig_filter tasks; add bandage…
fraser-combe Dec 13, 2024
01ebb7b
docs update theiaprok
fraser-combe Dec 13, 2024
84df24f
Refactor workflows to standardize assembly output variable names; inc…
fraser-combe Dec 13, 2024
2ec4e8f
add versions output to theiaprok
fraser-combe Dec 16, 2024
9ffc554
update theiaprok wf
fraser-combe Dec 16, 2024
b088c74
versions output
fraser-combe Dec 16, 2024
abad81b
medaka model output
fraser-combe Dec 16, 2024
beaec7a
update medaka model docs information for users
fraser-combe Dec 16, 2024
fcd39b9
update medaka model selection order
fraser-combe Dec 16, 2024
47155c0
Merge branch 'main' into smw-flye-dev
fraser-combe Dec 17, 2024
78d3a5f
update md sums for theiaprok after merge main
fraser-combe Dec 17, 2024
123f7ee
remove versioning task from fle sub wf
fraser-combe Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 167 additions & 27 deletions docs/workflows/genomic_characterization/theiaprok.md

Large diffs are not rendered by default.

45 changes: 44 additions & 1 deletion tasks/alignment/task_bwa.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -153,4 +153,47 @@ task bwa {
preemptible: 0
maxRetries: 3
}
}
}

task bwa_all {
input {
File draft_assembly_fasta
File read1
File read2
String samplename

Int cpu = 6
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/staphb/bwa:0.7.18"
Int memory = 16
}
command <<<
bwa &> BWA_HELP
grep "Version" BWA_HELP | cut -d" " -f2 > BWA_VERSION

if [[ ! -f "~{draft_assembly_fasta}.bwt" ]]; then
echo "Indexing reference genome: ~{draft_assembly_fasta}"
bwa index ~{draft_assembly_fasta}
else
echo "Reference genome is already indexed: ~{draft_assembly_fasta}"
fi

bwa mem -t ~{cpu} -a ~{draft_assembly_fasta} ~{read1} > ~{samplename}_R1.sam
bwa mem -t ~{cpu} -a ~{draft_assembly_fasta} ~{read2} > ~{samplename}_R2.sam

>>>
output {
File read1_sam = "~{samplename}_R1.sam"
File read2_sam = "~{samplename}_R2.sam"
String bwa_version = read_string("BWA_VERSION")
}
runtime {
docker: "~{docker}"
memory: "~{memory} GB"
cpu: "~{cpu}"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
30 changes: 30 additions & 0 deletions tasks/assembly/task_bandageplot.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
version 1.0

task bandage_plot {
input {
File assembly_graph_gfa
String samplename
Int cpu = 2
Int memory = 4
Int disk_size = 10
String docker = "us-docker.pkg.dev/general-theiagen/staphb/bandage:0.8.1"
}
command <<<
set -euo pipefail
Bandage --version | tee VERSION
Bandage image ~{assembly_graph_gfa} ~{samplename}_bandage_plot.png
>>>
output {
File plot = "~{samplename}_bandage_plot.png"
String bandage_version = read_string("VERSION")
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " HDD"
disk: disk_size + " GB"
maxRetries: 1
preemptible: 0
}
}
72 changes: 72 additions & 0 deletions tasks/assembly/task_dnaapler.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
version 1.0

task dnaapler_all {
input {
File input_fasta
String samplename
String dmaapler_mode = "all" # The mode of reorientation to execute (default: 'all')
Int cpu = 4
Int disk_size = 100
Int memory = 16
String docker = "us-docker.pkg.dev/general-theiagen/staphb/dnaapler:1.0.1"
}
command <<<
set -euo pipefail

# Check input FASTA is valid
echo "Validating input FASTA file..."
if ! grep -q "^>" ~{input_fasta}; then
echo "ERROR: Input file ~{input_fasta} is not in FASTA format." >&2
exit 1
else
echo "Input FASTA file is valid."
fi

# dnaapler version
dnaapler --version | tee VERSION

# Create a subdirectory for dnaapler outputs
output_dir="dnaapler_output"
mkdir -p "$output_dir"
echo "Output directory created: $output_dir"

# Run dnaapler with the 'all' subcommand
echo "Running dnaapler..."
dnaapler ~{dmaapler_mode} \
-i ~{input_fasta} \
-o "$output_dir" \
-p ~{samplename} \
-t ~{cpu} \
-f || {
echo "ERROR: dnaapler command failed. Check logs for details." >&2
exit 1
}

echo "dnaapler command completed successfully."

# Check if output FASTA file exists
if [[ ! -f "$output_dir"/~{samplename}_reoriented.fasta ]]; then
echo "ERROR: Expected output file not found: $output_dir/~{samplename}_reoriented.fasta" >&2
exit 1
fi

# Move the final reoriented FASTA file to the task's working directory
echo "Moving output FASTA file to working directory..."
mv "$output_dir"/~{samplename}_reoriented.fasta .

echo "dnaapler task completed successfully for sample: ~{samplename}"
>>>
output {
File reoriented_fasta = "~{samplename}_reoriented.fasta"
String dnaapler_version = read_string("VERSION")
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
96 changes: 0 additions & 96 deletions tasks/assembly/task_dragonflye.wdl

This file was deleted.

90 changes: 90 additions & 0 deletions tasks/assembly/task_flye.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
version 1.0

task flye {
input {
File read1
String samplename

# data type options; by default, uses --nano-raw
Boolean ont_corrected = false
Boolean ont_high_quality = false
Boolean pacbio_raw = false
Boolean pacbio_corrected = false
Boolean pacbio_hifi = false

Int? genome_length # requires `asm_coverage`
Int? asm_coverage # reduced coverage for initial disjointig assembly

Int flye_polishing_iterations = 1
Int? minimum_overlap

Float? read_error_rate
Boolean uneven_coverage_mode = false
Boolean keep_haplotypes = false
Boolean no_alt_contigs = false
Boolean scaffold = false

String? additional_parameters # Any extra Flye-specific parameters

Int cpu = 4
Int disk_size = 100
String docker = "us-docker.pkg.dev/general-theiagen/staphb/flye:2.9.4"
Int memory = 32
}
command <<<
set -euo pipefail
flye --version | tee VERSION

# determine read type
if ~{ont_corrected}; then
READ_TYPE="--nano-corr"
elif ~{ont_high_quality}; then
READ_TYPE="--nano-hq"
elif ~{pacbio_raw}; then
READ_TYPE="--pacbio-raw"
elif ~{pacbio_corrected}; then
READ_TYPE="--pacbio-corr"
elif ~{pacbio_hifi}; then
READ_TYPE="--pacbio-hifi"
else
READ_TYPE="--nano-raw"
fi

# genome size parameter requires asm_coverage
flye \
${READ_TYPE} ~{read1} \
--iterations ~{flye_polishing_iterations} \
~{"--min-overlap" + minimum_overlap} \
~{if defined(asm_coverage) then "--genome-size " + genome_length else ""} \
~{"--asm-coverage " + asm_coverage} \
~{"--read-error " + read_error_rate} \
~{true="--meta" false="" uneven_coverage_mode} \
~{true="--keep-haplotypes" false="" keep_haplotypes} \
~{true="--no-alt-contigs" false="" no_alt_contigs} \
~{true="--scaffold" false="" scaffold} \
~{"--extra-params " + additional_parameters } \
--threads ~{cpu} \
--out-dir .

mv assembly.fasta ~{samplename}.assembly.fasta
mv assembly_info.txt ~{samplename}.assembly_info.txt
mv assembly_graph.gfa ~{samplename}.assembly_graph.gfa

>>>
output {
File assembly_fasta = "~{samplename}.assembly.fasta"
File assembly_graph_gfa = "~{samplename}.assembly_graph.gfa"
File assembly_info = "~{samplename}.assembly_info.txt"
String flye_version = read_string("VERSION")
String flye_docker = "~{docker}"
}
runtime {
docker: "~{docker}"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
Loading
Loading