-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB #272
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* updated VCF output file renaming in kSNP3 task (#207) * updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive * ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling * added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename" * reduce unnecessary logging in MIDAS task (#210) * made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!) * update CI * expose tbprofiler parameters as inputs in merlin * input spelling --------- Co-authored-by: Curtis Kapsak <[email protected]>
sage-wright
changed the title
[TBProfiler_tNGS_PHB] Introduction of tNGS workflow for TB
TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB
Dec 15, 2023
…/public_health_bioinformatics into smw-tngs-tbprofiler-dev
cimendes
approved these changes
Apr 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #276 by introducing the TBProfiler_tNGS_PHB workflow, designed for Illumina PE tNGS data.
🗑️ This dev branch should NOT be deleted after merging to main.
🧠 Aim, Context and Functionality
tNGS is being used to analyze Mycobacterium tuberculosis data for clinical usage. Targeted sequence requires different analysis approaches to WGS, which means that TheiaProk workflows cannot be used as they are intended to create an assembled genome. Since this data is fragmented and amplicon-based, creating an assembly is a bad idea.
TBProfiler_tNGS_PHB is our solution: a workflow that performs minimal QC and runs TBProfiler and tbp-parser by default.
The minimal QC performed is as follows:
trimmomatic
is run using a workflow parameterbases_to_crop
(default=30) which cuts 30bp from the start and all bases that fall after a (average_read_length - 30bp) limit in the hope to remove primers and other sequencing artifacts.clockwork
is currently not implemented due to difficult to resolve issues experienced during implementation of the tool.🛠️ Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
📋 Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed:
tbp-parser has been updated to v1.3.0 which includes tNGS compatibility via the inclusion of the tNGS primer region bed file.
Databases or database versions changed:
No database changes.
Data processing/commands changed:
A new input parameter
trimmomatic_base_crop
is added to the trimmomatic_pe task. This Integer variable, if provided, will trigger calculation of average read length and creation of new parameters for the trimmomatic task, specifically: HEADCROP and CROP.HEADCROP:<int>
indicates the number of bases to remove from the START of the readCROP:<int>
indicates the FINAL LENGTH of the read that will be kept from the start of the read; any bases after this length will be removed.Average read length is used to determine the CROP value dynamically; the
trimmomatic_base_crop
value will be removed from the average read length. HEADCROP is set to equaltrimmomatic_base_crop
.No other analysis changes have been made to TBProfiler and tbp-parser (other than updated tbp-parser version, description available in tbp-parser repository).
File processing changed:
No file processing changes.
Compute resources changed:
No compute resources changes.
➡️ Inputs
All inputs are new because this is a new workflow.
New required inputs:
File read1
File read2
String samplename
New optional inputs for
tbp_parser
task:Int coverage threshold
Int cpu
Int disk_size
String docker
Int memory
Int min_depth
String operator
String sequencing_method
Boolean tbp_parser_debug
New optional inputs for
tbprofiler
task:Int cov_frac_threshold
Int cpu
Int disk_size
String mapper
Float min_af
Float min_af_pred
Int min_depth
Boolean ont_data
File tbprofiler_custom_db
String tbprofiler_docker_image
Boolean tbprofiler_run_custom_db
String variant_caller
String variant_calling_params
New optional inputs for
tbprofiler_tngs
workflow:Int bases_to_crop
New optional inputs for
trimmomatic_pe
task:Int disk_size
String docker
Int threads
String trimmomatic_args
Int trimmomatic_minlen
Int trimmomatic_quality_trim_score
Int trimmomatic_window_size
New optional inputs for
version_capture
task:String docker
String timezone
⬅️ Outputs
All outputs are new because this is a new workflow.
New outputs (in alphabetical order):
tbp_parser_average_genome_depth
tbp_parser_coverage_report
tbp_parser_docker
tbp_parser_genome_percent_coverage
tbp_parser_laboratorian_report_csv
tbp_parser_lims_report_csv
tbp_parser_looker_report_csv
tbp_parser_version
tbprofiler_dr_type
tbprofiler_main_lineage
tbprofiler_median_coverage
tbprofiler_num_dr_variants
tbprofiler_num_other_variants
tbprofiler_output_alignment_bai
tbprofiler_output_alignment_bam
tbprofiler_pct_reads_mapped
tbprofiler_report_csv
tbprofiler_report_json
tbprofiler_report_tsv
tbprofiler_resistance_genes
tbprofiler_sub_lineage
tbprofiler_tngs_wf_analysis_date
tbprofiler_tngs_wf_version
tbprofiler_version
trimmomatic_docker
trimmomatic_read1_trimmed
trimmomatic_read2_trimmed
trimmomatic_stats
trimmomatic_version
🧪 Testing
Test Dataset
Command-line Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
🔬 Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)