diff --git a/docs/assets/figures/TheiaEuk_Illumina_PHB_20241106.png b/docs/assets/figures/TheiaEuk_Illumina_PHB_20241106.png new file mode 100644 index 000000000..241b7bb8b Binary files /dev/null and b/docs/assets/figures/TheiaEuk_Illumina_PHB_20241106.png differ diff --git a/docs/assets/new_workflow_template.md b/docs/assets/new_workflow_template.md index 9e7ef6799..41c2b1895 100644 --- a/docs/assets/new_workflow_template.md +++ b/docs/assets/new_workflow_template.md @@ -4,7 +4,7 @@ | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | |---|---|---|---|---| -| [Workflow Type](../../workflows_overview/workflows_type.md/#link-to-workflow-type) | [Applicable Kingdom](../../workflows_overview/workflows_kingdom.md/#link-to-applicable-kingdom) | PHB | | | +| [Link to Workflow Type](../../workflows_overview/workflows_type.md/#link-to-workflow-type) | [Link to Applicable Kingdom](../../workflows_overview/workflows_kingdom.md/#link-to-applicable-kingdom) | PHB | | | ## Workflow_Name_On_Terra @@ -12,6 +12,8 @@ Description of the workflow. ### Inputs +Input should be ordered as they appear on Terra + | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| | task_name | **variable_name** | Type | Description | Default Value | Required/Optional | @@ -24,12 +26,12 @@ Description of the workflow tasks Description of the task !!! techdetails "Tool Name Technical Details" - | | Links | - | --- | --- | + | | Links | + | --- | --- | | Task | [link to task on GitHub] | | Software Source Code | [link to tool's source code] | | Software Documentation | [link to tool's documentation] | - | Original Publication | [link to tool's publication] | + | Original Publication(s) | [link to tool's publication] | ### Outputs diff --git a/docs/contributing/doc_contribution.md b/docs/contributing/doc_contribution.md index 7f20e5491..940468961 100644 --- a/docs/contributing/doc_contribution.md +++ b/docs/contributing/doc_contribution.md @@ -14,7 +14,7 @@ To test your documentation changes, you will need to have the following packages pip install mkdocs-material mkdocs-material-extensions mkdocs-git-revision-date-localized-plugin mike mkdocs-glightbox ``` -The live preview server can be activated by running the following command: +Once installed, navigate to the top directory in PHB. The live preview server can be activated by running the following command: ```bash mkdocs serve @@ -34,49 +34,7 @@ Here are some VSCode Extensions can help you write and edit your markdown files - [Excel to Markdown Table](https://tableconvert.com/excel-to-markdown) - This website will convert an Excel table into markdown format, which can be copied and pasted into your markdown file. - [Material for MkDocs Reference](https://squidfunk.github.io/mkdocs-material/reference/) - This is the official reference for the Material for MkDocs theme, which will help you understand how to use the theme's features. -- [Broken Link Check](https://www.brokenlinkcheck.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version. - -## Documentation Structure - -A brief description of the documentation structure is as follows: - -- `docs/` - Contains the Markdown files for the documentation. - - `assets/` - Contains images and other files used in the documentation. - - `figures/` - Contains images, figures, and workflow diagrams used in the documentation. For workflows that contain many images (such as BaseSpace_Fetch), it is recommended to create a subdirectory for the workflow. - - `files/` - Contains files that are used in the documentation. This may include example outputs or templates. For workflows that contain many files (such as TheiaValidate), it is recommended to create a subdirectory for the workflow. - - `logos/` - Contains Theiagen logos and symbols used int he documentation. - - `metadata_formatters/` - Contains the most up-to-date metadata formatters for our submission workflows. - - `new_workflow_template.md` - A template for adding a new workflow page to the documentation. - - `contributing/` - Contains the Markdown files for our contribution guides, such as this file - - `javascripts/` - Contains JavaScript files used in the documentation. - - `tablesort.js` - A JavaScript file used to enable table sorting in the documentation. - - `overrides/` - Contains HTMLs used to override theme defaults - - `main.html` - Contains the HTML used to display a warning when the latest version is not selected - - `stylesheets/` - Contains CSS files used in the documentation. - - `extra.css` - A custom CSS file used to style the documentation; contains all custom theme elements (scrollable tables, resizable columns, Theiagen colors), and custom admonitions. - - `workflows/` - Contains the Markdown files for each workflow, organized into subdirectories by workflow category - - `workflows_overview/` - Contains the Markdown files for the overview tables for each display type: alphabetically, by applicable kingdom, and by workflow type. - - `index.md` - The home/landing page for our documentation. - -### Adding a Page for a New Workflow {#new-page} - -If you are adding a new workflow, there are a number of things to do in order to include the page in the documentation: - -1. Add a page with the title of the workflow to appropriate subdirectory in `docs/workflows/`. Feel free to use the template found in the `assets/` folder. -2. Collect the following information for your new workflow: - - Workflow Name - Link the name with a relative path to the workflow page in appropriate `docs/workflows/` subdirectory - - Workflow Description - Brief description of the workflow - - Applicable Kingdom - Options: "Any taxa", "Bacteria", "Mycotics", "Viral" - - Workflow Level (_on Terra_) - Options: "Sample-level", "Set-level", or neither - - Command-line compatibility - Options: "Yes", "No", and/or "Some optional features incompatible" - - The version where the last known changes occurred (likely the upcoming version if it is a new workflow) - - Link to the workflow on Dockstore (if applicable) - Workflow name linked to the information tab on Dockstore. -3. Format this information in a table. -4. Copy the previously gathered information to ==**ALL THREE**== overview tables in `docs/workflows_overview/`: - - `workflows_alphabetically.md` - Add the workflow in the appropriate spot based on the workflow name. - - `workflows_kingdom.md` - Add the workflow in the appropriate spot(s) based on the kingdom(s) the workflow is applicable to. Make sure it is added alphabetically within the appropriate subsection(s). - - `workflows_type.md` - Add the workflow in the appropriate spot based on the workflow type. Make sure it is added alphabetically within the appropriate subsection. -5. Copy the path to the workflow to ==**ALL**== of the appropriate locations in the `mkdocs.yml` file (under the `nav:` section) in the main directory of this repository. These should be the exact same spots as in the overview tables but without additional information. This ensures the workflow can be accessed from the navigation sidebar. +- [Dead Link Check](https://www.deadlinkchecker.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version. ## Standard Language & Formatting Conventions @@ -98,10 +56,11 @@ The following language conventions should be followed when writing documentation - **Bold Text** - Use `**bold text**` to indicate text that should be bolded. - _Italicized Text_ - Use `_italicized text_` to indicate text that should be italicized. - ==Highlighted Text== - Use `==highlighted text==` to indicate text that should be highlighted. -- `Code` - Use \`code\` to indicate text that should be formatted as code. +- `Code` - Use ````code` ``` (backticks) to indicate text that should be formatted as code. - ^^Underlined Text^^ - Use `^^underlined text^^` to indicate text that should be underlined (works with our theme; not all Markdown renderers support this). - > Citations - Use a `>` to activate quote formatting for a citation. Make sure to separate multiple citations with a comment line (``) to prevent the citations from running together. + - Use a reputable citation style (e.g., Vancouver, Nature, etc.) for all citations. - Callouts/Admonitions - These features are called "call-outs" in Notion, but are "Admonitions" in MkDocs. [I highly recommend referring to the Material for MkDocs documentation page on Admonitions to learn how best to use this feature](https://squidfunk.github.io/mkdocs-material/reference/admonitions/). Use the following syntax to create a callout: ```markdown @@ -116,18 +75,37 @@ The following language conventions should be followed when writing documentation !!! dna This is a DNA admonition. Admire the cute green DNA emoji. You can create this with the `!!! dna` syntax. + Use this admonition when wanting to convey general information or highlight specific facts. + ???+ toggle This is a toggle-able section. The emoji is an arrow pointing to the right downward. You can create this with the `??? toggle` syntax. I have added a `+` at the end of the question marks to make it open by default. + Use this admonition when wanting to provide additional _optional_ information or details that are not strictly necessary, or take up a lot of space. + ???+ task This is a toggle-able section **for a workflow task**. The emoji is a gear. Use the `??? task` syntax to create this admonition. Use `!!! task` if you want to have it be permanently expanded. I have add a `+` at the end of the question marks to make this admonition open by default and still enable its collapse. + Use this admonition when providing details on a workflow, task, or tool. + !!! caption - This is a caption. The emoji is a painting. You can create this with the `!!! caption` syntax. This is used to enclose an image in a box and looks nice. A caption can be added beneath the picture and will also look nice. + This is a caption. The emoji is a painting. You can create this with the `!!! caption` syntax. A caption can be added beneath the picture and will also look nice. + + Use this admonition when including images or diagrams in the documentation. !!! techdetails This is where you will put technical details for a workflow task. You can create this by `!!! techdetails` syntax. + Use this admonition when providing technical details for a workflow task or tool. These admonitions should include the following table: + + | | Links | + | --- | --- | + | Task | [link to the task file in the PHB repository on GitHub] | + | Software Source Code | [link to tool's source code] | + | Software Documentation | [link to tool's documentation] | + | Original Publication(s) | [link to tool's publication] | + + If any of these items are unfillable, delete the row. + - Images - Use the following syntax to insert an image: ```markdown @@ -135,7 +113,7 @@ The following language conventions should be followed when writing documentation ![Alt Text](/path/to/image.png) ``` -- Indentation - **_FOUR_** spaces are required instead of the typical two. This is a side effect of using this theme. If you use two spaces, the list and/or indentations will not render correctly. This will make your linter sad :( +- Indentation - **_FOUR_** spaces are required instead of the typical two. This is a side effect of using this theme. If you use two spaces, the list and/or indentations will not render correctly. This will make your linter sad :( ```markdown - first item @@ -160,3 +138,45 @@ The following language conventions should be followed when writing documentation ``` - End all pages with an empty line + +## Documentation Structure + +A brief description of the documentation structure is as follows: + +- `docs/` - Contains the Markdown files for the documentation. + - `assets/` - Contains images and other files used in the documentation. + - `figures/` - Contains images, figures, and workflow diagrams used in the documentation. For workflows that contain many images (such as BaseSpace_Fetch), it is recommended to create a subdirectory for the workflow. + - `files/` - Contains files that are used in the documentation. This may include example outputs or templates. For workflows that contain many files (such as TheiaValidate), it is recommended to create a subdirectory for the workflow. + - `logos/` - Contains Theiagen logos and symbols used int he documentation. + - `metadata_formatters/` - Contains the most up-to-date metadata formatters for our submission workflows. + - `new_workflow_template.md` - A template for adding a new workflow page to the documentation. You can see this template [here](../assets/new_workflow_template.md) + - `contributing/` - Contains the Markdown files for our contribution guides, such as this file + - `javascripts/` - Contains JavaScript files used in the documentation. + - `tablesort.js` - A JavaScript file used to enable table sorting in the documentation. + - `overrides/` - Contains HTMLs used to override theme defaults + - `main.html` - Contains the HTML used to display a warning when the latest version is not selected + - `stylesheets/` - Contains CSS files used in the documentation. + - `extra.css` - A custom CSS file used to style the documentation; contains all custom theme elements (scrollable tables, resizable columns, Theiagen colors), and custom admonitions. + - `workflows/` - Contains the Markdown files for each workflow, organized into subdirectories by workflow category + - `workflows_overview/` - Contains the Markdown files for the overview tables for each display type: alphabetically, by applicable kingdom, and by workflow type. + - `index.md` - The home/landing page for our documentation. + +### Adding a Page for a New Workflow {#new-page} + +If you are adding a new workflow, there are a number of things to do in order to include the page in the documentation: + +1. Add a page with the title of the workflow to appropriate subdirectory in `docs/workflows/`. Feel free to use the template found in the `assets/` folder. +2. Collect the following information for your new workflow: + - Workflow Name - Link the name with a relative path to the workflow page in appropriate `docs/workflows/` subdirectory + - Workflow Description - Brief description of the workflow + - Applicable Kingdom - Options: "Any taxa", "Bacteria", "Mycotics", "Viral" + - Workflow Level (_on Terra_) - Options: "Sample-level", "Set-level", or neither + - Command-line compatibility - Options: "Yes", "No", and/or "Some optional features incompatible" + - The version where the last known changes occurred (likely the upcoming version if it is a new workflow) + - Link to the workflow on Dockstore (if applicable) - Workflow name linked to the information tab on Dockstore. +3. Format this information in a table. +4. Copy the previously gathered information to ==**ALL THREE**== overview tables in `docs/workflows_overview/`: + - `workflows_alphabetically.md` - Add the workflow in the appropriate spot based on the workflow name. + - `workflows_kingdom.md` - Add the workflow in the appropriate spot(s) based on the kingdom(s) the workflow is applicable to. Make sure it is added alphabetically within the appropriate subsection(s). + - `workflows_type.md` - Add the workflow in the appropriate spot based on the workflow type. Make sure it is added alphabetically within the appropriate subsection. +5. Copy the path to the workflow to ==**ALL**== of the appropriate locations in the `mkdocs.yml` file (under the `nav:` section) in the main directory of this repository. These should be the exact same spots as in the overview tables but without additional information. This ensures the workflow can be accessed from the navigation sidebar. diff --git a/docs/overrides/main.html b/docs/overrides/main.html index 54a833dfd..0df0d3be2 100644 --- a/docs/overrides/main.html +++ b/docs/overrides/main.html @@ -6,8 +6,3 @@ Click here to go to the latest version release. {% endblock %} - - -{% block announce %} -
🏗️ I'm under construction! Pardon the dust while we remodel! 👷
-{% endblock %} diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css index e510ecedc..72b16bc01 100644 --- a/docs/stylesheets/extra.css +++ b/docs/stylesheets/extra.css @@ -200,7 +200,6 @@ div.searchable-table input.table-search-input { color: #000; border: 1px solid #E0E1E1; } - [data-md-color-scheme="light"] div.searchable-table input.table-search-input::placeholder { color: #888; font-style: italic; @@ -212,7 +211,6 @@ div.searchable-table input.table-search-input { color: #fff; border: 1px solid #373B40; } - [data-md-color-scheme="slate"] div.searchable-table input.table-search-input::placeholder { color: #bbb; font-style: italic; diff --git a/docs/workflows/genomic_characterization/theiacov.md b/docs/workflows/genomic_characterization/theiacov.md index ffe0993f6..8e897e0d8 100644 --- a/docs/workflows/genomic_characterization/theiacov.md +++ b/docs/workflows/genomic_characterization/theiacov.md @@ -630,8 +630,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | Variable | Rationale | | --- | --- | - | `skip_screen` | Prevent the read screen from running | - | `skip_screen` | Saving waste of compute resources on insufficient data | + | `skip_screen` | Set to true to skip the read screen from running | | `min_reads` | Minimum number of base pairs for 10x coverage of the Hepatitis delta (of the *Deltavirus* genus) virus divided by 300 (longest Illumina read length) | | `min_basepairs` | Greater than 10x coverage of the Hepatitis delta (of the *Deltavirus* genus) virus | | `min_genome_size` | Based on the Hepatitis delta (of the *Deltavirus* genus) genome- the smallest viral genome as of 2024-04-11 (1,700 bp) | @@ -714,7 +713,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilties/wf_read_QC_trim.wdl) | + | Sub-workflow | [wf_read_QC_trim_pe.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_pe.wdl)
[wf_read_QC_trim_se.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_se.wdl) | | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | @@ -734,7 +733,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_ncbi_scrub.wdl) | + | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_ncbi_scrub.wdl) | | Software Source Code | [NCBI Scrub on GitHub](https://github.com/ncbi/sra-human-scrubber) | | Software Documentation | | @@ -755,7 +754,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [Kraken2 on GitHub](https://github.com/DerrickWood/kraken2/) | | Software Documentation | | | Original Publication(s) | [Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | diff --git a/docs/workflows/genomic_characterization/theiaeuk.md b/docs/workflows/genomic_characterization/theiaeuk.md index bedeac0cf..cc9cba9c1 100644 --- a/docs/workflows/genomic_characterization/theiaeuk.md +++ b/docs/workflows/genomic_characterization/theiaeuk.md @@ -8,22 +8,22 @@ ## TheiaEuk Workflows -**The TheiaEuk_PE workflow is for the assembly, quality assessment, and characterization of fungal genomes.** It is designed to accept Illumina paired-end sequencing data as the primary input. **It is currently intended only for haploid fungal genomes like _Candida auris_.** Analyzing diploid genomes using TheiaEuk should be attempted only with expert attention to the resulting genome quality. +**The TheiaEuk_Illumina_PE workflow is for the assembly, quality assessment, and characterization of fungal genomes.** It is designed to accept Illumina paired-end sequencing data as the primary input. **It is currently intended only for ==haploid== fungal genomes like _Candida auris_.** Analyzing diploid genomes using TheiaEuk should be attempted only with expert attention to the resulting genome quality. -All input reads are processed through "core tasks" in each workflow. The core tasks include raw-read quality assessment, read cleaning (quality trimming and adapter removal), de novo assembly, assembly quality assessment, and species taxon identification. For some taxa identified, "taxa-specific sub-workflows" will be automatically activated, undertaking additional taxa-specific characterization steps, including clade-typing and/or antifungal resistance detection. +All input reads are processed through "core tasks" in each workflow. The core tasks include raw read quality assessment, read cleaning (quality trimming and adapter removal), de novo assembly, assembly quality assessment, and species taxon identification. For some taxa identified, taxa-specific sub-workflows will be automatically activated, undertaking additional taxa-specific characterization steps, including clade-typing and/or antifungal resistance detection. !!! caption "TheiaEuk Workflow Diagram" - ![TheiaEuk Workflow Diagram](../../assets/figures/TheiaEuk_Illumina_PE.png){width=75%} + ![TheiaEuk Workflow Diagram](../../assets/figures/TheiaEuk_Illumina_PHB_20241106.png){width=75%} ### Inputs !!! info "Input read data" - The TheiaEuk_PE workflow takes in Illumina paired-end read data. Read file names should end with `.fastq` or `.fq`, with the optional addition of `.gz`. When possible, Theiagen recommends zipping files with [gzip](https://www.gnu.org/software/gzip/) prior to Terra upload to minimize data upload time. + The TheiaEuk_Illumina_PE workflow takes in Illumina paired-end read data. Read file names should end with `.fastq` or `.fq`, with the optional addition of `.gz`. When possible, Theiagen recommends zipping files with [gzip](https://www.gnu.org/software/gzip/) prior to Terra upload to minimize data upload time. By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for `trim_minlen` may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit. -
+
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| @@ -148,7 +148,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta | read_QC_trim | **workflow_series** | String | Internal component, do not modify | | Do Not Modify, Optional | | shovill_pe | **assembler** | String | Assembler to use (spades, skesa, velvet or megahit), see | "skesa" | Optional | | shovill_pe | **assembler_options** | String | Assembler-specific options that you might choose, see | | Optional | -| shovill_pe | **depth** | Int | User specified depth of coverage for downsampling (see ) | 150 | Optional | +| shovill_pe | **depth** | Int | User specified depth of coverage for downsampling (see and ) | 150 | Optional | | shovill_pe | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | | shovill_pe | **docker** | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/shovill:1.1.0 | Optional | | shovill_pe | **genome_length** | String | Internal component, do not modify | | Do Not Modify, Optional | @@ -177,7 +177,14 @@ All input reads are processed through "core tasks" in each workflow. The core ta
-### Workflow tasks (performed for all taxa) +### Workflow Tasks + +All input reads are processed through "core tasks" in the TheiaEuk workflows. These undertake read trimming and assembly appropriate to the input data type, currently only Illumina paired-end data. TheiaEuk workflow subsequently launch default genome characterization modules for quality assessment, and additional taxa-specific characterization steps. When setting up the workflow, users may choose to use "optional tasks" or alternatives to tasks run in the workflow by default. + +#### Core tasks + +!!! tip "" + These tasks are performed regardless of organism. They perform read trimming and various quality control steps. ??? task "`versioning`: Version capture for TheiaEuk" @@ -189,7 +196,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta | --- | --- | | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/task_versioning.wdl) | -??? task "`screen`: Total Raw Read Quantification and Genome Size Estimation" +??? task "`screen`: Total Raw Read Quantification and Genome Size Estimation (optional, on by default)" The [`screen`](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses [`fastq-scan`](https://github.com/rpetit3/fastq-scan) and bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: @@ -219,19 +226,22 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_screen.wdl) | + | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) | -??? task "`rasusa`: Read subsampling" +??? task "`Rasusa`: Read subsampling (optional, on by default)" - The RASUSA task performs subsampling of the raw reads. By default, this task will subsample reads to a depth of 150X using the estimated genome length produced during the preceding raw read screen. The user can prevent the task from being launched by setting the `call_rasusa`variable to false. + The Rasusa task performs subsampling of the raw reads. By default, this task will subsample reads to a depth of 150X using the estimated genome length produced during the preceding raw read screen. The user can prevent the task from being launched by setting the `call_rasusa`variable to false. The user can also provide an estimated genome length for the task to use for subsampling using the `genome_size` variable. In addition, the read depth can be modified using the `subsample_coverage` variable. - !!! techdetails "RASUSA Technical Details" + !!! techdetails "Rasusa Technical Details" - | | TheiaEuk_Illumina_PE_PHB | + | | Links | | --- | --- | | Task | [task_rasusa.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_rasusa.wdl) | + | Software Source Code | [Rasusa on GitHub](https://github.com/mbhall88/rasusa) | + | Software Documentation | [Rasusa on GitHub](https://github.com/mbhall88/rasusa) | + | Original Publication(s) | [Rasusa: Randomly subsample sequencing reads to a specified coverage](https://doi.org/10.21105/joss.03941) | ??? task "`read_QC_trim`: Read Quality Trimming, Adapter Removal, Quantification, and Identification" @@ -297,12 +307,17 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Sub-workflow | [wf_read_QC_trim_pe.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_pe.wdl) | | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | | Original Publication(s) | [Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | +#### Assembly tasks + +!!! tip "" + These tasks assemble the reads into a _de novo_ assembly and assess the quality of the assembly. + ??? task "`shovill`: _De novo_ Assembly" De Novo assembly will be undertaken only for samples that have sufficient read quantity and quality, as determined by the `screen` task assessment of clean reads. @@ -316,7 +331,8 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | | TheiaEuk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_shovill.wdl#L3) | - | Software code repository and documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | + | Software Source Code | [Shovill on GitHub](https://github.com/tseemann/shovill) | + | Software Documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | ??? task "`QUAST`: Assembly Quality Assessment" @@ -326,7 +342,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_quast.wdl) | + | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_quast.wdl) | | Software Source Code | [QUAST on GitHub](https://github.com/ablab/quast) | | Software Documentation | https://quast.sourceforge.net/docs/manual.html | | Orginal publication | [QUAST: quality assessment tool for genome assemblies](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3624806/) | @@ -340,11 +356,16 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_cg_pipeline.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_cg_pipeline.wdl) | + | Task | [task_cg_pipeline.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_cg_pipeline.wdl) | | Software Source Code | [CG-Pipeline on GitHub](https://github.com/lskatz/CG-Pipeline/) | | Software Documentation | [CG-Pipeline on GitHub](https://github.com/lskatz/CG-Pipeline/) | | Original Publication(s) | [A computational genomics pipeline for prokaryotic sequencing projects](https://academic.oup.com/bioinformatics/article/26/15/1819/188418) | +#### Organism-agnostic characterization + +!!! tip "" + These tasks are performed regardless of the organism and provide quality control and taxonomic assignment. + ??? task "`GAMBIT`: **Taxon Assignment**" [`GAMBIT`](https://github.com/jlumpe/gambit) determines the taxon of the genome assembly using a k-mer based approach to match the assembly sequence to the closest complete genome in a database, thereby predicting its identity. Sometimes, GAMBIT can confidently designate the organism to the species level. Other times, it is more conservative and assigns it to a higher taxonomic rank. @@ -360,7 +381,33 @@ All input reads are processed through "core tasks" in each workflow. The core ta | Software Documentation | [GAMBIT ReadTheDocs](https://gambit-genomics.readthedocs.io/en/latest/) | | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575) | -??? task "**`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)**" +??? task "`BUSCO`: Assembly Quality Assessment" + + BUSCO (**B**enchmarking **U**niversal **S**ingle-**C**opy **O**rthologue) attempts to quantify the completeness and contamination of an assembly to generate quality assessment metrics. It uses taxa-specific databases containing genes that are all expected to occur in the given taxa, each in a single copy. BUSCO examines the presence or absence of these genes, whether they are fragmented, and whether they are duplicated (suggestive that additional copies came from contaminants). + + **BUSCO notation** + + Here is an example of BUSCO notation: `C:99.1%[S:98.9%,D:0.2%],F:0.0%,M:0.9%,n:440`. There are several abbreviations used in this output: + + - Complete (C) - genes are considered "complete" when their lengths are within two standard deviations of the BUSCO group mean length. + - Single-copy (S) - genes that are complete and have only one copy. + - Duplicated (D) - genes that are complete and have more than one copy. + - Fragmented (F) - genes that are only partially recovered. + - Missing (M) - genes that were not recovered at all. + - Number of genes examined (n) - the number of genes examined. + + A high equity assembly will use the appropriate database for the taxa, have high complete (C) and single-copy (S) percentages, and low duplicated (D), fragmented (F) and missing (M) percentages. + + !!! techdetails "BUSCO Technical Details" + + | | Links | + | --- | --- | + | Task | [task_busco.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/advanced_metrics/task_busco.wdl) | + | Software Source Code | [BUSCO on GitLab](https://gitlab.com/ezlab/busco) | + | Software Documentation | https://busco.ezlab.org/ | + | Orginal publication | [BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs](https://academic.oup.com/bioinformatics/article/31/19/3210/211866) | + +??? task "`QC_check`: Check QC Metrics Against User-Defined Thresholds (optional)" The `qc_check` task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a `qc_check_table` .tsv file. If all QC metrics meet the threshold, the `qc_check` output variable will read `QC_PASS`. Otherwise, the output will read `QC_NA` if the task could not proceed or `QC_ALERT` followed by a string indicating what metric failed. @@ -383,96 +430,167 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) | + | Task | [task_qc_check_phb.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_qc_check_phb.wdl) | -### Organism-specific Characterization +#### Organism-specific characterization -The TheiaEuk workflow automatically activates taxa-specific tasks after identification of relevant taxa using `GAMBIT`. Many of these taxa-specific tasks do not require any additional workflow tasks from the user. +!!! tip "" + The TheiaEuk workflow automatically activates taxa-specific tasks after identification of the relevant taxa using `GAMBIT`. Many of these taxa-specific tasks do not require any additional inputs from the user. ??? toggle "_Candida auris_" + Two tools are deployed when _Candida auris_ is identified. + + ??? task "Cladetyping: clade determination" + GAMBIT is used to determine the clade of the specimen by comparing the sequence to five clade-specific reference files. The output of the clade typing task will be used to specify the reference genome for the antifungal resistance detection tool. + + ??? toggle "Default reference genomes used for clade typing and antimicrobial resistance gene detection of _C. auris_" + | Clade | Genome Accession | Assembly Name | Strain | NCBI Submitter | Included mutations in AMR genes (not comprehensive) | + | --- | --- | --- | --- | --- | --- | + | _Candida auris_ Clade I | GCA_002759435.2 | Cand_auris_B8441_V2 | B8441 | Centers for Disease Control and Prevention | | + | _Candida auris_ Clade II | GCA_003013715.2 | ASM301371v2 | B11220 | Centers for Disease Control and Prevention | | + | _Candida auris_ Clade III | GCA_002775015.1 | Cand_auris_B11221_V1 | B11221 | Centers for Disease Control and Prevention | _ERG11_ V125A/F126L | + | _Candida auris_ Clade IV | GCA_003014415.1 | Cand_auris_B11243 | B11243 | Centers for Disease Control and Prevention | _ERG11_ Y132F | + | _Candida auris_ Clade V | GCA_016809505.1 | ASM1680950v1 | IFRC2087 | Centers for Disease Control and Prevention | | + + !!! techdetails "Cladetyping Technical Details" + | | Links | + | --- | --- | + | Task | [task_cauris_cladetyping.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/candida/task_cauris_cladetyper.wdl) | + | Software Source Code | [GAMBIT on GitHub](https://github.com/jlumpe/gambit) | + | Software Documentation | [GAMBIT Overview](https://theiagen.notion.site/GAMBIT-7c1376b861d0486abfbc316480046bdc?pvs=4) + | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://doi.org/10.1371/journal.pone.0277575)
[TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization](https://doi.org/10.3389/fpubh.2023.1198213) | + + ??? task "Snippy Variants: antifungal resistance detection" + To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, then these variants are queried for product names associated with resistance. + + The genes in which there are known resistance-conferring mutations for this pathogen are: + + - FKS1 + - ERG11 (lanosterol 14-alpha demethylase) + - FUR1 (uracil phosphoribosyltransferase) + + We query `Snippy` results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the `theiaeuk_snippy_variants_hits` column corresponding gene name (see below): + + | **TheiaEuk Search Term** | **Corresponding Gene Name** | + |---|---| + | B9J08_005340 | ERG6 | + | B9J08_000401 | FLO8 | + | B9J08_005343 | Hypothetical protein (PSK74852) | + | B9J08_003102 | MEC3 | + | B9J08_003737 | ERG3 | + | lanosterol.14-alpha.demethylase | ERG11 | + | uracil.phosphoribosyltransferase | FUR1 | + | FKS1 | FKS1 | + + For example, one sample may have the following output for the `theiaeuk_snippy_variants_hits` column: + + ```plaintext + lanosterol.14-alpha.demethylase: lanosterol 14-alpha demethylase (missense_variant c.428A>G p.Lys143Arg; C:266 T:0),B9J08_000401: hypothetical protein (stop_gained c.424C>T p.Gln142*; A:70 G:0) + ``` + + Based on this, we can tell that ERG11 has a missense variant at position 143 (Lysine to Arginine) and B9J08_000401 (which is FLO8) has a stop-gained variant at position 142 (Glutamine to Stop). + + ??? toggle "Known resistance-conferring mutations for _Candida auris_" + Mutations in these genes that are known to confer resistance are shown below + + | **Organism** | **Found in** | **Gene name** | **Gene locus** | **AA mutation** | **Drug** | **Reference** | + | --- | --- | --- | --- | --- | --- | --- | + | **Candida auris** | **Human** | **ERG11** | | **Y132F** | **Fluconazole** | [Simultaneous Emergence of Multidrug-Resistant _Candida auris_ on 3 Continents Confirmed by Whole-Genome Sequencing and Epidemiological Analyses](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | + | **Candida auris** | **Human** | **ERG11** | | **K143R** | **Fluconazole** | [Simultaneous Emergence of Multidrug-Resistant _Candida auris_ on 3 Continents Confirmed by Whole-Genome Sequencing and Epidemiological Analyses](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | + | **Candida auris** | **Human** | **ERG11** | | **F126T** | **Fluconazole** | [Simultaneous Emergence of Multidrug-Resistant _Candida auris_ on 3 Continents Confirmed by Whole-Genome Sequencing and Epidemiological Analyses](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | + | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Micafungin** | [Activity of CD101, a long-acting echinocandin, against clinical isolates of Candida auris](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | + | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Caspofungin** | [Activity of CD101, a long-acting echinocandin, against clinical isolates of Candida auris](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | + | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Anidulafungin** | [Activity of CD101, a long-acting echinocandin, against clinical isolates of Candida auris](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | + | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Micafungin** | [A multicentre study of antifungal susceptibility patterns among 350 _Candida auris_ isolates (2009–17) in India: role of the ERG11 and FKS1 genes in azole and echinocandin resistance](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | + | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Caspofungin** | [A multicentre study of antifungal susceptibility patterns among 350 _Candida auris_ isolates (2009–17) in India: role of the ERG11 and FKS1 genes in azole and echinocandin resistance](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | + | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Anidulafungin** | [A multicentre study of antifungal susceptibility patterns among 350 _Candida auris_ isolates (2009–17) in India: role of the ERG11 and FKS1 genes in azole and echinocandin resistance](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | + | **Candida auris** | **Human** | **FUR1** | **CAMJ_004922** | **F211I** | **5-flucytosine** | [Genomic epidemiology of the UK outbreak of the emerging human fungal pathogen Candida auris](https://doi.org/10.1038/s41426-018-0045-x) | + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)
[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | + +??? toggle "_Candida albicans_" + When this species is detected by the taxon ID tool, an antifungal resistance detection task is deployed. - Two tools are deployed when _Candida auris is_ identified. First, the Cladetyping tool is launched to determine the clade of the specimen by comparing the sequence to five clade-specific reference files. The output of the clade typing task will be used to specify the reference genome for the antifungal resistance detection tool. To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, then these variants are queried for product names associated with resistance according to the MARDy database (). - - The genes in which there are known resistance-conferring mutations for this pathogen are: - - - FKS1 - - ERG11 (lanosterol 14-alpha demethylase) - - FUR1 (uracil phosphoribosyltransferase) - - We query `Snippy` results to see if any mutations were identified in those genes. In addition, _C. auris_ automatically checks for the following loci. You will find the mutations next to the locus tag in the `theiaeuk_snippy_variants_hits` column corresponding gene name followings: - - | **TheiaEuk Search Term** | **Corresponding Gene Name** | - |---|---| - | B9J08_005340 | ERG6 | - | B9J08_000401 | FLO8 | - | B9J08_005343 | Hypothetical protein (PSK74852) | - | B9J08_003102 | MEC3 | - | B9J08_003737 | ERG3 | - | lanosterol.14-alpha.demethylase | ERG11 | - | uracil.phosphoribosyltransferase | FUR1 | - | FKS1 | FKS1 | - - For example, one sample may have the following output for the `theiaeuk_snippy_variants_hits` column: - - ```plaintext - lanosterol.14-alpha.demethylase: lanosterol 14-alpha demethylase (missense_variant c.428A>G p.Lys143Arg; C:266 T:0),B9J08_000401: hypothetical protein (stop_gained c.424C>T p.Gln142*; A:70 G:0) - ``` - - Based on this, we can tell that ERG11 has a missense variant at position 143 (Lysine to Arginine) and B9J08_000401 (which is FLO8) has a stop-gained variant at position 142 (Glutamine to Stop). - - ??? toggle "Default reference genomes used for clade typing and antimicrobial resistance gene detection of _C. auris_" - | Clade | Genome Accession | Assembly Name | Strain | NCBI Submitter | Included mutations in AMR genes (not comprehensive) | - | --- | --- | --- | --- | --- | --- | - | Candida auris Clade I | GCA_002759435.2 | Cand_auris_B8441_V2 | B8441 | Centers for Disease Control and Prevention | | - | Candida auris Clade II | GCA_003013715.2 | ASM301371v2 | B11220 | Centers for Disease Control and Prevention | | - | Candida auris Clade III | GCA_002775015.1 | Cand_auris_B11221_V1 | B11221 | Centers for Disease Control and Prevention | _ERG11_ V125A/F126L | - | Candida auris Clade IV | GCA_003014415.1 | Cand_auris_B11243 | B11243 | Centers for Disease Control and Prevention | _ERG11_ Y132F | - | Candida auris Clade V | GCA_016809505.1 | ASM1680950v1 | IFRC2087 | Centers for Disease Control and Prevention | | - - ??? toggle "Known resistance-conferring mutations for _Candida auris_" - Mutations in these genes that are known to confer resistance are shown below (source: MARDy database http://mardy.dide.ic.ac.uk/index.php) - - | **Organism** | **Found in** | **Gene name** | **Gene locus** | **AA mutation** | **Drug** | **Tandem repeat name** | **Tandem repeat sequence** | **Reference** | - | --- | --- | --- | --- | --- | --- | --- | --- | --- | - | **Candida auris** | **Human** | **ERG11** | | **Y132F** | **Fluconazole** | | | [**10.1093/cid/ciw691**](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | - | **Candida auris** | **Human** | **ERG11** | | **K143R** | **Fluconazole** | | | [**10.1093/cid/ciw691**](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | - | **Candida auris** | **Human** | **ERG11** | | **F126T** | **Fluconazole** | | | [**10.1093/cid/ciw691**](https://academic.oup.com/cid/article/64/2/134/2706620/Simultaneous-Emergence-of-Multidrug-Resistant) | - | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Micafungin** | | | [**10.1016/j.diagmicrobio.2017.10.021**](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | - | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Caspofungin** | | | [**10.1016/j.diagmicrobio.2017.10.021**](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | - | **Candida auris** | **Human** | **FKS1** | | **S639P** | **Anidulafungin** | | | [**10.1016/j.diagmicrobio.2017.10.021**](https://www.sciencedirect.com/science/article/pii/S0732889317303498) | - | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Micafungin** | | | [**10.1093/jac/dkx480**](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | - | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Caspofungin** | | | [**10.1093/jac/dkx480**](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | - | **Candida auris** | **Human** | **FKS1** | | **S639F** | **Anidulafungin** | | | [**10.1093/jac/dkx480**](https://academic.oup.com/jac/advance-article/doi/10.1093/jac/dkx480/4794718) | - | **Candida auris** | **Human** | **FUR1** | **CAMJ_004922** | **F211I** | **5-flucytosine** | | | [**https://doi.org/10.1038/s41426-018-0045-x**](https://www.nature.com/articles/s41426-018-0045-x) | + ??? task "Snippy Variants: antifungal resistance detection" + To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance. -??? toggle "_Candida albicans_" + The genes in which there are known resistance-conferring mutations for this pathogen are: - When this species is detected by the taxon ID tool, an antifungal resistance detection task is deployed. To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance according to the MARDy database (). + - ERG11 + - GCS1 (FKS1) + - FUR1 + - RTA2 - The genes in which there are known resistance-conferring mutations for this pathogen are: + We query `Snippy` results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the `theiaeuk_snippy_variants_hits` column corresponding gene name (see below): - - ERG11 - - GCS1 (FKS1) - - FUR1 - - RTA2 + | **TheiaEuk Search Term** | **Corresponding Gene Name** | + |---|---| + | ERG11 | ERG11 | + | GCS1 | FKS1 | + | FUR1 | FUR1 | + | RTA2 | RTA2 | + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)
[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ??? toggle "_Aspergillus fumigatus_" + When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed. + + ??? task "Snippy Variants: antifungal resistance detection" + To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance. - When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed. To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance according to the MARDy database (). + The genes in which there are known resistance-conferring mutations for this pathogen are: - The genes in which there are known resistance-conferring mutations for this pathogen are: + - Cyp51A + - HapE + - COX10 (AFUA_4G08340) + + We query `Snippy` results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the `theiaeuk_snippy_variants_hits` column corresponding gene name (see below): - - Cyp51A - - HapE - - COX10 (AFUA_4G08340) + | **TheiaEuk Search Term** | **Corresponding Gene Name** | + |---|---| + | Cyp51A | Cyp51A | + | HapE | HapE | + | AFUA_4G08340 | COX10 | + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)
[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ??? toggle "_Cryptococcus neoformans_" + When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed. - When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed. To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance according to the MARDy database (). + ??? task "Snippy Variants: antifungal resistance detection" + To detect mutations that may confer antifungal resistance, `Snippy` is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance. - The gene in which there are known resistance-conferring mutations for this pathogen is: + The genes in which there are known resistance-conferring mutations for this pathogen are: - - ERG11 (CNA00300) + - ERG11 (CNA00300) + + We query `Snippy` results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the `theiaeuk_snippy_variants_hits` column corresponding gene name (see below): + + | **TheiaEuk Search Term** | **Corresponding Gene Name** | + |---|---| + | CNA00300 | ERG11 | + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)
[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ### Outputs @@ -540,4 +658,4 @@ The TheiaEuk workflow automatically activates taxa-specific tasks after identifi | theiaeuk_illumina_pe_analysis_date | String | Date of TheiaProk workflow execution | | theiaeuk_illumina_pe_version | String | TheiaProk workflow version used | -
\ No newline at end of file + diff --git a/docs/workflows/genomic_characterization/theiameta.md b/docs/workflows/genomic_characterization/theiameta.md index 6e9147399..e166088aa 100644 --- a/docs/workflows/genomic_characterization/theiameta.md +++ b/docs/workflows/genomic_characterization/theiameta.md @@ -149,7 +149,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_ncbi_scrub.wdl) | + | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_ncbi_scrub.wdl) | | Software Source Code | [NCBI Scrub on GitHub](https://github.com/ncbi/sra-human-scrubber) | | Software Documentation | | @@ -214,7 +214,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Sub-workflow | [wf_read_QC_trim_pe.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_pe.wdl)
[wf_read_QC_trim_se.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_se.wdl) | | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | @@ -233,7 +233,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [Kraken2 on GitHub](https://github.com/DerrickWood/kraken2/) | | Software Documentation | | | Original Publication(s) | [Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | @@ -267,7 +267,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_quast.wdl) | + | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_quast.wdl) | | Software Source Code | [QUAST on GitHub](https://github.com/ablab/quast) | | Software Documentation | | | Original Publication(s) | [QUAST: quality assessment tool for genome assemblies](https://academic.oup.com/bioinformatics/article/29/8/1072/228832) | diff --git a/docs/workflows/genomic_characterization/theiaprok.md b/docs/workflows/genomic_characterization/theiaprok.md index 6664df6df..8808caab2 100644 --- a/docs/workflows/genomic_characterization/theiaprok.md +++ b/docs/workflows/genomic_characterization/theiaprok.md @@ -722,7 +722,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Sub-workflow | [wf_read_QC_trim_pe.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_pe.wdl)
[wf_read_QC_trim_se.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_se.wdl) | | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | @@ -737,7 +737,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_cg_pipeline.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_cg_pipeline.wdl) | + | Task | [task_cg_pipeline.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_cg_pipeline.wdl) | | Software Source Code | [CG-Pipeline on GitHub](https://github.com/lskatz/CG-Pipeline/) | | Software Documentation | [CG-Pipeline on GitHub](https://github.com/lskatz/CG-Pipeline/) | | Original Publication(s) | [A computational genomics pipeline for prokaryotic sequencing projects](https://academic.oup.com/bioinformatics/article/26/15/1819/188418) | @@ -746,7 +746,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al De Novo assembly will be undertaken only for samples that have sufficient read quantity and quality, as determined by the `screen` task assessment of clean reads. - In TheiaEuk, assembly is performed using the [Shovill](https://github.com/tseemann/shovill) pipeline. This undertakes the assembly with one of four assemblers ([SKESA](https://github.com/ncbi/SKESA) (default), [SPAdes](https://github.com/ablab/spades), [Velvet](https://github.com/dzerbino/velvet/), [Megahit](https://github.com/voutcn/megahit)), but also performs [a number of pre- and post-processing steps](https://github.com/tseemann/shovill#main-steps) to improve the resulting genome assembly. Shovill uses an estimated genome size (see [here](https://github.com/tseemann/shovill#--gsize)). If this is not provided by the user as an optional input, Shovill will estimate the genome size using [mash](https://mash.readthedocs.io/en/latest/index.html). Adaptor trimming can be undertaken with Shovill by setting the `trim` option to "true", but this is set to "false" by default as [alternative adapter trimming](https://www.notion.so/TheiaProk-Workflow-Series-89b9c08406094ec78d08a578fe861626?pvs=21) is undertaken in the TheiaEuk workflow. + In TheiaProk, assembly is performed using the [Shovill](https://github.com/tseemann/shovill) pipeline. This undertakes the assembly with one of four assemblers ([SKESA](https://github.com/ncbi/SKESA) (default), [SPAdes](https://github.com/ablab/spades), [Velvet](https://github.com/dzerbino/velvet/), [Megahit](https://github.com/voutcn/megahit)), but also performs [a number of pre- and post-processing steps](https://github.com/tseemann/shovill#main-steps) to improve the resulting genome assembly. Shovill uses an estimated genome size (see [here](https://github.com/tseemann/shovill#--gsize)). If this is not provided by the user as an optional input, Shovill will estimate the genome size using [mash](https://mash.readthedocs.io/en/latest/index.html). Adaptor trimming can be undertaken with Shovill by setting the `trim` option to "true", but this is set to "false" by default as [alternative adapter trimming](https://www.notion.so/TheiaProk-Workflow-Series-89b9c08406094ec78d08a578fe861626?pvs=21) is undertaken in the TheiaEuk workflow. ??? toggle "What is _de novo_ assembly?" _De novo_ assembly is the process or product of attempting to reconstruct a genome from scratch (without prior knowledge of the genome) using sequence reads. Assembly of fungal genomes from short-reads will produce multiple contigs per chromosome rather than a single contiguous sequence for each chromosome. @@ -754,8 +754,9 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al !!! techdetails "Shovill Technical Details" | | Links | | --- | --- | - | TheiaProk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_shovill.wdl#L3) | - | Software code repository and documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | + | TheiaEuk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_shovill.wdl#L3) | + | Software Source Code | [Shovill on GitHub](https://github.com/tseemann/shovill) | + | Software Documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | #### ONT Data Core Tasks @@ -765,7 +766,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al **Estimated genome length**: - By default, an estimated genome length is set to 5 Mb, which is around 0.7 Mb higher than the average bacterial genome length, according to the information collated [here](https://github.com/CDCgov/phoenix/blob/717d19c19338373fc0f89eba30757fe5cfb3e18a/assets/databases/NCBI_Assembly_stats_20240124.txt). This estimate can be overwritten by the user, and is used by `RASUSA` and `dragonflye`. + By default, an estimated genome length is set to 5 Mb, which is around 0.7 Mb higher than the average bacterial genome length, according to the information collated [here](https://github.com/CDCgov/phoenix/blob/717d19c19338373fc0f89eba30757fe5cfb3e18a/assets/databases/NCBI_Assembly_stats_20240124.txt). This estimate can be overwritten by the user, and is used by `Rasusa` and `dragonflye`. **Plotting and quantifying long-read sequencing data:** `nanoplot` @@ -784,7 +785,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | Workflow | **TheiaProk_ONT** | | --- | --- | | Sub-workflow | [wf_read_QC_trim_ont.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_ont.wdl) | - | Tasks | [task_nanoplot.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_nanoplot.wdl) [task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/b481ce48f3d527ab8f31e4ad8171769212cc091a/tasks/quality_control/basic_statistics/task_fastq_scan.wdl) [task_rasusa.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_rasusa.wdl) [task_nanoq.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_nanoq.wdl) [task_tiptoft.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_detection/task_tiptoft.wdl) | + | Tasks | [task_nanoplot.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_nanoplot.wdl) [task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl) [task_rasusa.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_rasusa.wdl) [task_nanoq.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_nanoq.wdl) [task_tiptoft.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_detection/task_tiptoft.wdl) | | Software Source Code | [fastq-scan](https://github.com/rpetit3/fastq-scan), [NanoPlot](https://github.com/wdecoster/NanoPlot), [RASUSA](https://github.com/mbhall88/rasusa), [tiptoft](https://github.com/andrewjpage/tiptoft), [nanoq](https://github.com/esteinig/nanoq) | | Original Publication(s) | [NanoPlot paper](https://academic.oup.com/bioinformatics/article/39/5/btad311/7160911)
[RASUSA paper](https://doi.org/10.21105/joss.03941)
[Nanoq Paper](https://doi.org/10.21105/joss.02991)
[Tiptoft paper](https://doi.org/10.21105/joss.01021) | @@ -808,7 +809,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | --- | --- | | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_quast.wdl) | | Software Source Code | [QUAST on GitHub](https://github.com/ablab/quast) | - | Software Documentation | | + | Software Documentation | | | Original Publication(s) | [QUAST: quality assessment tool for genome assemblies](https://academic.oup.com/bioinformatics/article/29/8/1072/228832) | ??? task "`BUSCO`: Assembly Quality Assessment" @@ -892,7 +893,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al ??? task "`AMRFinderPlus`: AMR Genotyping (default)" - NCBI's [AMRFinderPlus](https://github.com/ncbi/amr/wiki) is the default antimicrobial resistance (AMR) detection tool used in TheiaProk. [ResFinder](https://www.notion.so/TheiaProk-Workflow-Series-68c34aca2a0240ef94fef0acd33651b9?pvs=21) may be used alternatively and if so, AMRFinderPlus is not run. + NCBI's [AMRFinderPlus](https://github.com/ncbi/amr/wiki) is the default antimicrobial resistance (AMR) detection tool used in TheiaProk. ResFinder may be used alternatively and if so, AMRFinderPlus is not run. AMRFinderPlus identifies acquired antimicrobial resistance (AMR) genes, virulence genes, and stress genes. Such AMR genes confer resistance to antibiotics, metals, biocides, heat, or acid. For some taxa (see [here](https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#--organism-option)), AMRFinderPlus will provide taxa-specific results including filtering out genes that are almost ubiquitous in the taxa (intrinsic genes) and identifying resistance-associated point mutations. In TheiaProk, the taxon used by AMRFinderPlus is specified based on the `gambit_predicted_taxon` or a user-provided `expected_taxon`. @@ -1047,7 +1048,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_plasmidfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_typing/task_plasmidfinder.wdl) | + | Task | [task_plasmidfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_detection/task_plasmidfinder.wdl) | | Software Source Code | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ | | Software Documentation | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ | | Original Publication(s) | [In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068535/) | @@ -1076,7 +1077,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) | + | Task | [task_qc_check_phb.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparistask_qc_check_phb.wdl.wdl) | ??? task "`Taxon Tables`: Copy outputs to new data tables based on taxonomic assignment (optional)" @@ -1323,7 +1324,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_kleborate.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/haemophilus/task_kleborate.wdl) | + | Task | [task_kleborate.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/klebsiella/task_kleborate.wdl) | | Software Source Code | [kleborate on GitHub](https://github.com/katholt/Kleborate) | | Software Documentation | https://github.com/katholt/Kleborate/wiki | | Orginal publication | [A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex](https://www.nature.com/articles/s41467-021-24448-3)
[Identification of Klebsiella capsule synthesis loci from whole genome data](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000102) | @@ -1534,7 +1535,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after ??? task "`PopPUNK`: Global Pneumococcal Sequence Cluster typing" - Global Pneumococcal Sequence Clusters (GPSC) define and name pneumococcal strains. GPSC designation is undertaken using the PopPUNK software and GPSC database as described in the file below, obtained from [here](https://www.pneumogen.net/gps/training_command_line.html). + Global Pneumococcal Sequence Clusters (GPSC) define and name pneumococcal strains. GPSC designation is undertaken using the PopPUNK software and GPSC database as described in the file below, obtained from [here](https://www.pneumogen.net/gps/#/training#command-line). :file: [GPSC_README_PopPUNK2.txt](../../assets/files/GPSC_README_PopPUNK2.txt) @@ -1547,9 +1548,9 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | | Task | [task_poppunk_streppneumo.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/streptococcus/task_poppunk_streppneumo.wdl) | - | GPSC database | https://www.pneumogen.net/gps/training_command_line.html | + | GPSC database | | | Software Source Code | [PopPunk](https://github.com/bacpop/PopPUNK) | - | Software Documentation | https://poppunk.readthedocs.io/en/latest/ | + | Software Documentation | | | Original Publication(s) | [Fast and flexible bacterial genomic epidemiology with PopPUNK](https://genome.cshlp.org/content/29/2/304) | ??? task "`SeroBA`: Serotyping ==_for Illumina_PE only_==" diff --git a/docs/workflows/phylogenetic_construction/augur.md b/docs/workflows/phylogenetic_construction/augur.md index d8eb10f9f..c9d144997 100644 --- a/docs/workflows/phylogenetic_construction/augur.md +++ b/docs/workflows/phylogenetic_construction/augur.md @@ -14,10 +14,10 @@ Two workflows are offered: **Augur_Prep_PHB** and **Augur_PHB**. These must be r !!! dna "**Helpful resources for epidemiological interpretation**" - - [introduction to Nextstrain](https://www.cdc.gov/amd/training/covid-toolkit/module3-1.html) (which includes Auspice) - - guide to Nextstrain [interactive trees](https://www.cdc.gov/amd/training/covid-toolkit/module3-4.html) - - an [introduction to UShER](https://www.cdc.gov/amd/training/covid-toolkit/module3-3.html) - - a video about [how to read trees](https://www.cdc.gov/amd/training/covid-toolkit/module1-3.html) if this is new to you + - [introduction to Nextstrain](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-1.html) (which includes Auspice) + - guide to Nextstrain [interactive trees](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-4.html) + - an [introduction to UShER](https://www.cdc.gov/advanced-molecular-detection/php/training/module-3-3.html) + - a video about [how to read trees](https://www.cdc.gov/advanced-molecular-detection/php/training/module-1-3.html) if this is new to you - documentation on [how to identify SARS-CoV-2 recombinants](https://github.com/pha4ge/pipeline-resources/blob/main/docs/sc2-recombinants.md) ### Augur_Prep_PHB @@ -174,7 +174,7 @@ The Augur_PHB workflow takes in a ***set*** of SARS-CoV-2 (or any other viral This workflow runs on the set level. Please note that for every task, runtime parameters are modifiable (cpu, disk_size, docker, and memory); most of these values have been excluded from the table below for convenience. -
+
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| @@ -198,7 +198,7 @@ This workflow runs on the set level. Please note that for every task, runtime pa | augur_ancestral | **inference** | String | Calculate joint or marginal maximum likelihood ancestral sequence states; options: "joint", "marginal" | joint | Optional | | augur_ancestral | **keep_ambiguous** | Boolean | If true, do not infer nucleotides at ambiguous (N) sides | FALSE | Optional | | augur_ancestral | **keep_overhangs** | Boolean | If true, do not infer nucleotides for gaps on either side of the alignment | FALSE | Optional | -| augur_export | **colors_tsv** | File | Custom color definitions, one per line in the format TRAIT_TYPE \| TRAIT_VALUE\tHEX_CODE | | Optional | +| augur_export | **colors_tsv** | File | Custom color definitions, one per line in TSV format with the following fields: TRAIT_TYPE TRAIT_VALUE HEX_CODE | | Optional | | augur_export | **description_md** | File | Markdown file with description of build and/or acknowledgements | | Optional | | augur_export | **include_root_sequence** | Boolean | Export an additional JSON containing the root sequence used to identify mutations | FALSE | Optional | | augur_export | **title** | String | Title to be displayed by Auspice | | Optional | diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline.md b/docs/workflows/phylogenetic_construction/snippy_streamline.md index c794be4c8..aa04198b3 100644 --- a/docs/workflows/phylogenetic_construction/snippy_streamline.md +++ b/docs/workflows/phylogenetic_construction/snippy_streamline.md @@ -173,11 +173,7 @@ For all cases: `Snippy_Variants` aligns reads for each sample against the reference genome. As part of `Snippy_Streamline`, the only output from this workflow is the `snippy_variants_outdir_tarball` which is provided in the set-level data table. Please see the full documentation for [Snippy_Variants](./snippy_variants.md) for more information. -??? task "snippy_variants (qc_metrics output)" - - ##### snippy_variants {#snippy_variants} - - This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics include the following columns: + This task also extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics include the following columns: - **samplename**: The name of the sample. - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome. @@ -195,9 +191,17 @@ For all cases: - **meanbaseq**: Mean base quality over the reference sequence. - **meanmapq**: Mean mapping quality over the reference sequence. - These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow. The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. + These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`). The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. + + !!! tip "QC Metrics for Phylogenetic Analysis" + These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses - **Note:** The per-sample QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses. + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ??? task "Snippy_Tree workflow" diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md b/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md index 352d5a55c..890674b3f 100644 --- a/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md +++ b/docs/workflows/phylogenetic_construction/snippy_streamline_fasta.md @@ -39,11 +39,11 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a ### Workflow Tasks -??? task "snippy_variants (qc_metrics output)" +??? task "Snippy_Variants QC Metrics Concatenation (optional)" - ##### snippy_variants {#snippy_variants} + ##### Snippy_Variants QC Metric Concatenation (optional) {#snippy_variants} - This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics include the following columns: + Optionally, the user can provide the `snippy_variants_qc_metrics` file produced by the Snippy_Variants workflow as input to the workflow to concatenate the reports for each sample in the tree. These per-sample QC metrics include the following columns: - **samplename**: The name of the sample. - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome. @@ -61,9 +61,17 @@ The `Snippy_Streamline_FASTA` workflow is an all-in-one approach to generating a - **meanbaseq**: Mean base quality over the reference sequence. - **meanmapq**: Mean mapping quality over the reference sequence. - These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow. The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. + The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. - **Note:** The per-sample QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses. + !!! tip "QC Metrics for Phylogenetic Analysis" + These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately. + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ### Inputs diff --git a/docs/workflows/phylogenetic_construction/snippy_tree.md b/docs/workflows/phylogenetic_construction/snippy_tree.md index d6c0a272b..d28160bbb 100644 --- a/docs/workflows/phylogenetic_construction/snippy_tree.md +++ b/docs/workflows/phylogenetic_construction/snippy_tree.md @@ -4,7 +4,7 @@ | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | |---|---|---|---|---| -| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.1.0 | Yes; some optional features incompatible | Set-level | +| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria) | PHB v2.3.0 | Yes; some optional features incompatible | Set-level | ## Snippy_Tree_PHB @@ -266,7 +266,7 @@ Sequencing data used in the Snippy_Tree workflow must: | | Links | | --- | --- | - | Task | [task_summarize_data.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_summarize_data.wdl) | + | Task | [task_summarize_data.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/data_handling/task_summarize_data.wdl) | ??? task "Concatenate Variants (optional)" @@ -310,11 +310,11 @@ Sequencing data used in the Snippy_Tree workflow must: | Task | task_shared_variants.wdl | | Software Source Code | [task_shared_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_shared_variants.wdl) | -??? task "snippy_variants (qc_metrics output)" +??? task "Snippy_Variants QC Metrics Concatenation (optional)" - ##### snippy_variants {#snippy_variants} + ##### Snippy_Variants QC Metric Concatenation (optional) {#snippy_variants} - This task runs Snippy to perform SNP analysis on individual samples. It extracts QC metrics from the Snippy output for each sample and saves them in per-sample TSV files (`snippy_variants_qc_metrics`). These per-sample QC metrics include the following columns: + Optionally, the user can provide the `snippy_variants_qc_metrics` file produced by the Snippy_Variants workflow as input to the workflow to concatenate the reports for each sample in the tree. These per-sample QC metrics include the following columns: - **samplename**: The name of the sample. - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome. @@ -332,9 +332,17 @@ Sequencing data used in the Snippy_Tree workflow must: - **meanbaseq**: Mean base quality over the reference sequence. - **meanmapq**: Mean mapping quality over the reference sequence. - These per-sample QC metrics are then combined into a single file (`snippy_combined_qc_metrics`) in the downstream `snippy_tree_wf` workflow. The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. + The combined QC metrics file includes the same columns as above for all samples. Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. - **Note:** The per-sample QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses. + !!! tip "QC Metrics for Phylogenetic Analysis" + These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately. + + !!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ### Outputs diff --git a/docs/workflows/phylogenetic_construction/snippy_variants.md b/docs/workflows/phylogenetic_construction/snippy_variants.md index 4ec73569a..f4fc65a37 100644 --- a/docs/workflows/phylogenetic_construction/snippy_variants.md +++ b/docs/workflows/phylogenetic_construction/snippy_variants.md @@ -4,7 +4,7 @@ | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | |---|---|---|---|---| -| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics), [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.2.0 | Yes | Sample-level | +| [Phylogenetic Construction](../../workflows_overview/workflows_type.md/#phylogenetic-construction) | [Bacteria](../../workflows_overview/workflows_kingdom.md/#bacteria), [Mycotics](../../workflows_overview/workflows_kingdom.md#mycotics), [Viral](../../workflows_overview/workflows_kingdom.md/#viral) | PHB v2.3.0 | Yes | Sample-level | ## Snippy_Variants_PHB @@ -60,14 +60,40 @@ The `Snippy_Variants` workflow aligns single-end or paired-end reads (in FASTQ f ### Workflow Tasks -`Snippy_Variants` uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column. - -Additionally, `Snippy_Variants` extracts quality control (QC) metrics from the Snippy output for each sample. These per-sample QC metrics are saved in TSV files (`snippy_variants_qc_metrics`). The QC metrics include: - -- **Percentage of reads aligned to the reference genome** (`snippy_variants_percent_reads_aligned`). -- **Percentage of the reference genome covered at or above the specified depth threshold** (`snippy_variants_percent_ref_coverage`). - -These per-sample QC metrics can be combined into a single file (`snippy_combined_qc_metrics`) in downstream workflows, such as `snippy_tree_wf`, providing an overview of QC metrics across all samples. +`Snippy_Variants` uses Snippy to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the `grep` bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the `snippy_results` column. + +!!! info "Quality Control Metrics" + Additionally, `Snippy_Variants` extracts quality control (QC) metrics from the Snippy output for each sample. These per-sample QC metrics are saved in TSV files (`snippy_variants_qc_metrics`). The QC metrics include: + + - **samplename**: The name of the sample. + - **reads_aligned_to_reference**: The number of reads that aligned to the reference genome. + - **total_reads**: The total number of reads in the sample. + - **percent_reads_aligned**: The percentage of reads that aligned to the reference genome; also available in the `snippy_variants_percent_reads_aligned` output column. + - **variants_total**: The total number of variants detected between the sample and the reference genome. + - **percent_ref_coverage**: The percentage of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10); also available in the `snippy_variants_percent_ref_coverage` output column. + - **#rname**: Reference sequence name (e.g., chromosome or contig name). + - **startpos**: Starting position of the reference sequence. + - **endpos**: Ending position of the reference sequence. + - **numreads**: Number of reads covering the reference sequence. + - **covbases**: Number of bases with coverage. + - **coverage**: Percentage of the reference sequence covered (depth ≥ 1). + - **meandepth**: Mean depth of coverage over the reference sequence. + - **meanbaseq**: Mean base quality over the reference sequence. + - **meanmapq**: Mean mapping quality over the reference sequence. + + Note that the last set of columns (`#rname` to `meanmapq`) may repeat for each chromosome or contig in the reference genome. + +!!! tip "QC Metrics for Phylogenetic Analysis" + These QC metrics provide valuable insights into the quality and coverage of your sequencing data relative to the reference genome. Monitoring these metrics can help identify samples with low coverage, poor alignment, or potential issues that may affect downstream analyses, and we recommend examining them before proceeding with phylogenetic analysis if performing Snippy_Variants and Snippy_Tree separately. + + These per-sample QC metrics can also be combined into a single file (`snippy_combined_qc_metrics`) in downstream workflows, such as `snippy_tree`, providing an overview of QC metrics across all samples. + +!!! techdetails "Snippy Variants Technical Details" + | | Links | + | --- | --- | + | Task | [task_snippy_variants.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_variants.wdl)
[task_snippy_gene_query.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/variant_detection/task_snippy_gene_query.wdl) | + | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | + | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | ### Outputs @@ -92,6 +118,7 @@ These per-sample QC metrics can be combined into a single file (`snippy_combined | snippy_variants_outdir_tarball | File | A compressed file containing the whole directory of snippy output files. This is used when running Snippy_Tree | | snippy_variants_percent_reads_aligned | Float | Percentage of reads aligned to the reference genome | | snippy_variants_percent_ref_coverage| Float | Proportion of the reference genome covered by reads with a depth greater than or equal to the `min_coverage` threshold (default is 10). | +| snippy_variants_qc_metrics | File | TSV file containing quality control metrics for the sample | | snippy_variants_query | String | Query strings specified by the user when running the workflow | | snippy_variants_query_check | String | Verification that query strings are found in the reference genome | | snippy_variants_results | File | CSV file detailing results for all mutations identified in the query sequence relative to the reference | @@ -99,4 +126,4 @@ These per-sample QC metrics can be combined into a single file (`snippy_combined | snippy_variants_version | String | Version of Snippy used | | snippy_variants_wf_version | String | Version of Snippy_Variants used | -
\ No newline at end of file +
diff --git a/docs/workflows/standalone/ncbi_scrub.md b/docs/workflows/standalone/ncbi_scrub.md index 0ae60c49b..e82b3feea 100644 --- a/docs/workflows/standalone/ncbi_scrub.md +++ b/docs/workflows/standalone/ncbi_scrub.md @@ -66,7 +66,7 @@ This workflow is composed of two tasks, one to dehost the input reads and anothe | | Links | | --- | --- | - | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [Kraken2 on GitHub](https://github.com/DerrickWood/kraken2/) | | Software Documentation | | | Original Publication(s) | [Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) |