-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dorado] New Dorado Basecalling Workflow Terra #659
base: main
Are you sure you want to change the base?
Conversation
outputs working and documentation updated, see ,https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/CDPH_Bioinformatics_Development/job_history/79417a5e-da8c-4fdc-aa61-f28de8490bba |
…e used at runtime; improved logging of dorado STDERR to a file; parsed explict model name from STDERR file or accept user input string; added dorado_log task output file
I will test 3 different workflows and report back:
EDIT: all of these wfs were run AFTER making the below commit |
TheiaProk_ONT ran successfully on the FASTQs produced by my test above with SUP dorado model 👍 https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/CDPH_Bioinformatics_Development/job_history/238d0f1f-fe13-4823-8846-b0774fb75e0c More confirmation the FASTQs produced by this wf are valid for downstream processing |
…ections. Added in-line code and re-orderd sections within Model Type Selection. Also added helpful info for determining Terra workspace bucket GSURI
Added new task that allows user to upload pod5 files to the Data Uploader in Terra and provide the link to the Google bucket as an import to the workflow. The new task will place combine the file paths into an array and pass to basecalling task. Documentation has been updated with visuals for user to use the data uploaded and copying bucket link to workflow input Tested with small number of pod5 files Tested with large number of Pod5 files |
I appreciate your fortitude in making even further changes to this workflow. These changes will seriously simplify the setup process for the end user and save everyone lots of time. The doc updates look great, the screenshots and section on uploading POD5 files and getting the input GSURI for where the files were uploaded look great. Straightforward and easy to understand & follow (in my opinion). I'm launching a test here in Terra, but assuming it won't finish before I go on PTO for the holidays. Given your recent tests & my previous tests, I'm pretty confident it will run successfully. Please check the logs & outputs in my absence https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/CDPH_Bioinformatics_Development/job_history/1c0618d9-560c-4741-a816-2c8bd7b89f15
Don't wait for me if the you/dev team wants to merge this PR before I'm back in office. |
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
A new Dorado Basecalling Workflow, a GPU-accelerated pipeline for basecalling Oxford Nanopore
POD5
files. The workflow includes optional automatic model selection, SAM-to-BAM conversion, and demultiplexing into unique barcode fastq files, with outputs uploaded to a new user defined Terra table for further downstream analysis.⚡ Impacted Workflows/Tasks
This is a new workflow that does not impact any other workflows
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
This PR introduces the following changes:
use_auto_model
flag for automatic model selection.sup
,hac
,fast
).⚙️ Algorithm
POD5
files to SAM using GPU acceleration. Uses a new Dorado Staph-B Docker image v0.80https://github.com/StaPH-B/docker-builds/tree/master/dorado/0.8.0
➡️ Inputs
sup
,hac
,fast
).⬅️ Outputs
🧪 Testing
POD5
inputs and GPU resources.Test 1. With 9 Rabies pod5 files from 2 barcodes (manual model)
-https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/889322c2-19f0-4092-ac7f-4863e676b28a
Test 2. 24 pod5 files from 2 barcodes (manual model)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/9bef28ea-82ba-4406-8545-f32de7e07e02
test 3. 24 files from 2 barcodes (auto mode)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/cead789e-c737-4541-a6ed-d9b907493ee1
output terra table example
Suggested Scenarios for Reviewer to Test
use_auto_model
flag enabled.dorado_model
path and confirm outputs.kit_name
) to confirm error handling.🔬 Final Developer Checklist
🎯 Reviewer Checklist