Releases: theiagen/terra_utilities
v1.4.1
This patch release implements improvements to the Terra_2_NCBI workflow.
Terra_2_NCBI has been modified in the following ways:
- single-end reads are now supported
- different read1/read2 files can now be indicated using new optional variables
read1_column_name
andread2_column_name
(default columns used:"read1"
and"read2"
) - samples that are excluded due to missing required metadata now tell you what data they are missing
- a bug fix was implemented that prevents failure in the rare case where the original sample id consisted of only numbers
- several variables are now renamed to be more clear and intuitive:
biosample_type
is nowbiosample_package
gcp_bucket_uri
is nowsra_transfer_gcp_bucket
- MAJOR CHANGE:
path_on_ftp_server
is now a Boolean variable calledsubmit_to_production
where it is by defaultfalse
, meaning that a Test submission will be performed. If set totrue
, then a Production submission will occur.
Other modifications
- In rare cases, SRA_Fetch would unintentionally overwrite the forward read file (
*_R1
file) with singleton reads. This will no longer happen. - For the Concatenate_Column_Content and Zip_Column_Content workflows, call caching is now always off, even if the box is checked.
What's Changed
- Improvements to Terra_2_NCBI by @sage-wright in #33
- Check to avoid capture of singletons as forward reads by @kevinlibuit in #31
- call caching always off for concatenate_column_content and zip_column_content wfs & update terra_2_bq workflow by @kapsakcj in #29
- update version by @sage-wright in #34
Full Changelog: v1.4.0...v1.4.1
v1.4.0
This release introduces two new workflows: SRA_Fetch
and Terra_2_NCBI
.
New Workflows
SRA_Fetch
The SRA_Fetch workflow is a seamless way to transfer SRA read files into your Terra workspace using a Terra workflow. Simply provide an SRA accession number(s) as input(s), and the workflow will return the associated read files for each accession.
This workflow makes uses of the fastq-dl tool by Robert Petit. It is able to download FASTQ files from either ENA (the European Nucleotide Archive) or SRA (the Sequence Read Archive hosted by NCBI).
Terra_2_NCBI
The Terra_2_NCBI workflow is a programmatic data submission method to share and metadata information with NCBI BioSample and paired-end Illumina reads with NCBI SRA directly from Terra without having to use the NCBI portal.
The Terra_2_NCBI workflow has several prerequisites, which are listed in detail on the documentation page. It is highly recommended to set up a meeting with Theiagen to understand in greater detail what is required.
Once those prerequisites have been met, this workflow enables swift and easy submission to NCBI’s BioSample and SRA databases.
Check out our new and detailed documentation for all workflows offered in the Terra Utilities repository. Please note some are still under construction but will be updated soon!
What's Changed
- Create Transfer Column Content workflow by @kevinlibuit in #17 (internal use)
- Add SRA_Fetch workflow by @kevinlibuit in #25
- the Terra_2_NCBI workflow by @sage-wright in #26
- Update version by @sage-wright in #30
New Contributors
- @sage-wright made their first contribution in #26
Full Changelog: v1.3.4...v1.4.0
v1.3.4
Patch to allow capture of Illumina read data stored in BaseSpace Projects
This patch addresses issues of transferring BaseSpace read data stored in BaseSpace Projects (as opposed to BaseSpace Runs) into a Terra-accessible GCP bucket with the BaseSpace_Fetch
workflow. This patch also ensures that BaseSpace-hosted read data that are associated with a discordant Sample_ID and Sample_Name (as per the Run/Project sample sheet) can be properly verified and transferred into a user's Terra workspace.
With this release, BaseSpace_Fetch input variables were updated for clarity:
basespace_run_name
was updated tobasespace_collection_id
as it now accepts either a BaseSpace Run or BaseSpace Project identifierdataset_name
was updated tobasespace_sample_name
as to more closely align with the BaseSpace sample sheet column header- an optional
basespace_sample_id
input variable was added if sample sheet Sample_ID and Sample_Name are discordant
Other modifications:
- Updated tasks & workflows to use docker images hosted on quay
v1.3.3
Patch to ensure only read files associated with the input BaseSpace run name are fetched
This patch integrates a safeguard against the incidental concatenation of read data from multiple samples hosted on BaseSpace: the BaseSpace_Fetch workflow is now written to fail if the user-defined basespace_run_name
is not found using the bs cli
tool.
v1.3.2
Patch to address inaccurate outputs generated by Zip_Comlumn_Content workflow
Analysis date and versioning outputs for the Zip_Column_Content were previously tagged as bam_to_fastq
outputs; this has been changed to tag outputs with a zip_column_content
prefix
v1.3.1
This release addresses the major problems reported regarding BaseSpace_Fetch
timeout errors
This patch also enables fetching of BaseSpace read data with the same dataset_name (e.g. re-sequenced samples) and also relieves the necessity of the BaseSpace_Fetch_Multilane
workflow as it will fetch and concatenate read data that has been split across multiple lanes. Implementation of this solution now requires a basespace_run_name
input for the BaseSpace_Fetch workflow.
Other modifications made:
- Capture of Terra Utilities version number and analysis date with every workflow
- Addressed white space inconsistencies throughout the repository
v1.3.0
Minor Update:
- BaseSpace_Fetch & BaseSpace_Fetch_MultiLane release
v1.2.0
Minor Update:
- Read import workflows (SE/PE) release
- BAM to FASTQ workflows (SE/PE) release
v1.0.1
Bug Fix:
- Fix issue with Dockstore linking
v1.0
First stable release with Concatenate Column Content & Zip Column Content workflows