Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

189 metaassembly output update v1.0.7 #298

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions configs/import.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ Workflows:
- Assembly Coverage Stats
- Assembly AGP
- Assembly Coverage BAM
- Error Corrected Reads

- Name: Metagenome Annotation
Import: false
Expand Down Expand Up @@ -479,6 +480,15 @@ Data Objects:
output_of: nmdc:MetagenomeAssembly
mulitple: false
action: rename
- data_object_type: Error Corrected Reads
description: Error correctde reads for {id}
name: bbcms error corrected reads
import_suffix: input.corr.fastq.gz
nmdc_suffix: _input.corr.fastq.gz
input_to: []
output_of: nmdc:MetagenomeAssembly
mulitple: false
action: rename
- data_object_type: GOTTCHA2 Report Full
description: GOTTCHA2 Full Report for {id}
name: GOTTCHA2 report file
Expand Down
20 changes: 12 additions & 8 deletions nmdc_automation/config/workflows/workflows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,9 @@ Workflows:
- Reads QC Interleave
Input_prefix: jgi_metaASM
Inputs:
input_file: do:Filtered Sequencing Reads
rename_contig_prefix: "{workflow_execution_id}"
input_files: do:Filtered Sequencing Reads
proj: "{workflow_execution_id}"
shortRead: false
Workflow Execution:
name: "Metagenome Assembly for {id}"
type: nmdc:MetagenomeAssembly
Expand Down Expand Up @@ -135,30 +135,34 @@ Workflows:
scaf_powsum: "{outputs.stats.scaf_powsum}"
scaffolds: "{outputs.stats.scaffolds}"
Outputs:
- output: contig
- output: sr_contig
name: Final assembly contigs fasta
data_object_type: Assembly Contigs
description: "Assembly contigs for {id}"
- output: scaffold
- output: sr_scaffold
name: Final assembly scaffolds fasta
data_object_type: Assembly Scaffolds
description: "Assembly scaffolds for {id}"
- output: covstats
- output: sr_covstats
name: Assembled contigs coverage information
data_object_type: Assembly Coverage Stats
description: "Coverage Stats for {id}"
- output: agp
- output: sr_agp
name: An AGP format file that describes the assembly
data_object_type: Assembly AGP
description: "AGP for {id}"
- output: bam
- output: sr_bam
name: Sorted bam file of reads mapping back to the final assembly
data_object_type: Assembly Coverage BAM
description: "Sorted Bam for {id}"
- output: asminfo
- output: sr_asminfo
name: File containing assembly info
data_object_type: Assembly Info File
description: "Assembly info for {id}"
- output: sr_bbcms_fq
name: bbcms error corrected reads
data_object_type: Error Corrected Reads
description: "Error corrected reads for {id}"

- Name: Metagenome Annotation
Type: nmdc:MetagenomeAnnotation
Expand Down
9 changes: 6 additions & 3 deletions nmdc_automation/models/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,13 +101,16 @@ class WorkflowConfig:
# populated after initialization
children: Set["WorkflowConfig"] = field(default_factory=set)
parents: Set["WorkflowConfig"] = field(default_factory=set)
data_object_types: List[str] = field(default_factory=list)
input_data_object_types: List[str] = field(default_factory=list)

def __post_init__(self):
""" Initialize the object """
""" Parse input data object types from the inputs """
for _, inp_param in self.inputs.items():
# Some input params are boolean values, skip these
if isinstance(inp_param, bool):
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbthornton-lbl this boolean needs to be pushed to the inp_param, not skipped. See https://github.com/microbiomedata/metaAssembly/blob/master/input.json

if inp_param.startswith("do:"):
self.data_object_types.append(inp_param[3:])
self.input_data_object_types.append(inp_param[3:])
if not self.type:
# Infer the type from the name
if self.collection == 'data_generation_set' and 'Sequencing' in self.name:
Expand Down
6 changes: 5 additions & 1 deletion nmdc_automation/workflow_automation/sched.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,11 @@ def create_job_rec(self, job: SchedulerJob):
inp = dict()
optional_inputs = wf.optional_inputs
for k, v in job.workflow.inputs.items():
if v.startswith("do:"):
# some inputs are booleans and should not be modified
if isinstance(v, bool):
inp[k] = v
continue
elif v.startswith("do:"):
do_type = v[3:]
dobj = do_by_type.get(do_type)
if not dobj:
Expand Down
2 changes: 1 addition & 1 deletion nmdc_automation/workflow_automation/workflow_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def get_required_data_objects_map(db, workflows: List[WorkflowConfig]) -> Dict[s
# Build up a filter of what types are used
required_types = set()
for wf in workflows:
required_types.update(set(wf.data_object_types))
required_types.update(set(wf.input_data_object_types))

required_data_objs_by_id = dict()
for rec in db.data_object_set.find({"data_object_type": {"$ne": None}}):
Expand Down
Loading