Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest 1000 soils data generated at JGI #632

Open
4 of 5 tasks
Tracked by #613
aclum opened this issue Mar 14, 2024 · 16 comments
Open
4 of 5 tasks
Tracked by #613

Ingest 1000 soils data generated at JGI #632

aclum opened this issue Mar 14, 2024 · 16 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Mar 14, 2024

Deliverable this task is associated with

GSP 2024 deliverable

RACI

Tag people in their roles

Describe the the task

  • stage in data for Gs0156736/JGI proposal ID 508306. There are 16 projects. We need to stage raw, filtered results and assembly. Annotation and binning will be re-done to be consistent with running annotation 5.2 on the genewiz data.

Criteria for completion

JGI File Staging steps:

  • jgi_file_metadata.py
  • file_restoration.py
  • globus_file_transfer.py

Estimate people time

  • [Hours or days of people time. 1 person, 4 hours]

Completion Date (Goal)

  • March 27

Target Sprint Start & End Dates

  • Start: March 11
  • End: April 5

Tag Blocker/Contingent upon issues

  • [Tagg issues]
@aclum
Copy link
Contributor Author

aclum commented Mar 22, 2024

Actively in progress based on slack messages, moving to the next sprint.

@mflynn-lanl
Copy link
Contributor

Data files have been downloaded from JGI to Perlmutter and can be found at: /global/homes/n/nmdcda/m3408/aim2/dev/1000_soils/1000_soils_analysis_projects/

@mbthornton-lbl
Copy link

@mflynn-lanl Thank you for doing this! Does this mean we can close it as done?

@mflynn-lanl
Copy link
Contributor

mflynn-lanl commented Apr 4, 2024 via email

@mbthornton-lbl
Copy link

Data staging work completed.

Moving to In Review pending successful completion of #613

@aclum
Copy link
Contributor Author

aclum commented Apr 8, 2024

@mbthornton-lbl there is code in the nmdc-automation repo which should make the workflow execution activities and data objects for the JGI files. We need this for readsqc and assembly, annotation and binning will be run by NMDC so it has the same version as the genewiz data.
I think the code is nmdc_automation/run_process/run_import.py
I believe this is the corresponding required config file
https://github.com/microbiomedata/nmdc_automation/blob/main/configs/import.yaml

@aclum
Copy link
Contributor Author

aclum commented Apr 9, 2024

1000 soils jgi gold mappings.xlsx
We have one missing omics processing set record, see #665 but otherwise I believe this the first sheet on this excel is the format that run_import.py needs.

@ssarrafan
Copy link
Contributor

@aclum can I remove this from the GSP/ECR board?

@aclum
Copy link
Contributor Author

aclum commented Apr 9, 2024

@ssarrafan yes.

@ssarrafan
Copy link
Contributor

@mbthornton-lbl will be continuing to work on this in the next sprint per Slack message. Moving over.

@aclum
Copy link
Contributor Author

aclum commented Apr 29, 2024

Moving to the next sprint since this sprint is focused on re-iding.

@aclum
Copy link
Contributor Author

aclum commented May 15, 2024

backlog until re-iding is done.

@aclum aclum self-assigned this Nov 2, 2024
@aclum
Copy link
Contributor Author

aclum commented Nov 5, 2024

was previously blocked on microbiomedata/nmdc_automation#274
Current blocker
microbiomedata/nmdc_automation#280

@aclum
Copy link
Contributor Author

aclum commented Dec 10, 2024

Blockers are resolved, i added a manfiest tsv file to the /global/homes/n/nmdcda/m3408/aim2/dev/1000_soils/1000_soils_analysis_projects/, this is ready for work.

@aclum
Copy link
Contributor Author

aclum commented Dec 10, 2024

@aclum
Copy link
Contributor Author

aclum commented Dec 13, 2024

This is in a mixed state, 1/2 the projects where imported and 1/2 had their data_generation_set records updated but workflow_execution_set and corresponding data_object_set records were not generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants