Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

04 Call Digest #4

Open
jules32 opened this issue May 16, 2024 · 0 comments
Open

04 Call Digest #4

jules32 opened this issue May 16, 2024 · 0 comments

Comments

@jules32
Copy link
Contributor

jules32 commented May 16, 2024

Hi All,

Thanks to NASA Openscapes Mentors Bri Lind and Mahsa Jami from LP DAAC, and Cassie Nickles from PO.DAAC for teaching this week! It was great to hear from Justin Rice, Deputy Project Manager/Data Systems for the ESDIS Project Office at NASA Goddard Space Flight Center about the status updates for the Cloud. Together we covered open communities and coding strategies to leverage the power of the cloud through parallelization – Below is a light digest of Call 4.

Have a great week!
Julie, Erin, Andy, Liz, and the very awesome NASA Openscapes Mentors 🚀

Digest: Cohort Call 04 [ 2024-nasa-champions ]
Openscapes_CohortCalls [ 2024-nasa-champions ] Google folder - contains agendas, recordings, pathways
https://openscapes.github.io/2024-nasa-champions - cohort webpage

Goals: We’ll discuss open communities and coding strategies for future us in the Cloud. Additionally, NASA Earthdata Cloud Update - Special Guest Justin Rice, NASA Goddard Space Flight Center, ESDIS Project Office, Deputy Project Manager/Data Systems

Task: Have a Seaside Chat and prepare your Pathways presentation (more details in our agenda)

  • Prepare to present your Pathway work-in-progress on our final call - Each group has 5 minutes to share their pathway: (3 min present + 2 min Questions)
  • Coworking (optional): May 23. 10-11:30 PT
    • Come work on your Pathways, or run code in the JupyterHub together, ask questions.
    • A chance to work on your own things socially & ask questions/ screenshare. We share what we’re going to work on, and then work quietly and then check in at the end as well. We also make breakout rooms for Q&A if folks want to screenshare and talk things out.

Slide Decks:

  • Open communities (slides)
  • NASA Earthdata Cloud Cookbook (cookbook)
  • Earth Science Data & Information System (ESDIS) Update (slides)
  • Coding strategies for Future Us (slides)

More open communities!!! 🥰

A few lines from shared notes in the Agenda doc:

  • Folks aren’t connecting on Twitter anymore, where are folks these days?
  • Every community I’m a part of is struggling with this! One article making the rounds is https://joanwestenberg.com/blog/breaking-up-with-slack-and-discord-why-its-time-to-bring-back-forums
  • Hard to have time for open communities when so much is going, but really fantastic to have a space to troubleshoot and get different perspectives
  • "pleasingly parallel" - tasks that are completely independent from each other. For example, to validate whether each value in a dataset is within a threshold.
  • A lot of parallel computing presupposes access to cloud computing or HPC resources which costs money, it would be nice to have some talk about using the multiple cores within a laptop
    • +1
  • This has all been very helpful. Not only are the tools and training great, but NMFS is embarking on large programs (e.g., CEFI) that could really use these approaches. And, given that we’re about done, how about we make these meetings sort of permanent events?! Everybody in?!
  • How do you get started knowing what kind of computing resources you actually need, many of my datasets are not that large so I typically operate with the mantra that it does not matter how I do stuff because it just works but this will breakdown at sometime and I would like to know how to determine the resources that I am using.
    • to my experience, running the workflow for a few sample files and tracking the memory allocation and memory time series from dashboard could give you an idea
  • Having Justin emphasize subsetting / opendap in the cloud right before Mahsa explained chunking and parallels … i think it’s all coming together for me now. Subsetting isn’t just important when you only need a small slice of the data, subsetting when reading directly from S3 is critical for chunks to independently load data, even when you’re going to process the whole array. Is that right?
    • From Luis Lopez: With this small edit, the sentence is totally correct:
    • Having Justin emphasize subsetting / opendap in the cloud right before Mahsa explained chunking and parallels … i think it’s all coming together for me now. Chunking isn’t just important when you only need a small slice of the data, Chunking when reading directly from S3 is critical for chunks to independently load data, even when you’re going to process the whole array.
  • How do you rethink your code so it can be parallel?
    • Inspect your for-loops or nested for-loops, wh‎at is the goal and how could you identify the parts that are
  • Resources shared from NOAA Enterprise Data Management Workshop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant