Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract-replicate our AWS cluster creation workflow into the mybinder.org-deploy repo #1824

Closed
Tracked by #919
damianavila opened this issue Oct 27, 2022 · 14 comments
Closed
Tracked by #919
Assignees

Comments

@damianavila
Copy link
Contributor

damianavila commented Oct 27, 2022

Ref: #919

Upstream issue where the discussion is going to happen: jupyterhub/team-compass#501.

@sgibson91, feel free to add any further details on this one.

@damianavila damianavila moved this to Todo 👍 in Sprint Board Oct 27, 2022
@sgibson91 sgibson91 changed the title Extract-replicate our binder workflow into the my binder-org repo Extract-replicate our AWS cluster creation workflow into the my binder-org repo Oct 27, 2022
@sgibson91
Copy link
Member

The repo already has a workflow for deploying Binder - we need to extract our workflow for creating a cluster in AWS, i.e., eksctl and terraform.

@sgibson91 sgibson91 changed the title Extract-replicate our AWS cluster creation workflow into the my binder-org repo Extract-replicate our AWS cluster creation workflow into the mybinder.org-deploy repo Oct 27, 2022
@damianavila
Copy link
Contributor Author

we need to extract our workflow for creating a cluster in AWS, i.e., eksctl and terraform.

@sgibson91, it is not clear to me if you intend to work on this one in this sprint or if you were just talking about referencing the upstream issue and having the discussion over there. Let me know what you think and I will update the sprint board accordingly. Thanks!

@sgibson91
Copy link
Member

I can't really work on this one until #1823 is done, so maybe for the next sprint

@damianavila
Copy link
Contributor Author

@yuvipanda will reply with more context on this one.

@yuvipanda
Copy link
Member

The primary reason we use eksctl is:

  1. EKS managed nodegroups (similar to GCP nodepools) can not scale to 0: [EKS] [request]: Managed Nodes scale to 0 aws/containers-roadmap#724
  2. Managed Nodegroups are available via terraform (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) but unmanaged nodegroups are only available via a flaky terraform module that is fucking around with autoscaling groups, etc
  3. Since we provision many different notebook nodegroups and many different dask ones, it will cost a lot of money if we need to have at least 1 of them all the time up!
  4. However, for mybinder.org we'll only have one nodegroup, and it's ok if it scales down to 1 not 0. I think it's fine to just use the available eks_node_group resource for mybinder

@damianavila damianavila moved this to Todo 👍 in Sprint Board Nov 9, 2022
@sgibson91
Copy link
Member

Thanks Yuvi! I will ping you for review on the appropriate terraform config to extract :)

@yuvipanda
Copy link
Member

@sgibson91 yay great! With a super quick look, I think there is not much (perhaps the CI/CD stuff) as most of the rest (scratch buckets, EFS, etc) aren't going to be needed on mybinder.org. I'd suggest looking at starting from scratch as a possibility too. Happy to help in whatever form!

@damianavila damianavila moved this from Blocked to In Progress in Deploy Pangeo Binder Nov 15, 2022
@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Nov 15, 2022
@damianavila damianavila moved this from Todo 👍 to In Progress ⚡ in Sprint Board Nov 15, 2022
@sgibson91 sgibson91 moved this from In Progress ⚡ to Todo 👍 in Sprint Board Nov 16, 2022
@sgibson91
Copy link
Member

Just moved this to TODO because I am not actively working on it (yet) Hopefully next week is calmer

@damianavila damianavila moved this from In progress to Waiting in DEPRECATED Engineering and Product Backlog Nov 23, 2022
@sgibson91
Copy link
Member

Will be tracking a lot of this work here:

@sgibson91 sgibson91 moved this from Todo 👍 to In Progress ⚡ in Sprint Board Nov 28, 2022
@yuvipanda
Copy link
Member

I provided more info in jupyterhub/mybinder.org-deploy#2449 (comment).

@damianavila damianavila moved this from Waiting to In progress in DEPRECATED Engineering and Product Backlog Nov 30, 2022
@damianavila
Copy link
Contributor Author

damianavila commented Jan 18, 2023

TODO for Damián: Setup a meeting with @consideRatio and @sgibson91 to discuss this one.

@damianavila damianavila self-assigned this Jan 18, 2023
@damianavila damianavila moved this to In Progress ⚡ in Sprint Board Jan 18, 2023
@damianavila
Copy link
Contributor Author

Do not use AWS ECR.
AWS cluster autoscaler behaving weird.

@damianavila damianavila moved this from In Progress ⚡ to Waiting 🕛 in Sprint Board Feb 15, 2023
@damianavila
Copy link
Contributor Author

Erik and Sarah made progress. We need to define the next steps.

@sgibson91
Copy link
Member

Yuvi helpfully reviewed the upstream PR for AWS-terraform config and merged it. I think others in the mybinder community will be pushing forward other pieces of technical work (such as actually deploying the BinderHub) as we increasingly see a need for an AWS-based federation member, and there looks to be a couple of other folk who wish to contribute AWS credits.

I'm closing this one.

@github-project-automation github-project-automation bot moved this from Waiting 🕛 to Done 🎉 in Sprint Board Apr 19, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Deploy Pangeo Binder Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants