Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS ECR support (round two) #1055

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

ivan-gomes
Copy link
Contributor

@ivan-gomes ivan-gomes commented Feb 9, 2020

Closes #705 and follow-on/alternative to #920.

Includes updated user documentation on how to configure AWS IAM and BinderHub to use AWS ECR as the Docker registry. Modifications are additive, in that existing configurations should not be affected, and have been tested with ECR.

@ivan-gomes ivan-gomes requested a review from betatim February 9, 2020 04:52
@jhamman
Copy link

jhamman commented Feb 10, 2020

Cc @scottyhq

@jhamman jhamman mentioned this pull request Feb 10, 2020
@betatim
Copy link
Member

betatim commented Feb 13, 2020

Thanks for this PR.

Has someone been running this code on AWS?

Having different registry classes is a good idea. Could also be used for GitLab like registries (xref #986 #965)

@ivan-gomes
Copy link
Contributor Author

Yes we are running this code on AWS with ECR and no further modifications.

binderhub/app.py Outdated Show resolved Hide resolved
binderhub/app.py Outdated Show resolved Hide resolved

@default("ecr_client")
def _get_ecr_client(self):
return boto3.client("ecr", region_name=self.aws_region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is boto3 async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not officially. There are third-party wrappers like aioboto3 - https://github.com/terrycain/aioboto3 - but we may not want to go there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinderHub is written using tornado so we can't make calls over the network with a library which blocks. If we do then all of the BinderHub process will block while that network request is happening. We either need to use a threadpool to execute the boto3 calls or use a async library that you can await.

Depending on the complexity of the requests you are making another option would be to implement the HTTP call yourself using the AsyncHTTPClient class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. There is really only one request that needs to be made currently and it's relatively simple to do ourselves. Unless you see a value in integrating something like aioboto3 we can implement the request ourselves using AsyncHTTPClient. Drawing inspiration from elsewhere, it seems this is the route that FargateSpawner for JupyterHub went as well when making AWS calls - ref: https://github.com/uktrade/fargatespawner/blob/c614a54ffd80d0fb8886d1ef9e8de2c938de7759/fargatespawner/fargatespawner.py#L322-L346.

If you approve of this path, I will start work on implementing and testing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After biting into AWS's request signing process - ref: https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html - I realized heading down the request reimplementation may not be the best approach. Additionally, the kubernetes call is also not async, so I switched to using a threadpool as you suggested. It has now been implemented and tested.

binderhub/registry.py Outdated Show resolved Hide resolved
@betatim betatim changed the title Add AWS ECR support #2 Add AWS ECR support (round two0 Feb 13, 2020
@betatim betatim changed the title Add AWS ECR support (round two0 Add AWS ECR support (round two) Feb 13, 2020
@thomas-bc
Copy link

This PR sparked a lot of interest in the past, and it would be really neat to allow Binder to run with AWS ECR.
I updated the PR by rebasing and making the changes that @manics and others requested. Those include:

  • moving the import of boto3 within the registry class;
  • adding some explanation on the patching of secrets;
  • making the RBAC increase conditional (with associated docs in setting up Binder);
  • referencing the secret name instead of hardcoding it.

As for the mock-testing, I fully agree that this would certainly be nice to have. However the Moto library has not implemented all the calls that would be necessary to test here. I looked for other solutions but could not come up with much.

Re-iterating here, but my organization has been using this fork for a while and it has been working perfectly. Maybe @teticio or @TomasBeuzen can confirm. I will be sure to test again with this updated version and report back.

Would love to hear some feedback!

@manics
Copy link
Member

manics commented Aug 19, 2022

Coming back to this after a while to take a fresh look, and after seeing #1506 (comment), I was wondering if we could either move the ECR token request to the Builder and pass in the credentials as an environment variable of volume instead of overwriting the Kubernetes secret?

It's a bigger change, but I've just done some work to refactor the build classes #1518 and I think further changes to KubernetesBuildExecutor to support custom registries or passing tokens is reasonable, especially if the outcome is cleaner code. A few ideas:

  • Split the submit() method to make it easier to override parts of the method, or add a hook that can be called
    def submit(self):
    """
    Submit a build pod to create the image for the repository.
    Progress of the build can be monitored by listening for items in
    the Queue passed to the constructor as `q`.
    """
    volume_mounts = [
    client.V1VolumeMount(
    mount_path="/var/run/docker.sock", name="docker-socket"
    )
    ]
    docker_socket_path = urlparse(self.docker_host).path
    volumes = [
    client.V1Volume(
    name="docker-socket",
    host_path=client.V1HostPathVolumeSource(
    path=docker_socket_path, type="Socket"
    ),
    )
    ]
    if self.push_secret:
    volume_mounts.append(
    client.V1VolumeMount(mount_path="/root/.docker", name="docker-config")
    )
    volumes.append(
    client.V1Volume(
    name="docker-config",
    secret=client.V1SecretVolumeSource(secret_name=self.push_secret),
    )
    )
    and fetch the ECR token, then either:
    a. Convert the ECR token into login credentials, pass them in as an environment variable to the build pod instead of passing a secret. The build pod would need to accept a script/hook to run docker login or equivalent. We already use GIT_CREDENTIAL_ENV for private git credentials
    git_credentials = Unicode(
    "",
    allow_none=True,
    help=(
    "Git credentials to use when cloning the repository, passed via the GIT_CREDENTIAL_ENV environment variable."
    "Can be anything that will be accepted by git as a valid output from a git-credential helper. "
    "See https://git-scm.com/docs/gitcredentials for more information."
    ),
    config=True,
    )

    so registry login credentials sounds reasonable? With this method the registry and most of the methods added in this PR can be kept.
    b. Use an service account (IRSA for AWS) as suggested in Allow setting service account on build pods #1506 and handle the entire ECR login process in the build pod. This has the advantage of being cleaner, but means the repo2docker build pod needs to includes the AWS dependencies.

@yuvipanda what do you think? I'm leaning towards option (a), and modifying repo2docker to optionally login to the registry before pushing, but I'm very keen to hear other ideas.

@manics
Copy link
Member

manics commented Aug 19, 2022

This is the sort of repo2docker change I had in mind if we go down the route of passing the credentials into the build pod:
https://github.com/jupyterhub/repo2docker/compare/main...manics:repo2docker:registry_login_kwargs?expand=1

@omri-shilton
Copy link

any updates on this PR? we would really love to have that support!

@manics
Copy link
Member

manics commented Jun 6, 2023

@omri-shilton I'm working on an alternative approach:
#1623

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code:helm-chart Helm template changes. code:python Python changes. new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS ECR registry for BinderHub deployment