Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS ECR support (round two) #1055

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
7c74f8a
add registry_class config value
chicocvenancio Aug 8, 2019
81f92ca
add boto3 requirement
chicocvenancio Aug 10, 2019
7d6f775
add AWSElasticContainerRegistry class
chicocvenancio Aug 27, 2019
b0b438c
rbac.yaml: allow manipulation of secrets
chicocvenancio Aug 27, 2019
6cb5b6a
doc-requirements: add boto3
chicocvenancio Aug 27, 2019
884af25
Merge branch 'master' into aws_ecr
chicocvenancio Nov 8, 2019
e25e597
Fix syntax
chicocvenancio Nov 8, 2019
56dd9cf
Merge commit 'e25e59763ef2e33fb829e2feb36329721b28e015' into aws-ecr-…
ivan-gomes Jan 20, 2020
bd961a8
Revert RBAC modification and remove handling in AWSElasticContainerRe…
ivan-gomes Jan 21, 2020
af56e93
Add password and url fetching for ECR
ivan-gomes Jan 21, 2020
296772b
Add AWS ECR documentation
ivan-gomes Jan 21, 2020
ade3a8b
Fix wording in AWS ECR documentation
ivan-gomes Jan 21, 2020
c1c37dd
Add ECR host to `image_prefix` in documentation
ivan-gomes Jan 21, 2020
b9ecea4
Re-add binder-push-secret patching. Add check for correct auth token.
ivan-gomes Jan 21, 2020
f9fd86c
Remove duplicate GetAuthorizationToken in AWS ECR doc
ivan-gomes Jan 30, 2020
a5edcd3
Lazily initialize kube_client in AWSElasticContainerRegistry to preve…
ivan-gomes Feb 9, 2020
a9aa576
Update description of `registry_class`
ivan-gomes Feb 13, 2020
468bffc
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes May 13, 2020
1047f9e
Rename `registry_class` to `docker_registry_class` for clarity
ivan-gomes May 13, 2020
88e4914
Move boto3 and kubernetes calls to own threadpool
ivan-gomes Jun 6, 2020
8309e97
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Jun 6, 2020
c6939fe
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Jul 31, 2020
019a9fe
Restore newline lost during merge
ivan-gomes Jul 31, 2020
00048e3
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Sep 17, 2020
a74d814
general word-smithing for clarity
TomasBeuzen Sep 18, 2020
dddecf8
remove stray comma that AWS doesn't like in the policy
TomasBeuzen Sep 18, 2020
a729986
one more small clarification
TomasBeuzen Sep 18, 2020
95a2f96
Merge pull request #2 from TomasBeuzen/aws-ecr-support-TB
ivan-gomes Sep 19, 2020
9a4ad32
add boto3 requirement
chicocvenancio Aug 10, 2019
dfcb921
add AWSElasticContainerRegistry class
chicocvenancio Aug 27, 2019
a2a670a
rbac.yaml: allow manipulation of secrets
chicocvenancio Aug 27, 2019
25faed2
Revert RBAC modification and remove handling in AWSElasticContainerRe…
ivan-gomes Jan 21, 2020
60c8ed4
Add password and url fetching for ECR
ivan-gomes Jan 21, 2020
6e6a61c
Add AWS ECR documentation
ivan-gomes Jan 21, 2020
b9877a7
Fix wording in AWS ECR documentation
ivan-gomes Jan 21, 2020
a5f21e8
Add ECR host to `image_prefix` in documentation
ivan-gomes Jan 21, 2020
54c1b64
Re-add binder-push-secret patching. Add check for correct auth token.
ivan-gomes Jan 21, 2020
6266cdd
Remove duplicate GetAuthorizationToken in AWS ECR doc
ivan-gomes Jan 30, 2020
dc676c4
Lazily initialize kube_client in AWSElasticContainerRegistry to preve…
ivan-gomes Feb 9, 2020
5a4894a
Rename `registry_class` to `docker_registry_class` for clarity
ivan-gomes May 13, 2020
eeb7661
Move boto3 and kubernetes calls to own threadpool
ivan-gomes Jun 6, 2020
ad97b5a
general word-smithing for clarity
TomasBeuzen Sep 18, 2020
e7624d9
remove stray comma that AWS doesn't like in the policy
TomasBeuzen Sep 18, 2020
378f539
one more small clarification
TomasBeuzen Sep 18, 2020
762443e
Fix rebase mishap
thomas-bc Aug 11, 2022
40ddeeb
update registry_class documentation for ECR
thomas-bc Aug 11, 2022
42a28bf
moving optional boto3 import inside class
thomas-bc Aug 11, 2022
7cf26bf
_patch_docker_config_secret docs and robustness
thomas-bc Aug 12, 2022
d9a983f
make RBAC increase conditional
thomas-bc Aug 12, 2022
482cdb9
Merge branch 'jupyterhub:master' into aws-ecr-support
thomas-bc Aug 16, 2022
591be68
rename RBAC increase to rbac.patchSecrets
thomas-bc Aug 16, 2022
c2b3f6e
Merge branch 'aws-ecr-support' of https://github.com/thomas-bc/binder…
thomas-bc Aug 16, 2022
56f9e81
Merge branch 'aws-ecr-support' of https://github.com/ivan-gomes/binde…
thomas-bc Aug 16, 2022
6fbc789
Merge pull request #4 from thomas-bc/aws-ecr-support
ivan-gomes Aug 16, 2022
8a25e45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions binderhub/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import tornado.log
from tornado.log import app_log
import tornado.web
from traitlets import Unicode, Integer, Bool, Dict, validate, TraitError, Union, default
from traitlets import Unicode, Integer, Bool, Dict, validate, TraitError, Union, default, Type
from traitlets.config import Application
from jupyterhub.services.auth import HubOAuthCallbackHandler
from jupyterhub.traitlets import Callable
Expand Down Expand Up @@ -207,6 +207,14 @@ def _valid_badge_base_url(self, proposal):
config=True,
)

registry_class = Type(
betatim marked this conversation as resolved.
Show resolved Hide resolved
DockerRegistry,
help="""
Registry class implementation, change to define your own
ivan-gomes marked this conversation as resolved.
Show resolved Hide resolved
""",
config=True
)

sticky_builds = Bool(
False,
help="""
Expand Down Expand Up @@ -554,7 +562,7 @@ def initialize(self, *args, **kwargs):
])
jinja_env = Environment(loader=loader, **jinja_options)
if self.use_registry and self.builder_required:
registry = DockerRegistry(parent=self)
registry = self.registry_class(parent=self)
else:
registry = None

Expand Down
63 changes: 62 additions & 1 deletion binderhub/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@
import os
from urllib.parse import urlparse

import boto3
import kubernetes.client
import kubernetes.config
from tornado import gen, httpclient
from tornado.httputil import url_concat
from traitlets import default, Dict, Unicode, Any
from traitlets.config import LoggingConfigurable
from traitlets import Dict, Unicode, default

DEFAULT_DOCKER_REGISTRY_URL = "https://registry.hub.docker.com"
DEFAULT_DOCKER_AUTH_URL = "https://index.docker.io/v1"
Expand Down Expand Up @@ -224,3 +227,61 @@ def get_image_manifest(self, image, tag):
raise
else:
return json.loads(resp.body.decode("utf-8"))


class AWSElasticContainerRegistry(DockerRegistry):
aws_region = Unicode(
config=True,
help="""
AWS region for ECR service
""",
)

ecr_client = Any()

@default("ecr_client")
def _get_ecr_client(self):
return boto3.client("ecr", region_name=self.aws_region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is boto3 async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not officially. There are third-party wrappers like aioboto3 - https://github.com/terrycain/aioboto3 - but we may not want to go there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinderHub is written using tornado so we can't make calls over the network with a library which blocks. If we do then all of the BinderHub process will block while that network request is happening. We either need to use a threadpool to execute the boto3 calls or use a async library that you can await.

Depending on the complexity of the requests you are making another option would be to implement the HTTP call yourself using the AsyncHTTPClient class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. There is really only one request that needs to be made currently and it's relatively simple to do ourselves. Unless you see a value in integrating something like aioboto3 we can implement the request ourselves using AsyncHTTPClient. Drawing inspiration from elsewhere, it seems this is the route that FargateSpawner for JupyterHub went as well when making AWS calls - ref: https://github.com/uktrade/fargatespawner/blob/c614a54ffd80d0fb8886d1ef9e8de2c938de7759/fargatespawner/fargatespawner.py#L322-L346.

If you approve of this path, I will start work on implementing and testing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After biting into AWS's request signing process - ref: https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html - I realized heading down the request reimplementation may not be the best approach. Additionally, the kubernetes call is also not async, so I switched to using a threadpool as you suggested. It has now been implemented and tested.


# TODO: cache auth if not expired - authorizationData[i]["expiresAt"]
def _get_ecr_auth(self):
auths = self.ecr_client.get_authorization_token()["authorizationData"]
auth = next(x for x in auths if x["proxyEndpoint"] == self.url)
self._patch_docker_config_secret(auth)
betatim marked this conversation as resolved.
Show resolved Hide resolved
return auth

username = "AWS"

@property
def password(self):
# Fetch password every time as ECR password expires
auth = self._get_ecr_auth()
return base64.b64decode(auth['authorizationToken']).decode("utf-8").split(':')[1]

async def get_image_manifest(self, image, tag):
try:
repo_name = image.split("/", 1)[1]
self.ecr_client.create_repository(repositoryName=repo_name)
self.log.info("ECR repo {} created".format(repo_name))
except self.ecr_client.exceptions.RepositoryAlreadyExistsException:
self.log.info("ECR repo {} already exists".format(repo_name))
return await super().get_image_manifest(repo_name, tag)

kube_client = Any()

@default("kube_client")
def _get_kube_client(self):
kubernetes.config.load_incluster_config()
return kubernetes.client.CoreV1Api()

def _patch_docker_config_secret(self, auth):
"""Patch binder-push-secret"""
secret_data = {"auths": {self.url: {"auth": auth["authorizationToken"]}}}
secret_data = base64.b64encode(json.dumps(secret_data).encode("utf8")).decode(
"utf8"
)
with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace") as f:
namespace = f.read()
self.kube_client.patch_namespaced_secret(
"binder-push-secret", namespace, {"data": {"config.json": secret_data}}
)
1 change: 1 addition & 0 deletions doc/doc-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ python-json-logger
jupyterhub
jsonschema
#pycurl Do not install for docs as it breaks the RTD build. Its primary use is for mocks in testing .
boto3
48 changes: 48 additions & 0 deletions doc/setup-binderhub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,22 @@ where:
* `<SERVICE_PRINCIPAL_ID>` is the AppID of the Service Principal with AcrPush role assignment,
* `<SERVICE_PRINCIPAL_PASSWORD>` is the password for the Service Principal.

If you are using Amazon Elastic Container Registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Update `secret.yaml` to include the following::

registry:
url: https://<ACCOUNT_NUMBER>.dkr.ecr.<REGION>.amazonaws.com

where:

* ``<ACCOUNT_NUMBER>`` is the identifier of your AWS account
* ``<REGION>`` is the AWS region of the ECR registry, e.g. ``us-east-1``

As ECR uses AWS IAM for authorization, specifying ``username`` and ``password``
is not necessary.

Create ``config.yaml``
----------------------

Expand Down Expand Up @@ -183,6 +199,38 @@ where:
If this is not provided, you may find BinderHub rebuilds images every launch instead of pulling them from the ACR.
Suggestions for `<project-name>` could be `ACR_NAME` or the name you give your BinderHub.

If you are using Amazon Elastic Container Registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want your BinderHub to push and pull images from an Amazon Elastic
Container Registry (ECR), then your `config.yaml` file will look as follows::

config:
BinderHub:
use_registry: true
registry_class: binderhub.registry.AWSElasticContainerRegistry
image_prefix: "<ACCOUNT_NUMBER>.dkr.ecr.<REGION>.amazonaws.com/<prefix>-"
AWSElasticContainerRegistry:
aws_region: <REGION>

where:

* ``<ACCOUNT_NUMBER>`` is the identifier of your AWS account
* ``<REGION>`` is the AWS region of the ECR registry, e.g. ``us-east-1``.
* ``<prefix>`` can be any string, and will be prepended to image names. We
recommend something descriptive such as ``binder-dev-`` or ``binder-prod-``
(ending with a `-` is useful).

If you opted to use an IAM User with programmatic access instead of assuming
the role in the previous step you will additionally need to add the following
to your `config.yaml`::

extraEnv:
- name: AWS_ACCESS_KEY_ID
value: "xxx"
- name: AWS_SECRET_ACCESS_KEY
value: "yyy"

If you are using a custom registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
90 changes: 90 additions & 0 deletions doc/setup-registry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,96 @@ where:
SERVICE_PRINCIPAL_PASSWORD=$(az ad sp create-for-rbac --name <SP_NAME> --role AcrPush --scopes <ACR_ID> --query password --output tsv)
SERVICE_PRINCIPAL_ID=$(az ad sp show --id http://<SP_NAME> --query appId --output tsv)

.. _use-ecr:

Set up Amazon Elastic Container Registry
----------------------------------------

To use Amazon Elastic Container Registry (ECR), you'll need to use AWS IAM to
authorize the machine or pod running BinderHub so it can push images. There
are a number of options on how to do this with IAM and Kubernetes, but we
will highlight two: assume IAM role and IAM user with programmatic access.

Start by creating an IAM policy that grants access to create repositories and
read/write images from them. An example IAM permissions policy is provided
below. For more information and examples see `Identity and Access Management for Amazon Elastic Container Registry <https://docs.aws.amazon.com/AmazonECR/latest/userguide/security-iam.html>`_.

.. code-block:: json

{
"Statement": [
{
"Action": [
"ecr:ListImages"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "ListImagesInRepository"
},
{
"Action": [
"ecr:GetAuthorizationToken"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "GetAuthorizationToken"
},
{
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "ManageRepositoryContents"
},
{
"Action": [
"ecr:CreateRepository"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "CreateRepository"
},
],
"Version": "2012-10-17"
}

If you used AWS services like EC2 or EKS to set up your Kubernetes cluster you
can add this policy to the IAM Role assumed by the nodes of the cluster, e.g.
``nodes.<somename>.k8s.local`` if you followed `Zero to JupyterHub with Kubernetes <https://zero-to-jupyterhub.readthedocs.io/>`_
and used kops. The IAM permissions policy will need to accompanied with an IAM
trust policy to allow it to be assumed. An example for EC2 is provided below.
For more information see `Granting a User Permissions to Pass a Role to an AWS Service <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_passrole.html>`_.
This is the recommended method if your Kubernetes cluster is provisioned on
AWS.

.. code-block:: json

{
"Version": "2012-10-17",
"Statement": {
"Sid": "TrustPolicyStatementThatAllowsEC2ServiceToAssumeTheAttachedRole",
"Effect": "Allow",
"Principal": { "Service": "ec2.amazonaws.com" },
"Action": "sts:AssumeRole"
}
}

Alternatively, you can create an IAM user with programmatic access (see
`Creating an IAM User in Your AWS Account <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html>`_)
and specify the ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` environment
variables in the following step.

Next step
---------

Expand Down
3 changes: 3 additions & 0 deletions helm-chart/binderhub/templates/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ rules:
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ python-json-logger
jupyterhub
jsonschema
pycurl
boto3