This repository consists of the following exercise:
- Develop a simple app in python 3.X
- Package the application in a Docker container.
- Spin up an Azure Kubernetes Cluster and an Azure Container Register in Azure
- Tag and push the app to the ACR.
- Deploy the container as a Deployment in the AKS Cluster.
- Expose the deployment through a public endpoint.
- Develop a CI/CD pipeline for continuous deployment on new commits to the master branch
- Application overview
- FastAPI as a python web framework
- APIs overview
- Local development
- Prerequisites
- Use the Makefile!
- Virtual environment, python dependencies and githooks
- Unit-tests and coverage
- Run the webserver locally
- Build and publish docker image to ACR
- ACR setup
- Build docker image
- Publish docker image to ACR
- AKS deployment
- Create the AKS cluster
- Connect to AKS with Lens
- Create an Azure Public IP
- Deploy the Nginx ingress controller
- Deploy the application to AKS with Helm
- Cluster auto-scaling
- CI/CD
- Containerised build steps
- CircleCI setup
- Future improvements
Swagger docs generated automatically by FastAPI
FastAPI is chosen as a python backend web framework for the following reasons:
- Modern web framework and simple set up
- Automated swagger docs generation, from docstrings, type hints, etc.
- Data validation for incoming and outgoing requests using
pydantic
to detect invalid data at runtime. Exception handling is made really simple. Seeaks_demo/models/input.py
andaks_demo/models/input.py
The simple web application exposes 2 endpoints.
The GET
endpoint is defined in aks_demo/api/endpoints/v1/hello.py
and returns "Hello world".
POST endpoint at '/prediction'
The POST
endpoint is defined in aks_demo/api/endpoints/v1/prediction.py
and returns a mock predicted data based on a given input.
Note the secret key header
x-api-key
used to secure this endpoint. If an invlid key is supplied, the endpoint returns a418
. This is handled using FastAPIdependencies
, seeaks_demo/api/dependencies/common.py
.
The optional query parameter
generate_report
(boolean) is used to start a fastapibackground task
after the request has been returned to the client.
- Clone the gbourniq/aks_demo repository and cd into it
- Install Miniconda
- Install Poetry
A Makefile
is available to conveniently run any useful setup, development and deployment commands.
Set up conda environment with environment.yaml
and install poetry dependencies defined in poetry.toml
make env
make env-update
The manual equivalent is
conda env create -f environment.yml
andpoetry install
to resolve and install the dependencies. Apoetry.lock
file will be generate with the package versions which were resolved.
The git-hooks are managed by the pre-commit package and defined in .pre-commit-config.yaml
. They are automatically run on each commit, consists of the following tools:
autoflake
: remove unused variables and importsisort
: sort importsblack
: format python codeyamllint
: format yaml codepylint
: code analysis rating based on PEP8 conventions
To install the pre-commit hooks:
make pre-commit
To manually run the githook:
make lint
Unit tests are defined in aks_demo/api/endpoints/v1/tests
.
Run the unit tests and view coverage with:
make test
make cov
Run the webserver locally with:
make run
- Create an ACR service from the Azure portal
- Enable password authentication (required for the circleci ci/cd pipeline)
- Login to ACR via
az login
or the password authenticaion:
echo ${ACR_PASSWORD} | docker login gbournique.azurecr.io --username gbournique --password-stdin 2>&1
The fastapi docker image can be built with make build
which is using the Dockerfile in deployment/Dockerfile
. Image name and tag can be set directly in the Makefile
under # Environment variables
.
To push the image to ACR, run docker push <docker-registry-url>/<registry-name>/<image-name>:<image-tag>
, e.g:
docker push gbournique.azurecr.io/gbournique/aks_demo:latest
This section includes steps to create the following kubernetes cluster.
Install azure-cli
, kubectl
, and helm
. On macOS:
brew install azure-cli
brew install kubectl
brew install helm
Create the cluster from the Azure portal.
The cluster should be enabled with
auto scaling
so that new worker nodes are created when kubernetes deployments are scaled up.
The
service principal
authenticatiom method is chosen here instead of theSystem-assigned managed identity
so that our 3rd party CI/CD tool can connect to the cluster via service principal credentials.
Useful links:
- https://docs.microsoft.com/en-us/azure/aks/kubernetes-service-principal?tabs=azure-cli
- https://docs.microsoft.com/en-us/cli/azure/create-an-azure-service-principal-azure-cli
Connect to the cluster from the command line with:
az login
az aks get-credentials --resource-group aks-demo-rg --name aks-demo-cluster
Install Lens which is an IDE tool to manage kubernetes clusters.
Some of the useful operations include:
- View, edit and remove any kubernetes resources across namespaces
- Easily scale deployments
- Tail logs from pods
- SSH into pods and cluster nodes
Lens IDE for managing kubernetes cluster
An azure public IP required to provide a static endpoint for reaching the nginx ingress controller. Run the following command to create a public IP:
az network public-ip create --resource-group MC_aks_demo_aks-demo-cluster_uksouth --name myAKSPublicIPForIngress --sku Standard --allocation-method static --query publicIp.ipAddress -o tsv
An Ingress controller is required to provide a bridge between Kubernetes services and the AKS Cluster load balancer.
Replace the PUBLIC_IP
value in ./deployment/kubernetes/ingress-nginx/installer.sh
with the IP created from the previous step, and then run the script to install to install the Nginx ingress controller to the cluster.
Helm is used as a package manager for the kubernetes deployment, and helm charts can be found in deployment/kubernetes/aks-demo
.
They consist of the following helm templates:
- ingress.yaml
- service.yaml
- deployment.yaml
- configmap.yaml
- secrets.yaml
- rbac.yaml
Run time configuration values can be edited in deployment/kubernetes/aks-demo/values.yaml
.
Useful make commands for helm:
make helm-lint <-- Validates helm charts
make helm-template <-- Generate k8s templates from helm charts + values.yaml (mostly for debugging)
make helm-install <-- Install helm chart (our application) to the cluster
make helm-tests <-- Ensures the application is available at the specified ingress host endpoint
Scaling up the deployment in Lens will automatically create new cluster worker nodes if the existing nodes have insufficient CPU/memory capacity.
A scheduled
Azure Function App
could be used to automatically scale up and down the cluster when knowing the traffic patterns, leading to cost savings.
The .circleci/config.yml
file configures the following CI/CD pipeline triggered on each commit to master.
All build steps are run within containers, which means the CI/CD pipeline can be run on any platform, including your local environment.
The docker image used by the CI/CD pipeline is defined in build_steps/cicd.Dockerfile
and includes all required packages such as conda
, poetry
, azure-cli
, kubectl
, and helm
.
The CI/CD scripts can be found in the build_steps
directory.
Run the scripts with no argument to view their usage.
Note the following secret environment variables must be configured in the CircleCI project settings.
DOCKER_PASSWORD <-- use by docker login to push image to ACR
SERVICE_ACCOUNT_PWD <-- used by CircleCI to deploy to the cluster
This section lists future improvements to this repository:
Security
- Use
Azure DevOps
for CI/CD - remove need for ACR password and service principal for the cluster - Create a Let's Encrypt
SSL
certificate for HTTPS - install the cert-manager kubernetes service for automatic renewal - Cluster
encryption at rest
using a customer managed key - Make the
cluster private
(no DNS endpoint) and whitelist client CIDR range / IPs - DR: Failover to a scaled-down cluster in another region
Maintainability
Terraform
to create and configure cluster- Use
helm chart repository
(e.g. Nexus) to publish packaged helm chart - Store logs in
Azure Monitor Logs
and create alerts if too many418
status code are returned (InvalidSecretKeyHeader
errors on the POST /prediction endpoint)
Costs
- Improve
auto scaling
: scheduled based on known traffic patterns, and scale deployment from a metrics such as the incoming request on the cluster load balancer
Other:
- Build an actual ML model (e.g image classification) or call to azure
cognitive services
- Store state in Azure CosmosDB