Gooey GPU

A set of common deep learning models that can be deployed on a Kubernetes cluster with GPU support.

Setup Instructions

Create the env-values.yaml file:
```
cp env-values.example.yaml env-values.yaml
```
Now, open the env-values.yaml file and fill in the values.

Connect to your k8s cluster. This will change depending on the cloud provider.

Eg for azure:

# Set the cluster subscription
az account set --subscription <subscription-id>
# Download cluster credentials
az aks get-credentials --resource-group <resource-group> --name <vm-name>
# Set the namespace
kubectl config set-context --current --namespace=gpu

Deploy the helm chart

helm install gooey-gpu-1 chart -f chart/model-values.yaml -f env-values.yaml

Check the status of the deployment
```
kubectl get pods -n gpu
```

Development

New huggingface based models

gooey-gpu includes a standard Dockerfile with common deep learning dependencies like cuda, diffusers & transformers pre-installed.

gooey-gpu also provides a small python helper library to make it easy to write celery tasks.

If the dependencies are not enough, you can add more dependencies to the Dockerfile & requirements.txt file.

Create a new file in the common/ folder that imports the model and defines the load function.

## common/my_model.py
from celeryconfig import setup_queues
from functools import lru_cache
import os


@lru_cache  # this will cache the model in memory and use it across calls
def load_model(model_id: str):
    ...


setup_queues(
    model_ids=os.environ["MY_MODEL_IDS"].split(), # get the model ids from the env
    load_fn=load_model, # this tells the celery worker to load the model when starting
)

To load custom models, you can use the provided cache directory. The helm chart includes a nfs provisioner that mounts a shared directory across all the pods. You can use this directory to store the models.

## common/my_model.py
import torch
import os
import gooey_gpu
from functools import lru_cache


@lru_cache
def load_model(model_id: str):
    model_path = os.path.join(gooey_gpu.CHECKPOINTS_DIR, model_id)
    if not os.path.exists(model_path):
        ... # download the model from huggingface or any other source
    return torch.load(model_path).to(gooey_gpu.DEVICE_ID)

You can also kubectl exec into a running pod a manually copy the model files to the shared directory.

kubectl exec -it <pod-name> -n gpu -- bash
cd /root/.cache/gooey-gpu/checkpoints
... # copy the model files here

Define the model inference params and code

## common/my_model.py
import gooey_gpu
from celeryconfig import app, setup_queues
from pydantic import BaseModel

class MyModelPipeline(BaseModel):
    model_id: str
    ...

class MyModelInputs(BaseModel):
    text: str
    ...

class MyModelOutput(BaseModel):
    image: str
    ...

@app.task(name="my_model_task")
@gooey_gpu.endpoint
def my_model_task(
    pipeline: MyModelPipeline,
    inputs: MyModelInputs,
) -> MyModelOutput:
    model = load_model(pipeline.model_id)
    ... # write the inference code here
    return MyModelOutput(...)

Install rabbitmq, redis & docker on the machine with GPUs.

Set the Hugging Face Hub Token

echo "export HUGGING_FACE_HUB_TOKEN=hf_XXXX" >> ~/.bashrc

Add your new model ids as env vars in scripts/run-dev.sh

## scripts/run-dev.sh

docker run \
   -e MY_MODEL_IDS="
      model_id_1 
      model_id_2
   " \
   ...

Run the development script

./scripts/run-dev.sh common common.my_model

Test the model by sending a request to the celery worker.

If you are using is a vm, Port forward the redis & rabbitmq ports from the vm to your local machine.

Eg for azure:

az ssh config --name <vm-name> --resource-group <resource-group> --file ~/.ssh/config.d/<vm-name> --overwrite
ssh <vm-name> -vN -L 6374:localhost:6379 -L 5674:localhost:5672

Note how we use ports 6374 and 5674. This is to avoid potential conflicts with a local redis and rabbitmq.

from celery import Celery

app = Celery()
app.conf.broker_url = "amqp://localhost:<port>"
app.conf.result_backend = "redis://localhost:<port>"
app.conf.result_extended = True

result = app.send_task(
    "my_model_task",
    kwargs=dict(
        pipeline=dict(model_id="model_id_1"), 
        inputs=dict(text="Hello, World!"),
    ), 
    queue="gooey-gpu/model_id_1",  # by default the queue name is gooey-gpu/<model_id>
)
print(result.get())  # { "image": "..." }

During this, try to record the GPU usage using nvitop (or nvidia-smi)

This will come handy to define the resource limits in the helm chart. Once you have this number, you need to convert this to the equivalent CPU memory limit.

To do this, you can use the following formula:
```
cpu_memory_limit = (cpu_memory_capacity / gpu_memory_capacity) * gpu_memory_limit
```
E.g. for an azure Standard_NC24ads_A100_v4 with 216 Gib CPU memory and 80 Gib GPU memory, and a diffusion model with a max GPU memory usage of 7 Gib, the CPU memory limit would be:
```
(216 / 80) * 7 ~= 20Gi
```
This helps us put multiple models in the same GPU and avoid CUDA OOM errors.
Once you are confident that the model is working as expected, upload the docker image to a container registry.
```
docker tag gooey-gpu-common <registry>/<image>:<tag>
docker push <registry>/<image>:<tag>
```
You might need to login to your container registry before pushing the image. Eg for azure:
```
az acr login --name <registry>
```
Update the model-values.yaml file with the new image (or create a new file)

Under rabbitmqAutoscaling, add the new env var name

rabbitmqAutoscaling:
   queueNameVars:
      - MY_MODEL_IDS

Under deployments, add your new model.

deployments:
- name: "common-my-model"
  image: "<registry>/<image>:<tag>"
  limits:
    memory: "20Gi"
    cpu: 1
  env:
    IMPORTS: |-
      my_model
    MODEL_IDS: |-
      model_id_1
      model_id_2

Deploy the helm chart

Check the diff to make sure the new model is being added

helm diff upgrade gooey-gpu-1 chart -f chart/model-values.yaml -f env-values.yaml

Deploy the chart

helm upgrade gooey-gpu-1 chart -f chart/model-values.yaml -f env-values.yaml

Older Deep Learning Models / Scripts

We have a retro/ folder that contains dependencies for older projects using CUDA 11.6 & PyTorch v1

The recommended way to turn a set of research scripts like Wav2Lip into a deployable container are:

Add the project as a submodule in the retro/ folder.
```
cd retro/
git submodule add <project-url>
```
Create a python module in the retro/ folder that imports the project's modules.

If you're lucky and the project has an installable package, you can just add the package to the requirements.txt file.

Otherwise, you can create a python module that imports the project's modules:
```
## retro/my_model.py
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), "<project-dir>"))
```
Copy the code from the project's inference script, and make a celery task out of it.

If you're lucky the project has a separate load() and predict() function. In that case you can just call the load() fn from setup_queues(), and the predict() fn from the celery task.

Otherwise, you will have to decode the inference script and write separate functions for loading the model and running inference.
Once you have these written, you can follow the same steps as the common models to deploy the model.

💣 Secret Scanning

Gitleaks will automatically run pre-commit (see pre-commit-config.yaml for details) to prevent commits with secrets in the first place. To test this without committing, run pre-commit from the terminal. To skip this check, use SKIP=gitleaks git commit -m "message" to commit changes. Preferably, label false positives with the #gitleaks:allow comment instead of skipping the check.

Gitleaks will also run in the CI pipeline as a GitHub action on push and pull request (can also be manually triggered in the actions tab on GitHub). To update the baseline of ignored secrets, run python ./scripts/create_gitleaks_baseline.py from the venv and commit the changes to .gitleaksignore.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
.github/workflows		.github/workflows
chart		chart
common		common
deforum_sd		deforum_sd
docs		docs
monitoring-chart		monitoring-chart
retro		retro
sadtalker		sadtalker
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitleaksignore		.gitleaksignore
.gitmodules		.gitmodules
.helmignore		.helmignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DEPLOYMENTS.md		DEPLOYMENTS.md
LICENSE.md		LICENSE.md
README.md		README.md
a1111.Dockerfile		a1111.Dockerfile
api.py		api.py
celeryconfig.py		celeryconfig.py
env-values.example.yaml		env-values.example.yaml
exceptions.py		exceptions.py
ffmpeg_util.py		ffmpeg_util.py
gooey_gpu.py		gooey_gpu.py
make-deployments-table.py		make-deployments-table.py
nvida-ffmepg.Dockerfile		nvida-ffmepg.Dockerfile
nvidia-device-plugin-ds.yaml		nvidia-device-plugin-ds.yaml
nvidia-monitor.Dockerfile		nvidia-monitor.Dockerfile
pull_request_template.md		pull_request_template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gooey GPU

Setup Instructions

Development

New huggingface based models

Older Deep Learning Models / Scripts

💣 Secret Scanning

About

Packages

Contributors 6

Languages

License

GooeyAI/gooey-gpu

Folders and files

Latest commit

History

Repository files navigation

Gooey GPU

Setup Instructions

Development

New huggingface based models

Older Deep Learning Models / Scripts

💣 Secret Scanning

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages 0

Contributors 6

Languages

Packages