CUDA BLAS GPU support for docker image #1405

jannikmi · 2023-12-14T12:54:52Z

When I run the docker container I see that the GPU is only being used for the embedding model (encoder), not the LLM.

I noticed that llama-cpp-python is not compiled properly (Notice: BLAS=0), as described in this issue: abetlen/llama-cpp-python#509

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

I got it to work by setting additional environment variables in the install llama-cpp-python command as mentioned in this comment: abetlen/llama-cpp-python#509 (comment)
Note: it is important to link to the correct cuda compilers (correct version!)


# build llama-cpp with CUDA support
# solution according to: https://github.com/abetlen/llama-cpp-python/issues/509#issuecomment-1739098588
# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
# Note: variable replacement won't show during docker build
ARG CUDA_LOC=/usr/local/cuda-11
ARG CUDA_LOC2=${CUDA_LOC}.8
RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=${CUDA_LOC2} -DCUDAToolkit_ROOT=${CUDA_LOC2} -DCUDAToolkit_INCLUDE_DIR=${CUDA_LOC}/include -DCUDAToolkit_LIBRARY_DIR=${CUDA_LOC2}/lib64 -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 .venv/bin/pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

The text was updated successfully, but these errors were encountered:

Mijago · 2023-12-15T20:03:13Z

Hi, can you share your entire Dockerfile, please?

jannikmi · 2023-12-18T13:06:36Z

Sure:
Dockerfile.txt

Note: I changed a couple of things on top of just the compilation environment variables.

lukaboljevic · 2023-12-22T11:20:49Z

@jannikmi I also managed to get PrivateGPT running on the GPU in Docker, though it's changes the 'original' Dockerfile as little as possible.

Starting from the current base Dockerfile, I made changes according to this pull request (which will probably be merged in the future). For me, this solved the issue of PrivateGPT not working in Docker at all - after the changes, everything was running as expected on the CPU. The command I used for building is simply docker compose up --build.

To get it to work on the GPU, I created a new Dockerfile and docker compose YAML file. The new docker compose file adds the following lines to share the GPU with the container:

services:
  private-gpt:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For the new Dockerfile, I used the nvidia/cuda image, because it's way easier to work with the drivers and toolkits already set up. For everyone reading, please note that I used version 12.2.2 of the CUDA toolkit, because CUDA version 12.2 uses NVIDIA driver version 535, which is what is installed on my host machine. CUDA version 12.3 (which, at the time of writing, is the latest version) uses driver version 545, and I did not want to run into possible driver mismatch issues. Apart from the driver, on the host machine, I have the NVIDIA container toolkit and CUDA toolkit installed.

Apart from installing Python 3.11, gcc and rebuilding llama-cpp-python, everything is pretty much the same as with the changes from the aforementioned pull request. The command I used for building is docker compose -f new-docker-compose.yaml up --build.

FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 as base

# For tzdata
ENV DEBIAN_FRONTEND="noninteractive" TZ="Europe/Ljubljana"

# Install Python 3.11 and set it as default
RUN apt-get update && \
    apt-get install -y software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \ 
    apt-get install -y python3.11 python3.11-venv python3-pip && \
    ln -sf /usr/bin/python3.11 /usr/bin/python3 && \
    python3 --version

# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
ENV PATH="/root/.local/bin:$PATH"

# Set the environment variable for the file URL (can be overwritten)
ENV FILE_URL="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf"
# Set the predefined model name (can be ovewritten)
ENV NAME="mistral-7b-instruct-v0.1.Q4_K_M.gguf"

# Dependencies to build llama-cpp
RUN apt update && apt install -y \
  libopenblas-dev\
  ninja-build\
  build-essential\
  pkg-config\
  wget\
  gcc

# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true

############################################
FROM base as dependencies
############################################

WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./

RUN poetry install --with local
RUN poetry install --with ui
RUN poetry install --extras chroma

# Enable GPU support
RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

############################################
FROM base as app
############################################

ENV PYTHONUNBUFFERED=1
ENV PORT=8080
EXPOSE 8080

# Prepare a non-root user
RUN adduser worker
WORKDIR /home/worker/app

RUN mkdir -p local_data; chown -R worker local_data
RUN mkdir -p models; chown -R worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker fern/ fern
COPY --chown=worker *.yaml *.md ./

# Copy the entry point script into the container and make it executable
COPY --chown=worker entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

# Set the entry point script to be executed when the container starts
ENTRYPOINT ["/entrypoint.sh", ".venv/bin/python", "-m", "private_gpt"]

he-man86 · 2023-12-22T11:51:42Z

hi @lukaboljevic, thanks for this. I have been struggling with the dockersettup for some time! It fails on the entrypoint file though .. could you provide us with that?

lukaboljevic · 2023-12-22T11:56:25Z

hi @lukaboljevic, thanks for this. I have been struggling with the dockersettup for some time. could you provide us with the entrypoint.sh file?

Glad to hear it helped. The entrypoint.sh file is given in the pull request I linked above (#1428)

jon6fingrs · 2024-01-16T23:01:55Z

I am pulling my hair out. I came across this thread after I had made my own Dockerfile. PrivateGPT will start, but I cannot, for the life of me, after many many hours, cannot get the GPU recognized in docker.

I have this installed on a Razer notebook with a gtx 1060. Running privategpt on bare metal works fine with GPU acceleration. Basically, repeating the same steps in my dockerfile, however, provides me with a working privategpt, but no GPU acceleration, Nvidia-smi does work inside the container.

I have tried this on my own computer and on runpod with the same results. I was not able to even build the other dockerfiles that were here and in the repo already.

Here is mine. Any additional help would be greatly appreciated. Thanks!

FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y software-properties-common git
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt update && apt upgrade -y && apt install -y python3.11 python3.11-venv python3-pip && ln -sf /usr/bin/python3.11 /usr/bin/python3

RUN pip install pipx
RUN python -m pipx ensurepath
RUN pipx install poetry

ENV PATH="/root/.local/bin:$PATH"
RUN apt update && apt install -y libopenblas-dev ninja-build build-essential pkg-config wget gcc

ENV FILE_URL="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf"

ENV NAME="mistral-7b-instruct-v0.1.Q4_K_M.gguf"

RUN mkdir /app
WORKDIR /app
RUN git clone https://github.com/imartinez/privateGPT
WORKDIR /app/privateGPT




RUN poetry install --with local
RUN poetry install --with ui
RUN poetry install --extras chroma

RUN poetry run python scripts/setup


ENV PGPT_PROFILES=local
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ARG CUDA_LOC=/usr/local/cuda-11
ARG CUDA_LOC2=${CUDA_LOC}.8
RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=${CUDA_LOC2} -DCUDAToolkit_ROOT=${CUDA_LOC2} -DCUDAToolkit_INCLUDE_DIR=${CUDA_LOC}/include -DCUDAToolkit_LIBRARY_DIR=${CUDA_LOC2}/lib64 -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

RUN sed -i "s/database: qdrant/database: chroma/g" settings.yaml
CMD make run

jon6fingrs · 2024-01-17T00:01:41Z

Ok I got it working. I changed the run command to just a wait timer and then went into the terminal in the container and manually executed 'PGPT_PROFILES=local make run' and it recognized the GPU. One of my environment variables is PGPT_PROFILES though so not sure why that helped?

Apotrox · 2024-02-08T22:02:07Z

hey @lukaboljevic, concerning the new docker compose, is your snippet all thats in there or did you add the content of the previous file too?

lukaboljevic · 2024-02-19T09:55:51Z

hey @lukaboljevic, concerning the new docker compose, is your snippet all thats in there or did you add the content of the previous file too?

I added the content of the previous file too, i.e. what I wrote in my comment is what I added to the original docker compose for it to work. Sorry for the late reply

lukaboljevic mentioned this issue Dec 22, 2023

Docker compose file issues #1414

Closed

Apotrox mentioned this issue Feb 9, 2024

Docker GPU support libllama.so: libcuda.so file not found #1597

Open

lukaboljevic mentioned this issue Feb 28, 2024

GPU support in Docker, other Docker-related updates #1655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA BLAS GPU support for docker image #1405

CUDA BLAS GPU support for docker image #1405

jannikmi commented Dec 14, 2023

Mijago commented Dec 15, 2023

jannikmi commented Dec 18, 2023

lukaboljevic commented Dec 22, 2023 •

edited

Loading

he-man86 commented Dec 22, 2023 •

edited

Loading

lukaboljevic commented Dec 22, 2023 •

edited

Loading

jon6fingrs commented Jan 16, 2024

jon6fingrs commented Jan 17, 2024

Apotrox commented Feb 8, 2024

lukaboljevic commented Feb 19, 2024

CUDA BLAS GPU support for docker image #1405

CUDA BLAS GPU support for docker image #1405

Comments

jannikmi commented Dec 14, 2023

Mijago commented Dec 15, 2023

jannikmi commented Dec 18, 2023

lukaboljevic commented Dec 22, 2023 • edited Loading

he-man86 commented Dec 22, 2023 • edited Loading

lukaboljevic commented Dec 22, 2023 • edited Loading

jon6fingrs commented Jan 16, 2024

jon6fingrs commented Jan 17, 2024

Apotrox commented Feb 8, 2024

lukaboljevic commented Feb 19, 2024

lukaboljevic commented Dec 22, 2023 •

edited

Loading

he-man86 commented Dec 22, 2023 •

edited

Loading

lukaboljevic commented Dec 22, 2023 •

edited

Loading