-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA BLAS GPU support for docker image #1405
Comments
Hi, can you share your entire Dockerfile, please? |
Sure: Note: I changed a couple of things on top of just the compilation environment variables. |
@jannikmi I also managed to get PrivateGPT running on the GPU in Docker, though it's changes the 'original' Dockerfile as little as possible. Starting from the current base Dockerfile, I made changes according to this pull request (which will probably be merged in the future). For me, this solved the issue of PrivateGPT not working in Docker at all - after the changes, everything was running as expected on the CPU. The command I used for building is simply To get it to work on the GPU, I created a new Dockerfile and docker compose YAML file. The new docker compose file adds the following lines to share the GPU with the container:
For the new Dockerfile, I used the nvidia/cuda image, because it's way easier to work with the drivers and toolkits already set up. For everyone reading, please note that I used version 12.2.2 of the CUDA toolkit, because CUDA version 12.2 uses NVIDIA driver version 535, which is what is installed on my host machine. CUDA version 12.3 (which, at the time of writing, is the latest version) uses driver version 545, and I did not want to run into possible driver mismatch issues. Apart from the driver, on the host machine, I have the NVIDIA container toolkit and CUDA toolkit installed. Apart from installing Python 3.11, gcc and rebuilding llama-cpp-python, everything is pretty much the same as with the changes from the aforementioned pull request. The command I used for building is
|
hi @lukaboljevic, thanks for this. I have been struggling with the dockersettup for some time! It fails on the entrypoint file though .. could you provide us with that? |
Glad to hear it helped. The entrypoint.sh file is given in the pull request I linked above (#1428) |
I am pulling my hair out. I came across this thread after I had made my own Dockerfile. PrivateGPT will start, but I cannot, for the life of me, after many many hours, cannot get the GPU recognized in docker. I have this installed on a Razer notebook with a gtx 1060. Running privategpt on bare metal works fine with GPU acceleration. Basically, repeating the same steps in my dockerfile, however, provides me with a working privategpt, but no GPU acceleration, Nvidia-smi does work inside the container. I have tried this on my own computer and on runpod with the same results. I was not able to even build the other dockerfiles that were here and in the repo already. Here is mine. Any additional help would be greatly appreciated. Thanks!
|
Ok I got it working. I changed the run command to just a wait timer and then went into the terminal in the container and manually executed 'PGPT_PROFILES=local make run' and it recognized the GPU. One of my environment variables is PGPT_PROFILES though so not sure why that helped? |
hey @lukaboljevic, concerning the new docker compose, is your snippet all thats in there or did you add the content of the previous file too? |
I added the content of the previous file too, i.e. what I wrote in my comment is what I added to the original docker compose for it to work. Sorry for the late reply |
When I run the docker container I see that the GPU is only being used for the embedding model (encoder), not the LLM.
I noticed that llama-cpp-python is not compiled properly (Notice: BLAS=0), as described in this issue: abetlen/llama-cpp-python#509
I got it to work by setting additional environment variables in the install llama-cpp-python command as mentioned in this comment: abetlen/llama-cpp-python#509 (comment)
Note: it is important to link to the correct cuda compilers (correct version!)
The text was updated successfully, but these errors were encountered: