Name		Name	Last commit message	Last commit date
parent directory ..
.env		.env
README.md		README.md
docker-compose-nim-ms.yaml		docker-compose-nim-ms.yaml
docker-compose-vectordb.yaml		docker-compose-vectordb.yaml

README.md

On Premises Deployment Using NVIDIA NIM microservices with GPUs

You can adapt any example to use on premises machines and NVIDIA NIM microservices. By performing the additional prerequisites that are required to get access to the containers and use GPUs with Docker, you can use local machines with GPUs and local microservices instead of NVIDIA API Catalog endpoints.

Prerequisites

You have an active subscription to an NVIDIA AI Enterprise product or you are an NVIDIA Developer Program member.
Complete the common prerequisites.

Ensure that you configure the host with the NVIDIA Container Toolkit.
A host with at least two NVIDIA A100, H100, or L40S GPUs.

You need at least one GPU for the inference container and one GPU for the embedding container. By default, Milvus requires one GPU as well.
You have an NGC API key. Refer to Generating NGC API Keys in the NVIDIA NGC User Guide for more information.

Start the Containers

Export NGC related environment variables:
```
export NGC_API_KEY="M2..."
```
Create a directory to cache the models and export the path to the cache as an environment variable:
```
mkdir -p ~/.cache/model-cache
export MODEL_DIRECTORY=~/.cache/model-cache
```

Export the connection information for the inference and embedding services:

export APP_LLM_SERVERURL="nemollm-inference:8000"
export APP_EMBEDDINGS_SERVERURL="nemollm-embedding:8000"

Start the example-specific containers.

Replace the path in the following cd command with the path to the example that you want to run.

cd RAG/examples/basic_rag/langchain
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d --build

Example Output

✔ Container milvus-minio                           Running
✔ Container chain-server                           Running
✔ Container nemo-retriever-embedding-microservice  Started
✔ Container milvus-etcd                            Running
✔ Container nemollm-inference-microservice         Started
✔ Container rag-playground                         Started
✔ Container milvus-standalone                      Started

Optional: Deploy Reranking service if needed by your example. This is required currently for only the Multi-Turn Rag Example.

export APP_RANKING_SERVERURL="ranking-ms:8000"
cd RAG/examples/local_deploy
USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms

Open a web browser and access http://localhost:8090 to use the RAG Playground.

Refer to Using the Sample Web Application for information about uploading documents and using the web interface.

Tips for GPU Use

When you start the microservices in the local_deploy directory, you can specify the GPUs use by setting the following environment variables before you run docker compose up.

INFERENCE_GPU_COUNT: Specify the number of GPUs to use with the NVIDIA NIM for LLMs container.

EMBEDDING_MS_GPU_ID: Specify the GPU IDs to use with the NVIDIA NeMo Retriever Text Embedding NIM container.

RANKING_MS_GPU_ID: Specify the GPU IDs to use with the NVIDIA NeMo Retriever Text Reranking NIM container.

VECTORSTORE_GPU_DEVICE_ID: Specify the GPU IDs to use with Milvus.

Related Information

NVIDIA NIM for LLMs

The preceding document frequently demonstrates using the curl command to interact with the microservices. You can determine the IP address for each container by running docker network inspect nvidia-rag | jq '.[].Containers[] | {Name, IPv4Address}'.

Next Steps

Vector Database Customizations
Stop the containers by running docker compose --profile local-nim down.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local_deploy

local_deploy

README.md

On Premises Deployment Using NVIDIA NIM microservices with GPUs

Prerequisites

Start the Containers

Tips for GPU Use

Related Information

Next Steps

Files

local_deploy

Directory actions

More options

Directory actions

More options

Latest commit

History

local_deploy

Folders and files

parent directory

README.md

On Premises Deployment Using NVIDIA NIM microservices with GPUs

Prerequisites

Start the Containers

Tips for GPU Use

Related Information

Next Steps