This is a simple demo to show how to use Jina AI to generate embeddings for text data. Then store the embeddings in TiDB Vector Storage and search for similar embeddings.
- A running TiDB Serverless cluster with vector search enabled
- Python 3.8 or later
- Jina AI API key
git clone https://github.com/pingcap/tidb-vector-python.git
cd tidb-vector-python/examples/jina-ai-embeddings-demo
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Get the Jina AI API key from the Jina AI Embedding API page
Get the HOST
, PORT
, USERNAME
, PASSWORD
, DATABASE
, and CA
parameters from the TiDB Cloud console (see Prerequisites), and then replace the following placeholders to get the TIDB_DATABASE_URL
.
export JINA_API_KEY="****"
export TIDB_DATABASE_URL="mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:4000/<DATABASE>?ssl_ca=<CA>&ssl_verify_cert=true&ssl_verify_identity=true"
or create a .env
file with the above environment variables.
$ python jina-ai-embeddings-demo.py
- Inserting Data to TiDB...
- Inserting: Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.
- Inserting: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
- List All Documents and Their Distances to the Query:
- distance: 0.3585317326132522
content: Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.
- distance: 0.10858102967720984
content: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
- The Most Relevant Document and Its Distance to the Query:
- distance: 0.10858102967720984
content: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.