Skip to content

Commit

Permalink
Merge pull request #511 from superlinked/update-kg-ontologies-article
Browse files Browse the repository at this point in the history
Update kg_ontologies.md
  • Loading branch information
robertdhayanturner authored Sep 30, 2024
2 parents 7847cb9 + 79476fa commit 5bb7fd3
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions docs/articles/kg_ontologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ To understand how KGs can work with LLMs to improve retrieval and harness organi
Imagine a two-dimensional space (see diagram below), with fruitiness on the y-axis, and techiness on the x-axis. ‘Banana’ and ‘grape’ score high on fruitiness and low on techiness. Vice versa for ‘Microsoft’ and ‘Google’. ‘Apple,’ on the other hand, is more complex; it’s both a tech giant and a fruit. We need to add dimensions that capture more meaning and context to properly represent it. In fact, each term in an LLM typically occupies a unique position across thousands of dimensions, with each dimension representing a unique aspect of meaning. (Beyond two or three dimensions, a vector space becomes too complex for humans to comprehend.) Each word or word-part’s position in this “latent space” encodes its meaning.

![Simplified representation of vector embeddings (on 2 dimensions) and (inset) knowledge graph](../assets/use_cases/kg_ontologies/fruit-techiness.jpeg)

*Simplified representation of vector embeddings (on 2 dimensions) and (inset) knowledge graph*

An LLM is basically a compression of the web. The LLM reads docs on web, and tries to do next word prediction. It looks back using the transformer model to see how one word relates to another, and creates embedding vectors.
Expand Down Expand Up @@ -70,15 +71,17 @@ Here's how you do it:
**Extract relevant nodes**

```python
# Begin by pulling all the nodes that you wish to index from your Knowledge Graph, including their descriptions:
# Begin by pulling all the nodes that you wish to index
# from your Knowledge Graph, including their descriptions:

rows = rdflib_graph.query('SELECT * WHERE {?uri dc:description ?desc}')
```

**Generate embedding vectors**

```python
# Employ your large language model to create an embedding vector for the description of each node:
# Employ your large language model to create an embedding
# vector for the description of each node:

node_embedding = openai.Embedding.create(input = row.desc, model=model) ['data'][0]['embedding']
```
Expand All @@ -95,7 +98,9 @@ index.add(embedding)
**Query with natural language**

```python
# When a user poses a question in natural language, convert the query into an embedding vector using the same language model. Then, leverage the vector store to find the nodes with the lowest cosine similarity to the query vector:
# When a user poses a question in natural language, convert the query into an
# embedding vector using the same language model. Then, leverage the vector store
# to find the nodes with the lowest cosine similarity to the query vector:

question_embedding = openai.Embedding.create(input = question, model=model) ['data'][0]['embedding']
d, i = index.search(question_embedding, 100)
Expand All @@ -104,7 +109,9 @@ d, i = index.search(question_embedding, 100)
**Semantic post-processing**

```python
# To further enhance the user experience, apply post-processing techniques to the retrieved related nodes. This step refines the results and presents information in a way that best provides users with actionable insights.
# To further enhance the user experience, apply post-processing techniques to the
# retrieved related nodes. This step refines the results and presents information
# in a way that best provides users with actionable insights.
```

For **example**, I pass the description text from my KG for the “Jennifer Aniston” node into my LLM, and now can store the fact that my discrete KG node (representing “Jennifer Aniston”) relates to the Jennifer Aniston textual description in embedding vector space (in the LLM). After this, when a user comes and does a query for “Jennifer Aniston”, I can turn the query into an embedding vector, locate the closest embedding vectors in the continuous vector space, and then find the related node within the discrete KG, and return a relevant result.
Expand Down

0 comments on commit 5bb7fd3

Please sign in to comment.