Can we fix the problem of acronyms? #9178
Replies: 3 comments 1 reply
-
🤖 Hi @amitguptadumka, nice to see you again! Hope you're doing well. To improve the LlamaIndex's understanding and retrieval of results for queries containing acronyms like "GBQ" for "Google Big Query", you can modify the Here's how you can modify the class KeywordTableIndex(BaseKeywordTableIndex):
"""Keyword Table Index.
This index uses a GPT model to extract keywords from the text.
"""
def _extract_keywords(self, text: str) -> Set[str]:
"""Extract keywords from text."""
response = self._service_context.llm_predictor.predict(
self.keyword_extract_template,
text=text,
)
keywords = extract_keywords_given_response(response, start_token="KEYWORDS:")
# Mapping of acronyms to their full forms
acronym_mapping = {
"GBQ": "Google Big Query",
# Add more acronyms here
}
# Replace acronyms with their full forms
keywords = {acronym_mapping.get(keyword, keyword) for keyword in keywords}
return keywords
async def _async_extract_keywords(self, text: str) -> Set[str]:
"""Extract keywords from text."""
response = await self._service_context.llm_predictor.apredict(
self.keyword_extract_template,
text=text,
)
keywords = extract_keywords_given_response(response, start_token="KEYWORDS:")
# Mapping of acronyms to their full forms
acronym_mapping = {
"GBQ": "Google Big Query",
# Add more acronyms here
}
# Replace acronyms with their full forms
keywords = {acronym_mapping.get(keyword, keyword) for keyword in keywords}
return keywords This way, when a query contains an acronym, it will be replaced with its full form before being used to index into the keyword table. This should improve the ability of the LlamaIndex to understand and retrieve results for queries containing acronyms. You can find the current implementation of the Please let me know if you need further assistance. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot |
Beta Was this translation helpful? Give feedback.
-
@Disiok Any help here please? |
Beta Was this translation helpful? Give feedback.
-
Hi Community,
One problem i am facing during retrieval is that when the query has the acronym of a word it does not work as expected.
For example if my query is "What is GBQ?" and my document has no word like GBQ but it has 'Google Big Query' it fails to understand.
Expectation is it should understand abbreviated words and fetch similar results for both query "What is GBQ?" or "What is Google Big Query?".
Any ways to achieve this? I have many such scenarios and this is casing lot of noise in retrieval.
PS: I am using Elastic search vector db
Thanks
Amit
Beta Was this translation helpful? Give feedback.
All reactions