Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add retireval_top_n to config in env #11132

Merged
merged 3 commits into from
Nov 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion api/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -410,4 +410,6 @@ POSITION_PROVIDER_EXCLUDES=
# Reset password token expiry minutes
RESET_PASSWORD_TOKEN_EXPIRY_MINUTES=5

CREATE_TIDB_SERVICE_JOB_ENABLED=false
CREATE_TIDB_SERVICE_JOB_ENABLED=false

RETRIEVAL_TOP_N=0
crazywoola marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions api/configs/feature/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,8 @@ class DataSetConfig(BaseSettings):
default=30,
)

RETRIEVAL_TOP_N: int = Field(description="number of retrieval top_n", default=0)


class WorkspaceConfig(BaseSettings):
"""
Expand Down
17 changes: 14 additions & 3 deletions api/core/rag/datasource/retrieval_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

from flask import Flask, current_app

from configs import DifyConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong way to use the config object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a stupid mistake and caused a bad impact on your, but the revised submission

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProseGuys We reverted this change in this version and reopened the related issue.

from core.rag.data_post_processor.data_post_processor import DataPostProcessor
from core.rag.datasource.keyword.keyword_factory import Keyword
from core.rag.datasource.vdb.vector_factory import Vector
Expand Down Expand Up @@ -110,8 +111,12 @@ def retrieve(
str(dataset.tenant_id), reranking_mode, reranking_model, weights, False
)
all_documents = data_post_processor.invoke(
query=query, documents=all_documents, score_threshold=score_threshold, top_n=top_k
query=query,
documents=all_documents,
score_threshold=score_threshold,
top_n=DifyConfig.RETRIEVAL_TOP_N or top_k,
)

return all_documents

@classmethod
Expand Down Expand Up @@ -178,7 +183,10 @@ def embedding_search(
)
all_documents.extend(
data_post_processor.invoke(
query=query, documents=documents, score_threshold=score_threshold, top_n=len(documents)
query=query,
documents=documents,
score_threshold=score_threshold,
top_n=DifyConfig.RETRIEVAL_TOP_N or len(documents),
)
)
else:
Expand Down Expand Up @@ -220,7 +228,10 @@ def full_text_index_search(
)
all_documents.extend(
data_post_processor.invoke(
query=query, documents=documents, score_threshold=score_threshold, top_n=len(documents)
query=query,
documents=documents,
score_threshold=score_threshold,
top_n=DifyConfig.RETRIEVAL_TOP_N or len(documents),
)
)
else:
Expand Down
1 change: 1 addition & 0 deletions docker/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ x-shared-env: &shared-api-worker-env
OCEANBASE_CLUSTER_NAME: ${OCEANBASE_CLUSTER_NAME:-difyai}
OCEANBASE_MEMORY_LIMIT: ${OCEANBASE_MEMORY_LIMIT:-6G}
CREATE_TIDB_SERVICE_JOB_ENABLED: ${CREATE_TIDB_SERVICE_JOB_ENABLED:-false}
RETRIEVAL_TOP_N: ${RETRIEVAL_TOP_N:-0}

services:
# API service
Expand Down