Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support tencent vector db #3568

Merged
merged 34 commits into from
Jun 14, 2024
Merged

Conversation

quicksandznzn
Copy link
Contributor

@quicksandznzn quicksandznzn commented Apr 17, 2024

Description

Support Tencent Vector DB

dependencies

tcvectordb==1.3.2

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update, included: Dify Document
  • Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
  • Dependency upgrade

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • TODO

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
  • optional I have made corresponding changes to the documentation
  • optional I have added tests that prove my fix is effective or that my feature works
  • optional New and existing unit tests pass locally with my changes

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. dependencies Pull requests that update a dependency file 📚 documentation Improvements or additions to documentation labels Apr 17, 2024
api/.env Outdated Show resolved Hide resolved
docker/docker-compose.yaml Outdated Show resolved Hide resolved
api/config.py Outdated Show resolved Hide resolved
@bowenliang123
Copy link
Contributor

bowenliang123 commented Apr 17, 2024

-1 for supporting tencent vdb

  1. tencent vdb , or Tencent Cloud VectorDB (https://cloud.tencent.com/product/vdb), is not an open-sourced vector db , which leads to less testability to test against the target vdb instance. The code will easily come to an idle status.
  2. missing required tcvectordb python package in requirements.txt.
  3. the package tcvectordb provides no information for usage and requirements on Pypl public repo, according to https://pypi.org/project/tcvectordb/
  4. never put .env file to the PR

Copy link
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me.

@wade30822
Copy link

how is it going?

@quicksandznzn
Copy link
Contributor Author

how is it going?

wait review ~

@bowenliang123
Copy link
Contributor

  1. please resolve the sytle violation in Python code by running dev/reformat.
  2. move the tests to api/tests/integration_tests/vdb/tcvectordb

@quicksandznzn
Copy link
Contributor Author

  1. please resolve the sytle violation in Python code by running dev/reformat.
  2. move the tests to api/tests/integration_tests/vdb/tcvectordb

done~

@bowenliang123
Copy link
Contributor

ok, thx.

limit=kwargs.get('top_k', 4),
timeout=self._client_config.timeout,
)
return self._get_search_res(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the socre is not returned and we have the score thresold check.
pls refer to below code:

        for doc, score in docs_and_scores:
            score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
            # check score threshold
            if score > score_threshold:
                doc.metadata['score'] = score
                docs.append(doc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroameli
Copy link
Contributor

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

@JohnJyong
Copy link
Contributor

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

the sdk is fine , it has returned the score

@JohnJyong
Copy link
Contributor

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)
    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

@quicksandznzn
Copy link
Contributor Author

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)
    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

thanks , optimized

with redis_client.lock(lock_name, timeout=20):
collection_exist_cache_key = 'vector_indexing_{}'.format(self._collection_name)
if redis_client.get(collection_exist_cache_key):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the self.collection not initialized @quicksandznzn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@zeroameli
Copy link
Contributor

@quicksandznzn I found some problems:

def delete_by_metadata_field(self, key: str, value: str) -> None:
docs = self._db.collection(self._collection_name).query(filter=Filter(Filter.In(key, [value])))
if docs and len(docs) > 0:
self.collection.delete(document_ids=[doc['id'] for doc in docs])

  • param limit is needed when query with filter, why not use delete with filter.
  • fields in metadata should be indexed if we need to filter by them.

The self.collection won't be initialized in multithread because of redis lock (For example, create a dataset), why not use self._db.collection(self._collection_name)

@wade30822
Copy link

it takes so long~~ 😭

crazywoola
crazywoola previously approved these changes Jun 13, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 13, 2024
@crazywoola crazywoola merged commit 4080f7b into langgenius:main Jun 14, 2024
7 checks passed
@takatost takatost mentioned this pull request Jun 14, 2024
dengpeng pushed a commit to dengpeng/dify that referenced this pull request Jun 16, 2024
Scorpion1221 added a commit to yybht155/dify that referenced this pull request Jun 26, 2024
* commit '12c815c597b121357151c798aae6580304416937': (97 commits)
  fix: ExtractSetting optional value missing None as default val (langgenius#5238)
  version to 0.6.11 (langgenius#5224)
  Feat/firecrawl data source (langgenius#5232)
  update tooltip (langgenius#5235)
  fix: note editor italic (langgenius#5230)
  fix: z-index (langgenius#5229)
  Update README.md (langgenius#5228)
  fix: allow the name and icon of the web app to be set independently of that of the bot itself (langgenius#5225)
  fix: initialize site with customized icon and icon_background (langgenius#5227)
  feat: support firecrawl frontend code (langgenius#5226)
  feat(Tools): Add Feishu multi-dimensional table operation function (langgenius#5213)
  chore: development script for syncing Poetry lockfile (langgenius#5170)
  fix: workspace member's last_active should be last_active_time, but not last_login_time (langgenius#4906)
  fix: number variable cause type error in openai moderation (langgenius#5222)
  feat: new editor user permission profile (langgenius#4435)
  Fix: http_request delete method not working (langgenius#4975)
  Update README, deploy dify with YAML file on Kubernetes (langgenius#5131)
  feat: support tencent vector db (langgenius#3568)
  fix: add repo check for build-push.yml (langgenius#5141)
  feat: Add Optional API Key, Proxy Server, and Bypass Cache Parameters to Jina Tools (langgenius#5197)
  ...

# Conflicts:
#	api/core/helper/code_executor/code_executor.py
#	api/requirements.txt
HuberyHuV1 pushed a commit to HuberyHuV1/dify that referenced this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file 📚 documentation Improvements or additions to documentation lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants