feat: support tencent vector db #3568

quicksandznzn · 2024-04-17T11:20:19Z

Description

Support Tencent Vector DB

dependencies

tcvectordb==1.3.2

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update, included: Dify Document
Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
Dependency upgrade

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

TODO

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
optional I have made corresponding changes to the documentation
optional I have added tests that prove my fix is effective or that my feature works
optional New and existing unit tests pass locally with my changes

api/.env

api/tests/unit_tests/core/rag/datasource/vdb/tencent/test_tencent.py

docker/docker-compose.yaml

api/config.py

bowenliang123 · 2024-04-17T13:21:24Z

-1 for supporting tencent vdb

tencent vdb , or Tencent Cloud VectorDB (https://cloud.tencent.com/product/vdb), is not an open-sourced vector db , which leads to less testability to test against the target vdb instance. The code will easily come to an idle status.
missing required tcvectordb python package in requirements.txt.
the package tcvectordb provides no information for usage and requirements on Pypl public repo, according to https://pypi.org/project/tcvectordb/
never put .env file to the PR

api/core/rag/datasource/vdb/tencent/tencent_vector.py

crazywoola

The rest looks good to me.

wade30822 · 2024-04-25T07:24:53Z

how is it going?

quicksandznzn · 2024-04-25T07:30:37Z

how is it going?

wait review ~

bowenliang123 · 2024-04-25T08:11:05Z

please resolve the sytle violation in Python code by running dev/reformat.
move the tests to api/tests/integration_tests/vdb/tcvectordb

…cvectordb

quicksandznzn · 2024-04-25T08:37:28Z

please resolve the sytle violation in Python code by running dev/reformat.

move the tests to api/tests/integration_tests/vdb/tcvectordb

done~

bowenliang123 · 2024-04-25T08:48:22Z

ok, thx.

JohnJyong · 2024-04-25T10:11:37Z

api/core/rag/datasource/vdb/tencent/tencent_vector.py

+                                                                limit=kwargs.get('top_k', 4),
+                                                                timeout=self._client_config.timeout,
+                                                                )
+        return self._get_search_res(res)


the socre is not returned and we have the score thresold check.
pls refer to below code:

for doc, score in docs_and_scores: score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0 # check score threshold if score > score_threshold: doc.metadata['score'] = score docs.append(doc)

@quicksandznzn

zeroameli · 2024-04-25T10:19:56Z

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

JohnJyong · 2024-04-25T10:45:35Z

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

the sdk is fine , it has returned the score

JohnJyong · 2024-04-25T11:28:31Z

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)

    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

quicksandznzn · 2024-04-26T01:43:00Z

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)

    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

thanks , optimized

api/core/rag/datasource/vdb/tencent/tencent_vector.py

JohnJyong · 2024-05-07T10:25:21Z

api/core/rag/datasource/vdb/tencent/tencent_vector.py

+        with redis_client.lock(lock_name, timeout=20):
+            collection_exist_cache_key = 'vector_indexing_{}'.format(self._collection_name)
+            if redis_client.get(collection_exist_cache_key):
+                return


the self.collection not initialized @quicksandznzn

zeroameli · 2024-05-26T12:38:44Z

@quicksandznzn I found some problems:

dify/api/core/rag/datasource/vdb/tencent/tencent_vector.py

Lines 158 to 161 in a591366

    
           def delete_by_metadata_field(self, key: str, value: str) -> None: 
        
               docs = self._db.collection(self._collection_name).query(filter=Filter(Filter.In(key, [value]))) 
        
               if docs and len(docs) > 0: 
        
                   self.collection.delete(document_ids=[doc['id'] for doc in docs])

param limit is needed when query with filter, why not use delete with filter.
fields in metadata should be indexed if we need to filter by them.

The self.collection won't be initialized in multithread because of redis lock (For example, create a dataset), why not use self._db.collection(self._collection_name)

…ollection

wade30822 · 2024-06-07T09:27:26Z

it takes so long~~ 😭

* commit '12c815c597b121357151c798aae6580304416937': (97 commits) fix: ExtractSetting optional value missing None as default val (langgenius#5238) version to 0.6.11 (langgenius#5224) Feat/firecrawl data source (langgenius#5232) update tooltip (langgenius#5235) fix: note editor italic (langgenius#5230) fix: z-index (langgenius#5229) Update README.md (langgenius#5228) fix: allow the name and icon of the web app to be set independently of that of the bot itself (langgenius#5225) fix: initialize site with customized icon and icon_background (langgenius#5227) feat: support firecrawl frontend code (langgenius#5226) feat(Tools): Add Feishu multi-dimensional table operation function (langgenius#5213) chore: development script for syncing Poetry lockfile (langgenius#5170) fix: workspace member's last_active should be last_active_time, but not last_login_time (langgenius#4906) fix: number variable cause type error in openai moderation (langgenius#5222) feat: new editor user permission profile (langgenius#4435) Fix: http_request delete method not working (langgenius#4975) Update README, deploy dify with YAML file on Kubernetes (langgenius#5131) feat: support tencent vector db (langgenius#3568) fix: add repo check for build-push.yml (langgenius#5141) feat: Add Optional API Key, Proxy Server, and Bypass Cache Parameters to Jina Tools (langgenius#5197) ... # Conflicts: # api/core/helper/code_executor/code_executor.py # api/requirements.txt

quicksandznzn added 2 commits April 17, 2024 19:14

feat: support tencent vdb

324a0ba

optimize:add requirements

75bbfb5

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. dependencies Pull requests that update a dependency file 📚 documentation Improvements or additions to documentation labels Apr 17, 2024

crazywoola requested changes Apr 17, 2024

View reviewed changes

api/.env Outdated Show resolved Hide resolved

api/tests/unit_tests/core/rag/datasource/vdb/tencent/test_tencent.py Outdated Show resolved Hide resolved

docker/docker-compose.yaml Outdated Show resolved Hide resolved

api/config.py Outdated Show resolved Hide resolved

quicksandznzn added 3 commits April 18, 2024 09:15

remove .env

96aeb34

optimize: test tencent vdb

fe905ea

optimize: config prefix

90dca38

quicksandznzn requested a review from crazywoola April 18, 2024 04:59

crazywoola reviewed Apr 18, 2024

View reviewed changes

api/core/rag/datasource/vdb/tencent/tencent_vector.py Outdated Show resolved Hide resolved

crazywoola reviewed Apr 18, 2024

View reviewed changes

crazywoola requested review from takatost and JohnJyong April 18, 2024 10:18

quicksandznzn added 2 commits April 19, 2024 09:08

remove comments

e3d5d2f

Merge branch 'langgenius:main' into main

617fec0

optimize: reformat and move test to api/tests/integration_tests/vdb/t…

01e27de

…cvectordb

Merge branch 'main' into main

a401a73

JohnJyong reviewed Apr 25, 2024

View reviewed changes

optimize: score_threshold

1ed4926

optimize

a289312

JohnJyong reviewed Apr 29, 2024

View reviewed changes

api/core/rag/datasource/vdb/tencent/tencent_vector.py Show resolved Hide resolved

quicksandznzn added 2 commits April 29, 2024 16:59

optimize

f3fcc16

Merge branch 'main' into main

dbf22ef

JohnJyong reviewed May 7, 2024

View reviewed changes

quicksandznzn added 6 commits May 8, 2024 14:13

Merge branch 'langgenius:main' into main

ddee4ed

optimize: remove cache

94c01ce

Merge branch 'main' into main

802d8ec

Merge branch 'langgenius:main' into main

742ea0a

optimize: collection

1a60f37

Merge branch 'main' into main

a591366

quicksandznzn added 3 commits May 27, 2024 13:53

Merge branch 'langgenius:main' into main

4b368a5

optimize: use delete by filter , self.collection change to self._db.c…

78a6cc4

…ollection

Merge branch 'main' into main

b495e44

quicksandznzn added 2 commits June 13, 2024 18:24

fix

b0c4949

fix

7137ebe

crazywoola previously approved these changes Jun 13, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 13, 2024

quicksandznzn added 2 commits June 14, 2024 09:12

Merge branch 'main' of https://github.com/langgenius/dify

d79b025

poetry lock --no-update

a076d2e

quicksandznzn dismissed crazywoola’s stale review via a076d2e June 14, 2024 01:51

fix Conflicts

4f6dd68

crazywoola approved these changes Jun 14, 2024

View reviewed changes

crazywoola merged commit 4080f7b into langgenius:main Jun 14, 2024
7 checks passed

takatost mentioned this pull request Jun 14, 2024

version to 0.6.11 #5224

Merged

dengpeng pushed a commit to dengpeng/dify that referenced this pull request Jun 16, 2024

feat: support tencent vector db (langgenius#3568)

eed830d

HuberyHuV1 pushed a commit to HuberyHuV1/dify that referenced this pull request Jul 22, 2024

feat: support tencent vector db (langgenius#3568)

445abec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support tencent vector db #3568

feat: support tencent vector db #3568

quicksandznzn commented Apr 17, 2024 •

edited

Loading

bowenliang123 commented Apr 17, 2024 •

edited

Loading

crazywoola left a comment

wade30822 commented Apr 25, 2024

quicksandznzn commented Apr 25, 2024

bowenliang123 commented Apr 25, 2024

quicksandznzn commented Apr 25, 2024

bowenliang123 commented Apr 25, 2024

JohnJyong Apr 25, 2024

JohnJyong Apr 25, 2024

zeroameli commented Apr 25, 2024

JohnJyong commented Apr 25, 2024

JohnJyong commented Apr 25, 2024

quicksandznzn commented Apr 26, 2024

JohnJyong May 7, 2024

quicksandznzn May 8, 2024

zeroameli commented May 26, 2024

wade30822 commented Jun 7, 2024

feat: support tencent vector db #3568

feat: support tencent vector db #3568

Conversation

quicksandznzn commented Apr 17, 2024 • edited Loading

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

bowenliang123 commented Apr 17, 2024 • edited Loading

crazywoola left a comment

Choose a reason for hiding this comment

wade30822 commented Apr 25, 2024

quicksandznzn commented Apr 25, 2024

bowenliang123 commented Apr 25, 2024

quicksandznzn commented Apr 25, 2024

bowenliang123 commented Apr 25, 2024

JohnJyong Apr 25, 2024

Choose a reason for hiding this comment

JohnJyong Apr 25, 2024

Choose a reason for hiding this comment

zeroameli commented Apr 25, 2024

JohnJyong commented Apr 25, 2024

JohnJyong commented Apr 25, 2024

quicksandznzn commented Apr 26, 2024

JohnJyong May 7, 2024

Choose a reason for hiding this comment

quicksandznzn May 8, 2024

Choose a reason for hiding this comment

zeroameli commented May 26, 2024

wade30822 commented Jun 7, 2024

quicksandznzn commented Apr 17, 2024 •

edited

Loading

bowenliang123 commented Apr 17, 2024 •

edited

Loading