Replies: 2 comments
-
@azurewtl the current tree index has not been touched/maintained in quite some time 😓 It is in need of quite lot of refactoring tbh. If you are ambitious enough to tackle it, it would be appreciated 🙏🏻 tbh I haven't even gone through all the code in there lol |
Beta Was this translation helpful? Give feedback.
-
Generalized from the original topicThe goal here is to build a hierarchy of knowledge base structure that helping retriever to find the MOST relevant chunks, when given a bunch of documents in folders, which contains many useful hierarchal meta info in it's own folder structure. My current approach would be:
Existing Approach I have ResearchedDuring the evaluation of my ambition(surprised by the comprehensiveness of exist feature), I have found 4 existing modules, which many construct such hierarchy of knowledge base.
|
Beta Was this translation helpful? Give feedback.
-
By skim throughly the code about
TreeIndex
, I think the idea is brilliant. However I am a bit confused by the current implementation.In order to keep track of the tree structure, It uses a dict in
TreeIndex
, which seems have the same function asnode relationship
, and the latter is more intuitive in my option.Additionally, current implement of GPTTreeIndexBuilder merely merges the input
nodes/documents
until the number hitnum_children
parameter, regardless of its metadata.I think the nodes should be merge primary based on
raw document file path
, in such way the node from actually the same file is under the same parent.I am think about to reimplement the tree index behavior using raw_documents. The splitter should takes the whole document as a whole, and split the document based on the it path hierarchy and paragraphs. A tree structure should be generated during the splitting of document, to preserve the nature knowledge structure of folders.
Beta Was this translation helpful? Give feedback.
All reactions