Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement (json): enhance JSON Process Tool extraction to return st… #10575

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

BenjaminX
Copy link
Contributor

@BenjaminX BenjaminX commented Nov 12, 2024

…ructured messages and improve error handling

Checklist:

Important

Please review the checklist below before submitting your pull request.

  • Please open an issue before creating a PR or link to an existing issue
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

Description

Link issue #10559

Demo data:

{ 
    "Tables": [
        { 
            "Name": "123Name", 
            "ID": "1", 
            "DDL": "rewrew", 
            "QAs": "rewrewfew", 
            "SQLs": "fwrfre", 
            "Memo": "freferfre" 
        }, 
        { 
            "Name": "321Name", 
            "ID": "2", 
            "DDL": "fdsfdsfdswer", 
            "QAs": "32423r3", 
            "SQLs": "654654g54", 
            "Memo": "54332423423" 
        } 
    ] 
}

JSONPath filters:
$.Tables[*]

Expected

{
  "text": "[{\"Name\": \"123Name\", \"ID\": \"1\", \"DDL\": \"rewrew\", \"QAs\": \"rewrewfew\", \"SQLs\": \"fwrfre\", \"Memo\": \"freferfre\"}, {\"Name\": \"321Name\", \"ID\": \"2\", \"DDL\": \"fdsfdsfdswer\", \"QAs\": \"32423r3\", \"SQLs\": \"654654g54\", \"Memo\": \"54332423423\"}]\n",
  "files": [],
  "json": [
    {
      "0": {
        "Name": "123Name",
        "ID": "1",
        "DDL": "rewrew",
        "QAs": "rewrewfew",
        "SQLs": "fwrfre",
        "Memo": "freferfre"
      },
      "1": {
        "Name": "321Name",
        "ID": "2",
        "DDL": "fdsfdsfdswer",
        "QAs": "32423r3",
        "SQLs": "654654g54",
        "Memo": "54332423423"
      }
    }
  ]
}

Actual

{
  "text": "[{\"Name\": \"123Name\", \"ID\": \"1\", \"DDL\": \"rewrew\", \"QAs\": \"rewrewfew\", \"SQLs\": \"fwrfre\", \"Memo\": \"freferfre\"}, {\"Name\": \"321Name\", \"ID\": \"2\", \"DDL\": \"fdsfdsfdswer\", \"QAs\": \"32423r3\", \"SQLs\": \"654654g54\", \"Memo\": \"54332423423\"}]",
  "files": [],
  "json": []
}

Did not parse the json array objects, fixed return objects by filters.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update, included: Dify Document
  • Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
  • Dependency upgrade

Testing Instructions

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 💪 enhancement New feature or request labels Nov 12, 2024
@BenjaminX
Copy link
Contributor Author

BenjaminX commented Nov 14, 2024

@crazywoola
bro, pls review code and check it.
Please help me merge to the main branch

If you have any questions about the code, please let me know in time.
Thx a lot

@BenjaminX
Copy link
Contributor Author

@crazywoola
have you any concerns about PR?

@PedroGomes02
Copy link
Contributor

Hi there,
It seems you're looking for a tool that outputs the entire JSON representation of an object, allowing it to be used in subsequent workflow nodes. Is that correct? Essentially, the tool would return the full JSON object in its entirety.
In my opinion:
Rename JSON Parse to JSON Parse & Extractor:
This updated tool will always return the full JSON object (parse) and, if a JSONPath is provided, extract and return the corresponding values (extractor), current behavior.
This ensures the tool serves both purposes without disrupting existing workflows, maintaining backward compatibility while adding clarity and functionality (real JSON parse of the string).
Introduce a JSON Filter Tool:
This new tool would perform the opposite of JSON Delete by returning a JSON object (or its text representation) that contains only the fields matching the specified JSONPath.
This complements JSON Parse & Extractor by offering more fine-grained filtering options.

With these updates, the toolset would address a wider range of needs while remaining easy to use. Let me know your thoughts on this proposal!

@BenjaminX
Copy link
Contributor Author

BenjaminX commented Nov 19, 2024

Hi there, It seems you're looking for a tool that outputs the entire JSON representation of an object, allowing it to be used in subsequent workflow nodes. Is that correct? Essentially, the tool would return the full JSON object in its entirety. In my opinion: Rename JSON Parse to JSON Parse & Extractor: This updated tool will always return the full JSON object (parse) and, if a JSONPath is provided, extract and return the corresponding values (extractor), current behavior. This ensures the tool serves both purposes without disrupting existing workflows, maintaining backward compatibility while adding clarity and functionality (real JSON parse of the string). Introduce a JSON Filter Tool: This new tool would perform the opposite of JSON Delete by returning a JSON object (or its text representation) that contains only the fields matching the specified JSONPath. This complements JSON Parse & Extractor by offering more fine-grained filtering options.

With these updates, the toolset would address a wider range of needs while remaining easy to use. Let me know your thoughts on this proposal!

Hi Pedro,
'It seems you're looking for a tool that outputs the entire JSON representation of an object, allowing it to be used in subsequent workflow nodes. Is that correct? Essentially, the tool would return the full JSON object in its entirety.
In my opinion:
Rename JSON Parse to JSON Parse & Extractor:
This updated tool will always return the full JSON object (parse) and, if a JSONPath is provided, extract and return the corresponding values (extractor), current behavior.'

Yes, Absolutely correct.

Just like you said, for backward compatibility. I also agree, making a new JSON Parse & Extractor tool might be a better choice than modifying the current one.

I will modify this part of the code, adding a JSON Extractor in the JSON Process Tool, while retaining the existing four functions: Parse, Insert, Delete, Replace.

@crazywoola have u anything comments and suggestion about this?

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Nov 22, 2024
@BenjaminX
Copy link
Contributor Author

@crazywoola
Revised the implementation method entirely according to @PedroGomes02's suggestion, please code review.

Thx PedroGomes02

@PedroGomes02
Copy link
Contributor

Hi, my original idea is to adapt the parse tool to:

  • Return the filtered JSONPath text message, as it currently does (backward compatibility), but make this optional (with the "json_filter" parameter not required in parse.yaml).
  • Additionally, I propose adding a new boolean parameter in parse.yaml to optionally return the fully parsed JSON.

This way, we can use the parse tool either to filter/extract specific information or to return the fully parsed JSON, or both.
I think the introduction on a new JSON Filter Tool (returning a JSON object (or its text representation) that contains only the fields matching the specified JSONPath) should be done separately from this one.

parse.py

    def _invoke(
        self,
        user_id: str,
        tool_parameters: dict[str, Any],
    ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:
        """
        invoke tools
        """
        # get tool parameters
        content = tool_parameters.get("content", "")
        json_filter = tool_parameters.get("json_filter", "")
        ensure_ascii = tool_parameters.get("ensure_ascii", True)
        output_full_parsed_json = tool_parameters.get("output_full_parsed_json", True)
        
        if not content:
            return self.create_text_message("Invalid parameter content")

        try:
            final_result = []
            if output_full_parsed_json:
                # parse full json
                json_content = json.loads(content)

                # append json_messages to final_result
                if isinstance(json_content, list):
                    for item in json_content:
                        final_result.append(self.create_json_message(item))
                else:
                    final_result.append(self.create_json_message(json_content))

            if json_filter:
                filtered_result = self._extract(content, json_filter, ensure_ascii)
                final_result.append(self.create_text_message(str(filtered_result)))

            return final_result

        except Exception:
            return self.create_text_message("Failed to extract JSON content")

parse.yaml

identity:
  name: parse
  author: Mingwei Zhang
  label:
    en_US: JSON Parse
    zh_Hans: JSON 解析
    pt_BR: JSON Parse
description:
  human:
    en_US: A tool for extracting JSON objects
    zh_Hans: 一个解析JSON对象的工具
    pt_BR: A tool for extracting JSON objects
  llm: A tool for extracting JSON objects
parameters:
  - name: content
    type: string
    required: true
    label:
      en_US: JSON data
      zh_Hans: JSON数据
      pt_BR: JSON data
    human_description:
      en_US: JSON data
      zh_Hans: JSON数据
      pt_BR: JSON数据
    llm_description: JSON data to be processed
    form: llm
  - name: json_filter
    type: string
    required: false
    label:
      en_US: JSON filter
      zh_Hans: JSON解析对象
      pt_BR: JSON filter
    human_description:
      en_US: JSON fields to be parsed
      zh_Hans: 需要解析的 JSON 字段
      pt_BR: JSON fields to be parsed
    llm_description: JSON fields to be parsed
    form: llm
  - name: ensure_ascii
    type: boolean
    default: true
    label:
      en_US: Ensure ASCII
      zh_Hans: 确保 ASCII
      pt_BR: Ensure ASCII
    human_description:
      en_US: Ensure the JSON output is ASCII encoded
      zh_Hans: 确保输出的 JSON 是 ASCII 编码
      pt_BR: Ensure the JSON output is ASCII encoded
    form: form
  - name: output_full_parsed_json
    type: boolean
    default: true
    label:
      en_US: Output Full Parsed JSON
      zh_Hans: 输出完整解析的 JSON
      pt_BR: Output Full Parsed JSON
    human_description:
      en_US: The full parsed JSON is also outputted
      zh_Hans: 完整解析的 JSON 也已输出
      pt_BR: The full parsed JSON is also outputted
    form: form

@BenjaminX
Copy link
Contributor Author

Hi, my original idea is to adapt the parse tool to:

  • Return the filtered JSONPath text message, as it currently does (backward compatibility), but make this optional (with the "json_filter" parameter not required in parse.yaml).
  • Additionally, I propose adding a new boolean parameter in parse.yaml to optionally return the fully parsed JSON.

This way, we can use the parse tool either to filter/extract specific information or to return the fully parsed JSON, or both. I think the introduction on a new JSON Filter Tool (returning a JSON object (or its text representation) that contains only the fields matching the specified JSONPath) should be done separately from this one.

parse.py

    def _invoke(
        self,
        user_id: str,
        tool_parameters: dict[str, Any],
    ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:
        """
        invoke tools
        """
        # get tool parameters
        content = tool_parameters.get("content", "")
        json_filter = tool_parameters.get("json_filter", "")
        ensure_ascii = tool_parameters.get("ensure_ascii", True)
        output_full_parsed_json = tool_parameters.get("output_full_parsed_json", True)
        
        if not content:
            return self.create_text_message("Invalid parameter content")

        try:
            final_result = []
            if output_full_parsed_json:
                # parse full json
                json_content = json.loads(content)

                # append json_messages to final_result
                if isinstance(json_content, list):
                    for item in json_content:
                        final_result.append(self.create_json_message(item))
                else:
                    final_result.append(self.create_json_message(json_content))

            if json_filter:
                filtered_result = self._extract(content, json_filter, ensure_ascii)
                final_result.append(self.create_text_message(str(filtered_result)))

            return final_result

        except Exception:
            return self.create_text_message("Failed to extract JSON content")

parse.yaml

identity:
  name: parse
  author: Mingwei Zhang
  label:
    en_US: JSON Parse
    zh_Hans: JSON 解析
    pt_BR: JSON Parse
description:
  human:
    en_US: A tool for extracting JSON objects
    zh_Hans: 一个解析JSON对象的工具
    pt_BR: A tool for extracting JSON objects
  llm: A tool for extracting JSON objects
parameters:
  - name: content
    type: string
    required: true
    label:
      en_US: JSON data
      zh_Hans: JSON数据
      pt_BR: JSON data
    human_description:
      en_US: JSON data
      zh_Hans: JSON数据
      pt_BR: JSON数据
    llm_description: JSON data to be processed
    form: llm
  - name: json_filter
    type: string
    required: false
    label:
      en_US: JSON filter
      zh_Hans: JSON解析对象
      pt_BR: JSON filter
    human_description:
      en_US: JSON fields to be parsed
      zh_Hans: 需要解析的 JSON 字段
      pt_BR: JSON fields to be parsed
    llm_description: JSON fields to be parsed
    form: llm
  - name: ensure_ascii
    type: boolean
    default: true
    label:
      en_US: Ensure ASCII
      zh_Hans: 确保 ASCII
      pt_BR: Ensure ASCII
    human_description:
      en_US: Ensure the JSON output is ASCII encoded
      zh_Hans: 确保输出的 JSON 是 ASCII 编码
      pt_BR: Ensure the JSON output is ASCII encoded
    form: form
  - name: output_full_parsed_json
    type: boolean
    default: true
    label:
      en_US: Output Full Parsed JSON
      zh_Hans: 输出完整解析的 JSON
      pt_BR: Output Full Parsed JSON
    human_description:
      en_US: The full parsed JSON is also outputted
      zh_Hans: 完整解析的 JSON 也已输出
      pt_BR: The full parsed JSON is also outputted
    form: form

This code implementation is really better, re-commit.

@PedroGomes02
Copy link
Contributor

This is to replace parse.py tool, not create a next extractor tool

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 22, 2024
@BenjaminX
Copy link
Contributor Author

This is to replace parse.py tool, not create a next extractor tool

sorry, follow you code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants