Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After I upgraded dify to the latest version, the LLM is unable to read video data. #11590

Closed
5 tasks done
hw872715125 opened this issue Dec 12, 2024 · 1 comment
Closed
5 tasks done
Labels
🐞 bug Something isn't working

Comments

@hw872715125
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

13.2

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

After I upgraded dify to the latest version, the LLM is unable to read the video data. It was working before the upgrade, and the version before the upgrade was 0.11.
I have recorded the reproduction steps in a GIF, as follows:
step
Model: Gemini 1.5 Flash 002
LLM Prompt:
You are a video slicing tool. Your task is to automatically slice the input video based on its plot, visuals, and audio. If the input video is empty, do not process it. Please complete the task according to the following steps:

  1. Analyze the input video file to ensure that every frame can be read.
  2. Automatically slice the video based on its plot, visual layout, style, and other elements. Slices can be based on scene changes, dialogue segments, action sequences, etc.
  3. If the input video is already an independent segment, no slicing is required.
  4. If adjacent segments have some degree of relevance, they can be merged into a single segment.
  5. Determine the start and end time points for each segment.
  6. Output the start and end time points for each segment, ensuring that the content of each segment is coherent.
  7. The time points should be accurate to the millisecond.
    Only return the final result in a concise, formatted JSON format.
    The following are some examples to help you better understand the task:
    Output result example:
    [
    {
    "name":"Segment 1",
    "start_time":"1089",
    "end_time":"5121",
    "content":"Problem description"
    },
    {
    "name":"Segment 2",
    "start_time":"5122",
    "end_time":"189562",
    "content":"Guest answers"
    }
    ]
    image

✔️ Expected Behavior

After I upload the video, the model should read the video content and return the automatic slicing result information.

❌ Actual Behavior

Although the file was uploaded successfully, the model did not actually read the video data. It was working normally before the upgrade.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 12, 2024
@crazywoola
Copy link
Member

Duplicated #11497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants