You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template :) and fill in all the required fields.
Dify version
13.2
Cloud or Self Hosted
Cloud
Steps to reproduce
I’m experiencing an issue where audio files cannot be properly uploaded and processed by the multimodal models (GPT audio preview and Gemini 1.5 Pro) through the Dify platform. When I attempt to upload an audio file, it successfully generates a URL, but in subsequent follow-up messages, no output or processed content is returned. The input field appears empty, and the audio is never processed.
✔️ Expected Behavior
After uploading the audio file and providing the generated URL in a follow-up request, the multimodal models should transcribe or otherwise process the audio, returning the corresponding output (e.g., transcription text).
❌ Actual Behavior
The response is empty, and no audio processing occurs, despite the upload being successful.
Additional Information:
The issue persists across different audio file formats.
Re-attempting with different files does not resolve the problem.
Other non-audio inputs work as expected in the follow-up messages.
Request for Assistance:
Could you please investigate this issue and advise on whether it is a known bug, a configuration error, or if additional permissions are required to enable audio processing on the Dify platform?
The text was updated successfully, but these errors were encountered:
The issue you're experiencing, where audio files are uploaded successfully but not processed, could be due to several factors. Here are some potential causes and solutions:
File Format and Size: Ensure that the audio files are in a supported format. Dify supports formats like flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Also, check that the file size does not exceed the upload limit [1][2][3].
S3 Upload and URL Generation: The process involves generating a URL and uploading the file to an S3 bucket. If the URL is generated but the file is not processed, there might be an issue with the S3 upload or the subsequent steps. Ensure that the S3 bucket permissions and configurations are correct [4].
Transcription Job: After uploading, a transcription job is initiated. If this job fails, it could be due to incorrect parameters or configuration issues. Check the logs for any specific error messages related to the transcription job [4].
Error Messages: Review any error messages or logs generated during the process. Common errors include NoAudioUploadedError, AudioTooLargeError, and UnsupportedAudioTypeError, among others [5][6].
AWS SageMaker Endpoint: If you're using AWS for processing, ensure that the SageMaker endpoint is correctly set up and accessible. The endpoint processes the audio file and returns the transcribed text [4][7].
Configuration and Permissions: Verify that all necessary configurations and permissions are in place for audio processing. This includes ensuring that any required AWS credentials and permissions are correctly configured.
If these steps do not resolve the issue, it might be helpful to check for any updates or patches for Dify version 13.2 that address known bugs related to audio processing.
Self Checks
Dify version
13.2
Cloud or Self Hosted
Cloud
Steps to reproduce
I’m experiencing an issue where audio files cannot be properly uploaded and processed by the multimodal models (GPT audio preview and Gemini 1.5 Pro) through the Dify platform. When I attempt to upload an audio file, it successfully generates a URL, but in subsequent follow-up messages, no output or processed content is returned. The input field appears empty, and the audio is never processed.
✔️ Expected Behavior
After uploading the audio file and providing the generated URL in a follow-up request, the multimodal models should transcribe or otherwise process the audio, returning the corresponding output (e.g., transcription text).
❌ Actual Behavior
The response is empty, and no audio processing occurs, despite the upload being successful.
Additional Information:
The issue persists across different audio file formats.
Re-attempting with different files does not resolve the problem.
Other non-audio inputs work as expected in the follow-up messages.
Request for Assistance:
Could you please investigate this issue and advise on whether it is a known bug, a configuration error, or if additional permissions are required to enable audio processing on the Dify platform?
The text was updated successfully, but these errors were encountered: