Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: Enable Opik to display additional media formats, including audio, PDF, and video. #567

Open
pleomax0730 opened this issue Nov 6, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@pleomax0730
Copy link

pleomax0730 commented Nov 6, 2024

Proposal summary

Feature Request

Enable Opik to display additional media formats, including audio, PDF, and video.

Background

Opik currently supports only image display, which limits its flexibility for monitoring, testing, and evaluating multimodal LLM applications that may involve other data formats. Expanding support for audio, PDF, and video would allow users to fully leverage Opik’s capabilities across a broader range of use cases.

Proposed Use Cases:

  • Audio Analysis: Track and evaluate audio-based LLM applications, such as voice transcription, voicebots, and sentiment analysis.
  • PDF Document Evaluation: Facilitate assessment of document parsing models, PDF summarization, and question-answering systems on document data.
  • Video Content Monitoring: Enhance capabilities for video-based LLM applications, such as video content analysis, summarization, and media captioning.

Benefits:

  • Enhanced Multimodal Support: Broadens Opik’s applicability to multimodal applications, aligning with the needs of teams working across diverse LLM applications.
  • Improved Traceability: Extends Opik’s tracing and feedback logging for all media types, ensuring comprehensive monitoring for complex, cross-modal projects.
  • Unified Platform: Allows users to manage all media formats under a single platform, streamlining workflows.

Summary

Adding support for audio, PDF, and video display will make Opik a more versatile platform, suitable for a wide range of LLM applications beyond text and images. This enhancement will empower users to develop, evaluate, and monitor their applications seamlessly across all media types.

Motivation

Many existing solutions for LLM evaluation and monitoring are limited to text and image formats, with little to no support for other media types like audio, PDFs, or video. This lack of multimodal support forces teams to use multiple tools or rely on custom workarounds, creating friction in their workflows and hindering a comprehensive evaluation process.

By introducing audio, PDF, and video support, Opik could become the first open-source platform to offer complete multimodal monitoring and evaluation capabilities. This would make Opik highly attractive to teams working on complex applications that require seamless integration of various data formats, such as multimedia retrieval, interactive voice systems, and document processing pipelines.

Competitive Advantage:

Leading the way with these features would position Opik as a go-to solution for multimodal LLM applications, setting it apart from other evaluation and monitoring tools. This could significantly increase Opik’s user base by attracting organizations and researchers who need comprehensive, media-agnostic monitoring for their LLM projects.

@pleomax0730 pleomax0730 added the enhancement New feature or request label Nov 6, 2024
@jverre
Copy link
Collaborator

jverre commented Nov 6, 2024

Hi @pleomax0730
Thanks for the detailed request ! I really like this idea, we could introduce the concept of an attachment for a trace or span that allows you to log any additional data. Depending on the data type, we could introduce some ways to view the data in the UI.

Is there a specific type of data you would like to us to support first ? We have gotten requests for better PDF support so could be a good candidate to start with

@pleomax0730
Copy link
Author

@jverre Thanks for your reply to my feature request! In terms of the current community, I think the best order might be PDF, then audio, and finally video.

@jverre
Copy link
Collaborator

jverre commented Nov 7, 2024

Makes sense, I'll keep this ticket open. It's quite a big feature so we might not get to it straightaway

@AHB102
Copy link

AHB102 commented Dec 1, 2024

@jverre @pleomax0730 I'd like to contribute to the Documents part of this issue. Could you please provide some guidance on the specific tasks or features that need to be implemented? Would it involve tasks like document summarization or extraction of key information?

@pleomax0730
Copy link
Author

pleomax0730 commented Dec 1, 2024

@jverre @pleomax0730 I'd like to contribute to the Documents part of this issue. Could you please provide some guidance on the specific tasks or features that need to be implemented? Would it involve tasks like document summarization or extraction of key information?

Hi, @AHB102 langfuse track audio might be a good reference or idea of implementing this feature. No summarization or extraction is needed. Only display the media or data as the reference to this track.

The audio preview in GENERATION section.
image

@jverre
Copy link
Collaborator

jverre commented Dec 4, 2024

Hi @pleomax0730 @AHB102
We are starting to think about this feature and I went ahead and created a short document with how it would work in the SDK and in the FE: https://cometml.notion.site/Add-support-for-attachments-1527124010a38025a600cb7ea20ecacf

It's a pretty big initiative so feel free to add comments here in case we have missed anything that is relevant

@AHB102
Copy link

AHB102 commented Dec 6, 2024

@jverre Hello 🖐️, I read the doc and think starting with the SDK changes for docs sounds like a good plan. However, I’m still relatively new to software development, so I’m a bit lost in the codebase. Any specific docs or resources you’d recommend to help me get started?

@jverre
Copy link
Collaborator

jverre commented Dec 9, 2024

@AHB102 This is a pretty big feature that touches the Python SDK, the backend and the UI, might be a bit tricky as a new issue

I'll create a couple of issues later today and will tag them as good first issue that will be a bit smaller in scope, I recommend tackling one of these

@AHB102
Copy link

AHB102 commented Dec 9, 2024

@jverre Yeah agreed, I'll definitely look into good first issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants