Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17 implement cli based interaction #34

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Dec 3, 2024

What's changing

  • Create cli.py and expose it as entrypoint.
  • Create config and refactor how speaker configuration is handled.
  • Update demo to only expose speaker customization instead of the full prompt.
    image
  • Update unit tests to use mock.
  • Add e2e tests that can be run through workflow_dispatch (takes too much time to be enabled by default)

How to test it

  • CLI
document-to-podcast --from_config example_data/config.yaml
  • Demo change:
python -m streamlit run demo/app.py
  • Docs updates:
mkdocs serve

@daavoo daavoo self-assigned this Dec 3, 2024
@daavoo daavoo linked an issue Dec 3, 2024 that may be closed by this pull request
@daavoo daavoo marked this pull request as draft December 4, 2024 15:08
@daavoo daavoo force-pushed the 17-implement-cli-based-interaction branch from 4e95724 to 1bded51 Compare December 5, 2024 10:22
@stefanfrench
Copy link
Contributor

@daavoo - Did an early test of this and its working well - tested on codespaces and worked end-to-end with the output saving. thanks for also updating the docs.

@daavoo daavoo force-pushed the 17-implement-cli-based-interaction branch from bd56198 to caae572 Compare December 10, 2024 12:27
@daavoo daavoo mentioned this pull request Dec 11, 2024
3 tasks
@daavoo daavoo force-pushed the 17-implement-cli-based-interaction branch from fe55a78 to bf5698b Compare December 11, 2024 11:10
@daavoo daavoo marked this pull request as ready for review December 11, 2024 11:10
@daavoo daavoo requested review from Kostis-S-Z, stefanfrench and a team and removed request for Kostis-S-Z and stefanfrench December 11, 2024 11:10
@stefanfrench
Copy link
Contributor

@daavoo - Thanks for this, a lot of work here!

  • Testing the CLI was smooth, no issues found and yaml is clear.
  • I was testing the new streamlit app locally. I really like the speaker configuration, however after generating the podcast, I got this error:
    image

@daavoo
Copy link
Contributor Author

daavoo commented Dec 11, 2024

  • however after generating the podcast, I got this error:

Taking a look 👀

@daavoo
Copy link
Contributor Author

daavoo commented Dec 11, 2024

  • however after generating the podcast, I got this error:

Taking a look 👀

@stefanfrench I think this is fixed now

Copy link
Contributor

@stefanfrench stefanfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Tested CLI works
  • Tested updated streamlit app works
  • Tested mkdocs serve

All working as expected, approved for merge. Thanks for the big effort on this. I think we will need to update the customiozation guide docs based on the changes, but we can do that in a separate issue.

Copy link
Contributor

@Kostis-S-Z Kostis-S-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes just looking it from GH. I ll test it locally in a bit.

"Speaker 1": "Sure! Imagine it like this...",
"Speaker 2": "Oh, that's cool! But how does..."
}
sampling_rate: 44100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually depends on the TTS model we use. Parler uses 44.100, but Oute models use 24.000. So sampling rate should be set internally by us like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I don't know why I thought it was a user-facing param

output_folder: str
text_to_text_model: Annotated[str, AfterValidator(validate_text_to_text_model)]
text_to_text_prompt: str
text_to_speech_model: Literal[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look cleaner than before, however it no longer makes it able for each speaker to use a different TTS model (e.g speaker 1 is from parler, speaker 2 is from oute). I had seen this feature in another open source notebooklm implementation and thought it was cool. Do we think this is something we want to enable or it is a bit of an overreach (who can load locally LLM + >=2 TTS models anyway)? I am okay with basically removing this feature, but maybe we can update the PR description to make it clear that this wont be possible anymore

Copy link
Contributor Author

@daavoo daavoo Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this is something we want to enable or it is a bit of an overreach (who can load locally LLM + >=2 TTS models anyway)?

I think it is an interesting scenario. Probably not worth to make the default CLI more complex but it feels like it should be doable using the "low level" API directly. Maybe it is a good example to add to some "advanced customization/usage" page in the docs or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this change since with #49 , I will need to reintroduce this function together with _speech_generation_oute.

And speaker profile could either by an Oute id (female_1) or a natural language description from Parler. I do feel a bit weird though that we would be using one variable in slightly two different ways. Do you have any other suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆒

Comment on lines +15 to +18
document-to-podcast \
--input_file "example_data/Mozilla-Trustworthy_AI.pdf" \
--output_folder "example_data"
--text_to_text_model "Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you dont provide an argument, does it take the value from the config.yaml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement CLI based interaction
3 participants