17 implement cli based interaction #34

daavoo · 2024-12-03T15:11:11Z

What's changing

Create cli.py and expose it as entrypoint.
Create config and refactor how speaker configuration is handled.
Update demo to only expose speaker customization instead of the full prompt.
Update unit tests to use mock.
Add e2e tests that can be run through workflow_dispatch (takes too much time to be enabled by default)

How to test it

CLI

document-to-podcast --from_config example_data/config.yaml

Demo change:

python -m streamlit run demo/app.py

Docs updates:

mkdocs serve

stefanfrench · 2024-12-06T22:32:05Z

@daavoo - Did an early test of this and its working well - tested on codespaces and worked end-to-end with the output saving. thanks for also updating the docs.

- Refactor speakers part of the config.

stefanfrench · 2024-12-11T12:25:46Z

@daavoo - Thanks for this, a lot of work here!

Testing the CLI was smooth, no issues found and yaml is clear.
I was testing the new streamlit app locally. I really like the speaker configuration, however after generating the podcast, I got this error:

daavoo · 2024-12-11T12:28:27Z

however after generating the podcast, I got this error:

Taking a look 👀

daavoo · 2024-12-11T14:30:36Z

however after generating the podcast, I got this error:

Taking a look 👀

@stefanfrench I think this is fixed now

stefanfrench

Tested CLI works
Tested updated streamlit app works
Tested mkdocs serve

All working as expected, approved for merge. Thanks for the big effort on this. I think we will need to update the customiozation guide docs based on the changes, but we can do that in a separate issue.

Kostis-S-Z

A few notes just looking it from GH. I ll test it locally in a bit.

Kostis-S-Z · 2024-12-12T10:29:05Z

example_data/config.yaml

+    "Speaker 1": "Sure! Imagine it like this...",
+    "Speaker 2": "Oh, that's cool! But how does..."
+  }
+sampling_rate: 44100


this actually depends on the TTS model we use. Parler uses 44.100, but Oute models use 24.000. So sampling rate should be set internally by us like this

Got it! I don't know why I thought it was a user-facing param

Kostis-S-Z · 2024-12-12T11:36:26Z

src/document_to_podcast/config.py

+    output_folder: str
+    text_to_text_model: Annotated[str, AfterValidator(validate_text_to_text_model)]
+    text_to_text_prompt: str
+    text_to_speech_model: Literal[


This does look cleaner than before, however it no longer makes it able for each speaker to use a different TTS model (e.g speaker 1 is from parler, speaker 2 is from oute). I had seen this feature in another open source notebooklm implementation and thought it was cool. Do we think this is something we want to enable or it is a bit of an overreach (who can load locally LLM + >=2 TTS models anyway)? I am okay with basically removing this feature, but maybe we can update the PR description to make it clear that this wont be possible anymore

Do we think this is something we want to enable or it is a bit of an overreach (who can load locally LLM + >=2 TTS models anyway)?

I think it is an interesting scenario. Probably not worth to make the default CLI more complex but it feels like it should be doable using the "low level" API directly. Maybe it is a good example to add to some "advanced customization/usage" page in the docs or something

Kostis-S-Z · 2024-12-12T11:40:12Z

src/document_to_podcast/inference/text_to_speech.py

I am not sure about this change since with #49 , I will need to reintroduce this function together with _speech_generation_oute.

And speaker profile could either by an Oute id (female_1) or a natural language description from Parler. I do feel a bit weird though that we would be using one variable in slightly two different ways. Do you have any other suggestion?

Kostis-S-Z · 2024-12-12T11:41:25Z

tests/e2e/test_document_to_podcast.py

Kostis-S-Z · 2024-12-12T12:38:39Z

docs/cli.md

+document-to-podcast \
+--input_file "example_data/Mozilla-Trustworthy_AI.pdf" \
+--output_folder "example_data"
+--text_to_text_model "Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"


If you dont provide an argument, does it take the value from the config.yaml?

daavoo self-assigned this Dec 3, 2024

daavoo linked an issue Dec 3, 2024 that may be closed by this pull request

Implement CLI based interaction #17

Open

daavoo requested review from stefanfrench and Kostis-S-Z December 3, 2024 15:12

daavoo mentioned this pull request Dec 4, 2024

Bring updates from OpenNotebookLLM. mozilla-ai/Blueprint-template#3

Merged

daavoo removed request for Kostis-S-Z and stefanfrench December 4, 2024 15:08

daavoo marked this pull request as draft December 4, 2024 15:08

daavoo force-pushed the 17-implement-cli-based-interaction branch from 4e95724 to 1bded51 Compare December 5, 2024 10:22

daavoo force-pushed the 17-implement-cli-based-interaction branch from bd56198 to caae572 Compare December 10, 2024 12:27

daavoo mentioned this pull request Dec 11, 2024

Add support for outetts #49

Open

3 tasks

daavoo added 16 commits December 11, 2024 12:07

Add cli

ab1d9a2

Add CLI docs

24a8bca

Add PODCAST_PROMPT

23ae388

Create config. Drop unused code.

0e44e2e

- Refactor speakers part of the config.

Update docs

8ca9f94

Fixes

248f431

fix text reset

813c8d5

Update tests

b0402db

debug e2e

250cb48

Add default ids

a7b834e

Add reraise

8f5d42b

Pass sampling_rate

eade336

Only run E2E on dispatch

bb1409a

Update demo to match CLI

fd7ca03

Use st.data_editor in the app

a9a34a5

Drop id from UI

bf5698b

daavoo force-pushed the 17-implement-cli-based-interaction branch from fe55a78 to bf5698b Compare December 11, 2024 11:10

daavoo marked this pull request as ready for review December 11, 2024 11:10

daavoo requested review from Kostis-S-Z, stefanfrench and a team and removed request for Kostis-S-Z and stefanfrench December 11, 2024 11:10

Fix id

ad23296

force streamlit upgrade

38f86e2

stefanfrench approved these changes Dec 11, 2024

View reviewed changes

Kostis-S-Z reviewed Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

17 implement cli based interaction #34

17 implement cli based interaction #34

daavoo commented Dec 3, 2024 •

edited

Loading

stefanfrench commented Dec 6, 2024

stefanfrench commented Dec 11, 2024

daavoo commented Dec 11, 2024

daavoo commented Dec 11, 2024

stefanfrench left a comment

Kostis-S-Z left a comment

Kostis-S-Z Dec 12, 2024

daavoo Dec 12, 2024

Kostis-S-Z Dec 12, 2024

daavoo Dec 12, 2024 •

edited

Loading

Kostis-S-Z Dec 12, 2024

Kostis-S-Z Dec 12, 2024

Kostis-S-Z Dec 12, 2024

17 implement cli based interaction #34

Are you sure you want to change the base?

17 implement cli based interaction #34

Conversation

daavoo commented Dec 3, 2024 • edited Loading

What's changing

How to test it

stefanfrench commented Dec 6, 2024

stefanfrench commented Dec 11, 2024

daavoo commented Dec 11, 2024

daavoo commented Dec 11, 2024

stefanfrench left a comment

Choose a reason for hiding this comment

Kostis-S-Z left a comment

Choose a reason for hiding this comment

Kostis-S-Z Dec 12, 2024

Choose a reason for hiding this comment

daavoo Dec 12, 2024

Choose a reason for hiding this comment

Kostis-S-Z Dec 12, 2024

Choose a reason for hiding this comment

daavoo Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Kostis-S-Z Dec 12, 2024

Choose a reason for hiding this comment

Kostis-S-Z Dec 12, 2024

Choose a reason for hiding this comment

Kostis-S-Z Dec 12, 2024

Choose a reason for hiding this comment

daavoo commented Dec 3, 2024 •

edited

Loading

daavoo Dec 12, 2024 •

edited

Loading