Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API examples in step-by-step guide #59

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs/step-by-step-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,33 @@ Cleaner input data ensures that the model works with reliable and consistent inf

- Ensures the document is clean and ready for the next step.

### 🔍 **API Example**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanfrench not sure if this is the best format but I put a quick example to test how it would look


```py
from document_to_podcast.preprocessing import DATA_CLEANERS, DATA_LOADERS

input_file = "example_data/introducing-mozilla-ai-investing-in-trustworthy-ai.html"
data_loader = DATA_LOADERS[".html"]
data_cleaner = DATA_CLEANERS[".html"]

raw_data = data_loader(input_file)
print(raw_data[:200])
"""
<!doctype html>
<html class="no-js" lang="en-US">

<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="profile" href="https://gmpg.org/x
"""
clean_data = data_cleaner(raw_data)
print(clean_data[:200])
"""
Skip to content Mozilla Internet Culture Deep Dives Mozilla Explains Interviews Videos Privacy Security Products Firefox Pocket Mozilla VPN Mozilla News Internet Policy Leadership Mitchell Baker, CEO
"""
```

## **Step 2: Podcast Script Generation**

In this step, the pre-processed text is transformed into a conversational podcast transcript. Using a Language Model, the system generates a dialogue that’s both informative and engaging.
Expand All @@ -73,6 +100,16 @@ In this step, the pre-processed text is transformed into a conversational podcas
- Supports both single-pass outputs (`text_to_text`) and real-time streamed responses (`text_to_text_stream`), offering flexibility for different use cases.


### 🔍 **API Example**

```py
from document_to_podcast.inference.model_loaders import load_llama_cpp_model
from document_to_podcast.inference.text_to_text import text_to_text

...
```


## **Step 3: Audio Podcast Generation**

In this final step, the generated podcast transcript is brought to life as an audio file. Using a Text-to-Speech (TTS) model, each speaker in the script is assigned a unique voice, creating an engaging and professional-sounding podcast.
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ theme:
name: Switch to light mode
extra_css:
- assets/custom.css
features:
- content.code.copy

markdown_extensions:
- pymdownx.highlight:
Expand Down
Loading