Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpnv/cc #35

Open
wants to merge 137 commits into
base: main
Choose a base branch
from
Open

Karpnv/cc #35

wants to merge 137 commits into from

Conversation

karpnv
Copy link
Collaborator

@karpnv karpnv commented Nov 10, 2023

Common Crawl dataset preprocessing

karpnv and others added 30 commits September 12, 2023 04:28
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
karpnv and others added 30 commits March 19, 2024 09:32
* YouTube German config and new processors

Signed-off-by: Sasha Meister <[email protected]>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <[email protected]>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <[email protected]>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <[email protected]>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <[email protected]>

* Updated requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

---------

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
* YouTube German config and new processors

Signed-off-by: Sasha Meister <[email protected]>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <[email protected]>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <[email protected]>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <[email protected]>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <[email protected]>

* Updated requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* ew processors for calculating metrics WER, CER, eedge CER, len diff ratio

Signed-off-by: Sasha Meister <[email protected]>

* Update utils.py

* Update aggregate_segments.py

* Update aggregate_segments.py

* Update aggregate_segments.py

---------

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Co-authored-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants