Skip to content

Characters with diacritics break words #30

Answered by jianfch
crecos asked this question in Q&A
Discussion options

You must be logged in to vote

Since not all languages uses space to separate words it was not made default (except English). But you can specify this behavior with combine_compound=True for results_to_word_srt or group_word_timestamps.

from stable_whisper import results_to_word_srt
results_to_word_srt(results, 'audio.srt', combine_compound=True) # strip=True to remove the space before the first word
1
00:00:00,360 --> 00:00:00,750
 Tým

2
00:00:00,750 --> 00:00:02,470
 závodem

3
00:00:02,470 --> 00:00:05,100
 je nejlepší.

4
00:00:06,360 --> 00:00:06,770
 Je to

...

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by jianfch
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants