Replies: 4 comments 3 replies
-
OK, I've cut an 8 minute bit out of a particular episode and compared three models with several settings. Still testing, but it seems like the default settings actually are the best. The biggest factor is fp32 vs fp16 when it comes to repetition. Theory: use fp32 for TV series and |
Beta Was this translation helpful? Give feedback.
-
If anyone's interested, here's how this went
Result (for now): small, fp32, beam_size=5. Only viable option. All of this has been tested with |
Beta Was this translation helpful? Give feedback.
-
I've been trying to do the same, but I'm not sure how to pass these parameters in the script. Could you paste a example of the script you're using? |
Beta Was this translation helpful? Give feedback.
-
Well, to be honest, I've abandoned this for now. Whisper itself needs optimizations for TV/Movie subtitles to properly be generated and timed. My tests yielded somewhat usable subs, but not nearly good enough. |
Beta Was this translation helpful? Give feedback.
-
Hey,
I've been experimenting with Whisper AI and stable-ts (and modifying auto_subtitle, which uses Whisper AI, to use stable-ts).
My current findings are for English (R***ard H**mond's Workshop S01E03):*results_to_sentence_srt
produces better results than whisper's default without stable-ts* model "base.en" > model "medium.en" > model "tiny.en"* the above is a little weird to me, but base just produced more correct words and sentences* usingtemperature=0, beam_size=3, best_of=3
fortranscribe
andstart_at_first_word=True
forresults_to_sentence_srt
seems to be the sweet spot between ultra long sentences, ultra tiny sentences (Whisper AI non-cli-default) and somewhat acceptable timing/pacingPlease post your results!Beta Was this translation helpful? Give feedback.
All reactions