Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any options to prevent "flickering" #355

Open
nns2009 opened this issue May 3, 2024 · 3 comments
Open

Any options to prevent "flickering" #355

nns2009 opened this issue May 3, 2024 · 3 comments

Comments

@nns2009
Copy link

nns2009 commented May 3, 2024

Let's say one segment ends at 4.7s and another one starts at 4.8s. This causes the first segment to disable a bit before the next one is enabled, thus resulting in no subtitle being shown for a brief moment - "flickering". I would prefer (in case the subsequent segments are "close enough") to hold the previous segment a bit longer and/or maybe start the following segment a bit earlier, so one subtitle is swapped directly into the following one. Is it possible?

Related question: is it possible to begin all segments (which do have some space before/after them) a bit earlier and to end them a bit later, thus making the timings less exact, but allowing the viewer more reading time?

P.S. Thanks for the library! It works great

@jianfch
Copy link
Owner

jianfch commented May 4, 2024

There are no options that directly extends timestamp, but you can prevent "flickering" with regrouping methods such as merge_by_gap() (for CLI: --regroup da_mg).

However, if you want to preserve the segments as they are and only change the timestamps, you'll need something like this:

import stable_whisper
model = stable_whisper.load_model('base')
result = model.transcribe('audio.mp3')
for i, segment in enumerate(result):
    if i+1 == len(result):
        break
    next_start = result[i+1].start
    if next_start - segment.end <= 0.100:
        segment.end = next_start

Related question: is it possible to begin all segments (which do have some space before/after them) a bit earlier and to end them a bit later, thus making the timings less exact, but allowing the viewer more reading time?

Likewise, there are no options for this because all the options in stable-ts are centered around squeezing the timestamps to its word. So something like the script above should do.

@nns2009
Copy link
Author

nns2009 commented May 8, 2024

@jianfch Thanks a lot for the answer and an example!
Based on it and the documentation, I wrote a script, which serves the purpose: good_subs.txt
(.txt extension because GitHub won't allow an upload otherwise)
It does three things:

  • Extend segments start and end times
  • Merge short gaps
  • Place line-breaks into long lines (because stupid YouTube won't)

Although I think such functionality would really benefit to have in stable-ts itself, as it's needed for the consumer.

This library and my script helped me a lot to get a good subtitle draft (much better than a YouTube auto-subs). Unfortunately, I still had to manually adjust many timings. Can you suggest the best settings (I guess, refine settings) in terms of quality when one doesn't care about the execution time. I don't mind running it even for 10 hours for a 10-minute video, just so I can save my own time.

@jianfch
Copy link
Owner

jianfch commented May 14, 2024

Can you suggest the best settings (I guess, refine settings) in terms of quality when one doesn't care about the execution time

refine is still rather experimental, so there are no specific settings that will produce higher quality timings than others because the result can vary from case to case. Generally, the key is a balance between settings that control how much to deviate from the initial confidence scores (e.g. rel_prob_decrease) and the number of steps. When the value of the former is high then use fewer steps and vice-verse). The default settings plays it safe by using low values and low number of steps to avoid any drastic changes in the timestamps.

Note that refine() was not working properly which might be why you did not see all changes after using refine(). It was fixed in 864b76c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants