-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always a bit delay and a bit early stops #343
Comments
If |
I tried vad=True as well but I can’t find a way to set min_silence_duration_ms=2000, I found it the best to use in faster-whisper’s vad_filter.
… 2024年4月12日 02:10,jian ***@***.***> 写道:
If faster-whisper was yielding satisfactory results with vad_filter=True, you might find better results with vad=True instead of k_size and q_levels which could be causing the "slight delay and cease prematurely" especially audio preprocessed with demucs. Since vad_filter=True already filters the result, completely disabling the silence suppression with suppress_silence=False is an option to consider if the issue persists even vad=True.
—
Reply to this email directly, view it on GitHub <#343 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3NLOM7EKM3FW4UW4JJJD3Y43GZZAVCNFSM6AAAAABGCS2UZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQGI2DCNJZGY>.
You are receiving this because you authored the thread.
|
I've noticed that setting |
Likely due to the different approaches. Faster-Whisper uses the VAD predictions to trim the audio into chunks that meet the threshold and only transcribe those chunks. Stable-ts uses the VAD predictions to trim the timings after the transcription is completed (see https://github.com/jianfch/stable-ts?#silence-suppression). |
I'm utilizing
stable-ts
alongsidefaster-whisper
's integrated VAD parameters, and I've noticed that when executing the following code snippet:result = model.transcribe_stable(filename, regroup=False, k_size=9, vad_filter=True)
,the outcomes generally exhibit a slight delay and cease prematurely compared to the original
faster-whisper
performance. Despite tweaking several parameters withinstable-ts
, I haven't found a successful adjustment yet.In my previous workflow, all my audio files are pre-processed with demucs before being fed into
faster-whisper
, which typically yields satisfactory results.However, in scenarios where the audio contains considerable noise, particularly coughs and other disruptions, the timestamps are excessively extended, spanning from the cough to the actual content.
This issue led me to experiment with
stable-ts
, though it hasn't met my expectations so far.Could you offer any advice on this matter? I've experimented with the
k_size
andq_levels
settings without finding a viable solution.Thanks in advance.
The text was updated successfully, but these errors were encountered: