Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

Open
bylate opened this issue Jul 28, 2024 · 4 comments

Comments

@bylate
Copy link

bylate commented Jul 28, 2024

model = stable_whisper.load_model('small')
result = model.transcribe(file)
result.to_srt_vtt('audio.vtt', False, True)
for caption in webvtt.read('audio.vtt'):
print(caption.start +" "+caption.text+" "+caption.end)

With the code above, during the transcription, it would skip different parts of the audio for different files uploaded. For example, it jumps from 00:00:30.920 yourself 00:00:32.440 to 00:01:00.000 too 00:01:00.200. Is there any way to fix it?

@jianfch
Copy link
Owner

jianfch commented Jul 28, 2024

Try to use a higher value for no_speech_threshold (default: 0.6). Or set it to None to disable all skipping triggered to this threshold (do this only when there is not non speech gaps longer than 30 seconds in the audio or it will hallucinate for that gap).

result = model.transcribe(file, no_speech_threshold=0.9)

@bylate
Copy link
Author

bylate commented Jul 28, 2024

Hi, really do appreciate your feedback; however, it still does not work even when I set no_speech_threshold to none. For the other song that I'm working on, it skips from 00:00:01.740 people 00:00:02.160 to 00:00:31.000 Sometimes 00:00:31.500 when there's around 10 seconds of pure music and 20 seconds of music + vocal. Is there a way to work on that?

@jianfch
Copy link
Owner

jianfch commented Jul 28, 2024

It generally does not perform well with music. Try to use denoiser="demucs" to only transcribe the isolated vocals.

@bylate
Copy link
Author

bylate commented Jul 29, 2024

That works! It also got better after I switch my model to small.en

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants