ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

bylate · 2024-07-28T03:00:39Z

model = stable_whisper.load_model('small')
result = model.transcribe(file)
result.to_srt_vtt('audio.vtt', False, True)
for caption in webvtt.read('audio.vtt'):
print(caption.start +" "+caption.text+" "+caption.end)

With the code above, during the transcription, it would skip different parts of the audio for different files uploaded. For example, it jumps from 00:00:30.920 yourself 00:00:32.440 to 00:01:00.000 too 00:01:00.200. Is there any way to fix it?

jianfch · 2024-07-28T18:13:04Z

Try to use a higher value for no_speech_threshold (default: 0.6). Or set it to None to disable all skipping triggered to this threshold (do this only when there is not non speech gaps longer than 30 seconds in the audio or it will hallucinate for that gap).

result = model.transcribe(file, no_speech_threshold=0.9)

bylate · 2024-07-28T22:33:39Z

Hi, really do appreciate your feedback; however, it still does not work even when I set no_speech_threshold to none. For the other song that I'm working on, it skips from 00:00:01.740 people 00:00:02.160 to 00:00:31.000 Sometimes 00:00:31.500 when there's around 10 seconds of pure music and 20 seconds of music + vocal. Is there a way to work on that?

jianfch · 2024-07-28T23:05:49Z

It generally does not perform well with music. Try to use denoiser="demucs" to only transcribe the isolated vocals.

bylate · 2024-07-29T22:30:54Z

That works! It also got better after I switch my model to small.en

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

bylate commented Jul 28, 2024

jianfch commented Jul 28, 2024

bylate commented Jul 28, 2024

jianfch commented Jul 28, 2024

bylate commented Jul 29, 2024

ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

ramdomly skipped a random part of audio (usually around 30 seconds) during transcription #382

Comments

bylate commented Jul 28, 2024

jianfch commented Jul 28, 2024

bylate commented Jul 28, 2024

jianfch commented Jul 28, 2024

bylate commented Jul 29, 2024