-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface's Fine Tuned model that can be used? #378
Comments
The models with preconfigured alignment heads or ones compatible with original heads will work. Technically even models without alignment heads, such as |
Hi @Patrick10731 , did you get any of kotoba-whisper models to work with Stable_ts? I am trying their kotoba-tech/kotoba-whisper-v2.1 model, but I keep getting out of memory error. @jianfch , I'm not sure if you have already come across kotobal-tech models in Huggingface. Their latest model is using Stable-ts for accurate timestamp and regroup. I thought you might be interested. |
@jianfch @dgoryeo
I also found that many models still won't work but will work if you convert the model into faster-whisper's model. For example, this model won't work
But following code will work.
This converted model is from here (https://huggingface.co/XA9/faster-whisper-large-v2-cantonese-2),
So I recommend to try converting model if a model won't work. |
Thank you @Patrick10731 , by any chance have you tried Kotoba's v2.1 (which is a distilled) Whisper? I will try to follow your recommendation. At the moment I am running out of memory with 2.1 but I haven't tried on CPU only --I've tried device=cuda so far. |
@dgoryeo
|
Thanks @Patrick10731 , I will test it on cpu. I have 12GB gpu vram, so didn't expect to run out of memory.. I'll test and will report back. |
@dgoryeo 12GB might be too low for the default |
@jianfch , that must be it. I'll change the batch_size accordingly. When I use the model directly with transformers, I use batch_size 16 with no problem:
Thanks |
@dgoryeo You can pass this |
Here to reporting back that it worked. I tested the both options: Both worked. Though I was happier with the results of (a). |
I tryed to use distil-whisper-v3 in stable-ts and it can be used.
However, it's unable to be used when I try to use "distil-large-v2".
Other model can't be used too.(ex:kotoba-whisper,"kotoba-tech/kotoba-whisper-v1.0")
What kind of model can be used in stable-ts except for OpenAI's model?
import stable_whisper
model = stable_whisper.load_hf_whisper('distil-whisper/distil-large-v3', device='cpu')
result = model.transcribe('audio.mp3')
result.to_srt_vtt('audio.srt', word_level=False)
The text was updated successfully, but these errors were encountered: