Huggingface's Fine Tuned model that can be used? #378

Patrick10731 · 2024-07-06T04:09:32Z

I tryed to use distil-whisper-v3 in stable-ts and it can be used.
However, it's unable to be used when I try to use "distil-large-v2".
Other model can't be used too.(ex:kotoba-whisper,"kotoba-tech/kotoba-whisper-v1.0")
What kind of model can be used in stable-ts except for OpenAI's model?

import stable_whisper

model = stable_whisper.load_hf_whisper('distil-whisper/distil-large-v3', device='cpu')
result = model.transcribe('audio.mp3')

result.to_srt_vtt('audio.srt', word_level=False)

jianfch · 2024-07-06T17:48:52Z

The models with preconfigured alignment heads or ones compatible with original heads will work.
For the ones compatible with the original heads, you can manually config it by assigning the head indices to model._pipe.model.generation_config.alignment_heads.

Technically even models without alignment heads, such as distil-large-v2, will work as well by disabling word timestamps with model.transcribe('audio.mp3', word_timestamps=False). However, many features, such as regrouping and word-level timestamp adjustment, will be unavailable.

dgoryeo · 2024-09-27T16:34:38Z

Hi @Patrick10731 , did you get any of kotoba-whisper models to work with Stable_ts? I am trying their kotoba-tech/kotoba-whisper-v2.1 model, but I keep getting out of memory error.

@jianfch , I'm not sure if you have already come across kotobal-tech models in Huggingface. Their latest model is using Stable-ts for accurate timestamp and regroup. I thought you might be interested.

Patrick10731 · 2024-09-28T04:13:37Z

@jianfch
Thanks, it worked

@dgoryeo
I confirmed that this code will work, try it.


import stable_whisper

model = stable_whisper.load_hf_whisper('kotoba-tech/kotoba-whisper-v1.1', device='cpu')
result = model.transcribe('audio.mp3', word_timestamps=False)

result.to_srt_vtt('audio.srt', word_level=False)

I also found that many models still won't work but will work if you convert the model into faster-whisper's model.

For example, this model won't work

import stable_whisper

model = stable_whisper.load_hf_whisper('Scrya/whisper-large-v2-cantonese', device='cpu')
result = model.transcribe('audio.mp3', word_timestamps=False)

result.to_srt_vtt('audio.srt', word_level=False)

But following code will work.

import stable_whisper

model = stable_whisper.load_faster_whisper('XA9/faster-whisper-large-v2-cantonese-2', device='cpu', compute_type='default')
result = model.transcribe_stable('audio.mp3')
result.to_srt_vtt('audio.srt', word_level=False)

This converted model is from here (https://huggingface.co/XA9/faster-whisper-large-v2-cantonese-2),
and this model is converted by using following command.

 ct2-transformers-converter --model Scrya/whisper-large-v2-cantonese --output_dir faster-whisper-large-v2-cantonese-2 --copy_files  preprocessor_config.json --quantization float16

So I recommend to try converting model if a model won't work.

dgoryeo · 2024-09-28T22:38:37Z

Thank you @Patrick10731 , by any chance have you tried Kotoba's v2.1 (which is a distilled) Whisper?

I will try to follow your recommendation. At the moment I am running out of memory with 2.1 but I haven't tried on CPU only --I've tried device=cuda so far.

Patrick10731 · 2024-09-29T13:58:17Z

@dgoryeo
I tried with this code and it worked.
How about you to try setting device='cpu'?
The reason of running out of memory must be lacking of performance of your video card.

import stable_whisper

model = stable_whisper.load_hf_whisper('kotoba-tech/kotoba-whisper-v2.1', device='cpu')
result = model.transcribe('audio.mp3', word_timestamps=False)

result.to_srt_vtt('audio.srt', word_level=False)

dgoryeo · 2024-09-29T15:39:46Z

Thanks @Patrick10731 , I will test it on cpu. I have 12GB gpu vram, so didn't expect to run out of memory.. I'll test and will report back.

jianfch · 2024-09-29T16:26:33Z

@dgoryeo 12GB might be too low for the default batch_size=24. Try smaller batch_size.

dgoryeo · 2024-09-29T19:25:33Z

@jianfch , that must be it. I'll change the batch_size accordingly.

When I use the model directly with transformers, I use batch_size 16 with no problem:

    pipe = pipeline(
        model=model_id,
        torch_dtype=torch_dtype,
        device=device,
        model_kwargs=model_kwargs,
        chunk_length_s=15,
        batch_size=16,
        trust_remote_code=True,
        stable_ts=True,
        punctuator=True
    )

Thanks

jianfch · 2024-09-30T22:27:48Z

@dgoryeo You can pass this pipe directly to the pipeline parameter of stable_whisper.load_hf_whisper().

dgoryeo · 2024-10-07T09:01:22Z

Here to reporting back that it worked.

I tested the both options:
(a) direct calling model = stable_whisper.load_hf_whisper('kotoba-tech/kotoba-whisper-v2.1', device='cuda'), and
(b) passing the pipe parameter to stable_whisper.load_hf_whisper(), device cuda.

Both worked. Though I was happier with the results of (a).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huggingface's Fine Tuned model that can be used? #378

Huggingface's Fine Tuned model that can be used? #378

Patrick10731 commented Jul 6, 2024 •

edited

Loading

jianfch commented Jul 6, 2024

dgoryeo commented Sep 27, 2024

Patrick10731 commented Sep 28, 2024

dgoryeo commented Sep 28, 2024

Patrick10731 commented Sep 29, 2024 •

edited

Loading

dgoryeo commented Sep 29, 2024

jianfch commented Sep 29, 2024

dgoryeo commented Sep 29, 2024

jianfch commented Sep 30, 2024 •

edited

Loading

dgoryeo commented Oct 7, 2024

Huggingface's Fine Tuned model that can be used? #378

Huggingface's Fine Tuned model that can be used? #378

Comments

Patrick10731 commented Jul 6, 2024 • edited Loading

jianfch commented Jul 6, 2024

dgoryeo commented Sep 27, 2024

Patrick10731 commented Sep 28, 2024

dgoryeo commented Sep 28, 2024

Patrick10731 commented Sep 29, 2024 • edited Loading

dgoryeo commented Sep 29, 2024

jianfch commented Sep 29, 2024

dgoryeo commented Sep 29, 2024

jianfch commented Sep 30, 2024 • edited Loading

dgoryeo commented Oct 7, 2024

Patrick10731 commented Jul 6, 2024 •

edited

Loading

Patrick10731 commented Sep 29, 2024 •

edited

Loading

jianfch commented Sep 30, 2024 •

edited

Loading