Memory leak #410

maxlund · 2024-10-22T10:53:43Z

Hi,

First off, thank you for this great implementation, really good stuff!

When using the newest stable-ts version on Windows to run the large-v3-turbo model I think there might be a memory leak of some sort when transcribing longer (1h+) audio, the RAM (not VRAM) usage goes way up:

RAM usage seems to be steadily increasing until we eventually get an OOM error:

Exception occurred: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1921792 bytes.
Traceback (most recent call last):
  File "stable_whisper\whisper_word_level\original_whisper.py", line 1437, in transcribe_stable
  File "stable_whisper\audio\__init__.py", line 373, in next_chunk
  File "stable_whisper\audio\__init__.py", line 341, in _read_append_to_buffer
RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1921792 bytes

I uploaded the audio file (runtime 02:27:47) which caused the error above here

We also have a very long audio file uploaded here (10h+ long, mostly silence), which you could perhaps use if the file above does not reproduce the issue.

We have been using your library for a while, and didn't observe any of these issues prior to switching over to the large-v3-turbo model and using the latest version of the library. Any ideas?

Thanks again for all your fantastic work here!

The text was updated successfully, but these errors were encountered:

maxlund · 2024-10-22T11:30:31Z

Including a minimal example to reproduce here:

import stable_whisper
import torch

model_path = "/path/to/large-v3-turbo.pt"
audio_paths = [
    "/path/to/mozart-of-gen-z-interview.mp3",
    "/path/to/long-audio.mp3"
]
model = stable_whisper.load_model(model_path, device=torch.device('cuda'))
segments_and_start_times = list()
for audio_path in audio_paths:
    whisper_result = model.transcribe(audio=audio_path, vad=True, language="english", verbose=False)
    for res in whisper_result:
        segments_and_start_times.append([res.start, res.text, res.end])

print(segments_and_start_times)

-disabled gradients for VAD to prevent memory leak (#410)

jianfch · 2024-10-23T01:05:36Z

The leak seems to be caused by the VAD gradients.
Try disabling the gradients before transcribing.

import torch
torch.set_grad_enabled(False)

Or update to 4711a01.

jianfch added a commit that referenced this issue Oct 23, 2024

fixed VAD memory leak

4711a01

-disabled gradients for VAD to prevent memory leak (#410)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #410

Memory leak #410

maxlund commented Oct 22, 2024

maxlund commented Oct 22, 2024

jianfch commented Oct 23, 2024 •

edited

Loading

Memory leak #410

Memory leak #410

Comments

maxlund commented Oct 22, 2024

maxlund commented Oct 22, 2024

jianfch commented Oct 23, 2024 • edited Loading

jianfch commented Oct 23, 2024 •

edited

Loading