Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #410

Open
maxlund opened this issue Oct 22, 2024 · 2 comments
Open

Memory leak #410

maxlund opened this issue Oct 22, 2024 · 2 comments

Comments

@maxlund
Copy link

maxlund commented Oct 22, 2024

Hi,

First off, thank you for this great implementation, really good stuff!

When using the newest stable-ts version on Windows to run the large-v3-turbo model I think there might be a memory leak of some sort when transcribing longer (1h+) audio, the RAM (not VRAM) usage goes way up:
image

RAM usage seems to be steadily increasing until we eventually get an OOM error:

Exception occurred: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1921792 bytes.
Traceback (most recent call last):
  File "stable_whisper\whisper_word_level\original_whisper.py", line 1437, in transcribe_stable
  File "stable_whisper\audio\__init__.py", line 373, in next_chunk
  File "stable_whisper\audio\__init__.py", line 341, in _read_append_to_buffer
RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1921792 bytes

I uploaded the audio file (runtime 02:27:47) which caused the error above here

We also have a very long audio file uploaded here (10h+ long, mostly silence), which you could perhaps use if the file above does not reproduce the issue.

We have been using your library for a while, and didn't observe any of these issues prior to switching over to the large-v3-turbo model and using the latest version of the library. Any ideas?

Thanks again for all your fantastic work here!

@maxlund
Copy link
Author

maxlund commented Oct 22, 2024

Including a minimal example to reproduce here:

import stable_whisper
import torch

model_path = "/path/to/large-v3-turbo.pt"
audio_paths = [
    "/path/to/mozart-of-gen-z-interview.mp3",
    "/path/to/long-audio.mp3"
]
model = stable_whisper.load_model(model_path, device=torch.device('cuda'))
segments_and_start_times = list()
for audio_path in audio_paths:
    whisper_result = model.transcribe(audio=audio_path, vad=True, language="english", verbose=False)
    for res in whisper_result:
        segments_and_start_times.append([res.start, res.text, res.end])

print(segments_and_start_times)

jianfch added a commit that referenced this issue Oct 23, 2024
-disabled gradients for VAD to prevent memory leak (#410)
@jianfch
Copy link
Owner

jianfch commented Oct 23, 2024

The leak seems to be caused by the VAD gradients.
Try disabling the gradients before transcribing.

import torch
torch.set_grad_enabled(False)

Or update to 4711a01.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants