Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal Tensor failed assertion when shared between threads #2637

Open
ealmloff opened this issue Nov 25, 2024 · 1 comment
Open

Metal Tensor failed assertion when shared between threads #2637

ealmloff opened this issue Nov 25, 2024 · 1 comment

Comments

@ealmloff
Copy link
Contributor

When trying to share a tensor between multiple threads on a metal accelerator, I get this error and then the program aborts without a panic message.

-[AGXG14XFamilyCommandBuffer tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1015: failed assertion `A command encoder is already encoding to this command buffer'
2024-11-25 07:40:01.246 candle-reproducation[88968:35288312] failed assertion _status < MTLCommandBufferStatusCommitted at line 316 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]
2024-11-25 07:40:01.247 candle-reproducation[88968:35288372] failed assertion _status < MTLCommandBufferStatusCommitted at line 316 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]
2024-11-25 07:40:01.246 candle-reproducation[88968:35288313] failed assertion _status < MTLCommandBufferStatusCommitted at line 316 in -[IOGPUMetalCommandBuffer setCurrentCommandEncoder:]
...
zsh: abort      cargo run

Here is the code that reproduces the issue with candle = 0.8.0 on a M2 Mac:

use candle_core::{Device, Tensor};

fn main() {
    let tensor = Tensor::new(vec![1.0f32, 2.0, 3.0], &Device::new_metal(0).unwrap()).unwrap();

    loop {
        let tensor = tensor.clone();
        std::thread::spawn(move || {
            let new = tensor.add(&tensor).unwrap();
            let vec: Vec<f32> = new.to_vec1().unwrap();
            assert!(vec[0] > 1.9 && vec[0] < 2.1);
            assert!(vec[1] > 3.9 && vec[1] < 4.1);
            assert!(vec[2] > 5.9 && vec[2] < 6.1);
        });
    }
}

A full reproduction repo with the exact lock file I used is available at https://github.com/ealmloff/candle-reproducation-failed-assertion

@EricLBuehler
Copy link
Member

Thanks for the issue and the example. I will take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants