Issue with built-in multi-threading #158

arunbpf · 2024-09-23T11:44:59Z

Hi,

In the readme, it is mentioned that Pipeless does multi-threading by default. When I am trying to use variables/lists in the post process that are updated across frames (for example, frame count, moving average of predictions from frames), I am running into race conditions. It messes up the calculations that use these variables/lists that are accumulated over a period.

What is the best way to handle this situation and ensure thread safety ?

miguelaeh · 2024-09-24T15:07:13Z

You can use the built-in key-value store: https://www.pipeless.ai/docs/docs/v1/kvstore

miguelaeh · 2024-09-24T15:08:05Z

You can also ensure those hooks where you process the data are stateful so they sort the frames before running the hook: https://www.pipeless.ai/docs/docs/v1/getting-started#stateless-vs-stateful-hooks

arunbpf · 2024-10-04T05:55:33Z

Are reading and writing to the built-in key-value store thread safe ?

miguelaeh · 2024-10-04T09:34:55Z

Yes, it is

arunbpf · 2024-10-14T04:55:59Z

When I tried to make the post-process.py stateful by adding the below comment in the first line of the script,

# make stateful

facing the below error from the onnx runtime. It runs fine when used in stateless mode. (I am using a YOLOv8 object detection model (onnx) in process.py to detect the object and a Transformer based PyTorch model (.pth) in post-process.py to do OCR on the detected object region from YOLO).

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 900: operation not permitted when stream is capturing ; GPU=0 ; hostname=ip-172-31-14-86 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=283 ; expr=cudaDeviceSynchronize(); 

 when using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
****************************************
*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 900: operation not permitted when stream is capturing ; GPU=0 ; hostname=ip-172-31-14-86 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=283 ; expr=cudaDeviceSynchronize(); 

 when using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
****************************************
[2024-10-04T11:09:47Z ERROR pipeless_ai::stages::languages::python] Error executing hook: RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 900: operation not permitted when stream is capturing ; GPU=0 ; hostname=ip-172-31-14-86 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=283 ; expr=cudaDeviceSynchronize(); 
    
    
[2024-10-04T11:09:47Z WARN  pipeless_ai::pipeline] No frame returned from path execution, skipping frame forwarding to the output (if any).

0: 640x640 (no detections), 858.1ms

0: 640x640 (no detections), 10.0ms

Overall Post Processing Time:  2.193450927734375e-05
0: 640x640 (no detections), 10.3ms

0: 640x640 (no detections), 9.6ms

2024-10-04 11:09:49.118122639 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/model.0/conv/Conv' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing
[2024-10-04T11:09:49Z ERROR pipeless_ai::stages::languages::python] Error executing hook: Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'/model.0/conv/Conv' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing
[2024-10-04T11:09:49Z WARN  pipeless_ai::pipeline] No frame returned from path execution, skipping frame forwarding to the output (if any).

0: 640x640 (no detections), 9.6ms

0: 640x640 (no detections), 10.1ms

2024-10-04 11:09:49.152105642 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/model.0/conv/Conv' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing
[2024-10-04T11:09:49Z ERROR pipeless_ai::stages::languages::python] Error executing hook: Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'/model.0/conv/Conv' Status Message: CUDA error cudaErrorStreamCaptureUnsupported:operation not permitted when stream is capturing
[2024-10-04T11:09:49Z WARN  pipeless_ai::pipeline] No frame returned from path execution, skipping frame forwarding to the output (if any).

0: 640x640 (no detections), 8.6ms

0: 640x640 (no detections), 10.0ms

0: 640x640 (no detections), 9.9ms

0: 640x640 (no detections), 10.0ms
0: 640x640 (no detections), 7.1ms
0: 640x640 (no detections), 9.9ms
0: 640x640 (no detections), 10.0ms
0: 640x640 (no detections), 7.0ms
0: 640x640 (no detections), 8.1ms
0: 640x640 (no detections), 7.0ms
0: 640x640 (no detections), 12.7ms
[2024-10-04T11:11:35Z INFO  pipeless_ai::input::pipeline] Received received EOS from source pipeline0.
                    Pipeline id: 8958540b-24be-4d75-98da-5f670f93caf5 ended

Any idea why would this happen ?

miguelaeh · 2024-10-23T09:46:39Z

If you are using an ONNX model for process, I would recommend using a process.json file instead of a Python file since the performance will be better.
Also, if you are using two models, you should be creating 2 stages instead of using both in the same stage.

Could you give a try with that setup?

miguelaeh added the question Further information is requested label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with built-in multi-threading #158

Issue with built-in multi-threading #158

arunbpf commented Sep 23, 2024

miguelaeh commented Sep 24, 2024

miguelaeh commented Sep 24, 2024

arunbpf commented Oct 4, 2024

miguelaeh commented Oct 4, 2024

arunbpf commented Oct 14, 2024

miguelaeh commented Oct 23, 2024

Issue with built-in multi-threading #158

Issue with built-in multi-threading #158

Comments

arunbpf commented Sep 23, 2024

miguelaeh commented Sep 24, 2024

miguelaeh commented Sep 24, 2024

arunbpf commented Oct 4, 2024

miguelaeh commented Oct 4, 2024

arunbpf commented Oct 14, 2024

miguelaeh commented Oct 23, 2024