[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

RichardSmithLab · 2024-11-07T16:24:48Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

It looks like recent changes to Thrust were made to tag the host_vector class name with the compile time CUDA architecture. This creates an issue (undefined reference) when linking shared objects compiled with nvcc or not (.cu vs .cpp files) that use Thrust objects, even host-only ones like host_vector.

Compiling everything with nvcc is probably in our case is not an option, some of the files need moc (ie Qt).

How to Reproduce

To reproduce on Ubuntu 20.04.6 LTS with Cuda 12.6:
Main.cpp:

include <cuda_runtime.h>
include <thrust/host_vector.h>

bool cudaFunc(thrust::host_vector &vec);
int main()
{
thrust::host_vector vec;
cudaFunc(vec);
}

cudaFunc.cu:

include <cuda_runtime.h>
include <thrust/host_vector.h>
bool cudaFunc(thrust::host_vector &vec)
{
return true;
}

Compile with:

/usr/local/cuda/bin/nvcc -c cudaFunc.cu
g++ -I /usr/local/cuda/include -L /usr/local/cuda/targets/x86_64-linux/lib main.cpp cudaFunc.o -lcudart -ldl

will produce the error:

/usr/bin/ld: /tmp/cc5fFAJC.o: in function main:
main.cpp:(.text+0x30): undefined reference to cudaFunc(thrust::THRUST_200500___CUDA_ARCH_LIST___NS::host_vector<int, std::allocator >&)

This will link fine on Ubuntu 20 with Cuda 12.3.

Expected behavior

No link error.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

jrhemstad · 2024-11-07T16:36:08Z

Unfortunately, this is a known limitation that resulted from the solution we used to solve an even more insidious problem. See: https://github.com/NVIDIA/cccl?tab=readme-ov-file#application-binary-interface-abi

The good news is that for most Thrust types, we have a replacement from cuda:: that does work between .cu and .cpp files. thrust::host_vector is one that we don't have an immediate replacement for, but that work is actively ongoing.

RichardSmithLab · 2024-11-07T17:26:32Z

Thank you for your reply. In my .o file compiled with nvcc, host_vector looks like:
thrust::THRUST_200500_520_NS::host_vector

Can you see an easy way to get thrust headers to tag the name correctly when compiling with g++? (ie swap " CUDA_ARCH_LIST" for "520"? I would be OK with making small changes to the Thrust headers to get this to compile until a cuda::host_vector or similar comes along, which I guess will be quite some time in the future. I was thinking something like redefining THRUST_DETAIL_ABI_NS_BEGIN/END in the case when THRUST_DEVICE_SYSTEM != THRUST_DEVICE_SYSTEM_CUDA.

half-potato · 2024-11-26T19:33:57Z

Are there any workarounds for this issue?

tiehexue · 2024-11-29T07:03:23Z

I got the same problem, and worked after change to cuda/std/array.

RichardSmithLab · 2024-12-02T15:45:30Z

Looking through the logic for the namespace names in the thrust header files, I went with this workaround.

I defined a new namespace for thrust objects, in my case mgxthrust, and replaced all the thrust::host_vector and other thrust names referenced in the application with mgxthrust::host_vector.

Then in a header file:

namespace mgx_thrust { namespace thrust {} }
#define THRUST_WRAPPED_NAMESPACE mgx_thrust
#define THRUST_DISABLE_ABI_NAMESPACE
// Sadly this needs to be defined before thrust/version.h is called
#include <thrust/version.h>
namespace mgxthrust = mgx_thrust::thrust;

Note this needs to be called before other thrust headers.

I am not sure if this is the best/correct way to do it, but it works for our application (MorphoGraphX).

jrhemstad · 2024-12-02T16:31:20Z

Looking through the logic for the namespace names in the thrust header files, I went with this workaround.

Apologies for needing to look through that macro hell 😅 .

You stumbled upon the escape hatch we intentionally built-in. Before we embedded the arch list in the symbol name to fix the symbol collision issue, we used to have to tell people to "wrap" the Thrust/CUB namespace. This isn't a full solution as it would still fall victim to issues if you linked together multiple TUs with your code compiled for different architectures, but if you know that isn't an issue for your use case, then it should be fine.

Your WAR is slightly more complex than it needs to be. You should be able to just define the wrapped namespace:

#define THRUST_WRAPPED_NAMESPACE mgx

#include <thrust/device_vector.h>

int main(){
   auto v = ::mgx::thrust::device_vector<int> (10);
}

https://godbolt.org/z/j6xs4Wr4v

RichardSmithLab added the bug Something isn't working right. label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

RichardSmithLab commented Nov 7, 2024

jrhemstad commented Nov 7, 2024

RichardSmithLab commented Nov 7, 2024

half-potato commented Nov 26, 2024

tiehexue commented Nov 29, 2024

RichardSmithLab commented Dec 2, 2024

jrhemstad commented Dec 2, 2024

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

Comments

RichardSmithLab commented Nov 7, 2024

Is this a duplicate?

Type of Bug

Component

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

jrhemstad commented Nov 7, 2024

RichardSmithLab commented Nov 7, 2024

half-potato commented Nov 26, 2024

tiehexue commented Nov 29, 2024

RichardSmithLab commented Dec 2, 2024

jrhemstad commented Dec 2, 2024