Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

Open
1 task done
RichardSmithLab opened this issue Nov 7, 2024 · 6 comments
Open
1 task done

[BUG]: CUDA Thrust object shared between .cu and .cpp files #2737

RichardSmithLab opened this issue Nov 7, 2024 · 6 comments
Labels
bug Something isn't working right.

Comments

@RichardSmithLab
Copy link

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

It looks like recent changes to Thrust were made to tag the host_vector class name with the compile time CUDA architecture. This creates an issue (undefined reference) when linking shared objects compiled with nvcc or not (.cu vs .cpp files) that use Thrust objects, even host-only ones like host_vector.

Compiling everything with nvcc is probably in our case is not an option, some of the files need moc (ie Qt).

How to Reproduce

To reproduce on Ubuntu 20.04.6 LTS with Cuda 12.6:
Main.cpp:

include <cuda_runtime.h>
include <thrust/host_vector.h>

bool cudaFunc(thrust::host_vector &vec);
int main()
{
thrust::host_vector vec;
cudaFunc(vec);
}

cudaFunc.cu:

include <cuda_runtime.h>
include <thrust/host_vector.h>
bool cudaFunc(thrust::host_vector &vec)
{
return true;
}

Compile with:

/usr/local/cuda/bin/nvcc -c cudaFunc.cu
g++ -I /usr/local/cuda/include -L /usr/local/cuda/targets/x86_64-linux/lib main.cpp cudaFunc.o -lcudart -ldl

will produce the error:

/usr/bin/ld: /tmp/cc5fFAJC.o: in function main:
main.cpp:(.text+0x30): undefined reference to cudaFunc(thrust::THRUST_200500___CUDA_ARCH_LIST___NS::host_vector<int, std::allocator >&)

This will link fine on Ubuntu 20 with Cuda 12.3.

Expected behavior

No link error.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

@RichardSmithLab RichardSmithLab added the bug Something isn't working right. label Nov 7, 2024
@jrhemstad
Copy link
Collaborator

Unfortunately, this is a known limitation that resulted from the solution we used to solve an even more insidious problem. See: https://github.com/NVIDIA/cccl?tab=readme-ov-file#application-binary-interface-abi

The good news is that for most Thrust types, we have a replacement from cuda:: that does work between .cu and .cpp files. thrust::host_vector is one that we don't have an immediate replacement for, but that work is actively ongoing.

@RichardSmithLab
Copy link
Author

Thank you for your reply. In my .o file compiled with nvcc, host_vector looks like:
thrust::THRUST_200500_520_NS::host_vector

Can you see an easy way to get thrust headers to tag the name correctly when compiling with g++? (ie swap " CUDA_ARCH_LIST" for "520"? I would be OK with making small changes to the Thrust headers to get this to compile until a cuda::host_vector or similar comes along, which I guess will be quite some time in the future. I was thinking something like redefining THRUST_DETAIL_ABI_NS_BEGIN/END in the case when THRUST_DEVICE_SYSTEM != THRUST_DEVICE_SYSTEM_CUDA.

@half-potato
Copy link

Are there any workarounds for this issue?

@tiehexue
Copy link

I got the same problem, and worked after change to cuda/std/array.

@RichardSmithLab
Copy link
Author

Looking through the logic for the namespace names in the thrust header files, I went with this workaround.

I defined a new namespace for thrust objects, in my case mgxthrust, and replaced all the thrust::host_vector and other thrust names referenced in the application with mgxthrust::host_vector.

Then in a header file:

namespace mgx_thrust { namespace thrust {} }
#define THRUST_WRAPPED_NAMESPACE mgx_thrust
#define THRUST_DISABLE_ABI_NAMESPACE
// Sadly this needs to be defined before thrust/version.h is called
#include <thrust/version.h>
namespace mgxthrust = mgx_thrust::thrust;

Note this needs to be called before other thrust headers.

I am not sure if this is the best/correct way to do it, but it works for our application (MorphoGraphX).

@jrhemstad
Copy link
Collaborator

Looking through the logic for the namespace names in the thrust header files, I went with this workaround.

Apologies for needing to look through that macro hell 😅 .

You stumbled upon the escape hatch we intentionally built-in. Before we embedded the arch list in the symbol name to fix the symbol collision issue, we used to have to tell people to "wrap" the Thrust/CUB namespace. This isn't a full solution as it would still fall victim to issues if you linked together multiple TUs with your code compiled for different architectures, but if you know that isn't an issue for your use case, then it should be fine.

Your WAR is slightly more complex than it needs to be. You should be able to just define the wrapped namespace:

#define THRUST_WRAPPED_NAMESPACE mgx

#include <thrust/device_vector.h>

int main(){
   auto v = ::mgx::thrust::device_vector<int> (10);
}

https://godbolt.org/z/j6xs4Wr4v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
None yet
Development

No branches or pull requests

4 participants