Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaptivecpp: init at 24.06.0 #360893

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

blenderfreaky
Copy link

@blenderfreaky blenderfreaky commented Dec 1, 2024

OpenSYCL has been renamed to AdaptiveCpp, "due to external legal pressure" (see their repo)

The package is a 1:1 of pkgs/development/compilers/opensycl/default.nix, with the repository and name updated, as well as the version bumped to the newest.

Open questions:

  • Remove opensycl package and add an alias or warning?
  • Update dependencies? ROCm 6 should and does seem to work now, but I haven't done sufficient testing yet

Tagging maintainers: @yboettcher

Things done

Built successfully on x86_64 with AMD (ROCm) GPU.

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@NixOSInfra NixOSInfra added the 12. first-time contribution This PR is the author's first one; please be gentle! label Dec 1, 2024
@yboettcher
Copy link
Contributor

Interesting that you apparently had no issues with the rocm build, while I had quite some trouble just to try make it compile. I do see that the clhpp package in nixpkgs was updated a few weeks ago, and in general, some time has passed, so that might have helped.
As for rocm, did you try to use the "generic" workflow of acpp, or did you instruct it to build for a specific gpu target? For me, only the latter worked when I tried it. The generic workflow always errored out at some point with rocm.

As for the open questions that you posed, I'd say updating to rocm6 is preferable, if it's possible as well as creating an alias in pkgs/top-level/aliases.nix that throws with a notice that the package has been renamed.

@blenderfreaky
Copy link
Author

blenderfreaky commented Dec 4, 2024

False alarm on ROCm 6 building, I forgot nix-build doesn't pull in my configs config.rocmSupport = true; 😅
With basically anything I do I get the error error: unsupported option '-fzero-call-used-regs=used-gpr' for target 'amdgcn-amd-amdhsa'

However, it SHOULD be possible. Building the arch-linux package in distrobox works just fine. Their build script doesn't do anything special. I think we might just need a slightly newer ROCm for some fixes? They use ROCm 6.2.4 and LLVM 18.1. We're on ROCm 6.0.2

@blenderfreaky
Copy link
Author

As it turns out, the issue was caused by Nix' default hardening options (specifically zerocallusedregs)

  • Updated to use LLVM 17 (latest version that should work with our ROCm (see here), though I can't verify if this plays nice with CUDA and others as I am on AMD)
  • Updated to use ROCm 6
  • Fixed formatting

Adds adaptivecpp, based on opensycl package.
Updated to newest version and to use LLVM 17 and ROCm 6.
@blenderfreaky
Copy link
Author

I've added the test suite, but I'm not sure it actually runs on GPU. It compiles and runs without complaints but it only seems to use CPU (nothing shows up in nvtop while btop spikes).

Looking at their CI, they seem to specify specific Target Architecture. It's not the most elegant, but I've passed it through like this, though it doesn't actually seem to make any difference in what runs:

nix-build -A adaptivecpp.tests --arg config '{ rocmSupport = true; }' --arg targetsBuild '"omp;hip:gfx1030"' --arg targetsRun '"omp;hip"'

Could you kindly take a look at this @yboettcher?

@yboettcher
Copy link
Contributor

I might be wrong, but given that nix builds are usually rather "sealed off" I would not expect a nix-build call to be able to use (or even detect) a gpu.

I cloned your branch, built the adativecppWithRocm package (with nix-build) and then used that to try and build the adaptivecpp tests manually (in a clone of the adaptivecpp repo).

  • When building the tests with -DACPP_TARGETS=omp all is well and it runs on the host cpu.

  • Trying to build with -DACPP_TARGETS=hip:gfx1101 I get error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library. I believe this might be because only one of the resulting binaries is wrapped with the --rocm-device-lib-path. In the postFixup script, only syclcc-clang is wrapped with the rocm-libs, while (I assume) the build uses the acpp binary which did not exist back then when it was still opensycl. So I assume this could be fixed by adding a wrapProgram for the acpp binary that is similar to the one already in place for syclcc-clang.
    I could fix this issue and actually run it on my gpu by adding

+ ''
      wrapProgram $out/bin/acpp \
        --prefix PATH : ${
          lib.makeBinPath [
            python3
            lld
          ]
        } \
        --add-flags "-L${llvmPackages.openmp}/lib" \
        --add-flags "-I${llvmPackages.openmp.dev}/include" \
    ''
    + lib.optionalString rocmSupport ''
      --add-flags "--rocm-device-lib-path=${rocmPackages.rocm-device-libs}/amdgcn/bitcode"
    ''

to the postFixup in addition to whats already there.
I also had to set NIX_HARDENING_ENABLE = "", or otherwise the '-fzero-call-used-regs=used-gpr' would come back. I also had to pull in boost llvmPackages_17.openmp llvmPackages_17.bintools, but that might just be due to me running outside of any special dev environment (have a feeling though that we might want to add the bintools to the path similar to how it's done with python, because I also had to use that when using acpp "standalone" to compile a single file for hip.).

  • When building without any ACPP_TARGETS or with -DACPP_TARGETS=generic it compiles, but fails to run with plenty of
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] accessor [host]: Aborting synchronization, runtime error list is non-empty

messages. Setting the mentioned variables however, only added some initial output about a few AMD_COMGR_ACTION_... calls that all ended with AMD_COMGR_STATUS_SUCCESS, so I guess that's not helpful.

And I think this is also how far I've gotten in trying to make this work: Explicit compilation for a specific target works (when adding that extra wrapProgram section), but the generic target does not work (and that's where I just gave up back then :/ #295845 (comment)). Although I think I gave up, because I also struggled with opencl a lot, which does not seem to be an issue here. If I remember correctly, it got enabled automatically back then because rocm also provided opencl, which is why I tried to make it work.

That said, I think I'd rather have a dysfunctional generic target with an all around updated compiler where you can at least specify a specific target, than no updated compiler (which also does not even have a generic target). Although I would prefer if we could somehow make this work, but I honestly don't know why it fails. It might even be that the way rocm is installed on nixos is the problem. Or not.
So, I guess what's left here would be adding that second wrapProgram and consider adding the bintools to the path. (I just saw: llvm-objcopy is also provided by llvmPackages_17.libllvm. Maybe that's the more appropriate package to use for this?)

@tdavidcl
Copy link

tdavidcl commented Dec 7, 2024

That said, I think I'd rather have a dysfunctional generic target with an all around updated compiler where you can at least specify a specific target, than no updated compiler (which also does not even have a generic target).

I second this, having a up to date version would be much appreciated even if the generic backend does not yet work.
Otherwise, since other backends are apparently working could this be merged without SSCP for the time being until a fix is found ?

@illuhad
Copy link

illuhad commented Dec 8, 2024

Hi - this discussion has made it into the AdaptiveCpp discord ;)

Without having seen the JIT logs it's hard to say why generic fails - does it work with other JIT backends?

If ROCm thinks that it has generated the device binary successfully, then it is possible that there is an internal ROCm issue. IIRC there were also some ROCm versions where the retrieval of the ROCm log was prevented by a bug (in case of an error, ROCm would return before filling the error log). This might play a role here.
Otherwise it might also be worthwhile to look at the ACPP_DEBUG_LEVEL=3 output - in particular, to check whether it finds and links the ROCm bitcode libraries.
Recent AdaptiveCpp versions (current develop branch) support the ACPP_S2_DUMP_IR_FINAL=1 environment variable, which causes the final IR to be dumped to a file before handing it off to ROCm. This could be helpful for validation.
(See https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/env_variables.md#environment-variables-to-control-dumping-ir-during-jit-compilation)

I don't want get involved in your internal prioritization, but perhaps it is helpful for your decision process to outline how we see these compilation flows from the perspective of the AdaptiveCpp project.

  • generic is where we've strongly focused all optimization and compiler functionality work for the last years.
  • generic supports many features that hip and cuda do not (and cannot) support. The performance is also better.
  • I see hip, cuda and omp as niche compilation flows that are primarily helpful for specific use cases (in particular certain types of interop), not for general purpose.
  • hip/cuda/omp will be maintained, but it is very unlikely that they will see new features being added.
  • generic and hip/cuda/omp are very, very different. From the compiler design, to the involved LLVM transformations and even the frontend. So generic running on e.g. sm_86 is not comparable or related in any way to targeting cuda:sm_86. They are entirely independent compilers.

@blenderfreaky
Copy link
Author

blenderfreaky commented Dec 8, 2024

Looking at the output more closely, I'm not convinced the executables should be wrapped. I've built it without any wrapping, and /nix/store/HASH-adaptivecpp-24.06.0/etc/AdaptiveCpp/acpp-rocm.json contains all the paths, which should be used as the default arguments. (Also note, acpp, syclcc and syclcc-clang all seem to be the same file, the code just branches depending on the filename)

I got different results for the tests though (both with and without wrapping):
cmake .. -DAdaptiveCpp_DIR=/nix/store/HASH-adaptivecpp-24.06.0/lib/cmake/AdaptiveCpp -DBOOST_ROOT=/nix/store/HASH-boost-1.81.0-dev/lib/cmake/ -DACPP_TARGETS=generic compiles and sycl_tests runs fine, just like without ACPP_TARGETS.
However none of them run on my GPU (RX 6800), I get:
Default-selected queue runs on device: hipSYCL OpenMP host device

@tdavidcl
Copy link

tdavidcl commented Dec 8, 2024

I just tested the generic backend on Nvidia hardware, it seems to be working correctly.

@yboettcher
Copy link
Contributor

yboettcher commented Dec 9, 2024

@illuhad, thank you very much for your insights!
I guess it works fine with other backends. I compiled a simple test file with generic and at runtime told AdaptiveCpp to use the omp backend using the ACPP_VISIBILITY_MASK variable, which ran fine (just to double check, I also used the variable to enforce the hip backend which lead to the error I encountered earlier).

Given that we get the jit::compile: Encountered errors: line, which looks like it originates from here, I would assume that both rocm and AdaptiveCpp know that compilation failed.

ACPP_DEBUG_LEVEL=3 does give me quite a lot of logging, but (at least to me) it did not contain anything suspicious.

Given that generic appears to be the most important target for AdaptiveCpp, I would like to make this work, if it is possible from within this build script. However, given that the call to hiprtc appears to fail, it might also be "something" with nixos rocm packaging, which I guess is out of scope here.

I added some more details (including the dumped IR) here.

@blenderfreaky
Interesting find with the wrapping. I also just removed all wrapping and it did "just work". Well, it still didn't run for generic on hip, but it apparently also did not break anything else. Aside from compiling with --acpp_targets="hip" which now complains about being unable to find the rocm device library. But that can be dealt with by the user by supplying a suitable path.

I noticed that when I run a binary that was compiled with generic with the ACPP_DEBUG_LEVEL=3 variable set, it tells me about the various backends AdaptiveCpp discovers. Does rocm show up for you there?

Log excerpt when executing
[AdaptiveCpp Info] backend_loader: Searching path for backend libs: '"/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL"'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-hip.so" for backend 'hip'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-omp.so" for backend 'omp'
[AdaptiveCpp Info] Registering backend: 'hip'...
[AdaptiveCpp Info] Registering backend: 'omp'...
[AdaptiveCpp Info] Discovered devices from backend 'HIP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: AMD
[AdaptiveCpp Info]     name: AMD Radeon RX 7800 XT
[AdaptiveCpp Info] Discovered devices from backend 'OpenMP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: the hipSYCL project
[AdaptiveCpp Info]     name: hipSYCL OpenMP host device

@illuhad
Copy link

illuhad commented Dec 9, 2024

@yboettcher thanks for the details. I assumed that since

Setting the mentioned variables however, only added some initial output about a few AMD_COMGR_ACTION_... calls that all ended with AMD_COMGR_STATUS_SUCCESS, so I guess that's not helpful.

there was an issue retrieving the logs. Anyway, from you ACPP_DEBUG_LEVEL=3 I noticed that there is no sign of it linking any bitcode libraries into the kernel. You would expect to see something like

LLVMToAmdgpu: Linking with bitcode file: ....

for each of the relevant ROCm bitcode libraries like ockl.bc, ocml.bc, oclc_*.bc etc.
Probably - for some reason - those bitcode libraries don't get linked, which then results in any function calls that AdaptiveCpp does into these bitcode libraries to remain unresolved. The result is then a JIT failure.

From you IR output, we can see that it tries to call __ockl_get_group_id() and __ockl_get_local_id() from ROCm ockl.bc, so it clearly needs these libraries for this kernel. Apart from that, I'm not seeing anything interesting or surprising in the IR - it looks fine.

The way the AdaptiveCpp amdgpu backend discovers these bitcode libraries is at the moment a bit hacky (for various reasons): It tries to invoke hipcc to figure out which bitcode libraries hipcc links, and then pull in the same ones. My guess is that something goes wrong there.
If you want to debug this, this is where I'd look and put in a couple of printf statements: https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L126

I noticed that when I run a binary that was compiled with generic with the ACPP_DEBUG_LEVEL=3 variable set, it tells me about the various backends AdaptiveCpp discovers. Does rocm show up for you there?

You can also try acpp-info -l, which prints the same information but avoids you having to sift through pages of debug output ;)
But yes, I agree that when it runs unexpectedly on the host it's much more likely to be an issue with the runtime not seeing the backend/device rather than a compiler issue.

@yboettcher
Copy link
Contributor

I am guessing here, but given that it works on omp and nvidia, I would assume that the problem is not with this adaptivecpp derivation, but maybe somewhere in how rocm is working on nixos. But that's just a guess.

In any case, unless we can find some "simple fix" by just supplying adaptivecpp with some parameter or including another buildInput or something like that, I would say that fixing this issue is out of scope of this PR, and I would say we can merge. Cpu and NVidia users gain the generic backend, and Amd users would have to use the --acpp_targets=hip:gfx<version> and supply a --rocm-device-lib-path=... , which should result in the same compilation as before.
I would be curious though, whether the wrapProgram can be entirely removed. @tdavidcl, could you try to remove the entire postFixup section and run again on the generic/nvidia backend? If that works, I guess the wrapProgram is not needed anymore.

@blenderfreaky
Copy link
Author

It turns out I linked to the wrong path in the nix store. I now also get the AMD_COMGR errors, and *_hip and *_omp both show up.

Looking into the code @illuhad linked, acpp should try to run hipcc -x ir --cuda-device-only -O3 -nogpuinc /dev/null --hip-link -### --cuda-gpu-arch=<YOUR GFX ARCH>.
Running the command in a shell with clr and rocm-device-libs fails with

clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library

Passing --rocm-device-lib-path makes it run however. But acpp does not pass this, as long as it finds hipcc. It only does so as a fallback, if it doesn't find hipcc (instead finding clang for example). Forcing this fallback to always run via a patch did not solve the problem however, as it tried looking in ${rocmPackages.clr}/amdgcn/bitcode rather than ${rocmPackages.rocm-device-libs}/amdgcn/bitcode.

I tried re-adding the wrapper, however that didn't help at all. @yboettcher, how exactly did you pass the arguments for it to see ROCm?

Revisting the acpp-rocm.json file mentioned in my last comment, it seems to pass --rocm-device-lib-path, however it points to the (false) link into rocmPackages.clr. Simply forcefully replacing this file in the fixup phase did not fix it however.

One thing I noticed is that --rocm-device-lib-path and --rocm-device-libs-path (note the extra s) both appear in AdaptiveCpp source. However, the auto-generated json uses ..-lib-.., where as it reads in ..-libs-.., which seems like it might be a bug?

Nonetheless, peppering in some debug statements, it doesn't seem like it actually ever receives those arguments from the json files, so maybe this is a dead-end path anyways? Even though acpp --help does show the correct arguments as default, e.g.:

--acpp-rocm-cxx-flags=<value>
  [can also be set with environment variable: ACPP_ROCM_CXX_FLAGS=<value>]
  [default value provided by field 'default-rocm-cxx-flags' in JSON files from directories: ['/nix/store/yli7gzgjg18q83mahfinmlcj3nj1kqr9-adaptivecpp-24.06.0/etc/AdaptiveCpp'].]
  [current value: -isystem /nix/store/yli7gzgjg18q83mahfinmlcj3nj1kqr9-adaptivecpp-24.06.0/include/AdaptiveCpp/hipSYCL/std/hiplike -isystem /nix/store/nha35i03qnyfvdpb14glfxw8mrv2fww4-clang-17.0.6-dev/include -U__FLOAT128__ -U__SIZEOF_FLOAT128__ -I/nix/store/24zx3a3b1vcyb6a76px1f9vbbcjvq40z-clr-6.0.2/include --rocm-device-libs-path=/nix/store/8miza8sk0ky5n3s98rzfwsq2vssf94bh-rocm-device-libs-6.0.2/amdgcn/bitcode --rocm-path=/nix/store/24zx3a3b1vcyb6a76px1f9vbbcjvq40z-clr-6.0.2 -fhip-new-launch-api -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -D__HIP_ROCclr__]
 The arguments passed to the compiler to compile for the ROCm backend

If you want to try the patch-stuff I did, I've put it in a separate branch for now to keep this one somewhat merge-able. See blenderfreaky:adaptivecpp-amd

@illuhad
Copy link

illuhad commented Dec 9, 2024

One thing I noticed is that --rocm-device-lib-path and --rocm-device-libs-path (note the extra s) both appear in AdaptiveCpp source. However, the auto-generated json uses ..-lib-.., where as it reads in ..-libs-.., which seems like it might be a bug?

Yep, I think too that one of them is likely incorrect.

Nonetheless, peppering in some debug statements, it doesn't seem like it actually ever receives those arguments from the json files, so maybe this is a dead-end path anyways? Even though acpp --help does show the correct arguments as default, e.g.:

rocm-cxx-flags only affects the ahead-of-time compilation flow (--acpp-targets=hip). The kind of flags that need to be taken into account there are entirely different for generic, so it neither needs them nor uses them. So I don't think this will play a role for JIT of generic target.

The fallback paths are not well tested. I think the main assumption is that we require a working ROCm installation, and a working ROCm installation will have a functioning hipcc that users can just invoke without having to configure paths first.

Is it possible to fix hipcc? IIRC it might be possible to point AdaptiveCpp to a hipcc wrapper of your choice using -DHIPCC_COMPILER. This might be the easiest solution here.

@blenderfreaky
Copy link
Author

I don't think hipcc is broken per-se, it's just that NixOS seperates the device libs and hipcc into two separate packages.

Afaict, all that's needed is to somehow pass --rocm-device-lib-path through to the JIT. Looking through the code it's not obvious to me how to do that.

@illuhad
Copy link

illuhad commented Dec 9, 2024

I don't think hipcc is broken per-se, it's just that NixOS seperates the device libs and hipcc into two separate packages.

What is the workflow for users? Can they just invoke hipcc if both packages are installed without having to pass extra flags to configure it?
If yes, then it should work with AdaptiveCpp. If no, then probably every Makefile using hipcc will be broken too.

As I said, the fallback path is not well-tested. It would be more predictable to get it to work with hipcc, where we deliberately do not pass in such flags:
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L162

The fallback path seems to assume a standard ROCm directory layout where the bitcode libraries live in $ROCM_PATH/amdgcn/bitcode:
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L214C54-L214C62
So there is no way to configure this in the JIT compiler. As I said, the assumption is that hipcc is already correctly configured and works out of the box.

@blenderfreaky
Copy link
Author

Scanning through some other nix builds, it looks like they seem to pass CXX_FLAGS="--rocm..." explicitly. Not sure why it's designed like that.

I now create a merged derivation with a wrapped hipcc. It looks like it now finds the bitcode, however it still fails with the same error overall:

[AdaptiveCpp Info] LLVMToAmdgpu: Invoking hipRTC...
[AdaptiveCpp Info] LLVMToAmdgpu: Invoking /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/bin/hipcc to determine ROCm device library list
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/hip.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/ocml.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/ockl.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/oclc_daz_opt_off.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/oclc_unsafe_math_off.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/oclc_finite_only_off.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc
[AdaptiveCpp Info] LLVMToAmdgpu: Linking with bitcode file: /nix/store/r7s8dzwam4da8aajdq2n6njw2rz1pvzp-rocm-merged/amdgcn/bitcode/oclc_wavefrontsize64_off.bc
[AdaptiveCpp Info] kernel_cache: Cache MISS for id 15056483203583061310.15627191518677280057
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed

Otherwise we could just brute-force it by patching the path into the code, though that just seems dirty.

@illuhad
Copy link

illuhad commented Dec 9, 2024

Okay, bitcode linking looks better now! But I do find it curious that there's no oclc_isa_*.bc and oclc_abi_version_*.bc that is linked. Without them, it's unlikely that the bitcode libraries are configured correctly, although that might not be the cause for the JIT error that you are seeing.
Does it report what it is still unhappy with if you set the environment variables that it mentions?

@blenderfreaky
Copy link
Author

Oddly enough, this doesn't seem to be particularly consistent. Running the simple example from the gist above for example, Linking: oclc_abi_version_500_lib.bc shows up in the COMGR debug, but not in the acpp debug, and *_isa_* doesn't show up at all. In the test suite, both show up in both, however the COMGR debug one only shows linking once (I assume that's intended behaviour). It talks about linking oclc_abi_version_500_lib.bc, but the acpp debug talks about version 400.

Afaict though, there's no obvious error message however

Here is the simple examples logs for reference

Ran with AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ACPP_DEBUG_LEVEL=3

[AdaptiveCpp Info] kernel_cache: Registering kernel ZZ11double_testvENK3$_0clERN7hipsycl4sycl7handlerEEUlNS1_2idILi1EEEE_
[AdaptiveCpp Info] hcf_cache: Registering HCF object 15335113193185923413...
[AdaptiveCpp Info] hcf_cache: Registering kernel info for kernel _Z18__acpp_sscp_kernelIN7hipsycl4glue15__sscp_dispatch18basic_parallel_forIZZ11double_testvENK3$_0clERNS0_4sycl7handlerEEUlNS5_2idILi1EEEE_Li1EEEEvRKT_ from HCF object 15335113193185923413
[AdaptiveCpp Info]   kernel_info: hcf object id = 15335113193185923413
[AdaptiveCpp Info]   kernel_info: parameter 0: offset = 0 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 1: offset = 8 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 2: offset = 16 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 3: offset = 24 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 4: offset = 32 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 5: offset = 40 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 6: offset = 48 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 7: offset = 56 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 8: offset = 64 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 9: offset = 72 size = 8 original index = 0
[AdaptiveCpp Info] hcf_cache: Registering kernel info for kernel _Z18__acpp_sscp_kernelIN7hipsycl4glue15__sscp_dispatch25basic_parallel_for_offsetIZZ11double_testvENK3$_0clERNS0_4sycl7handlerEEUlNS5_2idILi1EEEE_Li1EEEEvRKT_ from HCF object 15335113193185923413
[AdaptiveCpp Info]   kernel_info: hcf object id = 15335113193185923413
[AdaptiveCpp Info]   kernel_info: parameter 0: offset = 0 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 1: offset = 8 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 2: offset = 16 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 3: offset = 24 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 4: offset = 32 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 5: offset = 40 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 6: offset = 48 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 7: offset = 56 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 8: offset = 64 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 9: offset = 72 size = 8 original index = 0
[AdaptiveCpp Info]   kernel_info: parameter 10: offset = 80 size = 8 original index = 0
[AdaptiveCpp Info] hcf_cache: Registering image info for image llvm-ir.global from HCF object 15335113193185923413
[AdaptiveCpp Info] backend_loader: Searching path for backend libs: '"/nix/store/h7pqp5h2wyc9giaxa43wvj3x8q99xzh4-adaptivecpp-24.06.0/bin/../lib/hipSYCL"'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/h7pqp5h2wyc9giaxa43wvj3x8q99xzh4-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-hip.so" for backend 'hip'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/h7pqp5h2wyc9giaxa43wvj3x8q99xzh4-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-omp.so" for backend 'omp'
[AdaptiveCpp Info] Registering backend: 'hip'...
[AdaptiveCpp Info] Registering backend: 'omp'...
[AdaptiveCpp Info] Discovered devices from backend 'HIP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: AMD
[AdaptiveCpp Info]     name: AMD Radeon RX 6800
[AdaptiveCpp Info] Discovered devices from backend 'OpenMP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: the hipSYCL project
[AdaptiveCpp Info]     name: hipSYCL OpenMP host device
[AdaptiveCpp Info] dag_manager: DAG manager is alive!
[AdaptiveCpp Info] runtime: ******* rt launch initiated ********
[AdaptiveCpp Info] queue: Constructed queue with node group id 1
Running on AMD Radeon RX 6800
[AdaptiveCpp Info] data_region: constructed with page table dimensions 1 1 1
[AdaptiveCpp Info] data_region: constructed with page table dimensions 1 1 1
[AdaptiveCpp Info] data_region: constructed with page table dimensions 1 1 1
[AdaptiveCpp Info] dag_manager: Checking DAG flush opportunity...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Preparing submission of writeback...
[AdaptiveCpp Info] dag_manager: Checking DAG flush opportunity...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: dag node is registered as user but not marked as submitted, performing emergency DAG flush.
[AdaptiveCpp Info] dag_manager: Submitting asynchronous flush...
[AdaptiveCpp Info] dag_builder: DAG contains operations: 
[AdaptiveCpp Info] 0. kernel: ZZ11double_testvENK3$_0clERN7hipsycl4sycl7handlerEEUlNS1_2idILi1EEEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf5c8600
[AdaptiveCpp Info]     --> requires node @0x5641bf4d6240 MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info]     --> requires node @0x5641bf5c8160 MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info]     --> requires node @0x5641bf5c83a0 MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info] 1. MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf5c88e0
[AdaptiveCpp Info]     --> requires node @0x5641bf5c8600 kernel: ZZ11double_testvENK3$_0clERN7hipsycl4sycl7handlerEEUlNS1_2idILi1EEEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info] 2. MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf4d6240
[AdaptiveCpp Info] 3. MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf5c8160
[AdaptiveCpp Info] 4. MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf5c83a0
[AdaptiveCpp Info] dag_manager: waiting for async worker...
[AdaptiveCpp Info] dag_manager [async]: Flushing!
[AdaptiveCpp Info] dag_manager [async]: Releasing dead users of data region 0x5641bf5c7a60
[AdaptiveCpp Info] dag_manager [async]: Releasing dead users of data region 0x5641bf5c7c70
[AdaptiveCpp Info] dag_manager [async]: Releasing dead users of data region 0x5641bf5c7e80
[AdaptiveCpp Info] dag_manager [async]: Submitting node to scheduler!
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8 to 0x7fc21f200000
amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_ADD_PRECOMPILED_HEADERS
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
    Compilation Args:  "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "-I" "/tmp/comgr-83fc8e/include" "-include-pch" "/tmp/comgr-83fc8e/include/opencl1.2-c.pch" "-Xclang" "-fno-validate-pch" "-x" "cl" "-std=cl1.2" "-cl-no-stdinc" "-c" "-emit-llvm" "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5" "-nogpulib" "/tmp/comgr-83fc8e/input/CompileSource" "-o" "/tmp/comgr-83fc8e/output/CompileSource.bc"
     Driver Job Args: clang "-cc1" "-mcode-object-version=5" "-mllvm" "--amdhsa-code-object-version=5" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-emit-llvm-uselists" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "CompileSource" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=none" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-target-cpu" "gfx1030" "-debugger-tuning=gdb" "-resource-dir" "lib/clang/17.0.0" "-c-isystem" "include/gpu-none-llvm" "-include-pch" "/tmp/comgr-83fc8e/include/opencl1.2-c.pch" "-I" "/tmp/comgr-83fc8e/include" "-D" "__OPENCL_VERSION__=200" "-D" "__IMAGE_SUPPORT__=1" "-O3" "-std=cl1.2" "-fdebug-compilation-dir=/home/blenderfreaky/src/stuff/nixpkgs" "-ferror-limit" "19" "-cl-kernel-arg-info" "-nogpulib" "-fno-threadsafe-statics" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-fno-validate-pch" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-faddrsig" "-o" "/tmp/comgr-83fc8e/output/CompileSource.bc" "-x" "cl" "/tmp/comgr-83fc8e/input/CompileSource"
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
             Linking: LLVM Binary
             Linking: opencl_lib.bc
             Linking: ocml_lib.bc
             Linking: ockl_lib.bc
             Linking: oclc_isa_version_1030.bc
             Linking: oclc_correctly_rounded_sqrt_off_lib.bc
             Linking: oclc_daz_opt_off_lib.bc
             Linking: oclc_finite_only_off_lib.bc
             Linking: oclc_unsafe_math_off_lib.bc
             Linking: oclc_wavefrontsize64_off_lib.bc
             Linking: oclc_abi_version_500_lib.bc
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_CODEGEN_BC_TO_RELOCATABLE
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "-O3" "-cl-kernel-arg-info" "-mllvm" "-amdgpu-internalize-symbols" "-mcode-object-version=5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_NONE
    Compilation Args:  "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "-c" "-mllvm" "-amdgpu-internalize-symbols" "-O3" "-cl-kernel-arg-info" "-mllvm" "-amdgpu-internalize-symbols" "-mcode-object-version=5" "-nogpulib" "/tmp/comgr-906c83/input/linked.bc" "-o" "/tmp/comgr-906c83/output/linked.bc.o"
     Driver Job Args: clang "-cc1" "-mcode-object-version=5" "-mllvm" "--amdhsa-code-object-version=5" "-triple" "amdgcn-amd-amdhsa" "-emit-obj" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "linked.bc" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=none" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-target-cpu" "gfx1030" "-debugger-tuning=gdb" "-resource-dir" "lib/clang/17.0.0" "-O3" "-fdebug-compilation-dir=/home/blenderfreaky/src/stuff/nixpkgs" "-ferror-limit" "19" "-cl-kernel-arg-info" "-nogpulib" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-mllvm" "-amdgpu-internalize-symbols" "-mllvm" "-amdgpu-internalize-symbols" "-faddrsig" "-o" "/tmp/comgr-906c83/output/linked.bc.o" "-x" "ir" "/tmp/comgr-906c83/input/linked.bc"
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_RELOCATABLE_TO_EXECUTABLE
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options:
                Path: 
            Language: AMD_COMGR_LANGUAGE_NONE
    Compilation Args:  "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "/tmp/comgr-39e89c/input/linked.bc.o" "-o" "/tmp/comgr-39e89c/output/a.so"
     Driver Job Args: lld "/tmp/comgr-39e89c/input/linked.bc.o" "--no-undefined" "-shared" "-o" "/tmp/comgr-39e89c/output/a.so"
[AdaptiveCpp Info] multi_queue_executor: Spawned for backend HIP with configuration: 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     memcpy lane: 0
[AdaptiveCpp Info]     memcpy lane: 1
[AdaptiveCpp Info]     kernel lane: 2
[AdaptiveCpp Info]     kernel lane: 3
[AdaptiveCpp Info] multi_queue_executor: Processing node 0x5641bf4d6240 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] multi_queue_executor: Dispatching to lane 0: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] inorder_executor: Processing node 0x5641bf4d6240 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] inorder_executor: Dispatching to lane 0x7fc2180020f0: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8 to 0x7fc21f202000
[AdaptiveCpp Info] multi_queue_executor: Processing node 0x5641bf5c8160 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] multi_queue_executor: Dispatching to lane 1: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] inorder_executor: Processing node 0x5641bf5c8160 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] inorder_executor: Dispatching to lane 0x7fc218425f70: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8 to 0x7fc21f204000
[AdaptiveCpp Info] multi_queue_executor: Processing node 0x5641bf5c83a0 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] multi_queue_executor: Dispatching to lane 0: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] inorder_executor: Processing node 0x5641bf5c83a0 with 0 non-virtual requirement(s) and 0 direct requirement(s).
[AdaptiveCpp Info] inorder_executor: Dispatching to lane 0x7fc2180020f0: Memcpy: CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] multi_queue_executor: Processing node 0x5641bf5c8600 with 3 non-virtual requirement(s) and 3 direct requirement(s).
[AdaptiveCpp Info] multi_queue_executor: Dispatching to lane 2: kernel: ZZ11double_testvENK3$_0clERN7hipsycl4sycl7handlerEEUlNS1_2idILi1EEEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info] inorder_executor: Processing node 0x5641bf5c8600 with 3 non-virtual requirement(s) and 3 direct requirement(s).
[AdaptiveCpp Info]  --> Synchronizes with other queue for node: 0x5641bf4d6240
[AdaptiveCpp Info]   --> (Skipping unnecessary synchronization; another requirement follows in the same inorder queue)
[AdaptiveCpp Info]  --> Synchronizes with other queue for node: 0x5641bf5c8160
[AdaptiveCpp Info]  --> Synchronizes with other queue for node: 0x5641bf5c83a0
[AdaptiveCpp Info] inorder_executor: Dispatching to lane 0x7fc21869eb40: kernel: ZZ11double_testvENK3$_0clERN7hipsycl4sycl7handlerEEUlNS1_2idILi1EEEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: R device {0, 0, 0}+{1, 1, 1024} #8
   MEM_REQ: RW device {0, 0, 0}+{1, 1, 1024} #8
[AdaptiveCpp Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x5641bf5c8010
[AdaptiveCpp Info] Identified embedded pointer with uid 13635687135095732756-78982385135476617 in kernel blob, setting to 0x7fc21f200000
[AdaptiveCpp Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x5641bf5c80b0
[AdaptiveCpp Info] Identified embedded pointer with uid 15797494121560342401-439162604930412937 in kernel blob, setting to 0x7fc21f202000
[AdaptiveCpp Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x5641bf5c82f0
[AdaptiveCpp Info] Identified embedded pointer with uid 8879812261484428978-11824364715927372937 in kernel blob, setting to 0x7fc21f204000
[AdaptiveCpp Info] LLVMToBackend: Using build option: amdgpu-target-device=gfx1030
[AdaptiveCpp Info] LLVMToBackend: Using build option: known-group-size-x=128
[AdaptiveCpp Info] LLVMToBackend: Using build option: known-group-size-y=1
[AdaptiveCpp Info] LLVMToBackend: Using build option: known-group-size-z=1
[AdaptiveCpp Info] LLVMToBackend: Using build option: known-local-mem-size=0
[AdaptiveCpp Info] LLVMToBackend: Using build flag: global-sizes-fit-in-int
[AdaptiveCpp Info] LLVMToBackend: Preparing backend flavoring...
[AdaptiveCpp Info] LLVMToBackend: Attempting to link against 0 external bitcode modules to resolve 0 symbols
[AdaptiveCpp Info] LLVMToBackend: Applying specializations and S2 IR constants...
[AdaptiveCpp Info] LLVMToBackend: Processing specialization 26__acpp_sscp_s2_ir_constantIL_ZN7hipsycl4sycl3jitL15current_backendEEiE
[AdaptiveCpp Info] LLVMToBackend: Optimizing branches post S2 IR constant application...
[AdaptiveCpp Info] LLVMToBackend: Reoutlining kernels...
[AdaptiveCpp Info] LLVMToBackend: Adding backend-specific flavor to IR...
[AdaptiveCpp Info] LLVMToAmdgpu: Setting up kernel _Z18__acpp_sscp_kernelIN7hipsycl4glue15__sscp_dispatch18basic_parallel_forIZZ11double_testvENK3$_0clERNS0_4sycl7handlerEEUlNS5_2idILi1EEEE_Li1EEEEvRKT_
[AdaptiveCpp Info] LLVMToBackend: Linking with bitcode file: /nix/store/h7pqp5h2wyc9giaxa43wvj3x8q99xzh4-adaptivecpp-24.06.0/bin/../lib/hipSYCL/bitcode/libkernel-sscp-amdgpu-amdhsa-full.bc
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Info] AddressSpaceInferencePass: Found alloca in address space 0 when it should be in AS 5, fixing.
[AdaptiveCpp Inf[AdaptiveCpp Info] kernel_cache: Cache MISS for id 8862471291175581522.12052157848132275340
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Info] dag_manager [async]: Submitting node to scheduler!
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 to 0x5641bf3fe4d0
[AdaptiveCpp Info] multi_queue_executor: Processing node 0x5641bf5c88e0 with 0 non-virtual requirement(s) and 1 direct requirement(s).
[AdaptiveCpp Info] multi_queue_executor: Dispatching to lane 1: Memcpy: ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] inorder_executor: Processing node 0x5641bf5c88e0 with 0 non-virtual requirement(s) and 1 direct requirement(s).
[AdaptiveCpp Info] inorder_executor: Dispatching to lane 0x7fc218425f70: Memcpy: ROCm-Device0 #8 {0, 0, 0}+{1, 1, 1024}-->CPU-Device0 #8 {0, 0, 0}+{1, 1, 1024}{1, 1, 1024}
[AdaptiveCpp Info] dag_manager [async]: DAG flush complete.
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Preparing submission of writeback...
[AdaptiveCpp Info] dag_manager: Checking DAG flush opportunity...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: dag node is registered as user but not marked as submitted, performing emergency DAG flush.
[AdaptiveCpp Info] dag_manager: Submitting asynchronous flush...
[AdaptiveCpp Info] dag_builder: DAG contains operations: 
[AdaptiveCpp Info] 0. MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf4116e0
[AdaptiveCpp Info] dag_manager: waiting for async worker...
[AdaptiveCpp Info] dag_manager [async]: Flushing!
[AdaptiveCpp Info] dag_manager [async]: Submitting node to scheduler!
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 to 0x5641bf412cf0
[AdaptiveCpp Info] dag_manager [async]: DAG flush complete.
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Preparing submission of writeback...
[AdaptiveCpp Info] data_region::~data_region: Freeing allocation 0x7fc21f202000
[AdaptiveCpp Info] dag_manager: Checking DAG flush opportunity...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[AdaptiveCpp Info] buffer_impl::~buffer_impl: dag node is registered as user but not marked as submitted, performing emergency DAG flush.
[AdaptiveCpp Info] dag_manager: Submitting asynchronous flush...
[AdaptiveCpp Info] dag_builder: DAG contains operations: 
[AdaptiveCpp Info] 0. MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 @node 0x5641bf411920
[AdaptiveCpp Info] dag_manager: waiting for async worker...
[AdaptiveCpp Info] data_region::~data_region: Freeing allocation 0x7fc21f204000
[AdaptiveCpp Info] dag_manager [async]: Flushing!
[AdaptiveCpp Info] dag_manager [async]: Submitting node to scheduler!
[AdaptiveCpp Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R host_buffer {0, 0, 0}+{1, 1, 1024} #8 to 0x5641bf401500
[AdaptiveCpp Info] dag_manager [async]: DAG flush complete.
0
============== hipSYCL error report ============== 
hipSYCL has caught the following unhandled asynchronous errors: 

   0. from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

   1. from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
The application will now be terminated.

@illuhad
Copy link

illuhad commented Dec 9, 2024

Running the simple example from the gist above for example, Linking: oclc_abi_version_500_lib.bc shows up in the COMGR debug, but not in the acpp debug, and isa doesn't show up at all.

Strange. But I see *_isa_* in your comgr logs (and I'm not seeing any linking output from AdaptiveCpp?).

It talks about linking oclc_abi_version_500_lib.bc, but the acpp debug talks about version 400.

A mismatch of oclc ABI bitcode libraries should not happen and can lead to incorrect execution of kernels, but IMO should not lead to a JIT failure. I'm not sure what's going on. The AMD_COMGR_SAVE_TEMPS=1 should cause the comgr temporary files to be kept (perhaps somewhere in /tmp). If we can find the final generated bitcode there, that could help finding out more.

It's also possible that hipRTC has a problem at the linking stage, e.g. if there's an issue with lld. But that would be an issue within ROCm. You could check with a basic example whether hipRTC works, e.g. https://github.com/ROCm/rocm-examples/tree/develop/HIP-Basic/runtime_compilation/

@tdavidcl
Copy link

@tdavidcl, could you try to remove the entire postFixup section and run again on the generic/nvidia backend? If that works, I guess the wrapProgram is not needed anymore.

Yeah it works with or without the postfixup for the generic backend on Nvidia hardware since no flags affect the behavior on nvidia.

@blenderfreaky
Copy link
Author

Removed the wrapper. I say we merge this for now and move over to a new issue/PR for the AMD stuff.

There seemed to be some issues with the builds on ofBorg, unsure if those are relevant though.

@blenderfreaky
Copy link
Author

@illuhad somehow, the example builds and runs fine if I use the provided Makefile, but I haven't been able to even build it using CMake. I think that's a problem with my usage of CMake though.

I don't have much time this week for debugging but I've uploaded the comgr logs & bitcode in case you want to take a look.

@yboettcher
Copy link
Contributor

somehow, the example builds and runs fine if I use the provided Makefile, but I haven't been able to even build it using CMake

Same for me. I tried to also make a wrapped hipcc with the device libs and tried compile the rocm example @illuhad provided, but I just couldn't get any of it to work with cmake (with or without the wrapped hipcc). I initially only tried the cmake version, which looks like CMake failed on enable_language(HIP). With the Makefile however, there was no problem.

There seemed to be some issues with the builds on ofBorg, unsure if those are relevant though.

I don't know how this is set up, but if nixpkgs is set up in a way that ofborg failures prevent a merge, then I guess they're relevant. I once had issues where adaptivecpp tried to compile the opencl backend (because rocm also provides opencl), but that backend requires network access, which failed, making it necessary to manually disable the opencl backend. But I haven't encountered this issue in your branch.

@blenderfreaky
Copy link
Author

Gonna have wait for ofBorg to run through I guess

Do you by chance have the /opt/rocm workaround/hack from the wiki configured? I do, and after removing it, the Makefile fails too. But unlike CMake, just adding ROCM_INSTALL_DIR := /nix/store/...-clr-6.0.2/ instantly fixes it. Not even any merged rocm, wrapper, etc. required.

@yboettcher
Copy link
Contributor

I actually had a /opt/rocm thing set up, but apparently I ran an out of date version of that workaround from the old(?) wiki https://nixos.wiki/wiki/Amd_Gpu which used /opt/rocm/hip. I did have some issues with that when I played around with cmake (cmake complained about things not existing in /opt/rocm/lib) and manually added an /opt/rocm symlink to clr (outside of nix, just a plain old ln -s), which did not help with cmake, but that's probably why the makefile worked.

@illuhad
Copy link

illuhad commented Dec 11, 2024

I don't have much time this week for debugging but I've uploaded the comgr logs & bitcode in case you want to take a look.

@blenderfreaky Thanks, I don't know why we are seeing a JIT failure, but the bitcode does not actually seem to contain the kernel from your source file (only some AMD builtin kernels). So something seems definitely off, although I cannot say what exactly.

EDIT: It seems that __ockl_image_load_2Da is not resolved which likely causes the JIT failure. Apparently a lot of image support bitcode gets linked into the binary through the ROCm bitcode libraries, even though we're not using any of it in our example. This is what everything linked together looks like: https://godbolt.org/z/rqs78neqd

@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Dec 12, 2024
@blenderfreaky
Copy link
Author

@illuhad Interesting that the it says amdhsa.target: amdgcn-amd-amdhsa--gfx700, considering I'm on gfx1030, is that normal?

Running the program with strace, two thing stood out to me:

access("/dev/dri/card128", F_OK)        = -1 ENOENT (No such file or directory)
access("/dev/dri/renderD128", F_OK)     = 0

Note that /dev/dri/card1 exists. Not sure this is relevant, but seemed odd.

openat(AT_FDCWD, "/nix/store/nj7c3fsf4a65c3ndfxn8253q42f4jzk7-libdrm-2.4.123/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/vxgysj55gc9yq7g1wlb0vjjgd74hk84j-rocm-llvm-libunwind-6.0.2/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/y1563grxzk23mapa57a6qzsjaqyvcw76-elfutils-0.191/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/1hyxvd3zv5gyl9836y6895i567x04bqj-numactl-2.0.18/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v3/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v3/", 0x7ffc483a2450, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v2/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v2/", 0x7ffc483a2450, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/", {st_mode=S_IFDIR|0555, st_size=48, ...}, 0) = 0

It seems to be looking for libhsa-amd-aqlprofile64.so, which doesn't seem to exist in any package. This seems to be a common issue (see ROCm/ROCm#1781), but I'm not sure if it's really the cause of our issue.

@illuhad
Copy link

illuhad commented Dec 12, 2024

@illuhad Interesting that the it says amdhsa.target: amdgcn-amd-amdhsa--gfx700, considering I'm on gfx1030, is that normal?

Are referring to the godbolt link? I don't think this is cause for concern. When a target architecture is not provided, clang/LLVM falls back to the oldest AMD GPU it knows, which presumably is gfx700.
I haven't added any flags in that godbolt link to request compilation for a specific GPU architecture, so that's probably why we get default behavior.
I think the exact instructions in the assembly are not particularly interesting, the point is rather that the LLVM IR compiles, and that some function calls remain unresolved in the assembly.

@blenderfreaky
Copy link
Author

Ah ok, makes sense.

I compiled and ran the example in arch via distrobox and compared the comgr commands, there are some interesting differences.
Here's just the differences:

On Arch:

ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC

Compilation Args:
--rocm-path=/tmp/comgr-5bec98/rocm
-Xclang
-mlink-builtin-bitcode-postopt

Driver Job Args:
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/opencl.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/ocml.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/ockl.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_finite_only_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_wavefrontsize64_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_isa_version_1030.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_abi_version_500.bc

-mlink-builtin-bitcode-postopt

Others Actions:

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
 Comgr Branch-Commit: makepkg-1e2c94795ee0
         LLVM Commit: 1e2c94795ee0d6ab8e2ff3035965a6b74e11b475
             Linking Bitcode: /tmp/comgr-d874ea/input/LLVM Binary
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

On Nix:

ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC

Compilation Args:
-nogpulib

Driver Job Args:
-nogpulib
-fcolor-diagnostics

-disable-llvm-verifier
-discard-value-names

Others Actions:

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
             Linking: LLVM Binary
             Linking: opencl_lib.bc
             Linking: ocml_lib.bc
             Linking: ockl_lib.bc
             Linking: oclc_isa_version_1030.bc
             Linking: oclc_correctly_rounded_sqrt_off_lib.bc
             Linking: oclc_daz_opt_off_lib.bc
             Linking: oclc_finite_only_off_lib.bc
             Linking: oclc_unsafe_math_off_lib.bc
             Linking: oclc_wavefrontsize64_off_lib.bc
             Linking: oclc_abi_version_500_lib.bc
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

It seems the two are using different methods for linking, no idea why though.

@illuhad
Copy link

illuhad commented Dec 13, 2024

@blenderfreaky Are we looking at the same ROCm versions? IIRC there was a change where they changed how builtin bitcode libraries were included in the compilation. But it was back around ROCm 5.7.

I'm also a bit surprised to see -nogpulib in the nix compilation. This tells the compiler that bitcode libraries do not need to be included. But I'm not sure if this is related to our problem, or just a side effect of the change in bitcode handling that I mentioned.

EDIT: Notice the two different comgr actions:

  • AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
  • AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC

This indicates that the difference might be related to the mentioned ROCm change.

@blenderfreaky
Copy link
Author

@illuhad We're using ROCm 6.0.2 (Nix seems a little out of date here), the arch one is on 6.2.1 & 6.2.4 for some packages. Both are definitely >= ROCm 6 though.

I interpreted the difference in Action as the Arch one doing Linking and stuff in one step, whereas on the Nix one splits it out into several, linking in a later step.
What's odd as well is the --rocm-path being passed on Arch but not on Nix, maybe this is because of that difference in order as well?

@yboettcher Can you make any sense of the ofBorg failures? It looks like it's just the test failing, however:

  • It's failing to build, not to run
  • The error seems to be that it fails to resolve python, but python is an explicit build input for the tests. It might need to be a nativeBuildInput?
  • I cannot reproduce the failure locally, despite sandboxing being on. I don't know what other differences should be possible even?

We could just comment them out to get this merged sooner though I think, seeing as they don't run as they should anyways and it's working for Nvidia already.

@yboettcher
Copy link
Contributor

yboettcher commented Dec 13, 2024

I actually have the same failures ofborg has on my end. It appears that for the normal adaptivecpp build, the interpreter directives !#/usr/bin/env python do not get automatically changed to some nix path. When building adaptivecppWithRocm I get these messages at the end of the build

patching script interpreter paths in /nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/OpenSYCL/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/hipSYCL/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/AdaptiveCpp/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/acpp: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/syclcc: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/syclcc-clang: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"

but when building the normal adaptivecpp, these messages are absent.
I assume that the rocm packages include python somewhere, triggering this autoreplacement. I added python3 to the buildInputs of adaptivecpp, which caused the interpreter paths to get auto-changed, and now my tests build and run fine.

Edit:
forgot to mention: without the changed interpreter path, the resulting binaries of the adaptivecpp derivation cannot run on their own and require the user to provide some python. At least for me.

$ result/bin/acpp
/usr/bin/env: ‘python3’: No such file or directory

$ nix-shell -p python3 --run "result/bin/acpp"
acpp [AdaptiveCpp compilation driver], Copyright (C) 2018-2024 Aksel Alpay and the AdaptiveCpp project
  AdaptiveCpp version: 24.06.0
  Installation root: /nix/store/3lygkianckvd07brsfjdqd5ijcrgzyz5-adaptivecpp-24.06.0
<more output>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants