-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic GPU results #113
Comments
That certainly is not supposed to happen. A minimal working example running in the REPL would be helpful. Furthermore you could check if the input vector |
Ok, here's a reproducer: julia> using CUDA, NFFT, CuNFFT
julia> Ny, Nx = 1024, 2048
(1024, 2048)
julia> x = CUDA.randn(Ny, Nx);
julia> knots = CUDA.rand(2, Ny*Nx) .- 0.5f0;
julia> plan = NFFT.plan_nfft(CuArray{Float32}, knots, size(x));
julia> CUDA.@allowscalar [(adjoint(plan) * complex(x[:]))[1] for i=1:10]
10-element Vector{ComplexF32}:
-1568.3418f0 - 972.4891f0im
-1568.3413f0 - 972.489f0im
-1568.3414f0 - 972.48883f0im
-1568.3412f0 - 972.4891f0im
-1568.3416f0 - 972.4893f0im
-1568.3418f0 - 972.4893f0im
-1568.3416f0 - 972.4891f0im
-1568.3413f0 - 972.48944f0im
-1568.3413f0 - 972.4892f0im
-1568.3412f0 - 972.4895f0im Not quite as numerically drastic as my real case but its there. I don't see anything like this for FFTs.
|
Input vector does not appear to change. |
Following on that MWE, it looks like its the sparse matrix product? v = complex(vec(x))
tmpVec1, tmpVec2 = similar(plan.tmpVec), similar(plan.tmpVec)
NFFT.convolve_transpose!(plan, v, tmpVec1)
NFFT.convolve_transpose!(plan, v, tmpVec2)
tmpVec1 == tmpVec2 # false |
Well, thats just function AbstractNFFTs.convolve_transpose!(p::CuNFFTPlan{T,D}, fHat::CuArray{U}, g::CuArray{Complex{T},D}) where {D,T,U}
mul!(vec(g), p.B, fHat)
return
end see https://github.com/JuliaMath/NFFT.jl/blob/master/CuNFFT/src/implementation.jl#L72. So probably that issue also arises without |
Yea, I pinged on Slack. I think there's one possibility its still an NFFT issue maybe is if that matrix somehow accidentally has repeated entries? I think thats technically representable by that data structure. Not sure the expected behavior in that case or how to even check. |
Ah yes, you are right. Indeed we might need to have a look at the sparse matrix construction. |
No, I think it basically "is" a CUDA issue, B = CUDA.CUSPARSE.CuSparseMatrixCSC(sprand(Float32, 8388608, 2097152, 1e-5))
v = CUDA.randn(2097152);
w = CUDA.randn(8388608);
B * v == B * v # false
B' * w == B' * w # true Although if you make it CSR then the true/false above switch. Which I guess the opens up an option if NFFT wanted to "fix" this problem and also get a speedboost which is, probably as an option, to store the matrix both in CSR and CSC, and use the appropriate one for the forward vs. adjoint NFFT. I don't know if this is something you wanted to do, but figured I'd mention. They're certainly not huge numerical errors (although unfortunately in my real case they're propagating to some significant downstream ones, which maybe is a separate thing I should figure out). |
Ok, this is gets interesting. Storing both CSR and CSC seems very clever, I have not thought about that before. We should certainly provide some flag that allows to control that but I would be open to integrate that. Please go ahead, if you want to give that a go. I will not come to that in the near future. |
I've got a case where subsequent (adjoint) NFFTs on GPU are giving different results. I don't have a MWE yet but just to prove it:
Is this something which one could expect to happen in any scenario? Perhaps I'm doing something wrong? Any hints that are possible from the little info I provded might help me track down the MWE. In any case trying to get something...
The text was updated successfully, but these errors were encountered: