Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FieldVector broadcast inference failure #1465

Closed
charleskawczynski opened this issue Sep 19, 2023 · 9 comments · Fixed by #1615
Closed

FieldVector broadcast inference failure #1465

charleskawczynski opened this issue Sep 19, 2023 · 9 comments · Fixed by #1615
Labels
bug Something isn't working

Comments

@charleskawczynski
Copy link
Member

charleskawczynski commented Sep 19, 2023

Reproducer:

import ClimaCore
import ClimaCore.Spaces as Spaces
import ClimaCore.Fields as Fields
using StaticArrays
include(
    joinpath(pkgdir(ClimaCore), "test", "TestUtilities", "TestUtilities.jl"),
)
import .TestUtilities as TU

function foo!(args)
    (; residual, temp, dt,a_imp, Ui) = args
    i=1
    @. residual = temp + dt * a_imp[i, i] * residual - Ui
    return nothing
end

FT = Float32
cspace = TU.CenterExtrudedFiniteDifferenceSpace(FT);
Yc = fill((;a=FT(0)),cspace);
args = (;
    residual=Fields.FieldVector(;c=Yc),
    temp=Fields.FieldVector(;c=Yc),
    a_imp=SMatrix{1,1}(rand(1,1)),
    Ui=Fields.FieldVector(;c=Yc),
    dt = 1.0,
);
foo!(args)

using JET
JET.@test_opt foo!(args)
@charleskawczynski charleskawczynski added the bug Something isn't working label Sep 19, 2023
@charleskawczynski
Copy link
Member Author

I think this is blocking us from using the latest ClimaTimeSteppers (which is OOMing)

@charleskawczynski
Copy link
Member Author

Strangely

using JET; v = Int[1,7,3,4,1]; JET.@test_opt sort!(v)
using Revise; include("../cc_bc_inf.jl")

(where cc_bc_inf.jl is the MWE above) passes. Something very strange is going on with sort!(::Vector{Int}).. cc @simonbyrne

@charleskawczynski
Copy link
Member Author

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Sep 19, 2023

The order dependence seems to disappear in 1.10:

Charlies-iMac-2:ClimaCore.jl charliekawczynski$ julia +beta --project
julia> using Revise; include("../cc_bc_inf.jl")
Precompiling ClimaCore
  6 dependencies successfully precompiled in 11 seconds. 104 already precompiled.
[ Info: Precompiling JET [c3a54625-cd67-489e-a8e7-0a5a0ff4e31b]
JET-test failed at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30
  Expression: #= /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30 =# JET.@test_opt foo!(args)
  ═════ 5 possible errors found ═════
  ...
  
ERROR: LoadError: There was an error during testing
in expression starting at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30

julia> using JET; v = Int[1,7,3,4,1]; JET.@test_opt sort!(v)
Test Passed

julia> using Revise; include("../cc_bc_inf.jl")
WARNING: replacing module TestUtilities.
JET-test failed at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30
  Expression: #= /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30 =# JET.@test_opt foo!(args)
  ═════ 5 possible errors found ═════
  ...
  
ERROR: LoadError: There was an error during testing
in expression starting at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30

julia> 
Charlies-iMac-2:ClimaCore.jl charliekawczynski$ julia +beta --project
julia> using JET; v = Int[1,7,3,4,1]; JET.@test_opt sort!(v)
Test Passed

julia> using Revise; include("../cc_bc_inf.jl")
JET-test failed at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30
  Expression: #= /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30 =# JET.@test_opt foo!(args)
  ═════ 5 possible errors found ═════
  ...
  
ERROR: LoadError: There was an error during testing
in expression starting at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30

julia> 
Charlies-iMac-2:ClimaCore.jl charliekawczynski$ julia --project
julia> using Revise; include("../cc_bc_inf.jl")
JET-test failed at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30
  Expression: #= /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30 =# JET.@test_opt foo!(args)
  ═════ 2 possible errors found ═════
  ...
  
ERROR: LoadError: There was an error during testing
in expression starting at /Users/charliekawczynski/Dropbox/Caltech/work/dev/CliMA/cc_bc_inf.jl:30

julia> using JET; v = Int[1,7,3,4,1]; JET.@test_opt sort!(v)
JET-test failed at REPL[2]:1
  Expression: #= REPL[2]:1 =# JET.@test_opt sort!(v)
  ═════ 2 possible errors found ═════
  ...
  
ERROR: There was an error during testing

julia> 
Charlies-iMac-2:ClimaCore.jl charliekawczynski$ julia --project
julia> using JET; v = Int[1,7,3,4,1]; JET.@test_opt sort!(v)
Test Passed

julia> using Revise; include("../cc_bc_inf.jl")
Test Passed

julia> 

@charleskawczynski
Copy link
Member Author

Boiled down to:

using BlockArrays
shape1 = (BlockArrays._BlockedUnitRange((2,)),);
shape2 = (BlockArrays._BlockedUnitRange((2,)),);
Base.Broadcast._bcs(shape1, shape2)
@code_warntype Base.Broadcast._bcs(shape1, shape2)

@charleskawczynski
Copy link
Member Author

Opened JuliaArrays/BlockArrays.jl#310

@simonbyrne
Copy link
Member

How is this different than what you were doing before? Why did it only appear now?

@charleskawczynski
Copy link
Member Author

The only difference is that we do it more times because the stage loop is unrolled.

That said, I'm beginning to think this is not the issue. One odd thing I found is that this inference failure disappears if we change the broadcast expression from

@. residual = temp + dt * a_imp[i, i] * residual - Ui

to

@. residual = temp + dt * a_imp[i, i] * residual

(removing any other fieldvector from the RHS has the same impact). So, it seems like the code path is changing based on on the number of terms we have. I tried running a single step with this change, and the compiler still SO'd when compiling step_u!. So, I don't think this is the issue. Still digging into what is wrong. I'm changing gears by commenting / uncommenting code to try and identify the issue.

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Sep 21, 2023

It turns out that this issue is not responsible for the OOMing (OOMing seems to have been coming from the NVTX range calls, which are being temporarily removed), but it is just due to JuliaArrays/BlockArrays.jl#310. I'm hoping that JuliaArrays/BlockArrays.jl#313 is merged, which fixes this issue. Basically, anytime we have more than one FieldVector on the RHS of a broadcast expression, Base.Broadcast.check_broadcast_axes is called, which leads to JuliaArrays/BlockArrays.jl#310.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants