Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pointwise scheduler picks invalid reference TensorView #3512

Open
jjsjann123 opened this issue Dec 2, 2024 · 0 comments · May be fixed by #3513
Open

pointwise scheduler picks invalid reference TensorView #3512

jjsjann123 opened this issue Dec 2, 2024 · 0 comments · May be fixed by #3513

Comments

@jjsjann123
Copy link
Collaborator

🐛 Bug

RuntimeError:  INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/transform_iter.cpp":546, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Error during replay, a transformation was called that conflicts with an rfactor call.
Exception raised from BestEffortReplay at /opt/pytorch/nvfuser/csrc/transform_iter.cpp:546 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x103 (0x7f67e51a55cf in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #1: nvfuser::nvfErrorFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x62 (0x7f67e5592302 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x9bd40c (0x7f67e5a3640c in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #3: nvfuser::BestEffortReplay::replayCasP(nvfuser::TensorView const*, nvfuser::TensorView const*, long, nvfuser::LogicalDomainMap const&, bool, bool, bool) + 0x683 (0x7f67e5a39eb3 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #4: nvfuser::TransformReplay::replayCasP(nvfuser::TensorView const*, nvfuser::TensorView const*, long, nvfuser::LogicalDomainMap const&, nvfuser::TransformReplayOptions) + 0x21a (0x7f67e5a424ea in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #5: nvfuser::TransformReplay::replayCasP(nvfuser::TensorView const*, nvfuser::TensorView const*, long, nvfuser::TransformReplayOptions) + 0x53 (0x7f67e5a44413 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #6: nvfuser::TransformPropagator::propagateP2C(nvfuser::TensorView*, nvfuser::TensorView*) + 0xe3 (0x7f67e5a44523 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #7: nvfuser::MaxInfoSpanningTree::traverse(nvfuser::MaxInfoSpanningTree::Propagator*) + 0xe2 (0x7f67e59c8ac2 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #8: <unknown function> + 0x91c630 (0x7f67e5995630 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #9: <unknown function> + 0x91d9af (0x7f67e59969af in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x882ef7 (0x7f67e58fbef7 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #11: nvfuser::FusionKernelRuntime::compileFusionParallel(nvfuser::KernelArgumentHolder) + 0x3e8 (0x7f67e58fd7b8 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #12: nvfuser::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&, std::optional<nvfuser::PrimDataType>, std::optional<signed char>) + 0x1db (0x7f67e58f6dfb in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #13: nvfuser::python_frontend::FusionDefinition::execute(c10::ArrayRef<c10::IValue> const&, std::optional<signed char>, bool, bool, bool) const + 0x15a (0x7f67e5ab45ca in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)

To Reproduce

import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id38(fd : FusionDefinition) -> None :
    T3 = fd.define_tensor(shape=[1, 2048, 512], contiguity=[None, True, True], dtype=DataType.BFloat16, is_cpu=False, stride_order=[2, 1, 0])
    T33 = fd.ops.reshape(T3, new_shape=[1, 2048, 8, 64])
    T34 = fd.ops.permute(T33, dims=[0, 2, 1, 3])
    T185 = fd.ops.broadcast_in_dim(T34, shape=[1, 8, 1, 2048, 64], broadcast_dims=[0, 1, 3, 4])
    T192 = fd.ops.broadcast_in_dim(T185, shape=[1, 8, 4, 2048, 64], broadcast_dims=[0, 1, 2, 3, 4])
    T198 = fd.ops.reshape(T192, new_shape=[1, 32, 2048, 64])
    fd.add_output(T34)
    fd.add_output(T198)

with FusionDefinition() as fd:
    nvfuser_fusion_id38(fd)

inputs = [
    torch.testing.make_tensor((1, 2048, 512), dtype=torch.bfloat16, device='cuda:0'),
]
fd.execute(inputs)

The issue here is that pointwise scheduler picks T34 as the reference. When it tries to propagate the transformation along T34 -> T185 -> T192 -> T198, we have a new ID broadcast -> expanded -> merged that cannot be propagated.

@jjsjann123 jjsjann123 self-assigned this Dec 2, 2024
@jjsjann123 jjsjann123 changed the title pointwise scheduler picks invalid reference TensorView #1507 pointwise scheduler picks invalid reference TensorView Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants