-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyTorch] fused attention and cu_seqlens #1259
Comments
Hey Markus! I think what you wanted to do is in line with the I have a little blurb over here to explain the use cases of
Run:
|
Hey Hey, oh, okay, then my use case was simply not correct 🙈 |
Hi team,
we are currently adapting our training environment to use the fused attention functions. In one of our training setups, we work with batch size one and concaternate multiple documents along the sequence dimension (
sbhd
format). We setcu_seqlens_q
andcu_seqlens_kv
so that these documents cannot attend on each other. This is actually not apadding
use case, because we always fill up the whole sequence and there is no packing and unpacking withpack_tensors()
andunpack_tensors()
required. With the flash attention backend this worked perfectly fine and produces the results that we intended. With the fused attention functions we get device side assertions for this input. Here is a small sample code:Was the use case we have been working with ever intended? Or is there just some assertion missing that forbids to use
cu_seqlens
without setting apadding
mode?The text was updated successfully, but these errors were encountered: