Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

Open
parthmannan opened this issue Nov 29, 2024 · 0 comments
Assignees

Comments

@parthmannan
Copy link

Currently, DotProductAttention caches decisions like get_attention_backend for 1 set of attention_params and this helps reduce CPU overhead in the DotProductAttention call. However, when using model architectures with more than 1 shape for Attention (for example, Self and Cross Attention), this caching fails as it resets each time the params change.
This is a feature request to support more than 1 attention_params in the cache. Ideally this number can be configurable as some models tend to have more than 2 shapes as well but maybe 4 can be a safe number to start (if not configurable).

@cyanguwa cyanguwa self-assigned this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants