-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release/2.4] Use deterministic backends of scaled dot product attention for batched testing #1749
base: release/2.4
Are you sure you want to change the base?
Conversation
…comparison of outputs. It is mentioned in scaled_dot_product_attention documentation - Due to the nature of fusing floating point operations, the output of this function may be different depending on what backend kernel is chosen. The c++ implementation supports torch.float64 and can be used when higher precision is required. So, we disable the other two backends - FlashAttention and Memory-Efficient Attention.
Jenkins build for d3e12548e858245c64f30ad0972127f8e78b88d9 commit finished as FAILURE Detected error during Pytorch building:
|
Jenkins build for d3e12548e858245c64f30ad0972127f8e78b88d9 commit finished as FAILURE Detected error during Pytorch building:
|
Jenkins build for d3e12548e858245c64f30ad0972127f8e78b88d9 commit finished as FAILURE Detected error during Pytorch building:
|
Use deterministic version of scaled_dot_product_attention for better comparison of outputs. It is mentioned in scaled_dot_product_attention documentation - Due to the nature of fusing floating point operations, the output of this function may be different depending on what backend kernel is chosen. The c++ implementation supports torch.float64 and can be used when higher precision is required. So, we disable the other two backends - FlashAttention and Memory-Efficient Attention.