-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/opcodesextra: fix actions for VNNI and VBMI2 shifts #372
Conversation
In cases without a mask register, these concatenate and variable shift instructions were being defined with the wrong destination register action, which is always inst.RW. This differs from the immediate (non variable) versions of these instructions, which do not overwrite their outputs (absent merge masking).
Are we sure this doesn't apply to VNNI too? The pseudocode in https://www.felixcloutier.com/x86/vpdpbusd at least appears to suggest that merge masking is an option and does read
This could equally well have been my fault. I did a fair amount of reshuffling. |
Good catch, you are correct. In which case the fix is even simpler. I'll push the change momentarily. |
…ation operand action.
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #372 +/- ##
=======================================
Coverage 76.05% 76.05%
=======================================
Files 65 65
Lines 21055 21055
=======================================
Hits 16013 16013
Misses 4959 4959
Partials 83 83
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Okay, PR now reflects your correct observation that VNNI instructions also always read the destination. I do believe that continuing to call the form helper function Leaving it parameterized might be a better signal that one needs to actually pay attention to this detail! |
Sorry I've been trying to figure out how to rebase this PR on the latest changes (mostly to resolve #381), but failing to do so 😓 It seems like it's possible but I think I'd have to mess with your fork. This looks good. Please could you update it to the latest master and we should be good to go. Thanks again! |
Thanks so much for fixing the PR CI failures! \o/ This PR is now caught up with master. While you are here, can we get this one merged too? #233 It's been buried by newer issues, but I've been using it forever without problems and it would be great to get it merged. It's also now all caught up with master. |
The VBMI2 concatenate and variable shift instructions
VPSH{L,R}DV{W,D,Q}
always both read and write the destination register in the "Op 1" position.See:
https://www.felixcloutier.com/x86/vpshldv
https://www.felixcloutier.com/x86/vpshrdv
In cases without mask merging, Avo has defined these instructions with the incorrect destination register action (
inst.W
), which should always instead beinst.RW
. Note: this differs from the immediate (non variable shift) versions of these instructions, which do not (absent merge masking) read their Op 1 registers.This bug manifests as incorrect vector register scheduling when Avo doesn't recognize that these instructions have a data dependency on the destination register in Op1.
The fix is to parameterize the
_yvblendmpd
function defined inopcodesextra
to require specifying the destination register action. This flexibility is needed because this function is shared between the VBMI2 instructions and the VNNI instructions, which do not read their Op1 registers (absent merge masking).I believe the confusion here was originally my fault in the original PR including VBMI2 support, for not recognizing that the Golang AVX optab definitions are actually silent on whether destination register values are read in addition to being written (because the assembler doesn't need to care).