FusedMultiplyAdd not using all available instructions #110109

PavelCibulka · 2024-11-23T18:30:44Z

I've been trying to write code that uses the vfnmadd213ss instruction, but I've not been able to succeed in .NET 9. I'm using Zen 4 cpu (AMD Ryzen 7 7800X3D 8-Core Processor).

This code should do cos2 = 1 - sin * sin
1st variant:

    public static float M1(float a) {
        float x = MathF.FusedMultiplyAdd(a, -a, 1f);
        return x;
    }

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vmovaps  xmm1, xmm0
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vfmadd213ss xmm1, xmm0, dword ptr [reloc @RWD16]
       vmovaps  xmm0, xmm1
 
G_M000_IG03:                ;; offset=0x0019
       ret      
 
RWD00  	dq	8000000080000000h, 8000000080000000h
RWD16  	dq	3F8000003F800000h, 3F8000003F800000h

2nd variant:

    public static float M2(float a) {
        float x = MathF.FusedMultiplyAdd(-a, a, 1f);
        return x;
    }

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vxorps   xmm1, xmm0, xmmword ptr [reloc @RWD00]
       vfmadd213ss xmm1, xmm0, dword ptr [reloc @RWD16]
       vmovaps  xmm0, xmm1
 
G_M000_IG03:                ;; offset=0x0015
       ret      
 
RWD00  	dq	8000000080000000h, 8000000080000000h
RWD16  	dq	3F8000003F800000h, 3F8000003F800000h

3rd variant:

    public static float M3(float a) {
        float x = -MathF.FusedMultiplyAdd(a, a, -1f);
        return x;
    }

G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       vfmadd213ss xmm0, xmm0, dword ptr [reloc @RWD00]
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD16]
 
G_M000_IG03:                ;; offset=0x0011
       ret      
 
RWD00  	dq	BF800000BF800000h, BF800000BF800000h
RWD16  	dq	8000000080000000h, 8000000080000000h

All functions are identical, yet they generate different assembly code. None of them use variant of VFNMADD instruction. I anticipated just this assembly code:

       vfnmadd213ss xmm0, xmm0, dword ptr [reloc @RWD00]

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-11-23T18:31:17Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

huoyaoyuan · 2024-11-23T18:45:49Z

You can access the instruction from System.Runtime.Intrinsics.X86.Fma directly, and use Vector128.CreateScalarUnsafe and ToScalar to operate the fp value as xmm. It would of course be less portable.

This code should do cos2 = 1 - sin * sin

If what you need is to provide sin and cos with lower cost, you can also check Math{F}.SinCos.

PavelCibulka added the tenet-performance Performance related issue label Nov 23, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 23, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 23, 2024

EgorBo added this to the Future milestone Nov 23, 2024

EgorBo removed the untriaged New issue has not been triaged by the area owner label Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusedMultiplyAdd not using all available instructions #110109

FusedMultiplyAdd not using all available instructions #110109

PavelCibulka commented Nov 23, 2024 •

edited

Loading

dotnet-policy-service bot commented Nov 23, 2024

huoyaoyuan commented Nov 23, 2024

FusedMultiplyAdd not using all available instructions #110109

FusedMultiplyAdd not using all available instructions #110109

Comments

PavelCibulka commented Nov 23, 2024 • edited Loading

dotnet-policy-service bot commented Nov 23, 2024

huoyaoyuan commented Nov 23, 2024

PavelCibulka commented Nov 23, 2024 •

edited

Loading