Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

Ruihan-Yin
Copy link
Contributor

Overview

This PR is the follow-up PR after #104637, which added the initial CPUID and XSAVE updates for APX.

This PR adds REX2 encoding functionality for legacy instructions which enables the use of EGPR for add, sub, etc. Note that this PR focuses on REX2 encoding only: a follow up PR will enable EGPR support via the register allocator.

Specification

REX2 is a 2-byte prefix with a leading byte of 0xD5, detailed format below:
rex2

Similar to REX prefix, it provides the extended bits for the MODRM.REG field, REX2.R4/R3, and MODRM.R/M field, REX2.B4/B3, and the index register in SIB byte, REX2.X4/X3, those bits will act as the higher 5th/4th bits and combine with the field in MODRM and SIB byte as a 5-bit binary to access up to 32 registers.

REX2 prefix is generally available for legacy-map-0 and legacy-map-1 instructions, say 1-byte opcode or 2-byte opcode with escape byte 0x0F, with some exceptions.

Like VEX/EVEX, REX2 is considered as the last prefix before the main opcode, so it can not co-exist with REX/VEX/EVEX.

Design

The bulk of the changes occur in the backend emitter.

As there is no existing hardware that has APX support yet, we had some hacks to bypass the CPUID checks. In this PR, DOTNET_JitStressRex2Encoding will force all the eligible instructions to be encoded in REX2, regardless the presence of EGPRs in the operand. We had another switch DOTNET_JitBypassAPXCheck, with which will only bypass the APX CPUID check but JIT will encode REX2 only if needed, this is more useful when the LSRA changes come.

Note: REX2 can be used to address the lower 16 vector registers (XMM0~XMM15). But in this PR, we are not planning to add the support for this part now for simplicity, and the EGPRs functionality for SIMD instructions can be achieved with EVEX, we are open to discuss this part and tweak the design in the follow-up PRs.

Testing

We followed a multi-step testing plan to verify the encoding correctness and the semantic correctness.

Testing results will be presented below.

1. Emitter unit tests

In codgenxarch.cpp, similar to genAmd64EmitterUnitTestsSse2, we used the JitLateDisasm feature to insert instructions to encode as unit tests for emitter, and LateDisasm will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.

Note that we are using a custom coredistools.dll which uses a recent LLVM that supports APX decoding.

2. SuperPMI

In this step, we would run the SuperPMI pipeline to get the asmdiffs with REX2 on and off, the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke coredistools.dll as well, so we can verify the encoding correctness in a larger scope.

To ensure the new changes will not hit the existing code path in terms of throughput, we ran tpdiff with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the REX2 changes.

3. JIT unit tests

The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. Then the last will examine the semantic correctness of the generated code, say since we are simply forcing all the compatible instructions to be encoded in REX2, so the original semantics should not change, so we expect exactly the same output with REX2 on/off.

We used the existing CoreCLR unit test set: JIT and run it in the Intel SDE emulator.

Follow-up plans

This PR is only intended to provide the REX2 encoding functionality to the JIT backend, in terms of how to properly use it, we are preparing another PR that includes the updates on LSRA such that JIT will be able to allocate EGPRs only when needed, and generate optimal code.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 16, 2024
@Ruihan-Yin
Copy link
Contributor Author

Testing results

1. Emitter unit tests

1-1 1-2 1-3 1-4

2. SuperPMI

2.1 AsmDiffs - REX2 off (No diffs expected)

Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Overall (-100 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch 409,086,766 -82 -1.35%
libraries.pmi.windows.x64.checked.mch 63,022,393 +3 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 5,023,568 -21 0.00%
MinOpts (-40 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch 287,081,075 -40 -2.31%
FullOpts (-60 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.windows.x64.checked.mch 122,005,691 -42 0.00%
libraries.pmi.windows.x64.checked.mch 62,909,432 +3 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 5,022,597 -21 0.00%
Example diffs
coreclr_tests.run.windows.x64.checked.mch
-3 (-50.00%) : 509186.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-3 (-50.00%) : 579101.dasm - Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M19947_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M19947_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M19947_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-32 (-35.16%) : 510272.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
@@ -12,8 +12,8 @@
 ;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <<unknown class>>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     int  ->  zero-ref   
-;  V04 tmp2         [V04,T01] (  2,  0   )     ref  ->  rdx         class-hnd single-def "impSpillSpecialSideEff" <<unknown class>>
-;  V05 tmp3         [V05,T02] (  2,  0   )     int  ->  [rbp-0x04]  do-not-enreg[M] EH-live
+;* V04 tmp2         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff" <<unknown class>>
+;* V05 tmp3         [V05    ] (  0,  0   )     int  ->  zero-ref   
 ;  V06 PSPSym       [V06,T00] (  1,  1   )    long  ->  [rbp-0x10]  do-not-enreg[V] "PSPSym"
 ;
 ; Lcl frame size = 48
@@ -30,41 +30,30 @@ G_M30609_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 G_M30609_IG03:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, dword ptr [rbp-0x04]
-						;; size=3 bbWeight=0 PerfScore 0.00
+       xor      eax, eax
+						;; size=2 bbWeight=0 PerfScore 0.00
 G_M30609_IG04:        ; bbWeight=0, epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
-G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref, funclet prolog, nogc
-       ; gcrRegs +[rdx]
+G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, funclet prolog, nogc
        push     rbp
        sub      rsp, 48
        mov      rbp, qword ptr [rcx+0x20]
        mov      qword ptr [rsp+0x20], rbp
        lea      rbp, [rbp+0x30]
 						;; size=18 bbWeight=0 PerfScore 0.00
-G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
-       mov      rcx, 0xD1FFAB1E      ; <unknown class>
-       call     CORINFO_HELP_ISINSTANCEOFCLASS
-       ; gcrRegs -[rdx] +[rax]
-       ; gcr arg pop 0
-       xor      ecx, ecx
-       mov      edx, 100
-       test     rax, rax
-       cmovne   ecx, edx
-       mov      dword ptr [rbp-0x04], ecx
+G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        lea      rax, G_M30609_IG03
-       ; gcrRegs -[rax]
-						;; size=38 bbWeight=0 PerfScore 0.00
+						;; size=7 bbWeight=0 PerfScore 0.00
 G_M30609_IG07:        ; bbWeight=0, funclet epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 91, prolog size 14, PerfScore 0.00, instruction count 26, allocated bytes for code 91 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
+; Total bytes of code 59, prolog size 14, PerfScore 0.00, instruction count 19, allocated bytes for code 59 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:
-1 (-0.04%) : 510371.dasm - IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)
@@ -915,13 +915,13 @@ G_M58490_IG59:        ; bbWeight=0.50, gcrefRegs=0041 {rax rsi}, byrefRegs=0000
        ; gcrRegs +[rcx]
        call     [System.Convert:ToBoolean(System.Object):ubyte]
        ; gcrRegs -[rax rcx]
-       cmp      eax, 1
+       test     eax, eax
        jne      G_M58490_IG66
        xor      rcx, rcx
        ; gcrRegs +[rcx]
        mov      gword ptr [rsp+0x20], rcx
        mov      dword ptr [rsp+0x28], 3
-						;; size=59 bbWeight=0.50 PerfScore 7.88
+						;; size=58 bbWeight=0.50 PerfScore 7.88
 G_M58490_IG60:        ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rcx]
        mov      gword ptr [rsp+0x30], rcx
@@ -1041,7 +1041,7 @@ G_M58490_IG70:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 2296, prolog size 27, PerfScore 264.21, instruction count 525, allocated bytes for code 2296 (MethodHash=a9531b85) for method IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)
+; Total bytes of code 2295, prolog size 27, PerfScore 264.21, instruction count 525, allocated bytes for code 2295 (MethodHash=a9531b85) for method IntelHardwareIntrinsicTest.General.Program:IsSupported() (FullOpts)
 ; ============================================================
 
 Unwind Info:
-3 (-0.03%) : 579023.dasm - Runtime_34587:TestEntryPoint():int (FullOpts)
@@ -2680,7 +2680,7 @@ G_M52152_IG121:        ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rs
 G_M52152_IG122:        ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
        ; byrRegs -[rsi]
        lea      rcx, [rsp+0x50]
-       mov      edx, 1
+       xor      edx, edx
        call     [<unknown method>]
        ; gcr arg pop 0
        lea      rcx, [rsp+0x50]
@@ -2703,7 +2703,7 @@ G_M52152_IG122:        ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {},
        mov      gword ptr [rsp+0x58], rax
        test     rax, rax
        je       G_M52152_IG205
-						;; size=71 bbWeight=1.00 PerfScore 17.50
+						;; size=68 bbWeight=1.00 PerfScore 17.50
 G_M52152_IG123:        ; bbWeight=0.50, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref
        lea      rsi, bword ptr [rax+0x10]
        ; byrRegs +[rsi]
@@ -4142,7 +4142,7 @@ G_M52152_IG207:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 9099, prolog size 37, PerfScore 1583.63, instruction count 1959, allocated bytes for code 9099 (MethodHash=30563447) for method Runtime_34587:TestEntryPoint():int (FullOpts)
+; Total bytes of code 9096, prolog size 37, PerfScore 1583.63, instruction count 1959, allocated bytes for code 9096 (MethodHash=30563447) for method Runtime_34587:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+1.99%) : 205245.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)
@@ -39,14 +39,13 @@ G_M30609_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=5 bbWeight=0.50 PerfScore 0.50
 G_M30609_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        nop      
-       xor      eax, eax
-       mov      dword ptr [rbp-0x18], eax
+       mov      dword ptr [rbp-0x18], 1
        cmp      dword ptr [rbp-0x18], 0
        jne      SHORT G_M30609_IG05
        xor      eax, eax
        mov      dword ptr [rbp-0x1C], eax
        jmp      SHORT G_M30609_IG06
-						;; size=19 bbWeight=1 PerfScore 7.75
+						;; size=21 bbWeight=1 PerfScore 7.50
 G_M30609_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      dword ptr [rbp-0x1C], 100
 						;; size=7 bbWeight=1 PerfScore 1.00
@@ -87,14 +86,12 @@ G_M30609_IG11:        ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0004 {
        ; gcrRegs +[rax]
        mov      gword ptr [rbp-0x10], rax
        nop      
-       xor      eax, eax
-       ; gcrRegs -[rax]
-       mov      dword ptr [rbp-0x2C], eax
+       mov      dword ptr [rbp-0x2C], 1
        cmp      dword ptr [rbp-0x2C], 0
        jne      SHORT G_M30609_IG13
-						;; size=24 bbWeight=1 PerfScore 7.50
+						;; size=26 bbWeight=1 PerfScore 7.25
 G_M30609_IG12:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-       ; gcrRegs -[rdx]
+       ; gcrRegs -[rax rdx]
        mov      rdx, gword ptr [rbp-0x10]
        ; gcrRegs +[rdx]
        mov      rcx, 0xD1FFAB1E      ; <unknown class>
@@ -124,7 +121,7 @@ G_M30609_IG15:        ; bbWeight=0, funclet epilog, nogc, extend
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 201, prolog size 28, PerfScore 43.58, instruction count 63, allocated bytes for code 201 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)
+; Total bytes of code 205, prolog size 28, PerfScore 43.08, instruction count 61, allocated bytes for code 205 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (MinOpts)
 ; ============================================================
 
 Unwind Info:
libraries.pmi.windows.x64.checked.mch
-3 (-50.00%) : 29280.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+100.00%) : 29884.dasm - System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M34763_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M34763_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M34763_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+100.00%) : 29207.dasm - System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M31227_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M31227_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M31227_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
smoke_tests.nativeaot.windows.x64.checked.mch
-3 (-50.00%) : 14296.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-3 (-50.00%) : 21532.dasm - Program:AesX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M55817_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M55817_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M55817_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-3 (-50.00%) : 19229.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=2 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+100.00%) : 19199.dasm - Program:AvxVnniX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M60430_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M60430_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M60430_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+100.00%) : 21515.dasm - Program:FmaX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M2260_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M2260_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M2260_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+100.00%) : 21526.dasm - Program:Avx2X64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13187_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13187_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=5 bbWeight=1 PerfScore 0.25
 G_M13187_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
+; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
aspnet.run.windows.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run.windows.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_pgo.windows.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_tiered.windows.x64.checked.mch 0 0 0 0 -0 +0
coreclr_tests.run.windows.x64.checked.mch 12 11 1 0 -86 +4
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x64.checked.mch 3 1 2 0 -3 +6
libraries_tests.run.windows.x64.Release.mch 0 0 0 0 -0 +0
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 0 0 0 0 -0 +0
realworld.run.windows.x64.checked.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.windows.x64.checked.mch 17 12 5 0 -36 +15
32 24 8 0 -125 +25

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
aspnet.run.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
benchmarks.run.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
benchmarks.run_pgo.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
benchmarks.run_tiered.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
coreclr_tests.run.windows.x64.checked.mch 12 2 1 9 -7.87% +0.03% 0.0000%
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
libraries.pmi.windows.x64.checked.mch 3 0 0 3 0.00% 0.00% 0.0000%
libraries_tests.run.windows.x64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
realworld.run.windows.x64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.windows.x64.checked.mch 17 0 0 17 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
aspnet.run.windows.x64.checked.mch 141,224 77,324 63,900 0 (0.00%) 0 (0.00%)
benchmarks.run.windows.x64.checked.mch 38,352 6 38,346 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch 120,280 68,103 52,177 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.windows.x64.checked.mch 76,876 56,358 20,518 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch 642,813 393,776 249,037 0 (0.00%) 5 (0.00%)
libraries.crossgen2.windows.x64.checked.mch 276,889 15 276,874 0 (0.00%) 2 (0.00%)
libraries.pmi.windows.x64.checked.mch 316,010 6 316,004 0 (0.00%) 1 (0.00%)
libraries_tests.run.windows.x64.Release.mch 814,679 567,674 247,005 0 (0.00%) 0 (0.00%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 343,895 21,994 321,901 0 (0.00%) 0 (0.00%)
realworld.run.windows.x64.checked.mch 28,368 3 28,365 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 31,202 10 31,192 0 (0.00%) 3 (0.01%)
2,830,588 1,185,269 1,645,319 0 (0.00%) 11 (0.00%)

jit-analyze output

Comment:
SuperPMI pipeline with REX2 off:
Theoretically, it should be clean compared with the base corerun, the diff found here is because of the changes in the ISA definition, as it can be noticed, all the diffs are either from xor eax, eax to mov eax, 1, or in the reverse way, this is essentially indicating runtime is reporting discrepant ISA availability, and this is expected to be resolved when the public CPUID PR gets merged.

2.2 AsmDiffs - REX2 on

SuperPMI pipeline:

Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Diff JIT options: JitStressRex2Encoding=1

Overall (+243,564,575 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 49,406,065 +10,392,179 0.00%
benchmarks.run.windows.x64.checked.mch 12,230,572 +3,013,399 0.00%
benchmarks.run_pgo.windows.x64.checked.mch 40,192,955 +8,962,474 0.00%
benchmarks.run_tiered.windows.x64.checked.mch 17,606,620 +4,199,746 0.00%
coreclr_tests.run.windows.x64.checked.mch 409,086,766 +84,314,011 -0.00%
libraries.crossgen2.windows.x64.checked.mch 45,250,222 +11,739,139 0.00%
libraries.pmi.windows.x64.checked.mch 63,022,393 +15,125,725 0.00%
libraries_tests.run.windows.x64.Release.mch 336,307,360 +69,356,394 0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 147,986,092 +32,521,941 0.00%
realworld.run.windows.x64.checked.mch 11,552,911 +2,545,126 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 5,023,568 +1,394,441 0.00%
MinOpts (+113,554,608 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 23,379,337 +4,569,510 0.00%
benchmarks.run.windows.x64.checked.mch 588 +163 0.00%
benchmarks.run_pgo.windows.x64.checked.mch 18,796,230 +4,022,239 0.00%
benchmarks.run_tiered.windows.x64.checked.mch 13,707,415 +3,160,967 0.00%
coreclr_tests.run.windows.x64.checked.mch 287,081,075 +59,383,662 -0.00%
libraries.crossgen2.windows.x64.checked.mch 1,705 +442 0.00%
libraries.pmi.windows.x64.checked.mch 112,961 +15,358 0.00%
libraries_tests.run.windows.x64.Release.mch 203,705,533 +39,847,511 0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 10,696,900 +2,483,962 0.00%
realworld.run.windows.x64.checked.mch 412,968 +70,590 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 971 +204 0.00%
FullOpts (+130,009,967 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 26,026,728 +5,822,669 0.00%
benchmarks.run.windows.x64.checked.mch 12,229,984 +3,013,236 0.00%
benchmarks.run_pgo.windows.x64.checked.mch 21,396,725 +4,940,235 0.00%
benchmarks.run_tiered.windows.x64.checked.mch 3,899,205 +1,038,779 0.00%
coreclr_tests.run.windows.x64.checked.mch 122,005,691 +24,930,349 0.00%
libraries.crossgen2.windows.x64.checked.mch 45,248,517 +11,738,697 0.00%
libraries.pmi.windows.x64.checked.mch 62,909,432 +15,110,367 0.00%
libraries_tests.run.windows.x64.Release.mch 132,601,827 +29,508,883 0.00%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,289,192 +30,037,979 0.00%
realworld.run.windows.x64.checked.mch 11,139,943 +2,474,536 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 5,022,597 +1,394,237 0.00%
Example diffs
aspnet.run.windows.x64.checked.mch
+2 (+0.97%) : 32079.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
@@ -29,7 +29,7 @@ G_M40993_IG01:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.25 PerfScore 3.69
+						;; size=61 bbWeight=0.25 PerfScore 3.69
 G_M40993_IG02:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -72,13 +72,13 @@ G_M40993_IG08:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1 PerfScore 29.25
+						;; size=54 bbWeight=1 PerfScore 29.25
 RWD00  	dq	408F400000000000h	;         1000
 RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 120.27, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
+; Total bytes of code 209, prolog size 53, PerfScore 120.27, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -86,24 +86,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098
+2 (+0.97%) : 66029.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
@@ -30,7 +30,7 @@ G_M40993_IG01:        ; bbWeight=0.98, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.98 PerfScore 14.39
+						;; size=61 bbWeight=0.98 PerfScore 14.39
 G_M40993_IG02:        ; bbWeight=0.98, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -69,7 +69,7 @@ G_M40993_IG07:        ; bbWeight=1.00, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1.00 PerfScore 29.25
+						;; size=54 bbWeight=1.00 PerfScore 29.25
 G_M40993_IG08:        ; bbWeight=15.73, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
        vmovaps  xmm8, xmm12
        jmp      SHORT G_M40993_IG05
@@ -79,7 +79,7 @@ RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 840.20, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
+; Total bytes of code 209, prolog size 53, PerfScore 840.20, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -87,24 +87,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098
+2 (+0.97%) : 60922.dasm - Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)
@@ -29,7 +29,7 @@ G_M40993_IG01:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        vmovaps  xmmword ptr [rsp+0x20], xmm12
        vmovaps  xmm6, xmm0
        vmovaps  xmm7, xmm1
-						;; size=60 bbWeight=0.25 PerfScore 3.69
+						;; size=61 bbWeight=0.25 PerfScore 3.69
 G_M40993_IG02:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vxorps   xmm8, xmm8, xmm8
        vmovsd   xmm9, qword ptr [reloc @RWD00]
@@ -72,13 +72,13 @@ G_M40993_IG08:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm12, xmmword ptr [rsp+0x20]
        add      rsp, 152
        ret      
-						;; size=53 bbWeight=1 PerfScore 29.25
+						;; size=54 bbWeight=1 PerfScore 29.25
 RWD00  	dq	408F400000000000h	;         1000
 RWD08  	dq	3FE0000000000000h	;          0.5
 RWD16  	dq	3E112E0BE826D695h	;        1e-09
 
 
-; Total bytes of code 207, prolog size 52, PerfScore 120.27, instruction count 38, allocated bytes for code 207 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)
+; Total bytes of code 209, prolog size 53, PerfScore 120.27, instruction count 38, allocated bytes for code 209 (MethodHash=c7795fde) for method Perfolizer.Mathematics.Distributions.StudentDistribution:InverseTwoTailStudent(double,double):double (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -86,24 +86,24 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x34
+  SizeOfProlog      : 0x35
   CountOfUnwindCodes: 16
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x34 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
+    CodeOffset: 0x35 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM12 (12)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x2E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
+    CodeOffset: 0x2F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM11 (11)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x28 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x29 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x23 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 8 * 16 = 128 = 0x00080
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 19 * 8 = 152 = 0x00098
+7 (+87.50%) : 23755.dasm - System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M10906_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M10906_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M10906_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=da05d565) for method System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=da05d565) for method System.Runtime.Intrinsics.X86.X86Serialize:get_IsSupported():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 9608.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M31768_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M31768_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M31768_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=2e9583e7) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=2e9583e7) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[System.Text.Json.JsonDocument+DbRow]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 861.dasm - System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M16088_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M16088_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M16088_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=da12c127) for method System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=da12c127) for method System.Buffers.IndexOfAnyAsciiSearcher+ContainsAnyResultMapper`1[short]:get_NotFound():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
benchmarks.run.windows.x64.checked.mch
+1 (+0.14%) : 30038.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.20%) : 21351.dasm - System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)
@@ -140,7 +140,7 @@ G_M15800_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  xmmword ptr [rcx], xmm4
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=7 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M15800_IG08:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -148,7 +148,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	3F000000h		;       0.5
 
 
-; Total bytes of code 494, prolog size 0, PerfScore 173.42, instruction count 93, allocated bytes for code 494 (MethodHash=096ec247) for method System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)
+; Total bytes of code 495, prolog size 0, PerfScore 173.42, instruction count 93, allocated bytes for code 495 (MethodHash=096ec247) for method System.Numerics.Quaternion:CreateFromRotationMatrix(System.Numerics.Matrix4x4):System.Numerics.Quaternion (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.23%) : 33197.dasm - System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
@@ -140,7 +140,7 @@ G_M41960_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  zmmword ptr [rcx], zmm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=9 bbWeight=1 PerfScore 2.25
+						;; size=10 bbWeight=1 PerfScore 2.25
 G_M41960_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -163,7 +163,7 @@ RWD588 	dd	00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h
 RWD640 	dq	7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h
 
 
-; Total bytes of code 438, prolog size 0, PerfScore 166.17, instruction count 66, allocated bytes for code 441 (MethodHash=326b5c17) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 439, prolog size 0, PerfScore 166.17, instruction count 66, allocated bytes for code 442 (MethodHash=326b5c17) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[uint],System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+80.00%) : 32809.dasm - System.Linq.Tests.Perf_Enumerable+<>c:b__25_1(int):int:this (FullOpts)
@@ -18,12 +18,12 @@ G_M1177_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M1177_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        neg      eax
-						;; size=4 bbWeight=1 PerfScore 0.50
+						;; size=8 bbWeight=1 PerfScore 0.50
 G_M1177_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 5 (MethodHash=bea0fb66) for method System.Linq.Tests.Perf_Enumerable+<>c:<OrderByThenBy>b__25_1(int):int:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 9 (MethodHash=bea0fb66) for method System.Linq.Tests.Perf_Enumerable+<>c:<OrderByThenBy>b__25_1(int):int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+80.00%) : 38257.dasm - Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)
@@ -98,12 +98,12 @@ G_M24192_IG02:        ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rcx]
        cmp      byte  ptr [rcx], cl
        xor      eax, eax
-						;; size=4 bbWeight=1 PerfScore 3.25
+						;; size=8 bbWeight=1 PerfScore 3.25
 G_M24192_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 4.25, instruction count 3, allocated bytes for code 5 (MethodHash=7f17a17f) for method Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 4.25, instruction count 3, allocated bytes for code 9 (MethodHash=7f17a17f) for method Microsoft.Extensions.Primitives.StringSegmentBenchmark:Equals_Object_Invalid():ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+80.00%) : 22418.dasm - System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)
@@ -19,13 +19,13 @@ G_M47878_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        ; byrRegs +[rdx]
        xor      eax, eax
        mov      dword ptr [rdx], eax
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=8 bbWeight=1 PerfScore 1.25
 G_M47878_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
        ; byrRegs -[rdx]
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 5 (MethodHash=068744f9) for method System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 9 (MethodHash=068744f9) for method System.Xml.XmlDictionaryReader:TryGetArrayLength(byref):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
benchmarks.run_pgo.windows.x64.checked.mch
+19 (+0.74%) : 95347.dasm - System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
@@ -18,7 +18,7 @@ G_M9309_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        mov      bword ptr [rbp+0x10], rcx
        mov      bword ptr [rbp+0x18], rdx
        mov      dword ptr [rbp+0x20], r8d
-						;; size=16 bbWeight=1 PerfScore 4.25
+						;; size=22 bbWeight=1 PerfScore 4.25
 G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      rax, bword ptr [rbp+0x18]
        ; byrRegs +[rax]
@@ -31,7 +31,7 @@ G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        lea      rcx, G_M9309_IG02
        add      rdx, rcx
        jmp      rdx
-						;; size=36 bbWeight=1 PerfScore 12.00
+						;; size=45 bbWeight=1 PerfScore 12.00
 G_M9309_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vpsrldq  xmm0, xmm0, 0
        jmp      G_M9309_IG259
@@ -1061,11 +1061,11 @@ G_M9309_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        ; byrRegs +[rax]
        vmovups  xmmword ptr [rax], xmm0
        mov      rax, bword ptr [rbp+0x10]
-						;; size=12 bbWeight=1 PerfScore 4.00
+						;; size=14 bbWeight=1 PerfScore 4.00
 G_M9309_IG260:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG04 - G_M9309_IG02
        	dd	G_M9309_IG05 - G_M9309_IG02
@@ -1324,7 +1324,7 @@ RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG258 - G_M9309_IG02
 
 
-; Total bytes of code 2572, prolog size 4, PerfScore 789.75, instruction count 531, allocated bytes for code 2572 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
+; Total bytes of code 2591, prolog size 7, PerfScore 789.75, instruction count 531, allocated bytes for code 2591 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -1332,9 +1332,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+1 (+0.85%) : 74356.dasm - Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
@@ -75,7 +75,7 @@ G_M56301_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        vextractps dword ptr [rdx+0x08], xmm0, 2
        mov      rax, rdx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M56301_IG05:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -88,7 +88,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 118, prolog size 0, PerfScore 86.58, instruction count 26, allocated bytes for code 118 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
+; Total bytes of code 119, prolog size 0, PerfScore 86.58, instruction count 26, allocated bytes for code 119 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
 ; ============================================================
 
 Unwind Info:
+1 (+0.99%) : 74345.dasm - Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)
@@ -60,7 +60,7 @@ G_M16924_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vextractps dword ptr [rcx+0x08], xmm0, 2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M16924_IG05:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -73,7 +73,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 101, prolog size 0, PerfScore 76.58, instruction count 23, allocated bytes for code 101 (MethodHash=5eedbde3) for method Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)
+; Total bytes of code 102, prolog size 0, PerfScore 76.58, instruction count 23, allocated bytes for code 102 (MethodHash=5eedbde3) for method Benchmarks.SIMD.RayTracer.Vector:Norm(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector (Tier1)
 ; ============================================================
 
 Unwind Info:
+7 (+87.50%) : 1151.dasm - System.OperatingSystem:IsBrowser():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M61665_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M61665_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M61665_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=f0b70f1e) for method System.OperatingSystem:IsBrowser():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=f0b70f1e) for method System.OperatingSystem:IsBrowser():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 76175.dasm - Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)
@@ -12,16 +12,16 @@
 G_M47256_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M47256_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M47256_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=8e824767) for method Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=8e824767) for method Microsoft.CodeAnalysis.Collections.Internal.RoslynUnsafe:NullRef[int]():byref (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 27756.dasm - System.SByte:System.Numerics.INumberBase.get_Zero():byte (Tier0)
@@ -12,16 +12,16 @@
 G_M14356_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M14356_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M14356_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=1d31c7eb) for method System.SByte:System.Numerics.INumberBase<System.SByte>.get_Zero():byte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=1d31c7eb) for method System.SByte:System.Numerics.INumberBase<System.SByte>.get_Zero():byte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
benchmarks.run_tiered.windows.x64.checked.mch
+1 (+0.26%) : 61218.dasm - System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
@@ -139,7 +139,7 @@ G_M61896_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        vmovups  ymmword ptr [rcx], ymm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=7 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M61896_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -161,7 +161,7 @@ RWD352 	dq	42B1721842B17218h, 42B1721842B17218h, 42B1721842B17218h, 42B1721842B1
 RWD384 	dq	7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h, 7F8000007F800000h
 
 
-; Total bytes of code 378, prolog size 0, PerfScore 188.17, instruction count 65, allocated bytes for code 380 (MethodHash=a8e80e37) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 379, prolog size 0, PerfScore 188.17, instruction count 65, allocated bytes for code 381 (MethodHash=a8e80e37) for method System.Runtime.Intrinsics.VectorMath:ExpSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[ulong]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:
+19 (+0.74%) : 59614.dasm - System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
@@ -18,7 +18,7 @@ G_M9309_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        mov      bword ptr [rbp+0x10], rcx
        mov      bword ptr [rbp+0x18], rdx
        mov      dword ptr [rbp+0x20], r8d
-						;; size=16 bbWeight=1 PerfScore 4.25
+						;; size=22 bbWeight=1 PerfScore 4.25
 G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      rax, bword ptr [rbp+0x18]
        ; byrRegs +[rax]
@@ -31,7 +31,7 @@ G_M9309_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        lea      rcx, G_M9309_IG02
        add      rdx, rcx
        jmp      rdx
-						;; size=36 bbWeight=1 PerfScore 12.00
+						;; size=45 bbWeight=1 PerfScore 12.00
 G_M9309_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vpsrldq  xmm0, xmm0, 0
        jmp      G_M9309_IG259
@@ -1061,11 +1061,11 @@ G_M9309_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        ; byrRegs +[rax]
        vmovups  xmmword ptr [rax], xmm0
        mov      rax, bword ptr [rbp+0x10]
-						;; size=12 bbWeight=1 PerfScore 4.00
+						;; size=14 bbWeight=1 PerfScore 4.00
 G_M9309_IG260:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG04 - G_M9309_IG02
        	dd	G_M9309_IG05 - G_M9309_IG02
@@ -1324,7 +1324,7 @@ RWD00  	dd	G_M9309_IG03 - G_M9309_IG02
        	dd	G_M9309_IG258 - G_M9309_IG02
 
 
-; Total bytes of code 2572, prolog size 4, PerfScore 789.75, instruction count 531, allocated bytes for code 2572 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
+; Total bytes of code 2591, prolog size 7, PerfScore 789.75, instruction count 531, allocated bytes for code 2591 (MethodHash=c6eadba2) for method System.Runtime.Intrinsics.X86.Sse2:ShiftRightLogical128BitLane(System.Runtime.Intrinsics.Vector128`1[ulong],ubyte):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -1332,9 +1332,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+1 (+0.85%) : 56215.dasm - Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
@@ -77,7 +77,7 @@ G_M56301_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        vextractps dword ptr [rdx+0x08], xmm0, 2
        mov      rax, rdx
        ; byrRegs +[rax]
-						;; size=31 bbWeight=1 PerfScore 18.25
+						;; size=32 bbWeight=1 PerfScore 18.25
 G_M56301_IG06:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
@@ -85,7 +85,7 @@ RWD00  	dd	3F800000h		;         1
 RWD04  	dd	7F800000h		;       inf
 
 
-; Total bytes of code 118, prolog size 0, PerfScore 82.58, instruction count 26, allocated bytes for code 118 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
+; Total bytes of code 119, prolog size 0, PerfScore 82.58, instruction count 26, allocated bytes for code 119 (MethodHash=3e632412) for method Benchmarks.SIMD.RayTracer.Sphere:Normal(Benchmarks.SIMD.RayTracer.Vector):Benchmarks.SIMD.RayTracer.Vector:this (Tier1)
 ; ============================================================
 
 Unwind Info:
+7 (+87.50%) : 12704.dasm - System.UInt16:System.Numerics.INumberBase.get_Zero():ushort (Tier0)
@@ -12,16 +12,16 @@
 G_M3961_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M3961_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M3961_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=985cf086) for method System.UInt16:System.Numerics.INumberBase<System.UInt16>.get_Zero():ushort (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=985cf086) for method System.UInt16:System.Numerics.INumberBase<System.UInt16>.get_Zero():ushort (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 73294.dasm - System.Byte:System.Numerics.INumberBase.get_Zero():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M54785_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M54785_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M54785_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=e45a29fe) for method System.Byte:System.Numerics.INumberBase<System.Byte>.get_Zero():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=e45a29fe) for method System.Byte:System.Numerics.INumberBase<System.Byte>.get_Zero():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 15012.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M969_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M969_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M969_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=f362fc36) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=f362fc36) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Reflection.Emit.OpCode]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
coreclr_tests.run.windows.x64.checked.mch
-1 (-16.67%) : 509186.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-1 (-16.67%) : 579101.dasm - Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M19947_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M19947_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M19947_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=801bb214) for method Runtime_34587:get_PopcntX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-12 (-13.19%) : 510272.dasm - IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
@@ -12,8 +12,8 @@
 ;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <<unknown class>>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V03 tmp1         [V03    ] (  0,  0   )     int  ->  zero-ref   
-;  V04 tmp2         [V04,T01] (  2,  0   )     ref  ->  rdx         class-hnd single-def "impSpillSpecialSideEff" <<unknown class>>
-;  V05 tmp3         [V05,T02] (  2,  0   )     int  ->  [rbp-0x04]  do-not-enreg[M] EH-live
+;* V04 tmp2         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff" <<unknown class>>
+;* V05 tmp3         [V05    ] (  0,  0   )     int  ->  zero-ref   
 ;  V06 PSPSym       [V06,T00] (  1,  1   )    long  ->  [rbp-0x10]  do-not-enreg[V] "PSPSym"
 ;
 ; Lcl frame size = 48
@@ -23,48 +23,37 @@ G_M30609_IG01:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
        sub      rsp, 48
        lea      rbp, [rsp+0x30]
        mov      qword ptr [rbp-0x10], rsp
-						;; size=14 bbWeight=0 PerfScore 0.00
+						;; size=19 bbWeight=0 PerfScore 0.00
 G_M30609_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        call     CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED
        ; gcr arg pop 0
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 G_M30609_IG03:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, dword ptr [rbp-0x04]
-						;; size=3 bbWeight=0 PerfScore 0.00
+       xor      eax, eax
+						;; size=4 bbWeight=0 PerfScore 0.00
 G_M30609_IG04:        ; bbWeight=0, epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
-						;; size=6 bbWeight=0 PerfScore 0.00
-G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref, funclet prolog, nogc
-       ; gcrRegs +[rdx]
+						;; size=9 bbWeight=0 PerfScore 0.00
+G_M30609_IG05:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, funclet prolog, nogc
        push     rbp
        sub      rsp, 48
        mov      rbp, qword ptr [rcx+0x20]
        mov      qword ptr [rsp+0x20], rbp
        lea      rbp, [rbp+0x30]
-						;; size=18 bbWeight=0 PerfScore 0.00
-G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref
-       mov      rcx, 0xD1FFAB1E      ; <unknown class>
-       call     CORINFO_HELP_ISINSTANCEOFCLASS
-       ; gcrRegs -[rdx] +[rax]
-       ; gcr arg pop 0
-       xor      ecx, ecx
-       mov      edx, 100
-       test     rax, rax
-       cmovne   ecx, edx
-       mov      dword ptr [rbp-0x04], ecx
+						;; size=24 bbWeight=0 PerfScore 0.00
+G_M30609_IG06:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        lea      rax, G_M30609_IG03
-       ; gcrRegs -[rax]
-						;; size=38 bbWeight=0 PerfScore 0.00
+						;; size=8 bbWeight=0 PerfScore 0.00
 G_M30609_IG07:        ; bbWeight=0, funclet epilog, nogc, extend
        add      rsp, 48
        pop      rbp
        ret      
-						;; size=6 bbWeight=0 PerfScore 0.00
+						;; size=9 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 91, prolog size 14, PerfScore 0.00, instruction count 26, allocated bytes for code 91 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
+; Total bytes of code 79, prolog size 19, PerfScore 0.00, instruction count 19, allocated bytes for code 79 (MethodHash=3926886e) for method IntelHardwareIntrinsicTest.Program:TestEntryPoint():int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -72,25 +61,25 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x05
+  SizeOfProlog      : 0x08
   CountOfUnwindCodes: 2
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
 Unwind Info:
   >> Start offset   : 0xd1ffab1e (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x05
+  SizeOfProlog      : 0x08
   CountOfUnwindCodes: 2
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
 *************** EH table for IntelHardwareIntrinsicTest.Program:TestEntryPoint():int
 1 EH table entries, 0 duplicate clauses, 0 cloned finallys, 1 total EH entries reported to VM
 EH#0: try [G_M30609_IG02..G_M30609_IG03) handled by [G_M30609_IG05..END) (class: 100000B)
+7 (+87.50%) : 149326.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M50835_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M50835_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M50835_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=94b8396c) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=94b8396c) for method System.Runtime.CompilerServices.RuntimeHelpers:IsReferenceOrContainsReferences[double]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 270935.dasm - Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)
@@ -12,16 +12,16 @@
 G_M63398_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M63398_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M63398_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=e4d00859) for method Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=e4d00859) for method Benchstone.BenchF.LLoops:Clock():int (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+10 (+90.91%) : 526916.dasm - BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)
@@ -21,12 +21,12 @@ G_M23640_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        not      ecx
        xor      eax, ecx
-						;; size=10 bbWeight=1 PerfScore 1.25
+						;; size=20 bbWeight=1 PerfScore 1.25
 G_M23640_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 11, prolog size 0, PerfScore 2.25, instruction count 6, allocated bytes for code 11 (MethodHash=81d2a3a7) for method BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)
+; Total bytes of code 21, prolog size 0, PerfScore 2.25, instruction count 6, allocated bytes for code 21 (MethodHash=81d2a3a7) for method BringUpTest_NotAndNeg:NotAndNeg(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
libraries.crossgen2.windows.x64.checked.mch
+2 (+0.08%) : 30851.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)
@@ -174,7 +174,7 @@ G_M54838_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        movaps   xmmword ptr [rsp+0x30], xmm9
        movaps   xmmword ptr [rsp+0x20], xmm10
        movaps   xmm6, xmm0
-						;; size=35 bbWeight=1 PerfScore 10.50
+						;; size=36 bbWeight=1 PerfScore 10.50
 G_M54838_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        movaps   xmm0, xmm6
        mulsd    xmm0, qword ptr [reloc @RWD00]
@@ -643,7 +643,7 @@ G_M54838_IG08:        ; bbWeight=1, epilog, nogc, extend
        movaps   xmm10, xmmword ptr [rsp+0x20]
        add      rsp, 120
        ret      
-						;; size=33 bbWeight=1 PerfScore 21.25
+						;; size=34 bbWeight=1 PerfScore 21.25
 RWD00  	dq	3FEDB8A420DC189Ah	;    0.9287892
 RWD08  	dq	4070E8C71B478423h	;    270.54861
 RWD16  	dq	400921FB54442D18h	;   3.14159265
@@ -788,7 +788,7 @@ RWD1120	dq	40F5FD9C72B020C5h	;    90073.778
 RWD1128	dq	4062433333333333h	;        146.1
 
 
-; Total bytes of code 2395, prolog size 32, PerfScore 1804.33, instruction count 413, allocated bytes for code 2395 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)
+; Total bytes of code 2397, prolog size 33, PerfScore 1804.33, instruction count 413, allocated bytes for code 2397 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -796,19 +796,19 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x20
+  SizeOfProlog      : 0x21
   CountOfUnwindCodes: 11
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x20 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x21 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x1A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x14 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x0E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x0F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x09 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 14 * 8 + 8 = 120 = 0x78
+    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 14 * 8 + 8 = 120 = 0x78
+1 (+0.11%) : 15946.dasm - System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)
@@ -245,12 +245,12 @@ G_M31854_IG03:        ; bbWeight=1, extend
        movups   xmmword ptr [rcx+0x30], xmm3
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=436 bbWeight=1 PerfScore 175.25
+						;; size=437 bbWeight=1 PerfScore 175.25
 G_M31854_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 923, prolog size 0, PerfScore 381.00, instruction count 134, allocated bytes for code 923 (MethodHash=6feb8391) for method System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)
+; Total bytes of code 924, prolog size 0, PerfScore 381.00, instruction count 134, allocated bytes for code 924 (MethodHash=6feb8391) for method System.Runtime.Intrinsics.Vector512:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector512`1[byte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.23%) : 15798.dasm - System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)
@@ -126,12 +126,12 @@ G_M51694_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byr
        movups   xmmword ptr [rcx+0x10], xmm1
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=438 bbWeight=1 PerfScore 186.00
+						;; size=439 bbWeight=1 PerfScore 186.00
 G_M51694_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 439, prolog size 0, PerfScore 187.00, instruction count 68, allocated bytes for code 439 (MethodHash=4b683611) for method System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)
+; Total bytes of code 440, prolog size 0, PerfScore 187.00, instruction count 68, allocated bytes for code 440 (MethodHash=4b683611) for method System.Runtime.Intrinsics.Vector256:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):System.Runtime.Intrinsics.Vector256`1[byte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+6 (+85.71%) : 11664.dasm - System.UInt32:System.Numerics.IShiftOperators.op_LeftShift(uint,int):uint (FullOpts)
@@ -16,16 +16,16 @@
 
 G_M41089_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        mov      eax, ecx
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M41089_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        shl      eax, cl
-						;; size=4 bbWeight=1 PerfScore 2.25
+						;; size=8 bbWeight=1 PerfScore 2.25
 G_M41089_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 7, prolog size 0, PerfScore 3.50, instruction count 4, allocated bytes for code 7 (MethodHash=9e7e5f7e) for method System.UInt32:System.Numerics.IShiftOperators<System.UInt32,System.Int32,System.UInt32>.op_LeftShift(uint,int):uint (FullOpts)
+; Total bytes of code 13, prolog size 0, PerfScore 3.50, instruction count 4, allocated bytes for code 13 (MethodHash=9e7e5f7e) for method System.UInt32:System.Numerics.IShiftOperators<System.UInt32,System.Int32,System.UInt32>.op_LeftShift(uint,int):uint (FullOpts)
 ; ============================================================
 
 Unwind Info:
+34 (+87.18%) : 170393.dasm - Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)
@@ -34,12 +34,12 @@ G_M65205_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        sar      eax, 16
        or       eax, edx
        inc      eax
-						;; size=38 bbWeight=1 PerfScore 5.50
+						;; size=72 bbWeight=1 PerfScore 5.50
 G_M65205_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 39, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 39 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)
+; Total bytes of code 73, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 73 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
+8 (+88.89%) : 126160.dasm - Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)
@@ -21,12 +21,12 @@ G_M35496_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        not      eax
        and      eax, ecx
-						;; size=8 bbWeight=1 PerfScore 1.00
+						;; size=16 bbWeight=1 PerfScore 1.00
 G_M35496_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 9, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 9 (MethodHash=596f7557) for method Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)
+; Total bytes of code 17, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 17 (MethodHash=596f7557) for method Microsoft.Diagnostics.Tracing.Ctf.IntHelpers:AlignDown(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
libraries.pmi.windows.x64.checked.mch
-1 (-16.67%) : 29280.dasm - System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M37565_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M37565_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M37565_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=4c306d42) for method System.Runtime.Intrinsics.X86.Popcnt+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.14%) : 22972.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+0.16%) : 28844.dasm - System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
@@ -29,7 +29,7 @@ G_M43564_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
        ; byrRegs -[rdx]
        add      r8, rdx
        jmp      r8
-						;; size=40 bbWeight=1 PerfScore 14.00
+						;; size=44 bbWeight=1 PerfScore 14.00
 G_M43564_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byref
        vshufps  zmm0, zmm0, zmm1, 0
        jmp      G_M43564_IG259
@@ -1058,7 +1058,7 @@ G_M43564_IG259:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, by
        vmovups  zmmword ptr [rcx], zmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=9 bbWeight=1 PerfScore 2.25
+						;; size=10 bbWeight=1 PerfScore 2.25
 G_M43564_IG260:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -1321,7 +1321,7 @@ RWD00  	dd	G_M43564_IG03 - G_M43564_IG02
        	dd	G_M43564_IG258 - G_M43564_IG02
 
 
-; Total bytes of code 3083, prolog size 0, PerfScore 786.25, instruction count 524, allocated bytes for code 3083 (MethodHash=20a155d3) for method System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 3088, prolog size 0, PerfScore 786.25, instruction count 524, allocated bytes for code 3088 (MethodHash=20a155d3) for method System.Runtime.Intrinsics.X86.Avx512F:Shuffle(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float],ubyte):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+34 (+87.18%) : 137491.dasm - Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)
@@ -32,12 +32,12 @@ G_M17100_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        sar      eax, 16
        or       eax, ecx
        inc      eax
-						;; size=38 bbWeight=1 PerfScore 5.50
+						;; size=72 bbWeight=1 PerfScore 5.50
 G_M17100_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 39, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 39 (MethodHash=31c8bd33) for method Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)
+; Total bytes of code 73, prolog size 0, PerfScore 6.50, instruction count 18, allocated bytes for code 73 (MethodHash=31c8bd33) for method Microsoft.CodeAnalysis.CachingBase`1[ubyte]:AlignSize(int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+166.67%) : 29884.dasm - System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M34763_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M34763_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M34763_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=bcd77834) for method System.Runtime.Intrinsics.X86.X86Serialize+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+166.67%) : 29207.dasm - System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M31227_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M31227_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M31227_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=e6278604) for method System.Runtime.Intrinsics.X86.AvxVnni+X64:get_IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
libraries_tests.run.windows.x64.Release.mch
+2 (+0.08%) : 600803.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)
@@ -176,7 +176,7 @@ G_M54838_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        vmovaps  xmmword ptr [rsp+0x40], xmm9
        vmovaps  xmmword ptr [rsp+0x30], xmm10
        vmovaps  xmm6, xmm0
-						;; size=41 bbWeight=1 PerfScore 10.50
+						;; size=42 bbWeight=1 PerfScore 10.50
 G_M54838_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        vmulsd   xmm0, xmm6, qword ptr [reloc @RWD00]
        vaddsd   xmm0, xmm0, qword ptr [reloc @RWD08]
@@ -622,7 +622,7 @@ G_M54838_IG09:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm10, xmmword ptr [rsp+0x30]
        add      rsp, 136
        ret      
-						;; size=38 bbWeight=1 PerfScore 21.25
+						;; size=39 bbWeight=1 PerfScore 21.25
 RWD00  	dq	3FEDB8A420DC189Ah	;    0.9287892
 RWD08  	dq	4070E8C71B478423h	;    270.54861
 RWD16  	dq	400921FB54442D18h	;   3.14159265
@@ -767,7 +767,7 @@ RWD1120	dq	40F5FD9C72B020C5h	;    90073.778
 RWD1128	dq	4062433333333333h	;        146.1
 
 
-; Total bytes of code 2372, prolog size 37, PerfScore 1766.08, instruction count 388, allocated bytes for code 2372 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)
+; Total bytes of code 2374, prolog size 38, PerfScore 1766.08, instruction count 388, allocated bytes for code 2374 (MethodHash=72e629c9) for method System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double (Instrumented Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -775,20 +775,20 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x25
+  SizeOfProlog      : 0x26
   CountOfUnwindCodes: 12
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x25 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x26 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x1F UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x20 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x19 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x1A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 5 * 16 = 80 = 0x00050
-    CodeOffset: 0x13 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x14 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 6 * 16 = 96 = 0x00060
-    CodeOffset: 0x0D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0E UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 7 * 16 = 112 = 0x00070
-    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
+    CodeOffset: 0x08 UnwindOp: UWOP_ALLOC_LARGE (1)     OpInfo: 0 - Scaled small  
       Size: 17 * 8 = 136 = 0x00088
+1 (+0.23%) : 497356.dasm - System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
@@ -138,7 +138,7 @@ G_M36528_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  ymmword ptr [rcx], ymm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=243 bbWeight=1 PerfScore 117.75
+						;; size=244 bbWeight=1 PerfScore 117.75
 G_M36528_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -167,7 +167,7 @@ RWD232 	dd	BF000002h		;      -0.5
 RWD236 	dd	3F317218h		;  0.693147
 
 
-; Total bytes of code 432, prolog size 0, PerfScore 173.03, instruction count 68, allocated bytes for code 432 (MethodHash=421e714f) for method System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 433, prolog size 0, PerfScore 173.03, instruction count 68, allocated bytes for code 433 (MethodHash=421e714f) for method System.Runtime.Intrinsics.VectorMath:LogSingle[System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[uint]](System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:
+1 (+0.23%) : 484509.dasm - System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
@@ -108,7 +108,7 @@ G_M39372_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  ymmword ptr [rcx], ymm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=243 bbWeight=1 PerfScore 117.75
+						;; size=244 bbWeight=1 PerfScore 117.75
 G_M39372_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -137,7 +137,7 @@ RWD232 	dd	BF000002h		;      -0.5
 RWD236 	dd	3F317218h		;  0.693147
 
 
-; Total bytes of code 432, prolog size 0, PerfScore 161.72, instruction count 68, allocated bytes for code 432 (MethodHash=a2e86633) for method System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
+; Total bytes of code 433, prolog size 0, PerfScore 161.72, instruction count 68, allocated bytes for code 433 (MethodHash=a2e86633) for method System.Numerics.Tensors.TensorPrimitives+LogOperatorSingle:Invoke(System.Runtime.Intrinsics.Vector256`1[float]):System.Runtime.Intrinsics.Vector256`1[float] (Tier1)
 ; ============================================================
 
 Unwind Info:
+7 (+87.50%) : 344948.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)
@@ -12,16 +12,16 @@
 G_M39926_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M39926_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M39926_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=d45d6409) for method ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=d45d6409) for method ManagedTests.DynamicCSharp.Conformance.dynamic.declarations.backwardscompatible.dynamictypedeclared017.dynamictypedeclared017.A:MainMethod():int (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 561668.dasm - System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M6228_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M6228_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M6228_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=9c00e7ab) for method System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=9c00e7ab) for method System.Runtime.Intrinsics.Vector128`1[System.Int128]:get_IsSupported():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+7 (+87.50%) : 259812.dasm - System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)
@@ -12,16 +12,16 @@
 G_M54291_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        mov      rbp, rsp
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=7 bbWeight=1 PerfScore 1.25
 G_M54291_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M54291_IG03:        ; bbWeight=1, epilog, nogc, extend
        pop      rbp
        ret      
-						;; size=2 bbWeight=1 PerfScore 1.50
+						;; size=4 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 8, prolog size 4, PerfScore 3.00, instruction count 5, allocated bytes for code 8 (MethodHash=c4122bec) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)
+; Total bytes of code 15, prolog size 7, PerfScore 3.00, instruction count 5, allocated bytes for code 15 (MethodHash=c4122bec) for method System.Runtime.CompilerServices.RuntimeHelpers:IsBitwiseEquatable[System.Collections.Frozen.Tests.SimpleNonComparableStruct]():ubyte (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -29,9 +29,9 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x01
+  SizeOfProlog      : 0x03
   CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch
+1 (+0.14%) : 179042.dasm - System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
@@ -199,14 +199,14 @@ G_M8955_IG03:        ; bbWeight=1, extend
        vmovups  xmmword ptr [rcx+0x30], xmm0
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=354 bbWeight=1 PerfScore 161.25
+						;; size=355 bbWeight=1 PerfScore 161.25
 G_M8955_IG04:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 RWD00  	dd	3F800000h		;         1
 
 
-; Total bytes of code 693, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 693 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
+; Total bytes of code 694, prolog size 0, PerfScore 345.25, instruction count 118, allocated bytes for code 694 (MethodHash=5906dd04) for method System.Numerics.Matrix4x4+Impl:Transform(byref,byref):System.Numerics.Matrix4x4+Impl (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.15%) : 212365.dasm - System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
@@ -161,7 +161,7 @@ G_M22781_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  zmmword ptr [rcx], zmm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=437 bbWeight=1 PerfScore 201.42
+						;; size=438 bbWeight=1 PerfScore 201.42
 G_M22781_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -203,7 +203,7 @@ RWD488 	dq	BF2BD0105C610CA8h	; -0.00021219444
 RWD496 	dq	3FE6300000000000h	;  0.693359375
 
 
-; Total bytes of code 665, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 667 (MethodHash=3c50a702) for method System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
+; Total bytes of code 666, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 668 (MethodHash=3c50a702) for method System.Runtime.Intrinsics.VectorMath:LogDouble[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.15%) : 211948.dasm - System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
@@ -161,7 +161,7 @@ G_M63855_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
        vmovups  zmmword ptr [rcx], zmm2
        mov      rax, rcx
        ; byrRegs +[rax]
-						;; size=437 bbWeight=1 PerfScore 201.42
+						;; size=438 bbWeight=1 PerfScore 201.42
 G_M63855_IG05:        ; bbWeight=1, epilog, nogc, extend
        vzeroupper 
        ret      
@@ -203,7 +203,7 @@ RWD488 	dq	3ECB295C17F0BBBEh	; 3.23791045e-06
 RWD496 	dq	3FF7154400000000h	;    1.4426918
 
 
-; Total bytes of code 665, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 667 (MethodHash=4d7b0690) for method System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
+; Total bytes of code 666, prolog size 0, PerfScore 237.75, instruction count 91, allocated bytes for code 668 (MethodHash=4d7b0690) for method System.Runtime.Intrinsics.VectorMath:Log2Double[System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[ulong]](System.Runtime.Intrinsics.Vector512`1[double]):System.Runtime.Intrinsics.Vector512`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+80.00%) : 18935.dasm - LibraryImportGenerator.IntegrationTests.FunctionPointerTests:g__Callback|2_0(int,int):int (FullOpts)
@@ -18,12 +18,12 @@ G_M41111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M41111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, ecx
        sub      eax, edx
-						;; size=4 bbWeight=1 PerfScore 0.50
+						;; size=8 bbWeight=1 PerfScore 0.50
 G_M41111_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 5 (MethodHash=f1415f68) for method LibraryImportGenerator.IntegrationTests.FunctionPointerTests:<CalledWithArgumentsInOrder>g__Callback|2_0(int,int):int (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 1.50, instruction count 3, allocated bytes for code 9 (MethodHash=f1415f68) for method LibraryImportGenerator.IntegrationTests.FunctionPointerTests:<CalledWithArgumentsInOrder>g__Callback|2_0(int,int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
+14 (+82.35%) : 146650.dasm - System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:b__0_0(int):int:this (FullOpts)
@@ -23,12 +23,12 @@ G_M5520_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      ecx, edx
        sub      ecx, eax
        mov      eax, ecx
-						;; size=16 bbWeight=1 PerfScore 2.00
+						;; size=30 bbWeight=1 PerfScore 2.00
 G_M5520_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 17, prolog size 0, PerfScore 3.00, instruction count 8, allocated bytes for code 17 (MethodHash=3061ea6f) for method System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:<ReorderLeft>b__0_0(int):int:this (FullOpts)
+; Total bytes of code 31, prolog size 0, PerfScore 3.00, instruction count 8, allocated bytes for code 31 (MethodHash=3061ea6f) for method System.Linq.Parallel.Tests.JoinTests+LeftOrderingCollisionTest+<>c:<ReorderLeft>b__0_0(int):int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+6 (+85.71%) : 176318.dasm - System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)
@@ -20,12 +20,12 @@ G_M23137_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        xor      edx, edx
        div      edx:eax, ecx
-						;; size=6 bbWeight=1 PerfScore 25.50
+						;; size=12 bbWeight=1 PerfScore 25.50
 G_M23137_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 7, prolog size 0, PerfScore 26.50, instruction count 4, allocated bytes for code 7 (MethodHash=86f0a59e) for method System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)
+; Total bytes of code 13, prolog size 0, PerfScore 26.50, instruction count 4, allocated bytes for code 13 (MethodHash=86f0a59e) for method System.Numerics.Tensors.TensorPrimitives+InvertedBinaryOperator`2[System.Numerics.Tensors.TensorPrimitives+DivideOperator`1[uint],uint]:Invoke(uint,uint):uint (FullOpts)
 ; ============================================================
 
 Unwind Info:
realworld.run.windows.x64.checked.mch
+1 (+0.11%) : 1372.dasm - BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)
@@ -56,7 +56,7 @@
 G_M56806_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        mov      rax, bword ptr [rsp+0x28]
        ; byrRegs +[rax]
-						;; size=5 bbWeight=1 PerfScore 1.00
+						;; size=6 bbWeight=1 PerfScore 1.00
 G_M56806_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0107 {rax rcx rdx r8}, byref
        ; byrRegs +[rcx rdx r8]
        vmulss   xmm0, xmm3, dword ptr [reloc @RWD00]
@@ -236,7 +236,7 @@ RWD08  	dd	C0000000h		;        -2
 RWD12  	dd	40000000h		;         2
 
 
-; Total bytes of code 870, prolog size 0, PerfScore 531.00, instruction count 165, allocated bytes for code 870 (MethodHash=1b132219) for method BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)
+; Total bytes of code 871, prolog size 0, PerfScore 531.00, instruction count 165, allocated bytes for code 871 (MethodHash=1b132219) for method BepuPhysics.Collidables.MeshInertiaHelper:ComputeTetrahedronContribution(byref,byref,byref,float,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:
+1 (+0.21%) : 1203.dasm - BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)
@@ -100,7 +100,7 @@ G_M10677_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
        vmulps   ymm17, ymm3, ymm0
        vmulps   ymm0, ymm5, ymm0
        vmulps   ymm18, ymm3, ymm2
-						;; size=275 bbWeight=1 PerfScore 256.50
+						;; size=276 bbWeight=1 PerfScore 256.50
 G_M10677_IG03:        ; bbWeight=1, extend
        vmulps   ymm2, ymm5, ymm2
        vmulps   ymm4, ymm5, ymm4
@@ -146,7 +146,7 @@ G_M10677_IG04:        ; bbWeight=1, epilog, nogc, extend
 RWD00  	dq	3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h
 
 
-; Total bytes of code 476, prolog size 0, PerfScore 393.50, instruction count 97, allocated bytes for code 476 (MethodHash=d5c8d64a) for method BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)
+; Total bytes of code 477, prolog size 0, PerfScore 393.50, instruction count 97, allocated bytes for code 477 (MethodHash=d5c8d64a) for method BepuPhysics.Collidables.Compound:GetRotatedChildPose(byref,byref,byref,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+0.36%) : 1326.dasm - BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)
@@ -59,7 +59,7 @@ G_M52182_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        ; byrRegs +[rax]
        mov      r10, bword ptr [rsp+0x90]
        ; byrRegs +[r10]
-						;; size=57 bbWeight=1 PerfScore 13.25
+						;; size=61 bbWeight=1 PerfScore 13.25
 G_M52182_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0F07 {rax rcx rdx r8 r9 r10 r11}, byref
        ; byrRegs +[rcx rdx r8-r9]
        vmovups  ymm0, ymmword ptr [r8]
@@ -305,11 +305,11 @@ G_M52182_IG06:        ; bbWeight=1, epilog, nogc, extend
        vmovaps  xmm10, xmmword ptr [rsp]
        add      rsp, 88
        ret      
-						;; size=37 bbWeight=1 PerfScore 22.25
+						;; size=38 bbWeight=1 PerfScore 22.25
 RWD00  	dq	3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h, 3F8000003F800000h
 
 
-; Total bytes of code 1406, prolog size 33, PerfScore 914.50, instruction count 244, allocated bytes for code 1406 (MethodHash=9c303429) for method BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)
+; Total bytes of code 1411, prolog size 34, PerfScore 914.50, instruction count 244, allocated bytes for code 1411 (MethodHash=9c303429) for method BepuUtilities.Symmetric6x6Wide:LDLTSolve(byref,byref,byref,byref,byref,byref,byref) (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -317,19 +317,19 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x21
+  SizeOfProlog      : 0x22
   CountOfUnwindCodes: 11
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x21 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
+    CodeOffset: 0x22 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM10 (10)
       Scaled Small Offset: 0 * 16 = 0 = 0x00000
-    CodeOffset: 0x1C UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
+    CodeOffset: 0x1D UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM9 (9)
       Scaled Small Offset: 1 * 16 = 16 = 0x00010
-    CodeOffset: 0x16 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
+    CodeOffset: 0x17 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM8 (8)
       Scaled Small Offset: 2 * 16 = 32 = 0x00020
-    CodeOffset: 0x10 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
+    CodeOffset: 0x11 UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM7 (7)
       Scaled Small Offset: 3 * 16 = 48 = 0x00030
-    CodeOffset: 0x0A UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
+    CodeOffset: 0x0B UnwindOp: UWOP_SAVE_XMM128 (8)     OpInfo: XMM6 (6)
       Scaled Small Offset: 4 * 16 = 64 = 0x00040
-    CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 10 * 8 + 8 = 88 = 0x58
+    CodeOffset: 0x05 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 10 * 8 + 8 = 88 = 0x58
+6 (+75.00%) : 8466.dasm - Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)
@@ -19,12 +19,12 @@ G_M20425_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        mov      eax, edx
        not      eax
        shr      eax, 31
-						;; size=7 bbWeight=1 PerfScore 1.00
+						;; size=13 bbWeight=1 PerfScore 1.00
 G_M20425_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 8, prolog size 0, PerfScore 2.00, instruction count 4, allocated bytes for code 8 (MethodHash=067ab036) for method Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)
+; Total bytes of code 14, prolog size 0, PerfScore 2.00, instruction count 4, allocated bytes for code 14 (MethodHash=067ab036) for method Microsoft.ML.Data.VectorDataViewType+<>c:<.ctor>b__4_0(int):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+10 (+76.92%) : 20363.dasm - Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)
@@ -25,12 +25,12 @@ G_M6705_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byre
        shr      ecx, 31
        add      eax, ecx
        sar      eax, 1
-						;; size=12 bbWeight=1 PerfScore 3.50
+						;; size=22 bbWeight=1 PerfScore 3.50
 G_M6705_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 13, prolog size 0, PerfScore 4.50, instruction count 6, allocated bytes for code 13 (MethodHash=ab86e5ce) for method Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)
+; Total bytes of code 23, prolog size 0, PerfScore 4.50, instruction count 6, allocated bytes for code 23 (MethodHash=ab86e5ce) for method Microsoft.CodeAnalysis.CSharp.NullableWalker+LocalState:get_Capacity():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+4 (+80.00%) : 17010.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)
@@ -19,13 +19,13 @@ G_M52199_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byr
        ; byrRegs +[rdx]
        xor      eax, eax
        mov      dword ptr [rdx], eax
-						;; size=4 bbWeight=1 PerfScore 1.25
+						;; size=8 bbWeight=1 PerfScore 1.25
 G_M52199_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
        ; byrRegs -[rdx]
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 5, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 5 (MethodHash=79e23418) for method Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)
+; Total bytes of code 9, prolog size 0, PerfScore 2.25, instruction count 3, allocated bytes for code 9 (MethodHash=79e23418) for method Microsoft.CodeAnalysis.CSharp.Symbols.ErrorTypeSymbol:HasInlineArrayAttribute(byref):ubyte:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
smoke_tests.nativeaot.windows.x64.checked.mch
-1 (-16.67%) : 14296.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-1 (-16.67%) : 21532.dasm - Program:AesX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M55817_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M55817_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M55817_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=c5da25f6) for method Program:AesX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
-1 (-16.67%) : 19229.dasm - Program:X86SerializeX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13406_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13406_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       mov      eax, 1
-						;; size=5 bbWeight=1 PerfScore 0.25
+       xor      eax, eax
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M13406_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 6, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 6 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 5, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 5 (MethodHash=00a5cba1) for method Program:X86SerializeX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+166.67%) : 19199.dasm - Program:AvxVnniX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M60430_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M60430_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M60430_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=e20b13f1) for method Program:AvxVnniX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+166.67%) : 21515.dasm - Program:FmaX64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M2260_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M2260_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M2260_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=36a7f72b) for method Program:FmaX64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
+5 (+166.67%) : 21526.dasm - Program:Avx2X64IsSupported():ubyte (FullOpts)
@@ -14,13 +14,13 @@
 G_M13187_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M13187_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      eax, 1
+						;; size=7 bbWeight=1 PerfScore 0.25
 G_M13187_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 3 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
+; Total bytes of code 8, prolog size 0, PerfScore 1.25, instruction count 2, allocated bytes for code 8 (MethodHash=f683cc7c) for method Program:Avx2X64IsSupported():ubyte (FullOpts)
 ; ============================================================
 
 Unwind Info:
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
aspnet.run.windows.x64.checked.mch 140,527 0 140,527 0 -0 +10,392,179
benchmarks.run.windows.x64.checked.mch 37,922 0 37,922 0 -0 +3,013,399
benchmarks.run_pgo.windows.x64.checked.mch 120,020 0 120,020 0 -0 +8,962,474
benchmarks.run_tiered.windows.x64.checked.mch 76,575 0 76,575 0 -0 +4,199,746
coreclr_tests.run.windows.x64.checked.mch 639,890 3 639,887 0 -14 +84,314,025
libraries.crossgen2.windows.x64.checked.mch 274,848 0 274,848 0 -0 +11,739,139
libraries.pmi.windows.x64.checked.mch 309,149 1 309,148 0 -1 +15,125,726
libraries_tests.run.windows.x64.Release.mch 811,914 0 811,914 0 -0 +69,356,394
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 339,284 0 339,284 0 -0 +32,521,941
realworld.run.windows.x64.checked.mch 28,087 0 28,087 0 -0 +2,545,126
smoke_tests.nativeaot.windows.x64.checked.mch 30,547 8 30,539 0 -8 +1,394,449
2,808,763 12 2,808,751 0 -23 +243,564,598

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
aspnet.run.windows.x64.checked.mch 140,527 0 0 140,527 0.00% 0.00% 0.0000%
benchmarks.run.windows.x64.checked.mch 37,922 0 0 37,922 0.00% 0.00% 0.0000%
benchmarks.run_pgo.windows.x64.checked.mch 120,020 0 0 120,020 0.00% 0.00% 0.0000%
benchmarks.run_tiered.windows.x64.checked.mch 76,575 0 0 76,575 0.00% 0.00% 0.0000%
coreclr_tests.run.windows.x64.checked.mch 639,890 2 1 639,887 -7.87% +0.03% 0.0000%
libraries.crossgen2.windows.x64.checked.mch 274,848 0 0 274,848 0.00% 0.00% 0.0000%
libraries.pmi.windows.x64.checked.mch 309,149 0 0 309,149 0.00% 0.00% 0.0000%
libraries_tests.run.windows.x64.Release.mch 811,914 0 0 811,914 0.00% 0.00% 0.0000%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 339,284 0 0 339,284 0.00% 0.00% 0.0000%
realworld.run.windows.x64.checked.mch 28,087 0 0 28,087 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.windows.x64.checked.mch 30,547 0 0 30,547 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
aspnet.run.windows.x64.checked.mch 141,224 77,324 63,900 0 (0.00%) 0 (0.00%)
benchmarks.run.windows.x64.checked.mch 38,352 6 38,346 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch 120,280 68,103 52,177 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.windows.x64.checked.mch 76,876 56,358 20,518 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch 642,813 393,776 249,037 0 (0.00%) 5 (0.00%)
libraries.crossgen2.windows.x64.checked.mch 276,889 15 276,874 0 (0.00%) 2 (0.00%)
libraries.pmi.windows.x64.checked.mch 316,010 6 316,004 0 (0.00%) 1 (0.00%)
libraries_tests.run.windows.x64.Release.mch 814,679 567,674 247,005 0 (0.00%) 0 (0.00%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 343,895 21,994 321,901 0 (0.00%) 0 (0.00%)
realworld.run.windows.x64.checked.mch 28,368 3 28,365 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 31,202 10 31,192 0 (0.00%) 3 (0.01%)
2,830,588 1,185,269 1,645,319 0 (0.00%) 11 (0.00%)

jit-analyze output

Comments: No Decode Failure or assertion failure is reported in the logs, only except some assert fails about unsupported ISAs, this should is also attributed to the APX CPUID changes. The huge code size is expected as we are forcing all the compatible legacy instructions to be encoded in REX2 regradless if it is needed.

2.3 TpDiff - REX2 off (no or little tp impact expected)

TP impact with REX2 off compared with base main:

Overall (+0.08% to +0.19%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.13%
benchmarks.run.windows.x64.checked.mch +0.08%
benchmarks.run_pgo.windows.x64.checked.mch +0.12%
benchmarks.run_tiered.windows.x64.checked.mch +0.19%
coreclr_tests.run.windows.x64.checked.mch +0.18%
libraries.crossgen2.windows.x64.checked.mch +0.11%
libraries.pmi.windows.x64.checked.mch +0.09%
libraries_tests.run.windows.x64.Release.mch +0.15%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.10%
realworld.run.windows.x64.checked.mch +0.09%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%
MinOpts (+0.24% to +0.43%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.37%
benchmarks.run.windows.x64.checked.mch +0.36%
benchmarks.run_pgo.windows.x64.checked.mch +0.35%
benchmarks.run_tiered.windows.x64.checked.mch +0.34%
coreclr_tests.run.windows.x64.checked.mch +0.27%
libraries.crossgen2.windows.x64.checked.mch +0.36%
libraries.pmi.windows.x64.checked.mch +0.24%
libraries_tests.run.windows.x64.Release.mch +0.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.30%
realworld.run.windows.x64.checked.mch +0.43%
smoke_tests.nativeaot.windows.x64.checked.mch +0.29%
FullOpts (+0.07% to +0.11%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.08%
benchmarks.run.windows.x64.checked.mch +0.08%
benchmarks.run_pgo.windows.x64.checked.mch +0.07%
benchmarks.run_tiered.windows.x64.checked.mch +0.08%
coreclr_tests.run.windows.x64.checked.mch +0.10%
libraries.crossgen2.windows.x64.checked.mch +0.11%
libraries.pmi.windows.x64.checked.mch +0.09%
libraries_tests.run.windows.x64.Release.mch +0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.10%
realworld.run.windows.x64.checked.mch +0.08%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 142,394,809,547 142,582,075,252 +0.13%
benchmarks.run.windows.x64.checked.mch 55,370,986,624 55,417,510,475 +0.08%
benchmarks.run_pgo.windows.x64.checked.mch 121,883,543,862 122,027,057,184 +0.12%
benchmarks.run_tiered.windows.x64.checked.mch 34,231,112,724 34,297,405,288 +0.19%
coreclr_tests.run.windows.x64.checked.mch 809,468,778,745 810,902,734,493 +0.18%
libraries.crossgen2.windows.x64.checked.mch 154,853,677,569 155,028,932,749 +0.11%
libraries.pmi.windows.x64.checked.mch 269,020,941,364 269,270,900,769 +0.09%
libraries_tests.run.windows.x64.Release.mch 815,708,776,365 816,960,864,737 +0.15%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 577,085,986,658 577,668,396,449 +0.10%
realworld.run.windows.x64.checked.mch 49,400,011,363 49,442,097,903 +0.09%
smoke_tests.nativeaot.windows.x64.checked.mch 22,690,369,631 22,708,625,728 +0.08%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 24,056,213,491 24,145,743,102 +0.37%
benchmarks.run.windows.x64.checked.mch 705,633 708,145 +0.36%
benchmarks.run_pgo.windows.x64.checked.mch 19,880,799,806 19,950,333,595 +0.35%
benchmarks.run_tiered.windows.x64.checked.mch 15,022,302,541 15,073,432,822 +0.34%
coreclr_tests.run.windows.x64.checked.mch 347,233,426,241 348,186,424,612 +0.27%
libraries.crossgen2.windows.x64.checked.mch 2,084,909 2,092,477 +0.36%
libraries.pmi.windows.x64.checked.mch 132,525,396 132,849,510 +0.24%
libraries_tests.run.windows.x64.Release.mch 206,423,819,906 207,191,499,654 +0.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 12,115,105,172 12,152,028,039 +0.30%
realworld.run.windows.x64.checked.mch 348,063,722 349,547,131 +0.43%
smoke_tests.nativeaot.windows.x64.checked.mch 1,254,167 1,257,840 +0.29%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 118,338,596,056 118,436,332,150 +0.08%
benchmarks.run.windows.x64.checked.mch 55,370,280,991 55,416,802,330 +0.08%
benchmarks.run_pgo.windows.x64.checked.mch 102,002,744,056 102,076,723,589 +0.07%
benchmarks.run_tiered.windows.x64.checked.mch 19,208,810,183 19,223,972,466 +0.08%
coreclr_tests.run.windows.x64.checked.mch 462,235,352,504 462,716,309,881 +0.10%
libraries.crossgen2.windows.x64.checked.mch 154,851,592,660 155,026,840,272 +0.11%
libraries.pmi.windows.x64.checked.mch 268,888,415,968 269,138,051,259 +0.09%
libraries_tests.run.windows.x64.Release.mch 609,284,956,459 609,769,365,083 +0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 564,970,881,486 565,516,368,410 +0.10%
realworld.run.windows.x64.checked.mch 49,051,947,641 49,092,550,772 +0.08%
smoke_tests.nativeaot.windows.x64.checked.mch 22,689,115,464 22,707,367,888 +0.08%

3. JIT unit tests

3-1

Comments: We are not using the full JIT test suite because the emulator has its own limitation and when the test sets is too big, emulator itself will have some non-deterministic behaviors, to avoid it, we did some effort to figure out the best coverage that will generate stable testing results.

Comments: Within this subset shown in the screen shot, all the tests are passing without REX2 (DOTNET_JitStressRex2Encoding=0) and with REX2 (DOTNET_JitStressRex2Encoding = 1) with some know exceptions caused by the emulator itself, i.e. CodegenBringUpTests and IL_Comformance will break due to the fact that there are some existing try-catch structures, and some exceptions are supposed to be caught by the runtime, but first caught by the emulator.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 16, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@Ruihan-Yin
Copy link
Contributor Author

The base of 2 APX related PRs( CPUID: #104637 and REX2: #106557) is outdated, I will work offline to resolve the conflicts and rebase the branch.

We are willing to discuss the design and tests here, please feel free to leave a comment if any question or suggestion

@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Sep 5, 2024
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@anthonycanino
Copy link
Contributor

@dotnet/avx512-contrib can we reopen this as a PR ready to review?

@BruceForstall BruceForstall reopened this Oct 21, 2024
@BruceForstall
Copy link
Member

@anthonycanino I re-opened it (it wasn't clear to me if your question implied you did not have permission to do so). Either you or @Ruihan-Yin need to update to latest main and resolve the conflicts, then mark it ready-for-review.

@Ruihan-Yin
Copy link
Contributor Author

Hi @tannergooding, thanks for the reviews in #104637, it seems like the CPUID changes are just pending merge and there should be no major changes expected, so while waiting, I wonder if we can start the conversion on this PR?

@tannergooding
Copy link
Member

@Ruihan-Yin, just got #104637 merged. If we could get this PR updated so it contains just the new changes, that should make it a lot simpler to review and get in.

@Ruihan-Yin
Copy link
Contributor Author

Thanks! I will work on it soon.

@@ -647,6 +647,7 @@ class CodeGen final : public CodeGenInterface

#if defined(TARGET_AMD64)
void genAmd64EmitterUnitTestsSse2();
void genAmd64EmitterUnitTestsApx();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR really, but it'd be nice if we had similar tests for other ISAs/encodings (VEX, EVEX, etc). Sse2 itself is, afair, really just SimdLegacyEncoding.

genDefineTempLabel(genCreateTempLabel());

// This test suite needs REX2 enabled.
assert(theEmitter->emitComp->DoJitStressRex2Encoding());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this rather be ApxIsSupported || StressRex2Encoding?

Comment on lines +9069 to +9072
theEmitter->emitIns_R_R(INS_add, EA_1BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_2BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_8BYTE, REG_EAX, REG_ECX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be variants of these that explicitly test the new extended registers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests on EGPRs are planned in the PR for LSRA updates, that PR will introduce EGPR definition, which currently is unavailable in this PR.

Comment on lines +9134 to +9136
// TODO-xarch-apx: not enable these 2 for now.
// theEmitter->emitIns_R_I(INS_rcl_N, EA_4BYTE, REG_ECX, 0x05);
// theEmitter->emitIns_R_I(INS_rcr_N, EA_4BYTE, REG_ECX, 0x05);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these ones being skipped? Can we open tracking issues and list the issue number as part of the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L18695

It seems that the latency/tp information is missing for rcl_N/rcr_N, so I was supposing if these 2 instructions are not used in JIT. I can add those information if it is needed


theEmitter->emitIns_S(INS_pop, EA_PTRSIZE, 1, 2);
theEmitter->emitIns_I(INS_push, EA_PTRSIZE, 50);
// TODO-XArch-apx: figure out a way to test emitIns_A, which will require a GenTreeIndir* input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just create a GenTreeIndir node on the stack. We do this for hwintrinsiccodegenxarch in a few places to simplify the emitter. For example: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiccodegenxarch.cpp#L407-L408

theEmitter->emitIns_R(INS_not, EA_2BYTE, REG_EAX);
theEmitter->emitIns_S(INS_not, EA_2BYTE, 1, 2);

// TODO-XArch-apx: xadd does not have RM opcode, made it cannot be encoded with emitIns_R_R.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just because we only emit xadd as part of an interlocked or similar API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly follow the rules that if the instruction does not have an actual use case or it is blocked by the current emit paths, then I don't enable it.

Like, here, xadd is labeled with Encoding_REX2, so it will have access to EGPRs but if the current path will only use it with memory operand, then the situation will hold after my changes.

Comment on lines 9227 to 9230
// TODO-XArch-apx: S_R_I path only accepts SEE or VEX instructions,
// so I assuem shld/shrd will not be taking the first argument from stack.
// theEmitter->emitIns_S_R_I(INS_shld, EA_2BYTE, 1, 2, REG_EAX, 5);
// theEmitter->emitIns_S_R_I(INS_shrd, EA_2BYTE, 1, 2, REG_EAX, 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not ever emit this encoding today? Possibly a more general missing optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to comment above, there are instructions that "theoretically" accept some combination of operands, but if JIT does not use those combination, then this PR does not intend to change this fact.

Is this what we want in this PR, or we want to extend coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that shld/shrd are from GT_LSH_HI/GT_RSH_LO surfaces, from what I can tell, these instructions are only used when TARGET_64bit is not defined, since APX is only available under 64bit system, do we consider not enable them at all?

Same situation may apply to adc/sbb

image

@@ -2297,6 +2297,13 @@ void Compiler::compSetProcessor()
codeGen->GetEmitter()->SetUseEvexEncoding(true);
// TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added.
}
if (canUseRex2Encoding() || DoJitStressRex2Encoding())
Copy link
Member

@tannergooding tannergooding Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the DoJitStressRex2Encoding() check needed here? We notably don't have the equivalent for the canUseEvexEncoding() path even though a DoJitStressEvexEncoding() API exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will eventually turn this part into in the APX-EVEX PR:

if (canUseApxEncoding)
{
   codeGen->GetEmitter()->SetUseRex2Encoding(true);
   codeGen->GetEmitter()->SetUsePromotedEVEXEncoding(true);
}

the code here in this branch is a bit outdated, I will mirror the changes to this PR.

Comment on lines 2302 to 2304
// TODO-Xarch-apx:
// At this stage, since no machine will pass the CPUID check for APX, we need a special stress mode that
// enables REX2 on incompatible platform, `DoJitStressRex2Encoding` is expected to be removed eventually.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected to be removed? There's a general benefit to the stress mode even when APX CPUs exist, in that it forces all instructions to emit using the APX/EVEX encoding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mis-statement, we will keep this REX2 stress mode. The difference between REX2 stress and the existing EVEX stress is that now REX2 does not do the CPUID check, we will need to add it when we have the compatible machine. Or, we may make use of the JitBypassAPXCheck knob to skip the CPUID check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just rely on the AltJit and/or NAOT scenario for "bypassing" the CPUID check, seeing as we can't run the tests anyways.

There are the existing mechanisms for generating code for a CPU that doesn't match the host CPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just rely on the AltJit and/or NAOT scenario for "bypassing" the CPUID check, seeing as we can't run the tests anyways.

Thanks for the explanation! And could you elaborate more on this technique that I can "get" APX on non-APX machine?

If I follow the existing EVEX stress mode, inside DoJitStressEvexEncoding, it will return true only if Avx512/Avx10 is available, so DoJitStressRex2Encoding will require APX available, can you give me some instructions on how I can get the ISA check pass on non-APX machine? Thanks!

Comment on lines 9955 to 9960
if (JitConfig.JitBypassAPXCheck())
{
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed given the JitStressApxCheck right?

It can rather be handled via the general AltJit support for the "normal" path, like we do for other ISAs?

bool DoJitStressRex2Encoding() const
{
#ifdef DEBUG
if (JitConfig.JitStressRex2Encoding())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to interplay with JitStressEvexEncoding or similar at all (or does that need to be renamed, to clarify its only SIMD EVEX?)

In particular, I'm considering what the behavior would be on a machine without EVEX but where stress REX2 is enabled (which can't exist for real hardware), so there may be some consideration of ensuring that EVEX is enabled when REX2 is being stressed so everything "lines up".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to interplay with JitStressEvexEncoding or similar at all (or does that need to be renamed, to clarify its only SIMD EVEX?)

We will use JitStressPromotedEVEXEncoding to distinguish between APX-EVEX and pre-APX EVEX.

In particular, I'm considering what the behavior would be on a machine without EVEX but where stress REX2 is enabled (which can't exist for real hardware), so there may be some consideration of ensuring that EVEX is enabled when REX2 is being stressed so everything "lines up".

If we are simply stressing the "encodings", say letting all the compatible instructions to be encoded with the new encodings, since REX2 and APX-EVEX compatibility are tracked separately, I think letting REX2 and APX-EVEX stand alone should be fine.

But if we are to stress the register allocator to use the new registers (which seems to be another scenario.), we might need to ensure both encodings are available.

Is this the case we are considering, or I have some misunderstanding?

Comment on lines +277 to +278
// TODO-Xarch-apx: we have special stress mode for REX2 on non-compatible machine, that will
// force UseRex2Encoding return true regardless of the CPUID results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment? We don't have a corresponding one for IsEvexEncodableInstruction

(same general question/concept applies throughout; I think we can generally mirror the comments/semantics that EVEX and stress EVEX has)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same general question/concept applies throughout; I think we can generally mirror the comments/semantics that EVEX and stress EVEX has)

Yes, the overall concept of the REX2 stress is mirrored from EVEX stress mode, thanks for the inputs above, I will try to do the improvement to make the REX2 stress mode more aligned with the current stress mode design like EVEX stress mode does.

@tannergooding
Copy link
Member

Most of the code looks to be generally correct and in the right/expected shape. But there's a number of remaining todo comments that don't have corresponding issues or that appear unnecessary when viewed in contrast to the existing EVEX and stress EVEX support.

I think we can clean up and simplify much of it based on that and get it a little bit more streamlined.

@BruceForstall
Copy link
Member

Thanks for describing your testing plan. I'm glad to hear that JitLateDisasm was used (and hopefully helped) with encoding testing. Thanks for adding unit tests.

It might be useful to PR any changes you had to make to build a custom coredistool.dll, to https://github.com/dotnet/jitutils/. It was mentioned in a tracking issue dotnet/jitutils#414 that current LLVM builds support APX disassembly, so perhaps the only change you needed was to bump the LLVM version number and build? (My latest attempt to update it was a little more ambitious: dotnet/jitutils#412, but it also bumped the LLVM version number.)

@anthonycanino
Copy link
Contributor

Thanks for describing your testing plan. I'm glad to hear that JitLateDisasm was used (and hopefully helped) with encoding testing. Thanks for adding unit tests.

It might be useful to PR any changes you had to make to build a custom coredistool.dll, to https://github.com/dotnet/jitutils/. It was mentioned in a tracking issue dotnet/jitutils#414 that current LLVM builds support APX disassembly, so perhaps the only change you needed was to bump the LLVM version number and build? (My latest attempt to update it was a little more ambitious: dotnet/jitutils#412, but it also bumped the LLVM version number.)

Hi Bruce.

Looks like I had bumped the LLVMSourceVersion field in coredistools.ymp to llvmorg-19.1.0 to get a cordistools.dll that worked.

Do you have plans to bump the LLVM version for .NET 10? It looks like that LLVM 19 will cover APX, but LLVM 20 will be required for AVX10.2 after further discussions internally (which should be released early 2025 I believe).

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Nov 20, 2024

Do you have plans to bump the LLVM version for .NET 10? It looks like that LLVM 19 will cover APX, but LLVM 20 will be required for AVX10.2 after further discussions internally (which should be released early 2025 I believe).

#109939 (do note that only covers official builds, distro builds will probably still use older Clang/GCC)

@BruceForstall
Copy link
Member

Do you have plans to bump the LLVM version for .NET 10?

Yes. Also, I expect to re-tool the cordistools build process to make it easier than it is currently to update the dependent LLVM version.

@MichalPetryka thanks for that link. That might make it easier to build cordistools with our AzureLinux containers (currently, the libc in Ubuntu 16.04 I think is limiting our ability to build new LLVM). However, note that coredistools version of LLVM and .NET 10 version of LLVM are not currently related, and hopefully can still be chosen independently, in the future.

@Ruihan-Yin
Copy link
Contributor Author

Hi @tannergooding, I tried to refactor the stress mode for REX2, can you please check if it is as expected?

Plus, there are a few questions regarding to the comments, would appreciate it if you can take a look.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants