Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorAPI] Add initial support for TensorQ8 and TensorQ4 to support quantization and dequantization of Floats #591

Draft
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

mikepapadim
Copy link
Member

@mikepapadim mikepapadim commented Nov 16, 2024

Description

NO CODE GEN SUPPORT

This PR adds support for Q8 and Q4 tensor quantization in TornadoVM. The implementation includes:

  • A new TensorQ8 class that implements 8-bit quantization with float16 scales
  • A new TensorQ4 class that implements 4-bit quantization with float16 scales
  • Block-based quantization where each block has its own scale factor
  • Comprehensive test suite validating quantization accuracy and behavior
  • Memory-efficient storage using native memory segments
  • Support for both positive and negative values with proper scaling

The implementation shows good accuracy with relative errors typically below 1% for medium to large values and good preservation of small values.

Problem description

N/A - This is a feature addition to support 8-bit quantized tensors which are essential for efficient deep learning model deployment.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

tornado-test -V uk.ac.manchester.tornado.unittests.tensors.TestTensorQ8

Expected output:

Segment size for storing single value 34
Total elements: 32
Block size: 32
Total allocated bytes: 34
Index 0: Set=0.50 Retrieved=0.50
Index 1: Set=-1.00 Retrieved=-1.00
Index 2: Set=25.00 Retrieved=25.01
Index 3: Set=-30.50 Retrieved=-30.49
Index 4: Set=0.00 Retrieved=0.00
Index 5: Set=0.50 Retrieved=0.48
Index 6: Set=-1.00 Retrieved=-0.96
Index 7: Set=25.00 Retrieved=24.97
Index 8: Set=-30.50 Retrieved=-30.49
Index 9: Set=0.00 Retrieved=0.00
Index 10: Set=0.50 Retrieved=0.48
Index 11: Set=-1.00 Retrieved=-0.96
Index 12: Set=25.00 Retrieved=24.97
Index 13: Set=-30.50 Retrieved=-30.49
Index 14: Set=0.00 Retrieved=0.00
Index 15: Set=0.50 Retrieved=0.48
Index 16: Set=-1.00 Retrieved=-0.96
Index 17: Set=25.00 Retrieved=24.97
Index 18: Set=-30.50 Retrieved=-30.49
Index 19: Set=0.00 Retrieved=0.00
Index 20: Set=0.50 Retrieved=0.48
Index 21: Set=-1.00 Retrieved=-0.96
Index 22: Set=25.00 Retrieved=24.97
Index 23: Set=-30.50 Retrieved=-30.49
Index 24: Set=0.00 Retrieved=0.00
Index 25: Set=0.50 Retrieved=0.48
Index 26: Set=-1.00 Retrieved=-0.96
Index 27: Set=25.00 Retrieved=24.97
Index 28: Set=-30.50 Retrieved=-30.49
Index 29: Set=0.00 Retrieved=0.00
Index 30: Set=0.50 Retrieved=0.48
Index 31: Set=-1.00 Retrieved=-0.96
INT8 boundary test: Setting -128.0, got -128.0
INT8 boundary test: Setting -127.0, got -127.0
INT8 boundary test: Setting -64.0, got -64.5
INT8 boundary test: Setting 0.0, got 0.0
INT8 boundary test: Setting 63.0, got 63.5
INT8 boundary test: Setting 126.0, got 126.0
INT8 boundary test: Setting 127.0, got 127.0

Testing independent blocks with different scales:

Block 1 - Small values:
Index 0: Set=0.100000 Got=0.099982 Diff=0.000018
Index 1: Set=0.128125 Got=0.128141 Diff=0.000016
Index 2: Set=0.156250 Got=0.156240 Diff=0.000010
Index 3: Set=0.184375 Got=0.184340 Diff=0.000035
Index 4: Set=0.212500 Got=0.212560 Diff=0.000060
Index 5: Set=0.240625 Got=0.240659 Diff=0.000034
Index 6: Set=0.268750 Got=0.268637 Diff=0.000113
Index 7: Set=0.296875 Got=0.296978 Diff=0.000103
Index 8: Set=0.325000 Got=0.325077 Diff=0.000077
Index 9: Set=0.353125 Got=0.353176 Diff=0.000051
Index 10: Set=0.381250 Got=0.381275 Diff=0.000025
Index 11: Set=0.409375 Got=0.409374 Diff=0.000001
Index 12: Set=0.437500 Got=0.437473 Diff=0.000027
Index 13: Set=0.465625 Got=0.465572 Diff=0.000053
Index 14: Set=0.493750 Got=0.493671 Diff=0.000079
Index 15: Set=0.521875 Got=0.521770 Diff=0.000105
Index 16: Set=0.550000 Got=0.549870 Diff=0.000130
Index 17: Set=0.578125 Got=0.577969 Diff=0.000156
Index 18: Set=0.606250 Got=0.606068 Diff=0.000182
Index 19: Set=0.634375 Got=0.634167 Diff=0.000208
Index 20: Set=0.662500 Got=0.662266 Diff=0.000234
Index 21: Set=0.690625 Got=0.690849 Diff=0.000224
Index 22: Set=0.718750 Got=0.718948 Diff=0.000198
Index 23: Set=0.746875 Got=0.747047 Diff=0.000172
Index 24: Set=0.775000 Got=0.775146 Diff=0.000147
Index 25: Set=0.803125 Got=0.803246 Diff=0.000121
Index 26: Set=0.831250 Got=0.831345 Diff=0.000095
Index 27: Set=0.859375 Got=0.859444 Diff=0.000069
Index 28: Set=0.887500 Got=0.887543 Diff=0.000043
Index 29: Set=0.915625 Got=0.915642 Diff=0.000017
Index 30: Set=0.943750 Got=0.943741 Diff=0.000009
Index 31: Set=0.971875 Got=0.971840 Diff=0.000035

Block 2 - Medium values:
Index 0: Set=10.000000 Got=9.999390 Diff=0.000610
Index 1: Set=10.312500 Got=10.309448 Diff=0.003052
Index 2: Set=10.625000 Got=10.627258 Diff=0.002258
Index 3: Set=10.937500 Got=10.937317 Diff=0.000183
Index 4: Set=11.250000 Got=11.247375 Diff=0.002625
Index 5: Set=11.562500 Got=11.565186 Diff=0.002686
Index 6: Set=11.875000 Got=11.875244 Diff=0.000244
Index 7: Set=12.187500 Got=12.185303 Diff=0.002197
Index 8: Set=12.500000 Got=12.503113 Diff=0.003113
Index 9: Set=12.812500 Got=12.813171 Diff=0.000671
Index 10: Set=13.125000 Got=13.123230 Diff=0.001770
Index 11: Set=13.437500 Got=13.441040 Diff=0.003540
Index 12: Set=13.750000 Got=13.751099 Diff=0.001099
Index 13: Set=14.062500 Got=14.061157 Diff=0.001343
Index 14: Set=14.375000 Got=14.371216 Diff=0.003784
Index 15: Set=14.687500 Got=14.689026 Diff=0.001526
Index 16: Set=15.000000 Got=14.999084 Diff=0.000916
Index 17: Set=15.312500 Got=15.309143 Diff=0.003357
Index 18: Set=15.625000 Got=15.626953 Diff=0.001953
Index 19: Set=15.937500 Got=15.937012 Diff=0.000488
Index 20: Set=16.250000 Got=16.247070 Diff=0.002930
Index 21: Set=16.562500 Got=16.557129 Diff=0.005371
Index 22: Set=16.875000 Got=16.882690 Diff=0.007690
Index 23: Set=17.187500 Got=17.192749 Diff=0.005249
Index 24: Set=17.500000 Got=17.502808 Diff=0.002808
Index 25: Set=17.812500 Got=17.812866 Diff=0.000366
Index 26: Set=18.125000 Got=18.122925 Diff=0.002075
Index 27: Set=18.437500 Got=18.432983 Diff=0.004517
Index 28: Set=18.750000 Got=18.743042 Diff=0.006958
Index 29: Set=19.062500 Got=19.068604 Diff=0.006104
Index 30: Set=19.375000 Got=19.378662 Diff=0.003662
Index 31: Set=19.687500 Got=19.688721 Diff=0.001221

Block 3 - Large values:
Index 0: Set=100.000000 Got=100.024902 Diff=0.024902
Index 1: Set=103.125000 Got=103.125488 Diff=0.000488
Index 2: Set=106.250000 Got=106.226074 Diff=0.023926
Index 3: Set=109.375000 Got=109.388672 Diff=0.013672
Index 4: Set=112.500000 Got=112.489258 Diff=0.010742
Index 5: Set=115.625000 Got=115.651855 Diff=0.026855
Index 6: Set=118.750000 Got=118.752441 Diff=0.002441
Index 7: Set=121.875000 Got=121.853027 Diff=0.021973
Index 8: Set=125.000000 Got=125.015625 Diff=0.015625
Index 9: Set=128.125000 Got=128.116211 Diff=0.008789
Index 10: Set=131.250000 Got=131.216797 Diff=0.033203
Index 11: Set=134.375000 Got=134.317383 Diff=0.057617
Index 12: Set=137.500000 Got=137.541992 Diff=0.041992
Index 13: Set=140.625000 Got=140.642578 Diff=0.017578
Index 14: Set=143.750000 Got=143.743164 Diff=0.006836
Index 15: Set=146.875000 Got=146.843750 Diff=0.031250
Index 16: Set=150.000000 Got=149.944336 Diff=0.055664
Index 17: Set=153.125000 Got=153.168945 Diff=0.043945
Index 18: Set=156.250000 Got=156.269531 Diff=0.019531
Index 19: Set=159.375000 Got=159.370117 Diff=0.004883
Index 20: Set=162.500000 Got=162.470703 Diff=0.029297
Index 21: Set=165.625000 Got=165.571289 Diff=0.053711
Index 22: Set=168.750000 Got=168.795898 Diff=0.045898
Index 23: Set=171.875000 Got=171.896484 Diff=0.021484
Index 24: Set=175.000000 Got=174.997070 Diff=0.002930
Index 25: Set=178.125000 Got=178.097656 Diff=0.027344
Index 26: Set=181.250000 Got=181.198242 Diff=0.051758
Index 27: Set=184.375000 Got=184.422852 Diff=0.047852
Index 28: Set=187.500000 Got=187.523438 Diff=0.023438
Index 29: Set=190.625000 Got=190.624023 Diff=0.000977
Index 30: Set=193.750000 Got=193.724609 Diff=0.025391
Index 31: Set=196.875000 Got=196.825195 Diff=0.049805

Verifying accuracy for each block:
Block 0 stats:
  Value range: 0.107132 to 0.971840
  Max absolute difference: 0.011900
  Max relative error: 7.131957%
Block 1 stats:
  Value range: 10.231934 to 19.688721
  Max absolute difference: 0.377197
  Max relative error: 3.352865%
Block 2 stats:
  Value range: 102.287109 to 196.825195
  Max absolute difference: 2.743164
  Max relative error: 2.287109%

Testing constant value block:
Index 0: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 1: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 2: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 3: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 4: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 5: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 6: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 7: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 8: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 9: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 10: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 11: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 12: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 13: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 14: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 15: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 16: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 17: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 18: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 19: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 20: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 21: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 22: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 23: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 24: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 25: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 26: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 27: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 28: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 29: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 30: Expected=10.000000 Got=9.999390 Diff=0.000610
Index 31: Expected=10.000000 Got=9.999390 Diff=0.000610
Maximum relative error: 0.006104%

Testing single block precision:
Index 0: Set=0.312500 Got=0.312481 RelError=0.000061
Index 1: Set=0.625000 Got=0.624962 RelError=0.000061
Index 2: Set=0.937500 Got=0.937443 RelError=0.000061
Index 3: Set=1.250000 Got=1.249924 RelError=0.000061
Index 4: Set=1.562500 Got=1.562889 RelError=0.000249
Index 5: Set=1.875000 Got=1.874886 RelError=0.000061
Index 6: Set=2.187500 Got=2.187851 RelError=0.000160
Index 7: Set=2.500000 Got=2.499847 RelError=0.000061
Index 8: Set=2.812500 Got=2.811844 RelError=0.000233
Index 9: Set=3.125000 Got=3.125778 RelError=0.000249
Index 10: Set=3.437500 Got=3.437775 RelError=0.000080
Index 11: Set=3.750000 Got=3.749771 RelError=0.000061
Index 12: Set=4.062500 Got=4.061768 RelError=0.000180
Index 13: Set=4.375000 Got=4.375702 RelError=0.000160
Index 14: Set=4.687500 Got=4.685760 RelError=0.000371
Index 15: Set=5.000000 Got=4.999695 RelError=0.000061
Index 16: Set=5.312500 Got=5.313629 RelError=0.000213
Index 17: Set=5.625000 Got=5.623688 RelError=0.000233
Index 18: Set=5.937500 Got=5.937622 RelError=0.000021
Index 19: Set=6.250000 Got=6.251556 RelError=0.000249
Index 20: Set=6.562500 Got=6.561615 RelError=0.000135
Index 21: Set=6.875000 Got=6.875549 RelError=0.000080
Index 22: Set=7.187500 Got=7.185608 RelError=0.000263
Index 23: Set=7.500000 Got=7.499542 RelError=0.000061
Index 24: Set=7.812500 Got=7.813477 RelError=0.000125
Index 25: Set=8.125000 Got=8.123535 RelError=0.000180
Index 26: Set=8.437500 Got=8.441345 RelError=0.000456
Index 27: Set=8.750000 Got=8.751404 RelError=0.000160
Index 28: Set=9.062500 Got=9.061462 RelError=0.000114
Index 29: Set=9.375000 Got=9.371521 RelError=0.000371
Index 30: Set=9.687500 Got=9.689331 RelError=0.000189
Index 31: Set=10.000000 Got=9.999390 RelError=0.000061
Testing zero crossing behavior:

Range 0:
Value:  -0.001000 -> Retrieved:  -0.000999
Value:  -0.000100 -> Retrieved:  -0.000102
Value:   0.000000 -> Retrieved:   0.000000
Value:   0.000100 -> Retrieved:   0.000102
Value:   0.001000 -> Retrieved:   0.000999

Range 1:
Value:  -0.100000 -> Retrieved:  -0.099982
Value:  -0.050000 -> Retrieved:  -0.050385
Value:   0.000000 -> Retrieved:   0.000000
Value:   0.050000 -> Retrieved:   0.050385
Value:   0.100000 -> Retrieved:   0.099982

Range 2:
Value:  -1.000000 -> Retrieved:  -0.999939
Value:  -0.500000 -> Retrieved:  -0.503906
Value:   0.000000 -> Retrieved:   0.000000
Value:   0.500000 -> Retrieved:   0.503906
Value:   1.000000 -> Retrieved:   0.999939
Debug info:
Number of elements: 32
Block size: 32
Bytes per block: 34
Number of blocks: 1
Data size: 34
Header size: 24
Total size with header: 58
Debug info:
Number of elements: 32
Block size: 32
Bytes per block: 34
Number of blocks: 1
Data size: 34
Header size: 24
Total size with header: 58
Test: class uk.ac.manchester.tornado.unittests.tensors.TestTensorQ8
	Running test: testBasicQuantization      ................  [PASS] 
	Running test: testTensorQ8SetAndGetFloat ................  [PASS] 
	Running test: testTensorQ8SetAndGetFloatVerify ................  [PASS] 
	Running test: testMixedScaleValues       ................  [PASS] 
	Running test: testQuantizationRange      ................  [PASS] 
	Running test: testInt8Range              ................  [PASS] 
	Running test: testIndependentBlocks      ................  [PASS] 
	Running test: testConstantBlock          ................  [PASS] 
	Running test: testSingleBlockPrecision   ................  [PASS] 
	Running test: testNonAlignedBlockSize    ................  [PASS] 
	Running test: testZeroCrossing           ................  [PASS] 
	Running test: testRepeatedUpdates        ................  [PASS] 
	Running test: testAlternatingPatterns    ................  [PASS] 


@mikepapadim mikepapadim changed the title [TensorAPI] Add initial support for TensorQ8 to support quantization and dequantization of Floats [TensorAPI] Add initial support for TensorQ8 and TensorQ4 to support quantization and dequantization of Floats Nov 16, 2024
@jjfumero
Copy link
Member

Let's add also the JIT mode

@jjfumero
Copy link
Member

I suggest moving this PR as a draft PR until the full JIT mode is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants