Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto quantization of big arrays on computation #600

Open
eix128 opened this issue Dec 8, 2024 · 1 comment
Open

auto quantization of big arrays on computation #600

eix128 opened this issue Dec 8, 2024 · 1 comment
Labels
API question Further information is requested

Comments

@eix128
Copy link

eix128 commented Dec 8, 2024

hi , is it possible to auto quantization of big arrays on long computations and make them back to their original values.

Maybe we can use auto encoders or gaussians or any other network for quantization of big arrays before computation and return back to values.

This may require training but it can be nice feature.
We will give list of possible values as a txt or csv file.
Java unit test maybe used for training.We may pass all datas to tornadovm on unit stage maybe
Train it , then the list back to tornadovm with quantized.
so tornado vm can utilize INT4,INT8

we can also ignore values from post calculation by training also....

There is also neural network optimizer.
https://github.com/microsoft/Olive
I dont know tornadovm can also utilize this kind of utilities.

also each quantization training can be a "context"
so , when i do multiplication between two int4 arrays , also pass which trained "context" for ?

@jjfumero jjfumero added question Further information is requested API labels Dec 9, 2024
@jjfumero
Copy link
Member

jjfumero commented Dec 9, 2024

Hi @eix128 .

We are working on enabling quantization in TornadoVM. It is enabled in this PR, #591, but we need to complete it with the correct code for OpenCL/PTX/SPIRV. But as an API is already enabled.

Regarding auto-quantization, this requires more thinking from our side. It can be tricky because of loss of precision while the user does not expect it. For instance, if we use a FloatArray, we could automatically quentize it to use INT8, as in CUDA (int8_t). However, this will not comply with FP32 IEEE 754. Thus, TornadoVM either throws a warning, or we make it explicit when the auto-conversion should happen (for example, using the ExecutionContext object). Something like:

executionContext
     .withAutoQuantization()
     .execute();

Not sure if this answers your questions. In a nutshell

  • Quantized types are under development
  • Auto-quantization is not under our radar, but we are open to suggestions and use-cases.

I include @kotselidis and @mikepapadim in the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants