Support for Float16 Stable Diffusion Onnx models #34

AshD · 2023-11-15T19:21:16Z

AshD
Nov 15, 2023

I was trying to use this with the Microsoft Olive optimized Float16 Stable diffusion models and it was throwing an exception.
Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:Fail] D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(935)\onnxruntime.DLL!00007FFC53310120: (caller: 00007FFC5330BE42) Exception(1) tid(1e34) 80070057 The parameter is incorrect.

at line 
var results = await _onnxModelService.RunInferenceAsync(model, OnnxModelType.TextEncoder, inputs, outputs);

in function 
public async Task<float[]> EncodeTokensAsync(IModelOptions model, int[] tokenizedInput) 
in file PromptService.cs

The model I used is here
https://huggingface.co/softwareweaver/photon
It was converted to ONNX using the Microsoft Olive toolchain
https://github.com/microsoft/Olive

Thanks,
Ash

saddam213 · 2023-11-15T19:34:31Z

saddam213
Nov 15, 2023
Maintainer

Hi AshD

Float16 should be supported as many are using f16 model now

I tried to have a look at the TextEncoders model.config at the repo but is not there?
https://huggingface.co/softwareweaver/photon/tree/main/text_encoder

I suspect it has a different tensor shape for the input, But I have no idea what it is looking for as all model.configs have been removed form the repo

If I have time tonight I will try load up this model and see if I can figure out what it needs

0 replies

Amin456789 · 2023-11-15T19:36:10Z

Amin456789
Nov 15, 2023

i am using lcm dreamshaper v7 fp16, chilloutmix fp16 and even a quantized model with no problem, i do believe it is because of olive optimization, u should use fp16 none olive till Adam add support for olive models i guess,

there are some models [including fp16 ones i just talked about] here:
#14

0 replies

saddam213 · 2023-11-15T20:17:10Z

saddam213
Nov 15, 2023
Maintainer

Found the issue, Olive models use a struct type Float16 value not a half precision Float32

I will support this as I think its important as we are targeting .NET, olive support seems natural,
However its a large chunk of work, so will most likely take a week or so

This is one of those things that I'm glad we found now, as it would only get harder and harder to implement as new features are added to OnnxStack

I will open a PR later so you can track how this is going, and link it to this issue so people are aware

0 replies

saddam213 · 2023-11-15T21:01:01Z

saddam213
Nov 15, 2023
Maintainer

Actually I can do this in 2 stages, first I can just convert the input output tensor types, I dont expect the performance hit to be that bad casting to float32.

Then eventually fully support multiple value types for diffusers/schedulers

Stage 1 I can most likey get done for Fridays release :)

0 replies

AshD · 2023-11-15T23:39:36Z

AshD
Nov 15, 2023
Author

Thanks Adam. I can post some Float16 code later that I had to add to make the translations. I wished the Half datatype on C# could be used directly. Feel free to use it, if it is helpful.

Thank Amin. Let me try out the lcm dreamshaper v7 fp16 model and benchmark it. The Photon Olive optimized Float16 model creates an image in 2.7 seconds using Euler A with 15 steps using DirectML windows 11 on RTX 3090/intel Core 11 pc.

0 replies

AshD · 2023-11-16T00:52:18Z

AshD
Nov 16, 2023
Author

Feel free to use the code below, if it is helpful. It was useful for me to translate between float and Float16.

Here is the Float16Extensions class

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

public static class Float16Extensions
{
    public static Float16 Add(this Float16 t1, Float16 t2)
    {
        var half1 = BitConverter.UInt16BitsToHalf(t1.value);
        var half2 = BitConverter.UInt16BitsToHalf(t2.value);

        return new Float16(BitConverter.HalfToUInt16Bits((half1 + half2)));

    }

    public static Float16 Subtract(this Float16 t1, Float16 t2)
    {
        var half1 = BitConverter.UInt16BitsToHalf(t1.value);
        var half2 = BitConverter.UInt16BitsToHalf(t2.value);

        return new Float16(BitConverter.HalfToUInt16Bits((half1 - half2)));
    }

    public static Float16 Mul(this Float16 t1, Float16 t2)
    {
        var half1 = BitConverter.UInt16BitsToHalf(t1.value);
        var half2 = BitConverter.UInt16BitsToHalf(t2.value);

        return new Float16(BitConverter.HalfToUInt16Bits((half1 * half2)));
    }

    public static Float16 Div(this Float16 t1, Float16 t2)
    {
        var half1 = BitConverter.UInt16BitsToHalf(t1.value);
        var half2 = BitConverter.UInt16BitsToHalf(t2.value);

        return new Float16(BitConverter.HalfToUInt16Bits((half1 / half2)));
    }

    // Convert from System.Half to Float16
    public static Float16 ToFloat16(this Half half)
    {
        // Convert the half-precision floating-point number to a bit representation
        byte[] halfBytes = BitConverter.GetBytes(half);
        ushort halfBits = BitConverter.ToUInt16(halfBytes);

        // Now you have the bits, you can initialize a Float16
        return new Float16(halfBits);
    }

    // Convert from Float16 to System.Half
    public static Half ToHalf(this Float16 value)
    {
        return BitConverter.UInt16BitsToHalf(value.value);
    }

    public static float ToFloat(this Float16 value)
    {
        return (float)BitConverter.UInt16BitsToHalf(value.value);
    }

    public static Float16 ToFloat16(this float value)
    {
        return ((Half)value).ToFloat16();
    }

}

And here is the TensorHelper for Float16 operations

public partial class TensorHelper
{
    public static DenseTensor<Float16> DivideTensorByFloat(Float16[] data, float value, int[] dimensions)
    {
        for (int i = 0; i < data.Length; i++)
        {
            data[i] = new Float16(BitConverter.HalfToUInt16Bits((Half)((float)BitConverter.UInt16BitsToHalf(data[i].value) / value)));
        }

        return CreateTensor(data, dimensions);
    }

    public static DenseTensor<Float16> DivideTensorByFloat16(Float16[] data, Float16 value, int[] dimensions)
    {
        for (int i = 0; i < data.Length; i++)
        {
            float temp = (float)BitConverter.UInt16BitsToHalf(data[i].value);

            data[i] = new Float16(BitConverter.HalfToUInt16Bits((Half)(temp / (float)value)));
        }

        return CreateTensor(data, dimensions);
    }

    public static DenseTensor<Float16> MultipleTensorByFloat16(Float16[] data, float value, int[] dimensions)
    {
        for (int i = 0; i < data.Length; i++)
        {
            var interim = ((float)BitConverter.UInt16BitsToHalf(data[i].value)) * value;
            data[i] = new Float16(BitConverter.HalfToUInt16Bits((Half)(interim)));
        }

        return CreateTensor(data, dimensions);
    }

    public static DenseTensor<Float16> MultipleTensorByFloat16(Tensor<Float16> data, Float16 value)
    {
        return MultipleTensorByFloat16(data.ToArray(), (float)value.ToHalf(), data.Dimensions.ToArray());
    }

    public static DenseTensor<Float16> MultipleTensorByFloat16(Tensor<Float16> data, float value)
    {
        return MultipleTensorByFloat16(data.ToArray(), value, data.Dimensions.ToArray());
    }

    public static DenseTensor<Float16> AddTensors(Float16[] sample, Float16[] sumTensor, int[] dimensions)
    {
        for (var i = 0; i < sample.Length; i++)
        {
            sample[i] = sample[i].Add(sumTensor[i]);
        }
        return CreateTensor(sample, dimensions);
    }

    public static DenseTensor<Float16> AddTensors(Tensor<Float16> sample, Tensor<Float16> sumTensor)
    {
        return AddTensors(sample.ToArray(), sumTensor.ToArray(), sample.Dimensions.ToArray());
    }

    public static DenseTensor<Float16> SumTensors(Tensor<Float16>[] tensorArray, int[] dimensions)
    {
        var sumTensor = new DenseTensor<Float16>(dimensions);
        var sumArray = new Float16[sumTensor.Length];

        for (int m = 0; m < tensorArray.Count(); m++)
        {
            var tensorToSum = tensorArray[m].ToArray();
            for (var i = 0; i < tensorToSum.Length; i++)
            {
                sumArray[i] = sumArray[i].Add(tensorToSum[i]);
            }
        }

        return CreateTensor(sumArray, dimensions);
    }

    public static DenseTensor<Float16> SubtractTensorsFloat16(Float16[] sample, float[] subTensor, int[] dimensions)
    {

        for (var i = 0; i < sample.Length; i++)
        {
            var subdata = new Float16(BitConverter.HalfToUInt16Bits((Half)subTensor[i]));
            sample[i] = sample[i].Subtract(subdata);
        }
        return CreateTensor(sample, dimensions);
    }

    public static DenseTensor<Float16> SubtractTensorsFloat16(Float16[] sample, Float16[] subTensor, int[] dimensions)
    {
        for (var i = 0; i < sample.Length; i++)
        {
            sample[i] = sample[i].Subtract(subTensor[i]);
        }
        return CreateTensor(sample, dimensions);
    }

    public static DenseTensor<Float16> SubtractTensorsFloat16(Tensor<Float16> sample, Tensor<Float16> subTensor)
    {
        return SubtractTensorsFloat16(sample.ToArray(), subTensor.ToArray(), sample.Dimensions.ToArray());
    }

}

0 replies

saddam213 · 2023-11-16T09:04:28Z

saddam213
Nov 16, 2023
Maintainer

Pushed a PR if you would like to test, not final implementation, but seems to be working with the Photon model

0 replies

Amin456789 · 2023-11-16T09:34:08Z

Amin456789
Nov 16, 2023

u r welcome Ashd, i just posted some of the models fp16 optimized olive from ur huggingface page in that topic so people can use when olive is supported here

0 replies

AshD · 2023-11-16T18:16:09Z

AshD
Nov 16, 2023
Author

Thanks Adam and Amin.

I tested this with the other Olive models FP16, I created earlier and they all work :-)

The Olive Models took about 4 to 5 GB of VRAM and using Euler - 30 steps, they took around 1.5 to 1.7 seconds to generate on my intel core 13/Rtx 4090 using DirectML.

LCM-Dreamshaper-V7 with 5 steps took around 2.0 to 2.3 seconds and around 7GB VRAM.

I am wondering if a Float16 Olive optimized version of LCM-Dreamshaper-V7 would be faster.

Thanks,
Ash

0 replies

Amin456789 · 2023-11-16T18:21:08Z

Amin456789
Nov 16, 2023

u r welcome. that is great! thanks for the tests,
well it seems u r good at making fp16 olive models, could u please make one out of lcm and test it out? please share the model in ur huggingface page if it is faster than the regular one

0 replies

saddam213 · 2023-11-16T19:21:31Z

saddam213
Nov 16, 2023
Maintainer

Thanks Adam and Amin.

I tested this with the other Olive models FP16, I created earlier and they all work :-)

The Olive Models took about 4 to 5 GB of VRAM and using Euler - 30 steps, they took around 1.5 to 1.7 seconds to generate on my intel core 13/Rtx 4090 using DirectML.

LCM-Dreamshaper-V7 with 5 steps took around 2.0 to 2.3 seconds and around 7GB VRAM.

I am wondering if a Float16 Olive optimized version of LCM-Dreamshaper-V7 would be faster.

Thanks, Ash

I was blown away how fast the Olive model were, so very happy this is all working now

I can still optimize it a bit more too I think, But I will leave that for a bigger optimization update

We should be able to support BFloat as well witch is Microsofts Cloud optimized Version of olive so if anyone can find one of those models to test with it would be amazing

0 replies

AshD · 2023-11-16T20:16:40Z

AshD
Nov 16, 2023
Author

So I ran the Olive toolchain on LCM-Dreamshaper-V7 and created the optimized model
https://huggingface.co/softwareweaver/LCM_Dreamshaper_v7_Olive_Onnx

and used the cliptokenizer.onnx as the Tokenizer for it.

I am getting an Error
System.ArgumentOutOfRangeException: 'Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')'

at Line 133 in File OnnxStack-OnnxNativeTypes\OnnxStack-OnnxNativeTypes\OnnxStack.StableDiffusion\Diffusers\LatentConsistency**\LatentConsistencyDiffuser.cs**

Thanks,
Ash

0 replies

saddam213 · 2023-11-16T20:44:01Z

saddam213
Nov 16, 2023
Maintainer

LCM was s bit different to convert I think?

@TheyCallMeHex managed to get an onnx conversion, but not sure if he has anything with Olive working yet

I'll clone and test this one now see if I can see anything in the debugger

0 replies

saddam213 · 2023-11-16T20:51:03Z

saddam213
Nov 16, 2023
Maintainer

yeah looks like the converted model is missing an input parameter

should be

timestep_cond is LCM's secret sauce

0 replies

saddam213 · 2023-11-16T20:52:39Z

saddam213
Nov 16, 2023
Maintainer

I really need to start adding some better exceptions now more people are using OnnxStack

0 replies

saddam213 · 2023-11-16T21:02:11Z

saddam213
Nov 16, 2023
Maintainer

Are olive models GPU only? seems to crash on load if using CPU provider

0 replies

AshD · 2023-11-16T21:12:43Z

AshD
Nov 16, 2023
Author

Are olive models GPU only? seems to crash on load if using CPU provider

I have used them on a laptop that had only a CPU and no discrete GPU but I had used it like this

            options.AppendExecutionProvider_DML(GPUId);
            options.AppendExecutionProvider_CPU();

0 replies

AshD · 2023-11-16T21:59:02Z

AshD
Nov 16, 2023
Author

yeah looks like the converted model is missing an input parameter

should be

timestep_cond is LCM's secret sauce

I opened a feature request for this on the Olive Github
microsoft/Olive#645

Can someone point me to the python code used to create this model from the original pytorch model?
https://huggingface.co/TheyCallMeHex/LCM-Dreamshaper-V7-ONNX

Thanks,
Ash

0 replies

TheyCallMeHex · 2023-11-16T22:13:14Z

TheyCallMeHex
Nov 16, 2023

The original model I converted this from was: https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7

To convert the model to ONNX I used Optimum:

https://github.com/huggingface/optimum

https://pypi.org/project/optimum/

To do the conversion the command I ran was:

optimum-cli export onnx --model /the/path/of/the/model/directory --task stable-diffusion /the/path/of/the/output/directory

0 replies

AshD · 2023-11-17T02:09:15Z

AshD
Nov 17, 2023
Author

Thanks @TheyCallMeHex

I made some progress adding the missing parameter based on the Optimum code but got stuck with another error RuntimeError: Dynamic shape axis should be no more than the shape dimension for batch_size

More details here
microsoft/Olive#730

0 replies

Amin456789 · 2023-11-17T06:25:00Z

Amin456789
Nov 17, 2023

maybe @dakenf can help

0 replies

saddam213 · 2023-11-17T07:06:07Z

saddam213
Nov 17, 2023
Maintainer

Moved this over to a discussion since the original issue is resolved :)

0 replies

AshD · 2023-11-17T17:44:21Z

AshD
Nov 17, 2023
Author

Jambay Kinley helped with the issue and I generated the new optimized model.
https://huggingface.co/softwareweaver/LCM_Dreamshaper_v7_Olive_Onnx

With the Optimized version took about 1.8 seconds (8 steps) and 4GB VRAM for an image on a Core i9 13Gen/RTX 4090 Windows 11.

One last question - Are there plans for SDXL? If so, I can work on creating some Olive Optimized models for it.

Thanks,
Ash

2 replies

Amin456789 Nov 17, 2023

nice thanks!

saddam213 Nov 17, 2023
Maintainer

yep, a SDXL pipeline is on the roadmap :)

AshD · 2023-11-25T15:12:39Z

AshD
Nov 25, 2023
Author

Closing this discussion as the original goal has been met.

@saddam213 - I have uploaded a bunch of Olive optimized SDXL models here to test when the project has SDXL support.
https://huggingface.co/softwareweaver
Example - https://huggingface.co/softwareweaver/dreamshaper-xl-1-0-Olive-Onnx

Thanks,
Ash

6 replies

AshD Nov 26, 2023
Author

The normal SDXL Onnx pipeline did not work with the default settings. Here is the working python code

import onnxruntime
import PIL  
from optimum.onnxruntime import ORTStableDiffusionXLPipeline
from PIL import Image  

batch_size = 8
image_size = 1024

sess_options = onnxruntime.SessionOptions()
sess_options.add_free_dimension_override_by_name("unet_sample_batch", batch_size * 2)
sess_options.add_free_dimension_override_by_name("unet_sample_channels", 4)
sess_options.add_free_dimension_override_by_name("unet_sample_height", image_size // 8)
sess_options.add_free_dimension_override_by_name("unet_sample_width", image_size // 8)
sess_options.add_free_dimension_override_by_name("unet_time_batch", 1)
sess_options.add_free_dimension_override_by_name("unet_hidden_batch", batch_size * 2)
sess_options.add_free_dimension_override_by_name("unet_hidden_sequence", 77)
sess_options.add_free_dimension_override_by_name("unet_text_embeds_batch", batch_size * 2)
sess_options.add_free_dimension_override_by_name("unet_text_embeds_size", 1280)
sess_options.add_free_dimension_override_by_name("unet_time_ids_batch", batch_size * 2)
sess_options.add_free_dimension_override_by_name("unet_time_ids_size", 6)


model_id="softwareweaver/stable-diffusion-xl-base-1.0-Olive-Onnx"
pipeline = ORTStableDiffusionXLPipeline.from_pretrained(model_id, provider="DmlExecutionProvider", session_options=sess_options)
prompt = "sailing ship in storm by Leonardo da Vinci"

# Debugging: Print the configuration
print("UNet Config:", pipeline.unet.config)
# Manually setting the default configuration
default_config = {
    "sample_size": image_size // 8,
}
pipeline.unet.config = default_config

image = pipeline(prompt, num_images_per_prompt=batch_size).images[0]

saddam213 Nov 26, 2023
Maintainer

SDXL is now in the master branch,

I did not need to add any free_dimension_overrides with the OnnxStack codebase, your models just worked first time :p

AshD Nov 27, 2023
Author

I was browsing through to Olive Python code and I found this comment about the free dimensions on Line 203
Not sure how perf increase it brings...

https://github.com/microsoft/Olive/blob/main/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py

# Not necessary, but helps DML EP further optimize runtime performance.
# batch_size is doubled for sample & hidden state because of classifier free guidance:

https://github.com/huggingface/diffusers/blob/46c52f9b9607e6ecb29c782c052aea313e6487b7/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L672

saddam213 Nov 27, 2023
Maintainer

Because the UI lets you select Guidance under 1f we kinda need to support both. not sure if overriding it e shape to 2 will break if guidance id set to 1 or under.

After 2 calls with the same shape ONNX will optimize anyway

saddam213 Nov 27, 2023
Maintainer

I could add a key value set to the appsettings.json allowing people to add dimension overrides specific to a model that would be applied when the model is loaded, would give users some freedom to optimize specific to their needs

Support for Float16 Stable Diffusion Onnx models #34

AshD Nov 15, 2023

Replies: 24 comments · 8 replies

saddam213 Nov 15, 2023 Maintainer

Amin456789 Nov 15, 2023

saddam213 Nov 15, 2023 Maintainer

saddam213 Nov 15, 2023 Maintainer

AshD Nov 15, 2023 Author

AshD Nov 16, 2023 Author

saddam213 Nov 16, 2023 Maintainer

Amin456789 Nov 16, 2023

AshD Nov 16, 2023 Author

Amin456789 Nov 16, 2023

saddam213 Nov 16, 2023 Maintainer

AshD Nov 16, 2023 Author

saddam213 Nov 16, 2023 Maintainer

saddam213 Nov 16, 2023 Maintainer

saddam213 Nov 16, 2023 Maintainer

saddam213 Nov 16, 2023 Maintainer

AshD Nov 16, 2023 Author

AshD Nov 16, 2023 Author

TheyCallMeHex Nov 16, 2023

AshD Nov 17, 2023 Author

Amin456789 Nov 17, 2023

saddam213 Nov 17, 2023 Maintainer

AshD Nov 17, 2023 Author

Amin456789 Nov 17, 2023

saddam213 Nov 17, 2023 Maintainer

AshD Nov 25, 2023 Author

AshD Nov 26, 2023 Author

saddam213 Nov 26, 2023 Maintainer

AshD Nov 27, 2023 Author

saddam213 Nov 27, 2023 Maintainer

saddam213 Nov 27, 2023 Maintainer

AshD
Nov 15, 2023

Replies: 24 comments 8 replies

saddam213
Nov 15, 2023
Maintainer

Amin456789
Nov 15, 2023

saddam213
Nov 15, 2023
Maintainer

saddam213
Nov 15, 2023
Maintainer

AshD
Nov 15, 2023
Author

AshD
Nov 16, 2023
Author

saddam213
Nov 16, 2023
Maintainer

Amin456789
Nov 16, 2023

AshD
Nov 16, 2023
Author

Amin456789
Nov 16, 2023

saddam213
Nov 16, 2023
Maintainer

AshD
Nov 16, 2023
Author

saddam213
Nov 16, 2023
Maintainer

saddam213
Nov 16, 2023
Maintainer

saddam213
Nov 16, 2023
Maintainer

saddam213
Nov 16, 2023
Maintainer

AshD
Nov 16, 2023
Author

AshD
Nov 16, 2023
Author

TheyCallMeHex
Nov 16, 2023

AshD
Nov 17, 2023
Author

Amin456789
Nov 17, 2023

saddam213
Nov 17, 2023
Maintainer

AshD
Nov 17, 2023
Author

saddam213 Nov 17, 2023
Maintainer

AshD
Nov 25, 2023
Author

AshD Nov 26, 2023
Author

saddam213 Nov 26, 2023
Maintainer

AshD Nov 27, 2023
Author

saddam213 Nov 27, 2023
Maintainer

saddam213 Nov 27, 2023
Maintainer