Support for loading multiple LORA adapters at runtime #2661

naiveen · 2024-02-26T03:26:56Z

naiveen
Feb 26, 2024

Hi, I have a base model and several LORA adapters trained on top of it. The base model will always be loaded and for each inference request I modify the model by applying an adapter. I want to optimize my model using TensorRT, is there a way to apply LORA adapters on the optimized TensorRT model?

I would appreciate any ideas on where I can start to work on this problem? Thank you.

narendasan · 2024-02-26T17:49:55Z

narendasan
Feb 26, 2024
Collaborator

I think this would likely need refit support. This is an example in native TensorRT https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates. There may need to be some APIs exposed in torch-trt to work OOB

1 reply

naiveen Mar 13, 2024
Author

Thanks for your help. I tried refitting my model and the time taken to refit the engine is more than the latency gain from using Tensorrt.

narendasan · 2024-08-22T20:47:17Z

narendasan
Aug 22, 2024
Collaborator

For future notice there is now this API that helps make loras easier to use with Torch-TRT Models https://github.com/pytorch/TensorRT/blob/main/examples/dynamo/mutable_torchtrt_module_example.py

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for loading multiple LORA adapters at runtime #2661

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Support for loading multiple LORA adapters at runtime #2661

naiveen Feb 26, 2024

Replies: 2 comments · 1 reply

narendasan Feb 26, 2024 Collaborator

naiveen Mar 13, 2024 Author

narendasan Aug 22, 2024 Collaborator

naiveen
Feb 26, 2024

Replies: 2 comments 1 reply

narendasan
Feb 26, 2024
Collaborator

naiveen Mar 13, 2024
Author

narendasan
Aug 22, 2024
Collaborator