Add Inferentai2 and Optimum Neuron Support #120

philschmid · 2024-05-08T08:09:05Z

What does this PR do?

This PR add support for infernetia2. To deploy a model on Inferentia2 you have 3 options:

Provide an already compiled model with a model.neuron file as HF_MODEL_ID, .e.g. optimum/tiny_random_bert_neuron
Provide the HF_OPTIMUM_BATCH_SIZE and HF_OPTIMUM_SEQUENCE_LENGTH environment variables to compile the model on the fly, e.g. HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128
Include neuron dictionary in the config.json file in the model archive, e.g. neuron: {"static_batch_size": 1, "static_sequence_length": 128}

The currently supported tasks can be found here. If you plan to deploy an LLM, we recommend taking a look at Neuronx TGI, which is purposly build for LLMs

JingyaHuang

LGTM, thanks @philschmid. Just left some small nits.

README.md

src/sagemaker_huggingface_inference_toolkit/optimum_utils.py

src/sagemaker_huggingface_inference_toolkit/transformers_utils.py

Co-authored-by: Jingya HUANG <[email protected]>

philschmid added 8 commits April 29, 2024 13:01

v1 inf2

da9a8fd

update instructions

7685d27

make style

c6cbca6

else

b4344d0

quality

e0b66cc

update tests

00d1345

quality

8c47d45

style

a406f81

JingyaHuang approved these changes May 8, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

src/sagemaker_huggingface_inference_toolkit/optimum_utils.py Show resolved Hide resolved

src/sagemaker_huggingface_inference_toolkit/transformers_utils.py Show resolved Hide resolved

Update README.md

8c9fc10

Co-authored-by: Jingya HUANG <[email protected]>

philschmid merged commit 4164089 into main May 8, 2024
2 checks passed

philschmid deleted the inf2 branch May 8, 2024 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Inferentai2 and Optimum Neuron Support #120

Add Inferentai2 and Optimum Neuron Support #120

philschmid commented May 8, 2024

JingyaHuang left a comment

Add Inferentai2 and Optimum Neuron Support #120

Add Inferentai2 and Optimum Neuron Support #120

Conversation

philschmid commented May 8, 2024

What does this PR do?

JingyaHuang left a comment

Choose a reason for hiding this comment