Skip to content

Latest commit

 

History

History

InferenceOptimizations

LLM Inference Optimizations

  • Time: October 30, 2024.
    • 11.30 - 12.30 : LLM Inference Optimizations
  • Location: TCS 1416
  • Slack Channel: track-2 : Use to post questions, exact error messages etc.

Setup on Polaris

  • Get interactive node
    qsub -I -l select=1:ngpus=4 -l filesystems=home:eagle:grand -l walltime=1:00:00 -l -q HandsOnHPC -A alcf_training
  • Clone repo and activate module
    $ git clone https://github.com/argonne-lcf/ALCF_Hands_on_HPC_Workshop.git
    $ cd ALCF_Hands_on_HPC_Workshop/InferenceOptimizations
    
    $ module use /soft/modulefiles
    $ module load conda/2024-10-30-workshop
    $ conda activate

Hands-On Examples

We will use LLAMA3-8B model to run inference hands-on examples.

  1. Inference with Huggingface

    $ bash run_HF.sh

    This script will run run_HF.py script with correct command line flags.

  2. Inference with vLLM

    $ bash run_vllm.sh

    This script will run run_vllm.py script with correct command line flags.

  3. vLLM Quantization Example

    $ bash run_vllm_quant.sh

    This script will run run_vllm.py script with correct command line flags.

  4. vLLM SD Example

    $ bash run_vllm_SD.sh

    This script will run run_vllm.py script with correct command line flags.

Useful Links

Acknowledgements

Contributors: Krishna Teja Chitty-Venkata and Siddhisanket (Sid) Raskar.

This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.