diff --git a/docs/aurora/data-science/libraries/openvino.md b/docs/aurora/data-science/libraries/openvino.md index 68b206141..c1a20bced 100644 --- a/docs/aurora/data-science/libraries/openvino.md +++ b/docs/aurora/data-science/libraries/openvino.md @@ -5,11 +5,11 @@ This page contains build and run instructions for Python and C/C++ examples, but -## Instlling OpenVINO +## Instlling the OpenVINO Python Runtime and CLI Tools OpenVINO does not come with the default frameworks module on Aurora, but it can be installed manually within a virtual environment as shown below ``` module use /soft/modulefiles -module load frameworks/2023.10.15.001 +module load frameworks/2023.12.15.001 python -m venv --clear /path/to/_ov_env --system-site-packages source /path/to/_ov_env/bin/activate pip install openvino==2023.2 @@ -22,7 +22,7 @@ Note that `/path/to/` can either be a user's home or project directory. To use OpenVINO in the future, simply load the frameworks module and source the virtual environment. ``` module use /soft/modulefiles -module load frameworks/2023.10.15.001 +module load frameworks/2023.12.15.001 source /path/to/_ov_env/bin/activate ``` @@ -89,28 +89,169 @@ Note that `benchmark_app` takes a number of additional configuration options as ## Inference with Python OpenVINO API -Inference can be performed invoking the compiled model directly or using the OpenVINO Runtime API explicitly. +Inference can be performed invoking the compiled model directly or using the OpenVINO Runtime API explicitly to create inference requests. An example of performing direct inference with the compiled model is shown below. This leads to compact code, but it performs a single synchronous inference request. Future calls to the model will reuse the same inference request created, thus will experience less overhead. -Note that the output of the model is a numpy array. ``` import openvino as ov +import openvino.properties.hint as hints import torch core = ov.Core() -compiled_model = core.compile_model("resnet50.xml",device_name='GPU.0') +config = {hints.inference_precision: 'f32'} +compiled_model = core.compile_model("resnet50.xml",device_name='GPU.0', config=config) input_data = torch.rand((1, 3, 224, 224)) results = compiled_model(input_data)[0] ``` -The Runtime API can be called explicitly to have more control over the requests. +Note: + +* The output of the direct call to the compiled model is a NumPy array +* By default, OpenVINO performs inference with FP16 precision on GPU, therefore the precision type must be specified as a hint during model compilation if FP32 or other precisions are desired. + +Other than the direct call to the model, the Runtime API can be used to create inference requests and control their execution. For this approach we refer the user to the OpenVINO [documentation page](https://docs.openvino.ai/2023.2/openvino_docs_OV_UG_Integrate_OV_with_your_application.html), which clearly outlines the steps involved. ## Inference with C++ OpenVINO API -This feature is still under testing on Aurora. + +Currently, the C++ OpenVINO API on Aurora is enabled through a pre-built set of libraries. +The environment is set as follows, with `/path/to/openvino` being a placeholder for the user to specify +``` +module use /soft/modulefiles +module load spack-pe-gcc +module load cmake + +export OV_PATH=/path/to/openvino +cp /home/balin/OpenVINO/SLES15.3/openvino-suse.tar.gz $OV_PATH +tar -xzvf $OV_PATH/openvino-suse.tar.gz -C $OV_PATH +source $OV_PATH/openvino/setupvars.sh + +# Need to add a path to the libtbb.so.2 library needed by OpenVINO +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/soft/datascience/llm_ds/basekit_2023_0_25537/vtune/2023.0.0/lib64 +export ONEAPI_DEVICE_SELECTOR=opencl:gpu +export ZE_AFFINITY_MASK=0.0 +``` + +An example performing inference with the C++ OpenVINO API is shown below. +This simple program loads the ResNet50 model in OpenVINO IR format to the GPU (see instructions above on how to download and convert the model), creates an input vector and offloads it to the GPU with SYCL, and finally executes a single synchronous inference request on the GPU. + +``` +#include +#include +#include +#include "sycl/sycl.hpp" +#include "openvino/openvino.hpp" +#include "openvino/runtime/intel_gpu/ocl/ocl.hpp" + +const int N_BATCH = 1; +const int N_CHANNELS = 3; +const int N_PIXELS = 224; +const int INPUTS_SIZE = N_BATCH*N_CHANNELS*N_PIXELS*N_PIXELS; + +int main(int argc, const char* argv[]) +{ + // Print some information about OpenVINO and start the runtime + std::cout << "Running with " << ov::get_openvino_version() << "\n\n"; + ov::Core core; + std::vector availableDevices = core.get_available_devices(); + char device_str[] = "GPU"; + bool found_device = false; + for (auto&& device : availableDevices) { + if (strcmp(device.c_str(),device_str)==0) { + std::cout << "Found device " << device << " \n\n"; + found_device = true; + } + } + if (not found_device) { + std::cout << "Input device not found \n"; + std::cout << "Available devices are: \n"; + for (auto&& device : availableDevices) { + std::cout << device << std::endl; + } + return -1; + } + + // Load the model + std::shared_ptr model = core.read_model("./resnet50.xml"); + std::cout << "Loaded model \n\n"; + + // Create the input data on the host + std::vector inputs(INPUTS_SIZE); + srand(12345); + for (int i=0; i (rand()) / static_cast (RAND_MAX); + } + std::cout << "Generated input data on the host \n\n"; + + // Move input data to the device with SYCL + sycl::queue Q(sycl::gpu_selector_v, sycl::property::queue::in_order{}); // oneDNN needs in order queues + std::cout << "SYCL running on " + << Q.get_device().get_info() + << "\n\n"; + float *d_inputs = sycl::malloc_device(INPUTS_SIZE, Q); + Q.memcpy((void *) d_inputs, (void *) inputs.data(), INPUTS_SIZE*sizeof(float)); + Q.wait(); + + // Share the SYCL queue and context with the GPU plugin and compile the model + auto queue = sycl::get_native(Q); + auto remote_context = ov::intel_gpu::ocl::ClContext(core, queue); + auto compiled_model = core.compile_model(model, remote_context, + ov::hint::inference_precision("f32")); + + // Convert input array to OpenVINO Tensor + ov::element::Type input_type = ov::element::f32; + ov::Shape input_shape = {N_BATCH, N_CHANNELS, N_PIXELS, N_PIXELS}; + //ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, d_inputs); + auto input_tensor = remote_context.create_tensor(input_type, input_shape, (void *) d_inputs); + + // Run inference + ov::InferRequest infer_request = compiled_model.create_infer_request(); + infer_request.set_input_tensor(input_tensor); + infer_request.infer(); + std::cout << "Performed inference \n\n"; + + // Output the predicted Torch tensor + ov::Tensor output_tensor = infer_request.get_output_tensor(); + std::cout << "Size of output tensor " << output_tensor.get_shape() << std::endl; + std::cout << "Predicted tensor is : \n"; + for (int i=0; i<10; i++) { + std::cout << output_tensor.data()[i] << "\n"; + } + std::cout << "\n"; + + return 0; +} +``` + +To build the example program, use the `CMakeLists.txt` file below +``` +cmake_minimum_required(VERSION 3.5 FATAL_ERROR) +project(inference_openvino_sycl_example) + +find_package(OpenVINO REQUIRED COMPONENTS Runtime) +set(ov_link_libraries openvino::runtime) + +add_executable(inference_openvino_sycl inference_openvino_sycl.cpp) +target_link_libraries(inference_openvino_sycl ${ov_link_libraries} -lOpenCL) + +set_property(TARGET inference_openvino_sycl PROPERTY CXX_STANDARD 17) +``` + +and execute +``` +cmake -DCMAKE_CXX_FLAGS="-std=c++17 -fsycl" ./ +make +./inference_openvino_sycl +``` + +Note: + +* OpenVINO does not currently support the Level Zero backend. OpenCL must be used instead, which can be set on Aurora with `export ONEAPI_DEVICE_SELECTOR=opencl:gpu` +* The [Remote Tensor API](https://docs.openvino.ai/2023.3/openvino_docs_OV_UG_supported_plugins_GPU_RemoteTensor_API.html) must be used to share the SYCL OpenCL context with OpenVINO +