diff --git a/README.md b/README.md index 66f2f2374..b40aaa1ee 100644 --- a/README.md +++ b/README.md @@ -13,9 +13,10 @@ Merlin Systems uses the Merlin Operator DAG API, the same API used in [NVTabular ```python import tensorflow as tf -from nvtabular.workflow import Workflow + from merlin.systems.dag import Ensemble from merlin.systems.dag.ops import PredictTensorflow, TransformWorkflow +from nvtabular.workflow import Workflow # Load saved NVTabular workflow and TensorFlow model workflow = Workflow.load(nvtabular_workflow_path) @@ -42,8 +43,10 @@ After you export your ensemble, you reference the directory to run an instance o tritonserver --model-repository=/export_path/ ``` -Refer to the [Merlin Systems Example Notebooks](./examples/) for a notebook that serves a ranking models ensemble. -The notebook shows how to deploy the ensemble and demonstrates sending requests to Triton Inference Server. +Refer to the [Merlin Example Notebooks](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples/ranking) for exploring notebooks that demonstrate +how to train and evaluate a ranking model with Merlin Models and then how to serve it as an ensemble on [Triton Inference Server](https://github.com/triton-inference-server/server). + +For training models with XGBoost and Implicit, and then serving with Systems, you can visit these [examples](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples/traditional-ml). ## Building a Four-Stage Recommender Pipeline @@ -99,6 +102,9 @@ ensemble = Ensemble(ordering, request_schema) ensemble.export("./ensemble") ``` +Refer to the [Example Notebooks](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples/Building-and-deploying-multi-stage-RecSys) for exploring +`building-and-deploying-multi-stage-RecSys` notebooks with Merlin Models and Systems. + ## Installation Merlin Systems requires Triton Inference Server and Tensorflow. The simplest setup is to use the [Merlin Tensorflow Inference Docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow-inference), which has both pre-installed. diff --git a/examples/README.md b/examples/README.md deleted file mode 100644 index 97774248b..000000000 --- a/examples/README.md +++ /dev/null @@ -1,55 +0,0 @@ -# Merlin Systems Example Notebook - -These Jupyter notebooks demonstrate how to use Merlin Systems to deploy models to Triton Inference Server. - -## Running the Example Notebooks - -Docker containers are available from the NVIDIA GPU Cloud. -We use the latest stable version of the [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container to run the example notebooks. To run the example notebooks using Docker containers, perform the following steps: - -1. If you haven't already created a Docker volume to share models and datasets - between containers, create the volume by running the following command: - - ```shell - docker volume create merlin-examples - ``` - - For the ranking models example, note that the saved `dlrm` model that was created with Merlin Models, the NVTabular workflow, and processed synthetic parquet files should be stored in the `merlin-examples` folder so that they can be mounted to the container for performing inference with Merlin Systems. - -1. Pull and start the container by running the following command: - - ```shell - docker run --gpus all --rm -it \ - -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host \ - -v merlin-examples:/workspace/data \ - nvcr.io/nvidia/merlin/merlin-tensorflow:nightly /bin/bash - ``` - - > In production, instead of using the `nightly` tag, specify a release tag. - > You can find the release tags and more information on the [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) container page. - - The container opens a shell when the run command execution is completed. - Your shell prompt should look similar to the following example: - - ```shell - root@2efa5b50b909: - ``` - -1. Start the JupyterLab server by running the following command: - - ```shell - jupyter-lab --allow-root --ip='0.0.0.0' - ``` - - View the messages in your terminal to identify the URL for JupyterLab. - The messages in your terminal show similar lines to the following example: - - ```shell - Or copy and paste one of these URLs: - http://2efa5b50b909:8888/lab?token=9b537d1fda9e4e9cadc673ba2a472e247deee69a6229ff8d - or http://127.0.0.1:8888/lab?token=9b537d1fda9e4e9cadc673ba2a472e247deee69a6229ff8d - ``` - -1. Open a browser and use the `127.0.0.1` URL provided in the messages by JupyterLab. - -1. After you log in to JupyterLab, navigate to the `/systems/examples` directory to try out the example notebooks. diff --git a/examples/Serving-An-Implicit-Model-With-Merlin-Systems.ipynb b/examples/Serving-An-Implicit-Model-With-Merlin-Systems.ipynb deleted file mode 100644 index d645a172f..000000000 --- a/examples/Serving-An-Implicit-Model-With-Merlin-Systems.ipynb +++ /dev/null @@ -1,488 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "5cdba80f", - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2022 NVIDIA Corporation. All Rights Reserved.\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# http://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License.\n", - "# ==============================================================================\n", - "\n", - "# Each user is responsible for checking the content of datasets and the\n", - "# applicable licenses and determining if suitable for the intended use." - ] - }, - { - "cell_type": "markdown", - "id": "77acbcad", - "metadata": {}, - "source": [ - "\n", - "\n", - "# Serving an Implicit Model with Merlin Systems\n", - "\n", - "This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) container. This Jupyter notebook example demonstrates how to deploy an `Implicit` model to Triton Inference Server (TIS) and generate prediction results for a given query.\n", - "\n", - "## Overview\n", - "\n", - "NVIDIA Merlin is an open source framework that accelerates and scales end-to-end recommender system pipelines. The Merlin framework is broken up into several sub components, these include: Merlin-Core, Merlin-Models, NVTabular and Merlin-Systems. Merlin Systems will be the focus of this example.\n", - "\n", - "The purpose of the Merlin Systems library is to make it easy for Merlin users to quickly deploy their recommender systems from development to [Triton Inference Server](https://github.com/triton-inference-server/server). We extended the same user-friendly API users are accustomed to in NVTabular and leverage it to accommodate deploying recommender system components to TIS. \n", - "\n", - "### Learning objectives\n", - "\n", - "In this notebook, we learn how to deploy an NVTabular Workflow and a trained `Implicit` model from Merlin Models to Triton.\n", - "- Create Ensemble Graph\n", - "- Export Ensemble Graph\n", - "- Run Triton server\n", - "- Send request to Triton and verify results\n", - "\n", - "### Dataset\n", - "\n", - "We use the [MovieLens 100k Dataset](https://grouplens.org/datasets/movielens/100k/). It consists of ratings a user has given a movie along with some metadata for the user and the movie. We train an Implicit model to predict the rating based on user and item features and proceed to deploy it to the Triton Inference Server.\n", - "\n", - "It is important to note that the steps taken in this notebook are generalized and can be applied to any set of workflows and models. \n", - "\n", - "### Tools\n", - "\n", - "- NVTabular\n", - "- Merlin Models\n", - "- Merlin Systems\n", - "- Triton Inference Server" - ] - }, - { - "cell_type": "markdown", - "id": "6efad6b8", - "metadata": {}, - "source": [ - "## Prerequisite: Preparing the data and Training Implicit" - ] - }, - { - "cell_type": "markdown", - "id": "356ef8c9", - "metadata": {}, - "source": [ - "In this tutorial our objective is to demonstrate how to serve an `Implicit` model. In order for us to be able to do so, we begin by downloading data and training a model. We breeze through these activities below.\n", - "\n", - "If you would like to learn more about training an `Implicit` model using the Merlin Models library, please consult this [tutorial](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/07-Train-traditional-ML-models-using-the-Merlin-Models-API.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "edea28d0", - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import nvtabular as nvt\n", - "import numpy as np\n", - "from merlin.schema.tags import Tags\n", - "from merlin.models.implicit import BayesianPersonalizedRanking\n", - "\n", - "from merlin.datasets.entertainment import get_movielens\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b756a12f", - "metadata": {}, - "outputs": [], - "source": [ - "ensemble_export_path = os.environ.get(\"OUTPUT_DATA_DIR\", \"ensemble\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5c10a993", - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "train, _ = get_movielens(variant='ml-100k')\n", - "\n", - "# the implicit model expects a `user_id` column hence the need to rename it\n", - "train = nvt.Dataset(train.compute().rename(columns = {'userId': 'user_id'}))\n", - "\n", - "user_id = ['user_id'] >> nvt.ops.Categorify() >> nvt.ops.TagAsUserID()\n", - "movieId = ['movieId'] >> nvt.ops.Categorify() >> nvt.ops.TagAsItemID()\n", - "\n", - "train_workflow = nvt.Workflow(user_id + movieId)\n", - "train_transformed = train_workflow.fit_transform(train)" - ] - }, - { - "cell_type": "markdown", - "id": "ff168b4a", - "metadata": {}, - "source": [ - "Having preprocessed our data, let's train our model." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "d0b55be5", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n", - "2022-09-05 09:32:07.681291: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "2022-09-05 09:32:07.681740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "2022-09-05 09:32:07.681877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.\n", - " warnings.warn(\n", - "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 777.52it/s, train_auc=85.42%, skipped=29.68%]\n" - ] - } - ], - "source": [ - "model = BayesianPersonalizedRanking()\n", - "model.fit(train_transformed)" - ] - }, - { - "cell_type": "markdown", - "id": "f4a3cf39", - "metadata": {}, - "source": [ - "## Create the Ensemble Graph" - ] - }, - { - "cell_type": "markdown", - "id": "dc40083e", - "metadata": {}, - "source": [ - "Let us now define an `Ensemble` that will be used for serving predictions on the Triton Inference Server.\n", - "\n", - "An `Ensemble` defines operations to be performed on incoming requests. It begins with specifying what fields the inference request will contain.\n", - "\n", - "Our model was trained on data that included the `movieId` column. However, in production, this information will not be available to us, this is what we will be trying to predict.\n", - "\n", - "In general, you want to define a preprocessing workflow once and apply it throughout the lifecycle of your model, from training all the way to serving in production. Redefining the workflows on the go, or using custom written code for these operations, can be a source of subtle bugs.\n", - "\n", - "In order to ensure we process our data in the same way in production as we do in training, let us now modify the training preprocessing pipeline and use it to construct our inference workflow." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "fa8dc34a", - "metadata": {}, - "outputs": [], - "source": [ - "inf_workflow = train_workflow.remove_inputs(['movieId'])" - ] - }, - { - "cell_type": "markdown", - "id": "d71c5636", - "metadata": {}, - "source": [ - "Equipped with the modified data preprocessing workflow, let us define the full set of inference operations we will want to run on the Triton Inference Server.\n", - "\n", - "We begin by stating what data the server can expect (`inf_workflow.input_schema.column_names`). We proceed to wrap our `inf_workflow` in `TransformWorkflow` -- an operator we can leverage for executing our NVTabular workflow during serving.\n", - "\n", - "Last but not least, having received and preprocessed the data, we instruct the Triton Inference Server to perform inference using the model that we trained. " - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "de9e2237", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.dag.ops.implicit import PredictImplicit\n", - "from merlin.systems.dag.ensemble import Ensemble\n", - "from merlin.systems.dag.ops.workflow import TransformWorkflow\n", - "\n", - "inf_ops = inf_workflow.input_schema.column_names >> TransformWorkflow(inf_workflow) \\\n", - " >> PredictImplicit(model.implicit_model)" - ] - }, - { - "cell_type": "markdown", - "id": "76dad9c3", - "metadata": {}, - "source": [ - "With inference operations defined, all that remains now is outputting the ensemble to disk so that it can be loaded up when Triton starts." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "e23a7fc3", - "metadata": {}, - "outputs": [], - "source": [ - "ensemble = Ensemble(inf_ops, inf_workflow.input_schema)\n", - "ensemble.export(ensemble_export_path);" - ] - }, - { - "cell_type": "markdown", - "id": "c9165dfd", - "metadata": {}, - "source": [ - "## Starting the Triton Inference Server" - ] - }, - { - "cell_type": "markdown", - "id": "353e8602", - "metadata": {}, - "source": [ - "After we export the ensemble, we are ready to start the Triton Inference Server. The server is installed in Merlin Tensorflow and Merlin PyTorch containers. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation).\n", - "\n", - "You can start the server by running the following command:\n", - "\n", - "```shell\n", - "tritonserver --model-repository=ensemble\n", - "```\n", - "\n", - "For the `--model-repository` argument, specify the same value as the `export_path` that you specified previously in the `ensemble.export` method.\n", - "\n", - "After you run the `tritonserver` command, wait until your terminal shows messages like the following example:\n", - "\n", - "```shell\n", - "I0414 18:29:50.741833 4067 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001\n", - "I0414 18:29:50.742197 4067 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000\n", - "I0414 18:29:50.783470 4067 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "65b7e4e8", - "metadata": {}, - "source": [ - "## Retrieving Recommendations from Triton Inference Server\n", - "\n", - "Now that our server is running, we can send requests to it. This request is composed of values that correspond to the request schema that was created when we exported the ensemble graph.\n", - "\n", - "In the code below we create a request to send to Triton and send it. We will then analyze the response, to show the full experience.\n", - "\n", - "We begin by obtaining 10 examples from our train data to include in the request." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "2d61751b", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
user_id
06
115
270
386
496
5109
6143
7183
8609
9858
\n", - "
" - ], - "text/plain": [ - " user_id\n", - "0 6\n", - "1 15\n", - "2 70\n", - "3 86\n", - "4 96\n", - "5 109\n", - "6 143\n", - "7 183\n", - "8 609\n", - "9 858" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ten_examples = train.compute()['user_id'].unique().sample(10).sort_values().to_frame().reset_index(drop=True)\n", - "ten_examples" - ] - }, - { - "cell_type": "markdown", - "id": "7808bc12", - "metadata": {}, - "source": [ - "Let's now package the information up as inputs and send it to Triton for inference." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "2fefd5b8", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.triton import convert_df_to_triton_input\n", - "import tritonclient.grpc as grpcclient\n", - "\n", - "inputs = convert_df_to_triton_input(inf_workflow.input_schema, ten_examples, grpcclient.InferInput)\n", - "\n", - "outputs = [\n", - " grpcclient.InferRequestedOutput(col)\n", - " for col in inf_ops.output_schema.column_names\n", - "]\n", - "# send request to tritonserver\n", - "with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n", - " response = client.infer(\"executor_model\", inputs, outputs=outputs)" - ] - }, - { - "cell_type": "markdown", - "id": "3dc7909f", - "metadata": {}, - "source": [ - "We can now compare the predictions from the server to those from our local model." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "6ddd35cc", - "metadata": {}, - "outputs": [], - "source": [ - "predictions_from_triton = response.as_numpy(outputs[0].name())" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "6f28fdfe", - "metadata": {}, - "outputs": [], - "source": [ - "local_predictions = model.predict(inf_workflow.transform(nvt.Dataset(ten_examples)))[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "e946de27", - "metadata": {}, - "outputs": [], - "source": [ - "np.testing.assert_allclose(predictions_from_triton, local_predictions)" - ] - }, - { - "cell_type": "markdown", - "id": "d8aa4456", - "metadata": {}, - "source": [ - "We managed to preprocess the data in the same way in serving as we did during training and obtain the same predictions!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb b/examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb deleted file mode 100644 index 88cc2be8c..000000000 --- a/examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb +++ /dev/null @@ -1,545 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "5cdba80f", - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2022 NVIDIA Corporation. All Rights Reserved.\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# http://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License.\n", - "# ==============================================================================\n", - "\n", - "# Each user is responsible for checking the content of datasets and the\n", - "# applicable licenses and determining if suitable for the intended use." - ] - }, - { - "cell_type": "markdown", - "id": "77acbcad", - "metadata": {}, - "source": [ - "\n", - "\n", - "# Serving an XGBoost Model with Merlin Systems\n", - "\n", - "This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) container. This Jupyter notebook example demonstrates how to deploy an `XGBoost` model to Triton Inference Server (TIS) and generate prediction results for a given query.\n", - "\n", - "## Overview\n", - "\n", - "NVIDIA Merlin is an open source framework that accelerates and scales end-to-end recommender system pipelines. The Merlin framework is broken up into several sub components, these include: Merlin-Core, Merlin-Models, NVTabular and Merlin-Systems. Merlin Systems will be the focus of this example.\n", - "\n", - "The purpose of the Merlin Systems library is to make it easy for Merlin users to quickly deploy their recommender systems from development to [Triton Inference Server](https://github.com/triton-inference-server/server). We extended the same user-friendly API users are accustomed to in NVTabular and leveraged it to accommodate deploying recommender system components to TIS. \n", - "\n", - "### Learning objectives\n", - "\n", - "In this notebook, we learn how to deploy a NVTabular Workflow and a trained XGBoost model from Merlin Models to Triton.\n", - "- Create Ensemble Graph\n", - "- Export Ensemble Graph\n", - "- Run Triton server\n", - "- Send request to Triton and verify results\n", - "\n", - "### Dataset\n", - "\n", - "We use the [MovieLens 100k Dataset](https://grouplens.org/datasets/movielens/100k/). It consists of ratings a user has given a movie along with some metadata for the user and the movie. We train an XGBoost model to predict the rating based on user and item features and proceed to deploy it to the Triton Inference Server.\n", - "\n", - "It is important to note that the steps take in this notebook are generalized and can be applied to any set of workflow and models. \n", - "\n", - "### Tools\n", - "\n", - "- NVTabular\n", - "- Merlin Models\n", - "- Merlin Systems\n", - "- Triton Inference Server" - ] - }, - { - "cell_type": "markdown", - "id": "6efad6b8", - "metadata": {}, - "source": [ - "## Prerequisite: Preparing the data and Training XGBoost" - ] - }, - { - "cell_type": "markdown", - "id": "356ef8c9", - "metadata": {}, - "source": [ - "In this tutorial our objective is to demonstrate how to serve an `XGBoost` model. In order for us to be able to do so, we begin by downloading data and training a model. We breeze through these activities below.\n", - "\n", - "If you would like to learn more about training an `XGBoost` model using the Merlin Models library, please consult this [tutorial](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/07-Train-an-xgboost-model-using-the-Merlin-Models-API.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d0385d38", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.core.utils import Distributed\n", - "from merlin.models.xgb import XGBoost\n", - "import nvtabular as nvt\n", - "import numpy as np\n", - "from merlin.schema.tags import Tags\n", - "\n", - "from merlin.datasets.entertainment import get_movielens" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f79d5736", - "metadata": {}, - "outputs": [], - "source": [ - "ensemble_export_path = os.environ.get(\"OUTPUT_DATA_DIR\", \"ensemble\")" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "0a2d3208", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2022-08-05 22:27:29.446602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "2022-08-05 22:27:29.447091: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "2022-08-05 22:27:29.447227: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", - "downloading ml-100k.zip: 4.94MB [00:03, 1.45MB/s] \n", - "unzipping files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 262.32files/s]\n", - "INFO:merlin.datasets.entertainment.movielens.dataset:starting ETL..\n", - "/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.\n", - " warnings.warn(\n", - "2022-08-05 22:27:39,947 - distributed.diskutils - INFO - Found stale lock file and directory '/workspace/dask-worker-space/worker-oqemvhkv', purging\n", - "2022-08-05 22:27:39,947 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n", - "[22:27:41] task [xgboost.dask]:tcp://127.0.0.1:41809 got new rank 0\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[0]\ttrain-rmse:2.36952\n", - "[20]\ttrain-rmse:0.95316\n", - "[40]\ttrain-rmse:0.92447\n", - "[60]\ttrain-rmse:0.90741\n", - "[80]\ttrain-rmse:0.89437\n", - "[84]\ttrain-rmse:0.89138\n" - ] - } - ], - "source": [ - "\n", - "train, _ = get_movielens(variant='ml-100k')\n", - "\n", - "preprocess_categories = ['movieId', 'userId', 'genres'] >> nvt.ops.Categorify(freq_threshold=2, dtype=np.int32)\n", - "preprocess_rating = ['rating'] >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.REGRESSION])\n", - "\n", - "train_workflow = nvt.Workflow(preprocess_categories + preprocess_rating + train.schema.without(['rating_binary', 'title']).column_names)\n", - "train_transformed = train_workflow.fit_transform(train)\n", - "\n", - "with Distributed():\n", - " model = XGBoost(schema=train_transformed.schema)\n", - " model.fit(\n", - " train_transformed,\n", - " num_boost_round=85,\n", - " verbose_eval=20\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "f4a3cf39", - "metadata": {}, - "source": [ - "## Create the Ensemble Graph" - ] - }, - { - "cell_type": "markdown", - "id": "dc40083e", - "metadata": {}, - "source": [ - "Let us now define an `Ensemble` that will be used for serving predictions on the Triton Inference Server.\n", - "\n", - "An `Ensemble` defines operations to be performed on incoming requests. It begins with specifying what fields that the inference request will contain.\n", - "\n", - "Our model was trained on data that included the `target` column. However, in production, this information will not be available to us.\n", - "\n", - "In general, you want to define a preprocessing workflow once and apply it throughout the lifecycle of your model, from training all the way to serving in production. Redefining the workflows on the go, or using custom written code for these operations, can be a source of subtle bugs.\n", - "\n", - "In order to ensure we process our data in the same way in production as we do in training, let us now modify the training preprocessing pipeline and use it to construct our inference workflow." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "fa8dc34a", - "metadata": {}, - "outputs": [], - "source": [ - "inf_workflow = train_workflow.remove_inputs(['rating'])" - ] - }, - { - "cell_type": "markdown", - "id": "d71c5636", - "metadata": {}, - "source": [ - "Equipped with the modified data preprocessing workflow, let us define the full set of inference operations we will want to run on the Triton Inference Server.\n", - "\n", - "We begin by stating what data the server can expect (`inf_workflow.input_schema.column_names`). We proceed to wrap our `inf_workflow` in `TransformWorkflow` -- an operator we can leverage for executing our NVTabular workflow during serving.\n", - "\n", - "Last but not least, having received and preprocessed the data, we instruct the Triton Inference Server to perform inference using the model that we trained. " - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "de9e2237", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.dag.ops.fil import PredictForest\n", - "from merlin.systems.dag.ensemble import Ensemble\n", - "from merlin.systems.dag.ops.workflow import TransformWorkflow\n", - "\n", - "inf_ops = inf_workflow.input_schema.column_names >> TransformWorkflow(inf_workflow) \\\n", - " >> PredictForest(model, inf_workflow.output_schema)" - ] - }, - { - "cell_type": "markdown", - "id": "76dad9c3", - "metadata": {}, - "source": [ - "With inference operations defined, all that remains now is outputting the ensemble to disk so that it can be loaded up when Triton starts." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "e23a7fc3", - "metadata": {}, - "outputs": [], - "source": [ - "ensemble = Ensemble(inf_ops, inf_workflow.input_schema)\n", - "ensemble.export(ensemble_export_path);" - ] - }, - { - "cell_type": "markdown", - "id": "c9165dfd", - "metadata": {}, - "source": [ - "## Starting the Triton Inference Server" - ] - }, - { - "cell_type": "markdown", - "id": "353e8602", - "metadata": {}, - "source": [ - "After we export the ensemble, we are ready to start the Triton Inference Server. The server is installed in Merlin Tensorflow and Merlin PyTorch containers. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation).\n", - "\n", - "You can start the server by running the following command:\n", - "\n", - "```shell\n", - "tritonserver --model-repository=ensemble\n", - "```\n", - "\n", - "For the `--model-repository` argument, specify the same value as the `export_path` that you specified previously in the `ensemble.export` method.\n", - "\n", - "After you run the `tritonserver` command, wait until your terminal shows messages like the following example:\n", - "\n", - "```shell\n", - "I0414 18:29:50.741833 4067 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001\n", - "I0414 18:29:50.742197 4067 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000\n", - "I0414 18:29:50.783470 4067 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "65b7e4e8", - "metadata": {}, - "source": [ - "## Retrieving Recommendations from Triton Inference Server\n", - "\n", - "Now that our server is running, we can send requests to it. This request is composed of values that correspond to the request schema that was created when we exported the ensemble graph.\n", - "\n", - "In the code below we create a request to send to Triton and send it. We will then analyze the response, to show the full experience.\n", - "\n", - "We begin by obtaining 10 examples from our train data to include in the request." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "2d61751b", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movieIduserIdgenresTE_movieId_ratinguserId_countgenderzip_coderatingrating_binaryagetitle
0777430.7798765.572154177511Toy Story (1995)
12317713-0.8966195.572154177301GoldenEye (1995)
23667717-0.9546325.572154177411Four Rooms (1995)
3967789-0.0938095.572154177301Get Shorty (1995)
43837725-0.5393765.572154177301Copycat (1995)
\n", - "
" - ], - "text/plain": [ - " movieId userId genres TE_movieId_rating userId_count gender zip_code \\\n", - "0 7 77 43 0.779876 5.572154 1 77 \n", - "1 231 77 13 -0.896619 5.572154 1 77 \n", - "2 366 77 17 -0.954632 5.572154 1 77 \n", - "3 96 77 89 -0.093809 5.572154 1 77 \n", - "4 383 77 25 -0.539376 5.572154 1 77 \n", - "\n", - " rating rating_binary age title \n", - "0 5 1 1 Toy Story (1995) \n", - "1 3 0 1 GoldenEye (1995) \n", - "2 4 1 1 Four Rooms (1995) \n", - "3 3 0 1 Get Shorty (1995) \n", - "4 3 0 1 Copycat (1995) " - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ten_examples = train.compute()\n", - "ten_examples.head()" - ] - }, - { - "cell_type": "markdown", - "id": "7808bc12", - "metadata": {}, - "source": [ - "Let's now package the information up as inputs and send it to Triton for inference." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "2fefd5b8", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.triton import convert_df_to_triton_input\n", - "import tritonclient.grpc as grpcclient\n", - "\n", - "ten_examples = train.compute().drop(columns=['rating', 'title', 'rating_binary'])[:10]\n", - "inputs = convert_df_to_triton_input(inf_workflow.input_schema, ten_examples, grpcclient.InferInput)\n", - "\n", - "outputs = [\n", - " grpcclient.InferRequestedOutput(col)\n", - " for col in inf_ops.output_schema.column_names\n", - "]\n", - "# send request to tritonserver\n", - "with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n", - " response = client.infer(\"executor_model\", inputs, outputs=outputs)" - ] - }, - { - "cell_type": "markdown", - "id": "3dc7909f", - "metadata": {}, - "source": [ - "We can now compare the predictions from the server to those from our local model." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "6ddd35cc", - "metadata": {}, - "outputs": [], - "source": [ - "predictions_from_triton = response.as_numpy(outputs[0].name())" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "6f28fdfe", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.\n", - "Perhaps you already have a cluster running?\n", - "Hosting the HTTP server on port 35647 instead\n", - " warnings.warn(\n", - "2022-08-05 22:28:22,197 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n" - ] - } - ], - "source": [ - "with Distributed():\n", - " local_predictions = model.predict(train_transformed)[:10]" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "e946de27", - "metadata": {}, - "outputs": [], - "source": [ - "assert np.allclose(predictions_from_triton, local_predictions)" - ] - }, - { - "cell_type": "markdown", - "id": "d8aa4456", - "metadata": {}, - "source": [ - "We managed to preprocess the data in the same way in serving as we did during training and obtain the same predictions!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb b/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb deleted file mode 100644 index aaed78632..000000000 --- a/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb +++ /dev/null @@ -1,1168 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "620dd990", - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2022 NVIDIA Corporation. All Rights Reserved.\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# http://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License.\n", - "# ==============================================================================\n", - "\n", - "# Each user is responsible for checking the content of datasets and the\n", - "# applicable licenses and determining if suitable for the intended use." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "f550a0e5", - "metadata": {}, - "source": [ - "\n", - "\n", - "# Serving Ranking Models With Merlin Systems\n", - "\n", - "This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container. This Jupyter notebook example demonstrates how to deploy a ranking model to Triton Inference Server (TIS) and generate prediction results for a given query. As a prerequisite, the ranking model must be trained and saved with Merlin Models. Please read the [README](https://github.com/NVIDIA-Merlin/systems/blob/stable/examples/README.md) for the instructions.\n", - "\n", - "## Overview\n", - "\n", - "NVIDIA Merlin is an open source framework that accelerates and scales end-to-end recommender system pipelines. The Merlin framework is broken up into several sub components, these include: Merlin-Core, Merlin-Models, NVTabular and Merlin-Systems. Merlin Systems will be the focus of this example.\n", - "\n", - "The purpose of the Merlin Systems library is to make it easy for Merlin users to quickly deploy their recommender systems from development to [Triton Inference Server](https://github.com/triton-inference-server/server). We extended the same user-friendly API users are accustomed to in NVTabular and leveraged it to accommodate deploying recommender system components to TIS. \n", - "\n", - "There are some points we need ensure before we continue with this Notebook. Please ensure you have a working NVTabular workflow and model stored in an accessible location. Merlin Systems take the data preprocessing workflow defined in NVTabular and load that into Triton Inference Server as a model. Subsequently it does the same for the trained model. Lets take a closer look at how Merlin Systems makes deploying to TIS simple and effortless, in the rest of this notebook.\n", - "\n", - "### Learning objectives\n", - "\n", - "In this notebook, we learn how to deploy a NVTabular Workflow and a trained Tensorflow model from Merlin Models to Triton.\n", - "- Load NVTabular Workflow\n", - "- Load Pre-trained Merlin Models model\n", - "- Create Ensemble Graph\n", - "- Export Ensemble Graph\n", - "- Run Tritonserver\n", - "- Send Request to Tritonserver\n", - "\n", - "### Dataset\n", - "\n", - "We use the synthetic train and test datasets generated by mimicking the real [Ali-CCP: Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1) dataset to build our recommender system ranking models. To see how the data is transformed with NVTabular and how a DLRM model is trained with Merlin Models check out the [04-Exporting-ranking-models.ipynb](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/04-Exporting-ranking-models.ipynb) example notebook which is a prerequisite for this notebook.\n", - "\n", - "It is important to note that the steps take in this notebook are generalized and can be applied to any set of workflow and models. \n", - "\n", - "### Tools\n", - "\n", - "- NVTabular\n", - "- Merlin Models\n", - "- Merlin Systems\n", - "- Triton Inference Server" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "1346eec7-48d9-41b3-a3f7-c88230a6f24c", - "metadata": {}, - "source": [ - "## Install Required Libraries\n", - "\n", - "Install TensorFlow so we can read the saved model from disk." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "3a0b7e5d-197b-42ec-9213-5582a85d140e", - "metadata": {}, - "outputs": [], - "source": [ - "#!pip install tensorflow-gpu" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "222fde5f", - "metadata": {}, - "source": [ - "## Load an NVTabular Workflow\n", - "\n", - "First, we load the `nvtabular.Workflow` that we created in with this [example](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/04-Exporting-ranking-models.ipynb). " - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "b1dc4a71", - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "os.environ[\"TF_GPU_ALLOCATOR\"]=\"cuda_malloc_async\"\n", - "from nvtabular.workflow import Workflow\n", - "\n", - "input_path = os.environ.get(\"INPUT_FOLDER\", \"/workspace/data/\")\n", - "\n", - "workflow_stored_path = os.path.join(input_path, \"workflow\")\n", - "\n", - "workflow = Workflow.load(workflow_stored_path)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "746fba56-36af-44a3-8beb-34ca026648be", - "metadata": {}, - "source": [ - "After we load the workflow, we remove the label columns from it's inputs. This removes all columns with the `TARGET` tag from the workflow. We do this because we need to set the workflow to only require the features needed to predict, not train, when creating an inference pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "bcef202e-fa6f-4c23-aca6-0b965faf637e", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from merlin.schema.tags import Tags\n", - "\n", - "label_columns = workflow.output_schema.select_by_tag(Tags.TARGET).column_names\n", - "workflow.remove_inputs(label_columns)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "143795f2", - "metadata": {}, - "source": [ - "## Load the Tensorflow Model\n", - "\n", - "After loading the workflow, we load the model. This model was trained with the output of the workflow from the [Exporting Ranking Models](https://github.com/NVIDIA-Merlin/models/blob/stable/examples/04-Exporting-ranking-models.ipynb) example from Merlin Models." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "bd5b046f", - "metadata": {}, - "source": [ - "First, we need to import the Merlin Models library. Loading a TensorFlow model, which is based on custom subclasses, requires to the subclass definition. Otherwise, TensorFlow cannot load correctly load the model." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "ffca456b", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2022-07-12 17:18:23.722737: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX\n", - "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", - "2022-07-12 17:18:25.872447: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:214] Using CUDA malloc Async allocator for GPU: 0\n", - "2022-07-12 17:18:25.872791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16255 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB-LS, pci bus id: 0000:89:00.0, compute capability: 7.0\n" - ] - } - ], - "source": [ - "import merlin.models.tf as mm" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "8da5e606", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2022-07-12 17:18:27.313679: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.\n" - ] - } - ], - "source": [ - "import tensorflow as tf\n", - "tf_model_path = os.path.join(input_path, \"dlrm\")\n", - "\n", - "model = tf.keras.models.load_model(tf_model_path)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "53908458", - "metadata": { - "tags": [] - }, - "source": [ - "## Create the Ensemble Graph\n", - "\n", - "After we have both the model and the workflow loaded, we can create the ensemble graph. You create the graph. The goal is to illustrate the path of data through your full system. In this example we only serve a workflow with a model, but you can add other components that help you meet your business logic requirements.\n", - "\n", - "Because this example has two components—a model and a workflow—we require two operators. These operators, also known as inference operators, are meant to abstract away all the \"hard parts\" of loading a specific component, such as a workflow or model, into Triton Inference Server. \n", - "\n", - "The following code block shows how to use two inference operators:\n", - "\n", - "
\n", - "
TransformWorkflow
\n", - "
This operator ensures that the workflow is correctly saved and packaged with the required config so the server will know how to load it.
\n", - "
PredictTensorflow
\n", - "
This operator will do something similar with the model, loaded before.
\n", - "
\n", - "\n", - "Let's give it a try." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "f80e5cc8", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.dag.ops.workflow import TransformWorkflow\n", - "from merlin.systems.dag.ops.tensorflow import PredictTensorflow\n", - "\n", - "serving_operators = workflow.input_schema.column_names >> TransformWorkflow(workflow) >> PredictTensorflow(model)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "8b916afa", - "metadata": {}, - "source": [ - "## Export Graph as Ensemble\n", - "\n", - "The last step is to create the ensemble artifacts that Triton Inference Server can consume.\n", - "To make these artifacts, we import the `Ensemble` class.\n", - "The class is responsible for interpreting the graph and exporting the correct files for the server.\n", - "\n", - "After you run the following cell, you'll see that we create a `ColumnSchema` for the expected inputs to the workflow.\n", - "The workflow is a `Schema`. \n", - "\n", - "When you are creating an `Ensemble` object you supply the graph and a schema representing the starting input of the graph. the inputs to the ensemble graph are the inputs to the first operator of your graph. \n", - "\n", - "After you have created the `Ensemble` you export the graph, supplying an export path for the `Ensemble.export` function.\n", - "\n", - "This returns an ensemble config which represents the entire inference pipeline and a list of node-specific configs.\n", - "\n", - "Let's take a look below." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "2c9a7cd2-e14e-4b37-b0af-3dc3931c9ff9", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
nametagsdtypeis_listis_raggedproperties.num_bucketsproperties.freq_thresholdproperties.max_sizeproperties.start_indexproperties.cat_pathproperties.domain.minproperties.domain.maxproperties.domain.nameproperties.embedding_sizes.cardinalityproperties.embedding_sizes.dimension
0user_id(Tags.USER, Tags.USER_ID, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_id.parquet0755user_id75565
1item_id(Tags.ITEM_ID, Tags.ITEM, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.item_id.parquet0772item_id77266
2item_category(Tags.ITEM, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.item_category.parquet0772item_category77266
3item_shop(Tags.ITEM, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.item_shop.parquet0772item_shop77266
4item_brand(Tags.ITEM, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.item_brand.parquet0772item_brand77266
5user_shops(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_shops.parquet0755user_shops75565
6user_profile(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_profile.parquet067user_profile6717
7user_group(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_group.parquet013user_group1316
8user_gender(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_gender.parquet03user_gender316
9user_age(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_age.parquet08user_age816
10user_consumption_2(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_consumption_2.parquet04user_consumption_2416
11user_is_occupied(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_is_occupied.parquet03user_is_occupied316
12user_geography(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_geography.parquet05user_geography516
13user_intentions(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_intentions.parquet0755user_intentions75565
14user_brands(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_brands.parquet0755user_brands75565
15user_categories(Tags.USER, Tags.CATEGORICAL)int64FalseFalseNone000.//categories/unique.user_categories.parquet0755user_categories75565
\n", - "
" - ], - "text/plain": [ - "[{'name': 'user_id', 'tags': {, , }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_id.parquet', 'domain': {'min': 0, 'max': 755, 'name': 'user_id'}, 'embedding_sizes': {'cardinality': 755, 'dimension': 65}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'item_id', 'tags': {, , }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.item_id.parquet', 'domain': {'min': 0, 'max': 772, 'name': 'item_id'}, 'embedding_sizes': {'cardinality': 772, 'dimension': 66}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'item_category', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.item_category.parquet', 'domain': {'min': 0, 'max': 772, 'name': 'item_category'}, 'embedding_sizes': {'cardinality': 772, 'dimension': 66}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'item_shop', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.item_shop.parquet', 'domain': {'min': 0, 'max': 772, 'name': 'item_shop'}, 'embedding_sizes': {'cardinality': 772, 'dimension': 66}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'item_brand', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.item_brand.parquet', 'domain': {'min': 0, 'max': 772, 'name': 'item_brand'}, 'embedding_sizes': {'cardinality': 772, 'dimension': 66}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_shops', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_shops.parquet', 'domain': {'min': 0, 'max': 755, 'name': 'user_shops'}, 'embedding_sizes': {'cardinality': 755, 'dimension': 65}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_profile', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_profile.parquet', 'domain': {'min': 0, 'max': 67, 'name': 'user_profile'}, 'embedding_sizes': {'cardinality': 67, 'dimension': 17}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_group', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_group.parquet', 'domain': {'min': 0, 'max': 13, 'name': 'user_group'}, 'embedding_sizes': {'cardinality': 13, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_gender', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_gender.parquet', 'domain': {'min': 0, 'max': 3, 'name': 'user_gender'}, 'embedding_sizes': {'cardinality': 3, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_age', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_age.parquet', 'domain': {'min': 0, 'max': 8, 'name': 'user_age'}, 'embedding_sizes': {'cardinality': 8, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_consumption_2', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_consumption_2.parquet', 'domain': {'min': 0, 'max': 4, 'name': 'user_consumption_2'}, 'embedding_sizes': {'cardinality': 4, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_is_occupied', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_is_occupied.parquet', 'domain': {'min': 0, 'max': 3, 'name': 'user_is_occupied'}, 'embedding_sizes': {'cardinality': 3, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_geography', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_geography.parquet', 'domain': {'min': 0, 'max': 5, 'name': 'user_geography'}, 'embedding_sizes': {'cardinality': 5, 'dimension': 16}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_intentions', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_intentions.parquet', 'domain': {'min': 0, 'max': 755, 'name': 'user_intentions'}, 'embedding_sizes': {'cardinality': 755, 'dimension': 65}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_brands', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_brands.parquet', 'domain': {'min': 0, 'max': 755, 'name': 'user_brands'}, 'embedding_sizes': {'cardinality': 755, 'dimension': 65}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'user_categories', 'tags': {, }, 'properties': {'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 0, 'cat_path': './/categories/unique.user_categories.parquet', 'domain': {'min': 0, 'max': 755, 'name': 'user_categories'}, 'embedding_sizes': {'cardinality': 755, 'dimension': 65}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}]" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "workflow.output_schema" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c102204b", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.dag.ensemble import Ensemble\n", - "import numpy as np\n", - "\n", - "ensemble = Ensemble(serving_operators, workflow.input_schema)\n", - "\n", - "export_path = os.path.join(input_path, \"ensemble\")\n", - "\n", - "ens_conf, node_confs = ensemble.export(export_path)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "4edcbd33-f845-4921-b295-46ad366da722", - "metadata": {}, - "source": [ - "Display the path to the directory with the ensemble." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "64cc2ad6-78c7-47cf-b6bb-2de84a09f97e", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'/models/examples/ensemble'" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "export_path" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "96aa55fb", - "metadata": {}, - "source": [ - "## Verification of Ensemble Artifacts\n", - "\n", - "After we export the ensemble, we can check the export path for the graph's artifacts. The directory structure represents an ordering number followed by an operator identifier such as `1_transformworkflow`, `2_predicttensorflow`, and so on.\n", - "\n", - "Inside each of those directories, the `export` method writes a `config.pbtxt` file and a directory with a number. The number indicates the version and begins at 1. The artifacts for each operator are found inside the `version` folder. These artifacts vary depending on the operator in use. \n", - "\n", - "Install the `seedir` python package so we can view some of the directory contents." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "2b48b3e0-ca84-4e63-90d3-f507f4f99ff8", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: seedir in /usr/local/lib/python3.8/dist-packages (0.3.1)\r\n", - "Requirement already satisfied: natsort in /usr/local/lib/python3.8/dist-packages (from seedir) (8.1.0)\r\n", - "Requirement already satisfied: emoji in /usr/local/lib/python3.8/dist-packages (from seedir) (1.7.0)\r\n" - ] - } - ], - "source": [ - "# install seedir\n", - "!pip install seedir" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "ce10c04f-e3a4-4886-a5de-644dfe41c6e1", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ensemble/\n", - "├─0_transformworkflow/\n", - "│ ├─1/\n", - "│ │ ├─__pycache__/\n", - "│ │ ├─model.py\n", - "│ │ └─workflow/\n", - "│ └─config.pbtxt\n", - "├─1_predicttensorflow/\n", - "│ ├─1/\n", - "│ │ └─model.savedmodel/\n", - "│ └─config.pbtxt\n", - "└─ensemble_model/\n", - " ├─1/\n", - " └─config.pbtxt\n" - ] - } - ], - "source": [ - "import seedir as sd\n", - "\n", - "sd.seedir(export_path, style='lines', itemlimit=10, depthlimit=3, exclude_folders='.ipynb_checkpoints', sort=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "0b99146c", - "metadata": {}, - "source": [ - "## Starting Triton Inference Server\n", - "\n", - "After we export the ensemble, we are ready to start the Triton Inference Server. The server is installed in all the Merlin inference containers. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation). \n", - "\n", - "You can start the server by running the following command:\n", - "\n", - "```shell\n", - "tritonserver --model-repository=/workspace/data/ensemble --backend-config=tensorflow,version=2\n", - "```\n", - "\n", - "For the `--model-repository` argument, specify the same value as the `export_path` that you specified previously in the `ensemble.export` method.\n", - "\n", - "After you run the `tritonserver` command, wait until your terminal shows messages like the following example:\n", - "\n", - "```shell\n", - "I0414 18:29:50.741833 4067 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001\n", - "I0414 18:29:50.742197 4067 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000\n", - "I0414 18:29:50.783470 4067 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002\n", - "```" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "c75e2eb9", - "metadata": {}, - "source": [ - "## Retrieving Recommendations from Triton Inference Server\n", - "\n", - "Now that our server is running, we can send requests to it. This request is composed of values that correspond to the request schema that was created when we exported the ensemble graph.\n", - "\n", - "In the code below we create a request to send to triton and send it. We will then analyze the response, to show the full experience.\n", - "\n", - "First we need to ensure that we have a client connected to the server that we started. To do this, we use the Triton HTTP client library." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "21a693ab-0ee3-443e-8ad4-fad030ec1985", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "client created.\n" - ] - } - ], - "source": [ - "import tritonclient.http as client\n", - "\n", - "# Create a triton client\n", - "try:\n", - " triton_client = client.InferenceServerClient(url=\"localhost:8000\", verbose=True)\n", - " print(\"client created.\")\n", - "except Exception as e:\n", - " print(\"channel creation failed: \" + str(e))" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "358bb59b-796a-4e1e-8e02-b7d3b86c27f9", - "metadata": {}, - "source": [ - "After we create the client and verified it is connected to the server instance, we can communicate with the server and ensure all the models are loaded correctly." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "ae48a6ee-d585-421b-bb5a-1a5c2edca058", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "GET /v2/health/live, headers None\n", - "\n", - "POST /v2/repository/index, headers None\n", - "\n", - "\n", - "bytearray(b'[{\"name\":\"0_transformworkflow\",\"version\":\"1\",\"state\":\"READY\"},{\"name\":\"1_predicttensorflow\",\"version\":\"1\",\"state\":\"READY\"},{\"name\":\"ensemble_model\",\"version\":\"1\",\"state\":\"READY\"}]')\n" - ] - }, - { - "data": { - "text/plain": [ - "[{'name': '0_transformworkflow', 'version': '1', 'state': 'READY'},\n", - " {'name': '1_predicttensorflow', 'version': '1', 'state': 'READY'},\n", - " {'name': 'ensemble_model', 'version': '1', 'state': 'READY'}]" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# ensure triton is in a good state\n", - "triton_client.is_server_live()\n", - "triton_client.get_model_repository_index()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "dfbc6f01-5ccf-43a2-ae4f-8451a420fc70", - "metadata": {}, - "source": [ - "After verifying the models are correctly loaded by the server, we use some original validation data and send it as an inference request to the server. Here the `valid` folder was generated after `04-Exporting-ranking-models.ipynb` notebook is run. \n", - "\n", - "> The `df_lib` object is `cudf` if a GPU is available and `pandas` otherwise." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "3235eb79-494a-4d92-b3d7-b74091c4e28b", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
user_iditem_iditem_categoryitem_shopitem_branduser_shopsuser_profileuser_groupuser_genderuser_ageuser_consumption_2user_is_occupieduser_geographyuser_intentionsuser_brandsuser_categories
__null_dask_index__
700000555191305450298131111118631482156
700001538332283787287131111118311427150
700002106241631562497111111114424726
\n", - "
" - ], - "text/plain": [ - " user_id item_id item_category item_shop item_brand \\\n", - "__null_dask_index__ \n", - "700000 55 5 19 1305 450 \n", - "700001 53 8 33 2283 787 \n", - "700002 10 6 24 1631 562 \n", - "\n", - " user_shops user_profile user_group user_gender \\\n", - "__null_dask_index__ \n", - "700000 2981 3 1 1 \n", - "700001 2871 3 1 1 \n", - "700002 497 1 1 1 \n", - "\n", - " user_age user_consumption_2 user_is_occupied \\\n", - "__null_dask_index__ \n", - "700000 1 1 1 \n", - "700001 1 1 1 \n", - "700002 1 1 1 \n", - "\n", - " user_geography user_intentions user_brands \\\n", - "__null_dask_index__ \n", - "700000 1 863 1482 \n", - "700001 1 831 1427 \n", - "700002 1 144 247 \n", - "\n", - " user_categories \n", - "__null_dask_index__ \n", - "700000 156 \n", - "700001 150 \n", - "700002 26 " - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from merlin.core.dispatch import get_lib\n", - "\n", - "df_lib = get_lib()\n", - "\n", - "original_data_path = os.environ.get(\"INPUT_FOLDER\", \"/workspace/data/\")\n", - "\n", - "# read in data for request\n", - "batch = df_lib.read_parquet(\n", - " os.path.join(original_data_path,\"valid\", \"part.0.parquet\"), columns=workflow.input_schema.column_names\n", - ").head(3)\n", - "batch" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "9691d28f-80d8-4520-b6e6-b7e260c50380", - "metadata": {}, - "source": [ - "After we isolate our `batch`, we convert the dataframe representation into inputs for Triton. We also declare the outputs that we expect to receive from the model." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "fe53865c-9099-478c-8a30-17140ec1c120", - "metadata": {}, - "outputs": [], - "source": [ - "from merlin.systems.triton import convert_df_to_triton_input\n", - "import tritonclient.grpc as grpcclient\n", - "# create inputs and outputs\n", - "\n", - "inputs = convert_df_to_triton_input(workflow.input_schema, batch, grpcclient.InferInput)\n", - "\n", - "output_cols = ensemble.graph.output_schema.column_names\n", - "\n", - "outputs = [\n", - " grpcclient.InferRequestedOutput(col)\n", - " for col in output_cols\n", - "]" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "50fb442d-d918-4b1a-a825-dfce4f75bc02", - "metadata": {}, - "source": [ - "Now that our `inputs` and `outputs` are created, we can use the `triton_client` that we created earlier to send the inference request." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "1c9031d7-a242-4278-808f-26d307e6a0db", - "metadata": {}, - "outputs": [], - "source": [ - "# send request to tritonserver\n", - "with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n", - " response = client.infer(\"executor_model\", inputs, request_id=\"1\", outputs=outputs)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "10c00a98-a30e-45ca-9ab6-667621a20c75", - "metadata": {}, - "source": [ - "When the server completes the inference request, it returns a response. This response is parsed to get the desired predictions." - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "id": "4700c280-73e9-400a-ae9a-ee723789c96d", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "click/binary_classification_task [[0.49801633]\n", - " [0.49624982]\n", - " [0.49931964]] (3, 1)\n" - ] - } - ], - "source": [ - "# access individual response columns to get values back.\n", - "for col in ensemble.graph.output_schema.column_names:\n", - " print(col, response.as_numpy(col), response.as_numpy(col).shape)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "f37053aa-9a01-40ce-ac81-e6edf266f2af", - "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "This sample notebook started with an exported DLRM model and workflow. We saw how to create an ensemble graph, verify the ensemble artifacts in the file system, and then put the ensemble into production with Triton Inference Server. Finally, we sent a simple inference request to the server and printed the response." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.10" - }, - "merlin": { - "containers": [ - "nvcr.io/nvidia/merlin/merlin-tensorflow:latest" - ] - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tests/integration/examples/test_serving_an_implicit_model_with_merlin_systems.py b/tests/integration/examples/test_serving_an_implicit_model_with_merlin_systems.py deleted file mode 100644 index 95a0d4e63..000000000 --- a/tests/integration/examples/test_serving_an_implicit_model_with_merlin_systems.py +++ /dev/null @@ -1,59 +0,0 @@ -import shutil - -import pytest -from testbook import testbook - -from merlin.systems.triton.utils import run_triton_server -from merlin.core.compat import cudf -from tests.conftest import REPO_ROOT - -pytest.importorskip("implicit") -pytest.importorskip("merlin.models") - - -if cudf: - - _TRAIN_ON_GPU = [True, False] -else: - _TRAIN_ON_GPU = [False] - -TRITON_SERVER_PATH = shutil.which("tritonserver") - - -@pytest.mark.notebook -@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found") -@pytest.mark.parametrize("gpu", _TRAIN_ON_GPU) -def test_example_serving_implicit(gpu, tmpdir): - with testbook( - REPO_ROOT / "examples/Serving-An-Implicit-Model-With-Merlin-Systems.ipynb", - execute=False, - timeout=180, - ) as tb: - tb.inject( - f""" - import os - os.environ["OUTPUT_DATA_DIR"] = "{tmpdir}/ensemble" - os.environ["USE_GPU"] = "{int(gpu)}" - from unittest.mock import patch - from merlin.datasets.synthetic import generate_data - mock_train, mock_valid = generate_data( - input="movielens-100k", - num_rows=1000, - set_sizes=(0.8, 0.2) - ) - p1 = patch( - "merlin.datasets.entertainment.get_movielens", - return_value=[mock_train, mock_valid] - ) - p1.start() - """, - pop=True, - ) - - tb.execute_cell(list(range(0, 18))) - - with run_triton_server(f"{tmpdir}/ensemble", grpc_port=8001): - tb.execute_cell(list(range(18, len(tb.cells) - 2))) - pft = tb.ref("predictions_from_triton") - lp = tb.ref("local_predictions") - assert pft.shape == lp.shape diff --git a/tests/integration/examples/test_serving_an_xgboost_model_with_merlin_systems.py b/tests/integration/examples/test_serving_an_xgboost_model_with_merlin_systems.py deleted file mode 100644 index dbbeae0c4..000000000 --- a/tests/integration/examples/test_serving_an_xgboost_model_with_merlin_systems.py +++ /dev/null @@ -1,50 +0,0 @@ -import shutil - -import pytest -from testbook import testbook - -from merlin.systems.triton.utils import run_triton_server -from tests.conftest import REPO_ROOT - -pytest.importorskip("tensorflow") -pytest.importorskip("merlin.models") -pytest.importorskip("xgboost") - -TRITON_SERVER_PATH = shutil.which("tritonserver") - - -@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found") -@pytest.mark.notebook -def test_example_serving_xgboost(tmpdir): - with testbook( - REPO_ROOT / "examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb", - execute=False, - timeout=180, - ) as tb: - tb.inject( - f""" - import os - os.environ["OUTPUT_DATA_DIR"] = "{tmpdir}/ensemble" - from unittest.mock import patch - from merlin.datasets.synthetic import generate_data - mock_train, mock_valid = generate_data( - input="movielens-100k", - num_rows=1000, - set_sizes=(0.8, 0.2) - ) - p1 = patch( - "merlin.datasets.entertainment.get_movielens", - return_value=[mock_train, mock_valid] - ) - p1.start() - """ - ) - NUM_OF_CELLS = len(tb.cells) - - tb.execute_cell(list(range(0, 14))) - - with run_triton_server(f"{tmpdir}/ensemble", grpc_port=8001): - tb.execute_cell(list(range(14, NUM_OF_CELLS - 1))) - pft = tb.ref("predictions_from_triton") - lp = tb.ref("local_predictions") - assert pft.shape == lp.shape diff --git a/tests/integration/examples/test_serving_ranking_models_with_merlin_systems.py b/tests/integration/examples/test_serving_ranking_models_with_merlin_systems.py deleted file mode 100644 index 2cdbde419..000000000 --- a/tests/integration/examples/test_serving_ranking_models_with_merlin_systems.py +++ /dev/null @@ -1,116 +0,0 @@ -import os - -import pytest -from testbook import testbook - -from tests.conftest import REPO_ROOT - -pytest.importorskip("cudf") -pytest.importorskip("tensorflow") -pytest.importorskip("merlin.models") - - -@pytest.mark.notebook -@testbook(REPO_ROOT / "examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb", execute=False) -def test_example_04_exporting_ranking_models(tb): - import numpy as np - import tensorflow as tf - - import merlin.models.tf as mm - import nvtabular as nvt - from merlin.datasets.synthetic import generate_data - from merlin.io.dataset import Dataset - from merlin.schema import Schema, Tags - - DATA_FOLDER = "/tmp/data/" - NUM_ROWS = 1000000 - BATCH_SIZE = 512 - train, valid = generate_data("aliccp-raw", int(NUM_ROWS), set_sizes=(0.7, 0.3)) - train.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "train")) - valid.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "valid")) - train_path = os.path.join(DATA_FOLDER, "train", "*.parquet") - valid_path = os.path.join(DATA_FOLDER, "valid", "*.parquet") - output_path = os.path.join(DATA_FOLDER, "processed") - user_id = ["user_id"] >> nvt.ops.Categorify() >> nvt.ops.TagAsUserID() - item_id = ["item_id"] >> nvt.ops.Categorify() >> nvt.ops.TagAsItemID() - targets = ["click"] >> nvt.ops.AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, "target"]) - item_features = ( - ["item_category", "item_shop", "item_brand"] - >> nvt.ops.Categorify() - >> nvt.ops.TagAsItemFeatures() - ) - user_features = ( - [ - "user_shops", - "user_profile", - "user_group", - "user_gender", - "user_age", - "user_consumption_2", - "user_is_occupied", - "user_geography", - "user_intentions", - "user_brands", - "user_categories", - ] - >> nvt.ops.Categorify() - >> nvt.ops.TagAsUserFeatures() - ) - outputs = user_id + item_id + item_features + user_features + targets - workflow = nvt.Workflow(outputs) - train_dataset = nvt.Dataset(train_path) - valid_dataset = nvt.Dataset(valid_path) - workflow.fit(train_dataset) - workflow.transform(train_dataset).to_parquet(output_path=output_path + "/train/") - workflow.transform(valid_dataset).to_parquet(output_path=output_path + "/valid/") - workflow.save("/tmp/data/workflow") - train = Dataset(os.path.join(output_path, "train", "*.parquet")) - valid = Dataset(os.path.join(output_path, "valid", "*.parquet")) - schema = train.schema - target_column = schema.select_by_tag(Tags.TARGET).column_names[0] - model = mm.DLRMModel( - schema, - embedding_dim=64, - bottom_block=mm.MLPBlock([128, 64]), - top_block=mm.MLPBlock([128, 64, 32]), - prediction_tasks=mm.BinaryClassificationTask(target_column), - ) - model.compile("adam", run_eagerly=False, metrics=[tf.keras.metrics.AUC()]) - model.fit(train, validation_data=valid, batch_size=BATCH_SIZE) - model.save("/tmp/data/dlrm") - tb.inject( - """ - import os - os.environ["INPUT_FOLDER"] = "/tmp/data/" - """ - ) - NUM_OF_CELLS = len(tb.cells) - tb.execute_cell(list(range(0, NUM_OF_CELLS - 12))) - tb.execute_cell(list(range(NUM_OF_CELLS - 9, NUM_OF_CELLS - 6))) - from merlin.core.dispatch import get_lib - - df_lib = get_lib() - - # original_data_path = os.environ.get("INPUT_FOLDER", "/workspace/data/") - - # read in data for request - batch = df_lib.read_parquet( - os.path.join("/tmp/data/", "valid", "part.0.parquet"), - columns=workflow.input_schema.column_names, - ).head(3) - batch = batch.drop(columns="click") - outputs = tb.ref("output_cols") - from merlin.dataloader.tf_utils import configure_tensorflow - - configure_tensorflow() - from merlin.systems.triton.utils import run_ensemble_on_tritonserver - - # The schema contains int64s, while the actual data contains int32s. Not sure why. - schema = Schema( - [col_schema.with_dtype(np.int32) for col_schema in schema.column_schemas.values()] - ) - - response = run_ensemble_on_tritonserver( - "/tmp/data/ensemble/", schema.without(["click"]), batch, outputs, "executor_model" - ) - assert len(response["click/binary_classification_task"]) == 3