[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

tuanavu · 2023-09-21T22:48:22Z

❓ Questions & Help

Details

Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:

sok_model: This results in a collection of files named EmbeddingVariable_*_keys.file and EmbeddingVariable_*_values.file.
tf2 model: This exports saved_model.pb, variables files.

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

Serving the Model: I'm interested in how to serve this model in AWS EKS using the Triton Inference Server. What would be the required structure? Should I treat it as an ensemble model that includes both the sok and TensorFlow 2 backends? Which would be the most suitable backend - HugeCTR, TensorFlow 2, or something else? Do you have any guides or resources that can help me with this?
Converting the Model to ONNX: According to the Hierarchical Parameter Server Demo, HugeCTR can load both the sparse and dense models and convert them to a single ONNX model. I'm wondering how I can perform a similar conversion for this merlin-tensorflow model that uses the SOK toolkit and exports both the sparse and dense model.

Environment details

Merlin version: nvcr.io/nvidia/merlin/merlin-tensorflow:23.02

The text was updated successfully, but these errors were encountered:

rnyak · 2023-10-03T23:53:38Z

@FDecaYed fyi.

tuanavu added the question Further information is requested label Sep 21, 2023

tuanavu changed the title ~~[QST] How do you serve merlin-tensorflow model in Triton~~ [QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

tuanavu commented Sep 21, 2023 •

edited

Loading

rnyak commented Oct 3, 2023

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

Comments

tuanavu commented Sep 21, 2023 • edited Loading

❓ Questions & Help

Details

Questions

Environment details

rnyak commented Oct 3, 2023

tuanavu commented Sep 21, 2023 •

edited

Loading