339 support seq2seq models (#477)

* chore: Ignore VS Code debug config files Signed-off-by: Dimitris Poulopoulos <[email protected]> * inference: Upgrade openai client Upgrade the openai client due to incombatibility with `httpx==0.28.0`. Refs #466 Signed-off-by: Dimitris Poulopoulos <[email protected]> * backend: Rename job config args variable Rename the `eval_config_args` variable to `job_config_args` to make it more generic. Signed-off-by: Dimitris Poulopoulos <[email protected]> * inference: Support Hugging Face models Support models pulled from Hugging Face Hub, through the `HuggingFaceModelClient`. The client expects a model name (i.e., the model repo ID on HF Hub) and a task (e.g., "summarization"), creates the corresponding pipeline, and uses it for inference. Refs #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> * inference: Extend inference parameter set Extend the set of parameters that one can pass to create a new inference job: * revision: choose the model version (i.e., branch, tag, or commit ID) * use_fast: whether or not to use a fast tokenizer, if possible * torch_dtype: model precision (e.g., float16, float32, "auto") * accelerator: device to use during inference (i.e., "cpu", "cuda", or "mps") Refs #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> * inference: Validate Hugging Face inference params Validate the values of the parameters used in HF pipelines: * Check if the model name is a valid HF repo ID * Check if the task is a supported task * Check if the data type is a valid `torch.dtype` value Refs #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> * inference: Amend and extend unit tests Fix the failing unit tests and extend them to cover inference with Hugging Face models. Refs #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> * sdk: Create inference jobs Support creating distinct inference jobs using the SDK. Refs #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> * doc: Fix the quickstart guide Use `lumigator_schemas` and `JobEvalCreate` in the quickstart guide instead if just `schemas` and `JobCreate`. Signed-off-by: Dimitris Poulopoulos <[email protected]> * doc: Add inference user guide Add a new user guide to demonstrate how to create and run an inference job, using the Lumigator SDK and a model from Hugging Face Hub. Closes #339 Signed-off-by: Dimitris Poulopoulos <[email protected]> --------- Signed-off-by: Dimitris Poulopoulos <[email protected]>
mozilla-ai · Dec 11, 2024 · 30c5ba6 · 30c5ba6
1 parent 99ce0bf
commit 30c5ba6
Show file tree

Hide file tree

Showing 20 changed files with 416 additions and 39 deletions.
diff --git a/.gitignore b/.gitignore
@@ -168,6 +168,9 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 .idea/
 
+# VS Code
+.vscode/launch.json
+
 # Ruff
 .ruff_cache
 

diff --git a/docs/source/get-started/quickstart.md b/docs/source/get-started/quickstart.md
@@ -46,7 +46,7 @@ user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/datasets/ \
 :sync: tab2
 ```python
 from lumigator_sdk.lumigator import LumigatorClient
-from schemas.datasets import DatasetFormat
+from lumigator_schemas.datasets import DatasetFormat
 
 dataset_path = 'path/to/dataset.csv'
 lm_client = LumigatorClient('localhost:8000')
@@ -84,7 +84,7 @@ dataset.csv
 :sync: tab2
 ```python
 datasets = lm_client.datasets.get_datasets()
-print(datasets.items[0].filename)
+print(datasets.items[-1].filename)
 ```
 :::
 
@@ -151,9 +151,9 @@ user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/jobs/evaluate/ \
 :::{tab-item} Python SDK
 :sync: tab2
 ```python
-from schemas.jobs import JobType, JobCreate
+from lumigator_schemas.jobs import JobType, JobEvalCreate
 
-dataset_id = datasets.items[0].id
+dataset_id = datasets.items[-1].id
 
 models = ['hf://facebook/bart-large-cnn',]
 
@@ -164,7 +164,7 @@ team_name = "lumigator_enthusiasts"
 
 responses = []
 for model in models:
-    job_args = JobCreate(
+    job_args = JobEvalCreate(
         name=team_name,
         description="Test",
         model=model,
@@ -219,8 +219,6 @@ job_id = responses[0].id
 
 job = lm_client.jobs.wait_for_job(job_id)  # Create the coroutine object
 result = await job  # Await the coroutine to get the result
-
-print(result)
 ```
 :::
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -31,7 +31,7 @@ Hugging Face and local stores or accessed through APIs. It consists of:
    - A database to track platform-level lifecycle, job, and dataset metadata.
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
    :caption: Get Started
 
    get-started/installation
@@ -46,12 +46,11 @@ Hugging Face and local stores or accessed through APIs. It consists of:
    operations-guide/alembic
    operations-guide/dev
 
-.. TODO: Add user-guides and examples here.
-.. .. toctree::
-..    :maxdepth: 2
-..    :caption: User Guides
+.. toctree::
+   :maxdepth: 2
+   :caption: User Guides
 
-..    user-guides/evaluation
+   user-guides/inference
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/source/user-guides/evaluation.md b/docs/source/user-guides/evaluation.md
diff --git a/docs/source/user-guides/inference.md b/docs/source/user-guides/inference.md
@@ -0,0 +1,110 @@
+# Running an Inference Job
+
+This guide will walk you through the process of running an inference job using the Lumigator SDK and
+a model downloaded from the Hugging Face Hub. The model will generate summaries for a given set of
+text data.
+
+```{note}
+You can also use the OpenAI GPT family of models or the Mistal API to run an inference job. To do
+so, you need to set the appropriate environment variables: `OPENAI_API_KEY` or `MISTRAL_API_KEY`.
+Refer to the `.env.example` file in the repository for more details.
+```
+
+## What You'll Need
+
+- A running instance of [Lumigator](../get-started/installation.md).
+
+## Procedure
+
+1. Install the Lumigator SDK:
+
+    ```console
+    user@host:~/lumigator$ uv pip install -e lumigator/python/mzai/sdk
+    ```
+
+1. Create a new Python file:
+
+    ```console
+    user@host:~/lumigator$ touch inference.py
+    ```
+
+1. Add the following code to `inference.py`:
+
+    ```python
+    import json
+    import requests
+
+    from lumigator_sdk.lumigator import LumigatorClient
+    from lumigator_schemas import jobs, datasets
+
+
+    BUCKET = "lumigator-storage"
+    HOST = "localhost"
+    LUMIGATOR_PORT = 8000
+    RAY_PORT = 4566
+
+
+    # Instantiate the Lumigator client
+    lm_client = LumigatorClient(f"{HOST}:{LUMIGATOR_PORT}")
+
+    # Upload a dataset
+    dataset_path = "lumigator/python/mzai/sample_data/dialogsum_exc.csv"
+    dataset = lm_client.datasets.create_dataset(
+        dataset=open(dataset_path, 'rb'),
+        format=datasets.DatasetFormat.JOB
+    )
+
+    # Create and submit an inference job
+    name = "bart-summarization-run"
+    model = "hf://facebook/bart-large-cnn"
+    task = "summarization"
+
+    job_args = jobs.JobInferenceCreate(
+        name=name,
+        model=model,
+        dataset=dataset.id,
+        task=task,
+    )
+
+    job = lm_client.jobs.create_job(
+        type=jobs.JobType.INFERENCE, request=job_args)
+
+    # Wait for the job to complete
+    lm_client.jobs.wait_for_job(job.id, poll_wait=10)
+
+    # Retrieve the job results
+    url = f"http://{HOST}:{RAY_PORT}/{BUCKET}/jobs/results/{name}/{job.id}/inference_results.json"
+    response = requests.get(url=url)
+
+    if response.status_code != 200:
+        raise Exception(f"Failed to retrieve job results: {response.text}")
+    results = response.json()
+
+    # Write the JSON results to a file
+    with open("inference_results.json", "w") as f:
+        json.dump(results, f, indent=4)
+    ```
+
+1. Run the script:
+
+    ```console
+    user@host:~/lumigator$ uv run python inference.py
+    ```
+
+## Verify
+
+Review the contents of the `inference_results.json` file to ensure that the inference job ran
+successfully:
+
+```console
+user@host:~/lumigator$ cat inference_results.json | jq
+{
+"prediction": [
+    "A man has trouble breathing. He is sent to see a pulmonary specialist. The doctor tests him for asthma. He does not have any allergies that he knows of. He also has a heavy feeling in his chest when he tries to breathe. This happens a lot when he works out, he says.",
+...
+```
+
+## Next Steps
+
+Congratulations! You have successfully run an inference job using the Lumigator SDK. You can now
+use the results to evaluate your model's performance.
diff --git a/lumigator/python/mzai/backend/backend/config_templates.py b/lumigator/python/mzai/backend/backend/config_templates.py
@@ -66,6 +66,24 @@
 
 # Inference templates
 
+default_infer_template = """{{
+    "name": "{job_name}/{job_id}",
+    "dataset": {{ "path": "{dataset_path}" }},
+    "hf_pipeline": {{
+        "model_path": "{model_path}",
+        "task": "{task}",
+        "accelerator": "{accelerator}",
+        "revision": "{revision}",
+        "use_fast": "{use_fast}",
+        "trust_remote_code": "{trust_remote_code}",
+        "torch_dtype": "{torch_dtype}"
+    }},
+     "job": {{
+        "max_samples": {max_samples},
+        "storage_path": "{storage_path}"
+    }}
+}}"""
+
 seq2seq_infer_template = """{{
     "name": "{job_name}/{job_id}",
     "model": {{ "path": "{model_path}" }},
@@ -111,7 +129,7 @@
 
 templates = {
     JobType.INFERENCE: {
-        "default": causal_infer_template,
+        "default": default_infer_template,
         "oai://gpt-4o-mini": oai_infer_template,
         "oai://gpt-4-turbo": oai_infer_template,
         "oai://gpt-3.5-turbo-0125": oai_infer_template,

diff --git a/lumigator/python/mzai/backend/backend/services/jobs.py b/lumigator/python/mzai/backend/backend/services/jobs.py
@@ -143,6 +143,12 @@ def _get_job_params(self, job_type: str, record, request: BaseModel) -> dict:
                 "job_name": request.name,
                 "model_path": request.model,
                 "dataset_path": dataset_s3_path,
+                "task": request.task,
+                "accelerator": request.accelerator,
+                "revision": request.revision,
+                "use_fast": request.use_fast,
+                "trust_remote_code": request.trust_remote_code,
+                "torch_dtype": request.torch_dtype,
                 "max_samples": request.max_samples,
                 "storage_path": self.storage_path,
                 "model_url": model_url,
@@ -182,7 +188,7 @@ def create_job(self, request: JobEvalCreate | JobInferenceCreate) -> JobResponse
         # command parameters provided via command line to the ray job.
         # To do this, we use a dict where keys are parameter names as they'd
         # appear on the command line and the values are the respective params.
-        eval_config_args = {
+        job_config_args = {
             "--config": config_template.format(**config_params),
         }
 
@@ -195,7 +201,7 @@ def create_job(self, request: JobEvalCreate | JobInferenceCreate) -> JobResponse
             job_id=record.id,
             job_type=job_type,
             command=job_settings["command"],
-            args=eval_config_args,
+            args=job_config_args,
         )
 
         # build runtime ENV for workers

diff --git a/lumigator/python/mzai/jobs/inference/inference.py b/lumigator/python/mzai/jobs/inference/inference.py
@@ -11,9 +11,11 @@
 from loguru import logger
 from model_clients import (
     BaseModelClient,
+    HuggingFaceModelClient,
     MistralModelClient,
     OpenAIModelClient,
 )
+from paths import PathPrefix
 from tqdm import tqdm
 
 
@@ -101,6 +103,15 @@ def run_inference(config: InferenceJobConfig) -> Path:
             # run the openai client
             logger.info(f"Using OAI client. Endpoint: {base_url}")
             model_client = OpenAIModelClient(base_url, config)
+    elif config.hf_pipeline:
+        if config.hf_pipeline.model_path.startswith(PathPrefix.HUGGINGFACE):
+            logger.info("Using HuggingFace client.")
+            model_client = HuggingFaceModelClient(config)
+            output_model_name = config.hf_pipeline.model
+        else:
+            raise ValueError("Unsupported model type.")
+    else:
+        raise NotImplementedError("Inference pipeline not supported.")
 
     # run inference
     output[config.job.output_field] = predict(dataset_iterable, model_client)