backend: Extend the models config YAML file

Extend the models config YAML file to include model-specific information. Add model-specific details, such as the number of parameters, memory footprint, and tensor type (e.g., FP32, BF16, etc.), along with a set of default values used by Lumigator when calling the models. Not all parameters are applicable to each model. For example, Hugging Face models have default values for parameters like `max_length`, while API models include parameters like `temperature`. In general, parameters are model-specific, so there is no universal set of parameter names. The defaults are inferred either by checking the documentation for each model or by examining the configuration file on the Hugging Face Hub. Closes #381 Signed-off-by: Dimitris Poulopoulos <[email protected]>
mozilla-ai · Nov 20, 2024 · 8dba904 · 8dba904
1 parent d954a27
commit 8dba904
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/lumigator/python/mzai/backend/backend/models_config.yaml b/lumigator/python/mzai/backend/backend/models_config.yaml
@@ -1,35 +1,96 @@
 - name: facebook/bart-large-cnn
   uri: hf://facebook/bart-large-cnn
   description: BART is a large-sized model fine-tuned on the CNN Daily Mail dataset.
+  info:
+    parameters_count: 406M
+    tensor_type: F32
+    model_size: 1.63GB
+  default_parameters:
+    max_length: 142
+    min_length: 56
+    length_penalty: 2.0
+    early_stopping: true
+    no_repeat_ngram_size: 3
+    num_beams: 4
 
 - name: mikeadimech/longformer-qmsum-meeting-summarization
   uri: hf://mikeadimech/longformer-qmsum-meeting-summarization
   description: Longformer is a transformer model that is capable of processing long sequences.
+  info:
+    parameters_count: 162M
+    tensor_type: F32
+    model_size: 648MB
 
 - name: mrm8488/t5-base-finetuned-summarize-news
   uri: hf://mrm8488/t5-base-finetuned-summarize-news
   description: Google's T5 base fine-tuned on News Summary dataset for summarization downstream task.
+  info:
+    parameters_count: 223M
+    tensor_type: F32
+    model_size: 892MB
+  default_parameters:
+    max_length: 200
+    min_length: 30
+    length_penalty: 2.0
+    early_stopping: true
+    no_repeat_ngram_size: 3
+    num_beams: 4
 
 - name: Falconsai/text_summarization
   uri: hf://Falconsai/text_summarization
   description: A fine-tuned variant of the T5 transformer model, designed for the task of text summarization.
+  info:
+    parameters_count: 60.5M
+    tensor_type: F32
+    model_size: 242MB
+  default_parameters:
+    max_length: 200
+    min_length: 30
+    length_penalty: 2.0
+    early_stopping: true
+    no_repeat_ngram_size: 3
+    num_beams: 4
 
 - name: mistralai/Mistral-7B-Instruct-v0.3
   uri: hf://mistralai/Mistral-7B-Instruct-v0.3
   description: Mistral-7B-Instruct-v0.3 is an instruct fine-tuned version of the Mistral-7B-v0.3.
+  info:
+    parameters_count: 7.25B
+    tensor_type: BF16
+    model_size: 14.5GB
 
 - name: gpt-4o-mini
   uri: oai://gpt-4o-mini
   description: OpenAI's GPT-4o-mini model.
+  default_parameters:
+    temperature: 1.0
+    top_p: 1.0
+    max_completion_tokens: 200
 
 - name: gpt-4-turbo
   uri: oai://gpt-4-turbo
   description: OpenAI's GPT-4 Turbo model.
+  default_parameters:
+    temperature: 1.0
+    top_p: 1.0
+    max_completion_tokens: 200
 
 - name: open-mistral-7b
   uri: mistral://open-mistral-7b
   description: Mistral's 7B model.
+  default_parameters:
+    temperature: 0.7
+    top_p: 1.0
+    max_completion_tokens: 200
 
 - name: mistralai/Mistral-7B-Instruct-v0.2
   uri: llamafile://mistralai/Mistral-7B-Instruct-v0.2
   description: A llamafile package of Mistral's 7B Instruct model.
+  info:
+    parameters_count: 7.24B
+    tensor_type: BF16
+    model_size: 14.5GB
+  default_parameters:
+    temperature: 0.7
+    top_p: 1.0
+    max_completion_tokens: 200