LLM Text Encoder Docs (#330)

* add llm text classification user guide * add text classification user guide to toc * add llm encoder reference
ludwig-ai · Dec 20, 2023 · 820ac21 · 820ac21
1 parent 1fa4714
commit 820ac21
Show file tree

Hide file tree

Showing 3 changed files with 108 additions and 2 deletions.
diff --git a/docs/configuration/features/text_features.md b/docs/configuration/features/text_features.md
@@ -37,7 +37,7 @@ Example text feature entry in the input features list:
 name: text_column_name
 type: text
 tied: null
-encoder: 
+encoder:
     type: bert
     trainable: true
 ```
@@ -281,6 +281,77 @@ Parameters:
 {{ render_fields(schema_class_to_fields(hf_encoder, exclude=["type"])) }}
 {% endfor %}
 
+## LLM Encoders
+
+``` mermaid
+graph LR
+  A["12\n7\n43\n65\n23\n4\n1"] --> B["Pretrained\n LLM"];
+  B --> C["Last\n Hidden\n State"];
+  C --> ...;
+```
+{ data-search-exclude }
+
+The LLM encoder processes text with a pretrained LLM (ex. `llama-2-7b`) passes the last hidden state of the LLM forward to the combiner. Like the [LLM model type](../large_langiage_model.md), adapter-based fine-tuning and quantization can be configured, and any combiner or decoder parameters will be bundled with the adapter weights.
+
+Example config:
+
+```yaml
+encoder:
+  type: llm
+  base_model: meta-llama/Llama-2-7b-hf
+  adapter:
+    type: lora
+  quantization:
+    bits: 4
+```
+
+Parameters:
+
+### Base Model
+
+The `base_model` parameter specifies the pretrained large language model to serve
+as the foundation of your custom LLM.
+
+More information about the `base_model` parameter can be found [here](../configuration/large_language_model.md#base-model)
+
+### Adapter
+
+{% set adapter_classes = get_adapter_schemas() %}
+{% for adapter in adapter_classes %}
+
+### {{ adapter.name() }}
+
+{{ adapter.description() }}
+
+{{ render_yaml(adapter, parent="adapter") }}
+
+{{ render_fields(schema_class_to_fields(adapter, exclude=["type"])) }}
+{% endfor %}
+
+More information about the adapter config can be found [here](../configuration/large_language_model.md#adapter).
+
+### Quantization
+
+!!! attention
+
+    Quantized fine-tuning currently requires using `adapter: lora`. In-context
+    learning does not have this restriction.
+
+!!! attention
+
+    Quantization is currently only supported with `backend: local`.
+
+{% set quantization = get_quantization_schema() %}
+{{ render_yaml(quantization, parent="quantization") }}
+
+{{ render_fields(schema_class_to_fields(quantization)) }}
+
+More information about quantization parameters can be found [here](../configuration/large_language_model.md#quantization).
+
+### Model Parameters
+
+More information about the model initialization parameters can be found [here](../configuration/large_language_model.md#model-parameters).
+
 # Output Features
 
 Text output features are a special case of [Sequence Features](#sequence-output-features-and-decoders), so all options
@@ -304,7 +375,7 @@ loss:
     robust_lambda: 0
     class_weights: 1
     class_similarities_temperature: 0
-decoder: 
+decoder:
     type: generator
 ```
 

diff --git a/docs/user_guide/llms/text_classification.md b/docs/user_guide/llms/text_classification.md
@@ -0,0 +1,34 @@
+Pretrained LLMs are available as text encoders for general text features, and can be included in ECD models for binary or multi-class text classification tasks.
+
+The LLM encoder shares most of its features with the LLM model type, including base model selection, adapters, quantization, and initialization parameters like RoPE scaling. Unlike the LLM model type, the LLM encoder is part of an ECD architecture and does not generate text directly. Instead the input text is processed by the LLM and the final hidden state is passed forward to the combiner and decoder(s), allowing it to be used for predictive tasks directly.
+
+## Example LLM encoder config
+
+The `agnews` dataset contains the examples of news article titles and descriptions, and the task is to classify the examples into one of four section categories. A config to use LLMs to classify article titles may look like the following:
+
+```yaml
+model_type: ecd
+input_features:
+  - name: title
+    type: text
+    encoder:
+      type: llm
+      adapter:
+        type: lora
+      base_model: meta-llama/Llama-2-7b-hf
+      quantization:
+        bits: 4
+    column: title
+output_features:
+  - name: class
+    type: category
+    column: class
+trainer:
+  epochs: 3
+  optimizer:
+    type: paged_adam
+```
+
+This will fine-tune a 4-bit quantized LoRA adapter for `llama-2-7b` model and simultaneously train a classification head. The adapter weights, combiner parameters, and decoder parameters will be saved in the results after fine-tuning/training.
+
+To learn more about configuring LLMs for text classification, see the [LLM Encoder Reference](../../configuration/features/text_features.md#llm-encoders).
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -43,6 +43,7 @@ nav:
           - Large Language Models: user_guide/llms/index.md
           - Fine-Tuning: user_guide/llms/finetuning.md
           - In-Context Learning: user_guide/llms/in_context_learning.md
+          - Text Classification: user_guide/llms/text_classification.md
       - GPUs: user_guide/gpus.md
       - Distributed Training:
           - Distributed Training: user_guide/distributed_training/index.md