Skip to content

Commit

Permalink
LLM Text Encoder Docs (#330)
Browse files Browse the repository at this point in the history
* add llm text classification user guide

* add text classification user guide to toc

* add llm encoder reference
  • Loading branch information
jeffkinnison authored Dec 20, 2023
1 parent 1fa4714 commit 820ac21
Show file tree
Hide file tree
Showing 3 changed files with 108 additions and 2 deletions.
75 changes: 73 additions & 2 deletions docs/configuration/features/text_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Example text feature entry in the input features list:
name: text_column_name
type: text
tied: null
encoder:
encoder:
type: bert
trainable: true
```
Expand Down Expand Up @@ -281,6 +281,77 @@ Parameters:
{{ render_fields(schema_class_to_fields(hf_encoder, exclude=["type"])) }}
{% endfor %}

## LLM Encoders

``` mermaid
graph LR
A["12\n7\n43\n65\n23\n4\n1"] --> B["Pretrained\n LLM"];
B --> C["Last\n Hidden\n State"];
C --> ...;
```
{ data-search-exclude }

The LLM encoder processes text with a pretrained LLM (ex. `llama-2-7b`) passes the last hidden state of the LLM forward to the combiner. Like the [LLM model type](../large_langiage_model.md), adapter-based fine-tuning and quantization can be configured, and any combiner or decoder parameters will be bundled with the adapter weights.

Example config:

```yaml
encoder:
type: llm
base_model: meta-llama/Llama-2-7b-hf
adapter:
type: lora
quantization:
bits: 4
```

Parameters:

### Base Model

The `base_model` parameter specifies the pretrained large language model to serve
as the foundation of your custom LLM.

More information about the `base_model` parameter can be found [here](../configuration/large_language_model.md#base-model)

### Adapter

{% set adapter_classes = get_adapter_schemas() %}
{% for adapter in adapter_classes %}

### {{ adapter.name() }}

{{ adapter.description() }}

{{ render_yaml(adapter, parent="adapter") }}

{{ render_fields(schema_class_to_fields(adapter, exclude=["type"])) }}
{% endfor %}

More information about the adapter config can be found [here](../configuration/large_language_model.md#adapter).

### Quantization

!!! attention

Quantized fine-tuning currently requires using `adapter: lora`. In-context
learning does not have this restriction.

!!! attention

Quantization is currently only supported with `backend: local`.

{% set quantization = get_quantization_schema() %}
{{ render_yaml(quantization, parent="quantization") }}

{{ render_fields(schema_class_to_fields(quantization)) }}

More information about quantization parameters can be found [here](../configuration/large_language_model.md#quantization).

### Model Parameters

More information about the model initialization parameters can be found [here](../configuration/large_language_model.md#model-parameters).

# Output Features

Text output features are a special case of [Sequence Features](#sequence-output-features-and-decoders), so all options
Expand All @@ -304,7 +375,7 @@ loss:
robust_lambda: 0
class_weights: 1
class_similarities_temperature: 0
decoder:
decoder:
type: generator
```

Expand Down
34 changes: 34 additions & 0 deletions docs/user_guide/llms/text_classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Pretrained LLMs are available as text encoders for general text features, and can be included in ECD models for binary or multi-class text classification tasks.

The LLM encoder shares most of its features with the LLM model type, including base model selection, adapters, quantization, and initialization parameters like RoPE scaling. Unlike the LLM model type, the LLM encoder is part of an ECD architecture and does not generate text directly. Instead the input text is processed by the LLM and the final hidden state is passed forward to the combiner and decoder(s), allowing it to be used for predictive tasks directly.

## Example LLM encoder config

The `agnews` dataset contains the examples of news article titles and descriptions, and the task is to classify the examples into one of four section categories. A config to use LLMs to classify article titles may look like the following:

```yaml
model_type: ecd
input_features:
- name: title
type: text
encoder:
type: llm
adapter:
type: lora
base_model: meta-llama/Llama-2-7b-hf
quantization:
bits: 4
column: title
output_features:
- name: class
type: category
column: class
trainer:
epochs: 3
optimizer:
type: paged_adam
```
This will fine-tune a 4-bit quantized LoRA adapter for `llama-2-7b` model and simultaneously train a classification head. The adapter weights, combiner parameters, and decoder parameters will be saved in the results after fine-tuning/training.

To learn more about configuring LLMs for text classification, see the [LLM Encoder Reference](../../configuration/features/text_features.md#llm-encoders).
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ nav:
- Large Language Models: user_guide/llms/index.md
- Fine-Tuning: user_guide/llms/finetuning.md
- In-Context Learning: user_guide/llms/in_context_learning.md
- Text Classification: user_guide/llms/text_classification.md
- GPUs: user_guide/gpus.md
- Distributed Training:
- Distributed Training: user_guide/distributed_training/index.md
Expand Down

0 comments on commit 820ac21

Please sign in to comment.