Skip to content

Commit

Permalink
Merge branch 'main' into fix-json-serializable
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver authored Aug 19, 2024
2 parents e911214 + 73dce0c commit 91680dd
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155
The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.

## News 🗞️
* **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻
* **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁
* **March 12, 2024:** We release StarChat2 15B, along with the recipe to train capable coding assistants 🌟
* **March 1, 2024:** We release Zephyr 7B Gemma, which is a new recipe to align Gemma 7B with RLAIF 🔥
Expand Down
19 changes: 19 additions & 0 deletions recipes/smollm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

# Instructions to train SmolLM-Instruct

We build the [SmolLM-Instruct](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966) (v0.2) models (135M, 360M and 1.7B) by doing SFT on a mix of these datasets:
- a dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/)
- [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)
- [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k)
- A small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)

## Setup

Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions

## Training
We train the models on 8 GPUs using the following command:

```shell
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm/sft/config.yaml
```
53 changes: 53 additions & 0 deletions recipes/smollm/sft/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Model arguments
model_name_or_path: HuggingFaceTB/SmolLM-360M
model_revision: main
tokenizer_name_or_path: HuggingFaceTB/SmolLM-360M-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens
torch_dtype: bfloat16
use_flash_attention_2: true

# Data training arguments
dataset_mixer:
HuggingFaceTB/Magpie-Pro-300K-Filtered-H4: 1.0
HuggingFaceTB/self-oss-instruct-sc2-H4: 1.0
HuggingFaceTB/OpenHermes-2.5-H4: 0.001
HuggingFaceTB/everyday-conversations-llama3.1-2k: 1.0
HuggingFaceTB/instruct-data-basics-smollm-H4: 1.0

dataset_splits:
- train_sft
- test_sft
preprocessing_num_workers: 36

# SFT trainer config
bf16: true
dataset_kwargs:
add_special_tokens: false # We already wrap <bos> and <eos> in the chat template
append_concat_token: false # No need to add <eos> across samples
do_eval: true
evaluation_strategy: epoch
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: smollm-360M-instruct-new
hub_strategy: every_save
learning_rate: 1.0e-03 # 3e-4
log_level: info
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine
max_seq_length: 2048
max_steps: -1
num_train_epochs: 1
output_dir: data/smollm-360M-instruct-new
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 4
push_to_hub: true
remove_unused_columns: true
report_to:
- tensorboard
- wandb
save_strategy: "no"
seed: 42
warmup_ratio: 0.1

0 comments on commit 91680dd

Please sign in to comment.