Skip to content

Finetune the tasks

zhezhaoa edited this page Aug 12, 2023 · 41 revisions

UER-py supports the many downstream tasks, including text classification, pair classification, document-based question answering, sequence labeling, machine reading comprehension, etc. The options used for downstream task (specified by configuration file or command line) should be coincident with the pre-trained model. The pre-trained models used in this section can be found in Modelzoo. The datasets used in this section can be found in Downstream datasets

Classification

run_classifier.py adds two feedforward layers upon encoder layer. The example of using run_classifier.py

python3 finetune/run_classifier.py --pretrained_model_path models/google_zh_model.bin --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 64 \
                                   --embedding word pos seg --encoder transformer --mask fully_visible

CLS embedding is used for prediction in default (--pooling first).

The example of using run_classifier.py for pair classification:

python3 finetune/run_classifier.py --pretrained_model_path models/google_zh_model.bin --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --train_path datasets/lcqmc/train.tsv \
                                   --dev_path datasets/lcqmc/dev.tsv \
                                   --test_path datasets/lcqmc/test.tsv \
                                   --epochs_num 3 --batch_size 64 \
                                   --embedding word pos seg --encoder transformer --mask fully_visible

One can download the LCQMC dataset in Downstream datasets section and put it in datasets folder.

The example of using inference/run_classifier_infer.py to do inference:

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --seq_length 128 --labels_num 2 --output_logits --output_prob \
                                          --embedding word pos seg --encoder transformer --mask fully_visible

For classification, texts in text_a column are predicted. For pair classification, texts in text_a and text_b columns are predicted.
--labels_num specifies the number of labels.
--output_logits denotes the predicted logits are outputted,whose column name is logits.
--output_prob denotes the predicted probabilities are outputted,whose column name is prob.
--seq_length specifies the sequence length, which should be the same with setting in training stage.

Notice that BERT and RoBERTa have the same embedding and encoder. There is no difference between loading BERT and RoBERTa. Since configuration file specifies which modules are used, we do not have to specify the modules in command line:

python3 finetune/run_classifier.py --pretrained_model_path models/google_zh_model.bin --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 64

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --seq_length 128 --labels_num 2 --output_logits --output_prob

In the rest of the section, we do not explicitly specify the modules in command line for simplicity.

The example of using ALBERT for classification:

python3 finetune/run_classifier.py --pretrained_model_path models/google_zh_albert_base_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/albert/base_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 4e-5 --epochs_num 5 --batch_size 32

The performance of ALBERT is sensitive to hyper-parameter settings. Dropout is turned off in pre-training stage (See models/albert/base_config.json). It is recommended to set dropout to 0.1 in configuration file when fine-tuning ALBERT on downstream tasks.
The example of doing inference for ALBERT:

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/albert/base_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --labels_num 2

The example of using GPT-2 for classification:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_gpt2_seq1024_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/gpt2/config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 32 \
                                   --pooling mean

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/gpt2/config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --labels_num 2 --pooling mean

We use --pooling mean to obtain text representation. --pooling max and --pooling last can also be used in above case. --pooling first is not suitable since language model is used (--mask causal).

The example of using LSTM for classification:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_lstm_lm_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/rnn/lstm_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 1e-3 --epochs_num 5 --batch_size 64 \
                                   --pooling mean

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/rnn/lstm_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --labels_num 2 --pooling mean

The example of using ELMo for classification:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_elmo_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/rnn/bilstm_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 5e-4 --epochs_num 5 --batch_size 64 --seq_length 192 \
                                   --pooling max

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/rnn/bilstm_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --seq_length 192 --labels_num 2 \
                                          --pooling max

The example of using GatedCNN for classification:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_gatedcnn_lm_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/cnn/gatedcnn_9_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 5e-5 --epochs_num 5  --batch_size 64 \
                                   --pooling mean

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/cnn/gatedcnn_9_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --labels_num 2 --pooling mean

UER-py supports multi-task learning. Embedding and encoder layers are shared by different tasks. The example of training two sentiment analysis datasets:

python3 finetune/run_classifier_mt.py --pretrained_model_path models/google_zh_model.bin --vocab_path models/google_zh_vocab.txt \
                                      --config_path models/bert/base_config.json \
                                      --dataset_path_list datasets/book_review/ datasets/chnsenticorp/ \
                                      --epochs_num 1 --batch_size 64

--dataset_path_list specifies folder path list of different tasks. Each folder should contains train set train.tsv and development set dev.tsv .

The example of doing inference on above multi-task classifier:

python3 inference/run_classifier_mt_infer.py --load_model_path models/multitask_classifier_model.bin --vocab_path models/google_zh_vocab.txt \
                                             --config_path models/bert/base_config.json \
                                             --test_path datasets/book_review/test_nolabel.tsv \
                                             --prediction_path datasets/book_review/prediction.tsv \
                                             --labels_num_list 2 2 \
                                             --batch_size 64 --output_logits --output_prob

--test_path specifies the path of test file.
--prediction_path specifies the path of prediction result output file. It contains 3 columns: label, logits, and prob. The existence of the last two columns depends on --output_logits and --output_prob . The results of multiple tasks are separated by | in each column. An example of prediction result (the model containing two tasks) is as follows:

label    logits    prob
1|0	 1.5531 2.6371|3.2732 -3.3976	  0.2527 0.7473|0.9987 0.0013
0|0	 4.6869 -2.2779|1.9069 -2.2426	  0.9990 0.0009|0.9845 0.0155

UER-py supports grid search for classification task:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                        --vocab_path models/google_zh_vocab.txt \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/book_review/train.tsv \
                                        --dev_path datasets/book_review/dev.tsv \
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We use grid search to find optimal batch size, learning rate, and the number of epochs.

UER-py supports distillation for classification tasks.
First of all, we train a teacher model. We fine-tune upon a Chinese RoBERTa-WWM-large model (provided in model zoo):

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/large_config.json \
                                   --output_model_path models/teacher_classifier_model.bin \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 32

Then we use the teacher model to do inference. The pesudo labels and logits are generated:

python3 inference/run_classifier_infer.py --load_model_path models/teacher_classifier_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/large_config.json \
                                          --test_path text.tsv \
                                          --prediction_path label_logits.tsv \
                                          --labels_num 2 --output_logits

The input file text.tsv contains text to be predicted (see datasets/book_review/test_nolabel.tsv). text.tsv could be downstream dataset, e.g. using datasets/book_review/train.tsv as input (--test_path), or related external data. Larger transfer set often leads to better performance.
The output file label_logits.tsv contains label column and logits column. Then we obtain text_label_logits.tsv by combining text.tsv and label_logits.tsv . text_label_logits.tsv contains text_a column (text_a column and text_b column for pair classification), label column (hard label), and logits column (soft label).
Student model is BERT-tiny model. The pre-trained model is provided in model zoo. Then the student model learns the outputs (hard and soft labels) of the teacher model:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/tiny_config.json \
                                   --train_path text_label_logits.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 64 --soft_targets --soft_alpha 0.5

--soft_targets denotes that the model uses logits (soft label) for training. Mean-squared-error (MSE) is used as loss function.
--soft_alpha specifies the weight of the soft label loss. The loss function is weighted average of cross-entropy loss (for hard label) and mean-squared-error loss (for soft label).

UER-py supports adversarial training methods:

python3 finetune/run_classifier.py --pretrained_model_path models/google_zh_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --epochs_num 3 --batch_size 64 \
                                   --use_adv --adv_type fgm

UER-py supports to finetune siamese network on classification dataset. The two networks process two pieces of text respectively and their outputs are put together to predict the label of text pair. The example of using siamese network with Transformer encoder on OCNLI dataset:

python3 finetune/run_classifier_siamese.py --pretrained_model_path models/google_zh_model.bin \
                                           --vocab_path models/google_zh_vocab.txt \
                                           --config_path models/sbert/base_config.json \
                                           --train_path datasets/ocnli/train_50k.tsv \
                                           --dev_path datasets/ocnli/dev.tsv \
                                           --epochs_num 3 --batch_size 32

python3 inference/run_classifier_siamese_infer.py --load_model_path models/finetuned_model.bin \
                                                  --vocab_path models/google_zh_vocab.txt \
                                                  --config_path models/sbert/base_config.json \
                                                  --test_path datasets/ocnli/test_nolabel.tsv \
                                                  --prediction_path datasets/ocnli/prediction.tsv \
                                                  --labels_num 3

The example of using siamese network with LSTM encoder on OCNLI dataset:

python3 finetune/run_classifier_siamese.py --vocab_path models/google_zh_vocab.txt \
                                           --config_path models/rnn/siamese_lstm_config.json \
                                           --train_path datasets/ocnli/train_50k.tsv \
                                           --dev_path datasets/ocnli/dev.tsv \
                                           --learning_rate 1e-4 --epochs_num 3 --batch_size 32

python3 inference/run_classifier_siamese_infer.py --load_model_path models/finetuned_model.bin \
                                                  --vocab_path models/google_zh_vocab.txt \
                                                  --config_path models/rnn/siamese_lstm_config.json \
                                                  --test_path datasets/ocnli/test_nolabel.tsv \
                                                  --prediction_path datasets/ocnli/prediction.tsv \
                                                  --labels_num 3

UER-py supports prompt-based learning.

The example of fine-tuning and doing inference with character-based models:

python3 finetune/run_classifier_prompt.py --pretrained_model_path models/google_zh_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --train_path datasets/chnsenticorp/train.tsv \
                                          --dev_path datasets/chnsenticorp/dev.tsv \
                                          --test_path datasets/chnsenticorp/test.tsv \
                                          --prompt_path models/prompts.json --prompt_id chnsenticorp_char \
                                          --learning_rate 3e-5 --epochs_num 3 --batch_size 64

python3 inference/run_classifier_prompt_infer.py --load_model_path models/finetuned_model.bin \
                                                 --vocab_path models/google_zh_vocab.txt \
                                                 --config_path models/bert/base_config.json \
                                                 --test_path datasets/chnsenticorp/test_nolabel.tsv \
                                                 --prediction_path datasets/chnsenticorp/prediction.tsv \
                                                 --prompt_path models/prompts.json --prompt_id chnsenticorp_char

The example of fine-tuning and doing inference with word-based models:

python3 finetune/run_classifier_prompt.py --pretrained_model_path models/cluecorpussmall_word_roberta_base_seq512_model.bin \
                                          --spm_model_path models/cluecorpussmall_spm.model \
                                          --config_path models/bert/base_config.json \
                                          --train_path datasets/chnsenticorp/train.tsv \
                                          --dev_path datasets/chnsenticorp/dev.tsv \
                                          --test_path datasets/chnsenticorp/test.tsv \
                                          --prompt_path models/prompts.json --prompt_id chnsenticorp_word \
                                          --learning_rate 3e-5 --epochs_num 3 --batch_size 64

python3 inference/run_classifier_prompt_infer.py --load_model_path models/finetuned_model.bin \
                                                 --spm_model_path models/cluecorpussmall_spm.model \
                                                 --config_path models/bert/base_config.json \
                                                 --test_path datasets/chnsenticorp/test_nolabel.tsv \
                                                 --prediction_path datasets/chnsenticorp/prediction.tsv \
                                                 --prompt_path models/prompts.json --prompt_id chnsenticorp_word

The example of zero-shot learning with character-based and word-based models:

python3 finetune/run_classifier_prompt.py --pretrained_model_path models/google_zh_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --train_path datasets/chnsenticorp/train.tsv \
                                          --dev_path datasets/chnsenticorp/dev.tsv \
                                          --test_path datasets/chnsenticorp/test.tsv \
                                          --prompt_path models/prompts.json --prompt_id chnsenticorp_char \
                                          --epochs_num 0

python3 finetune/run_classifier_prompt.py --pretrained_model_path models/cluecorpussmall_word_roberta_base_seq512_model.bin \
                                          --spm_model_path models/cluecorpussmall_spm.model \
                                          --config_path models/bert/base_config.json \
                                          --train_path datasets/chnsenticorp/train.tsv \
                                          --dev_path datasets/chnsenticorp/dev.tsv \
                                          --test_path datasets/chnsenticorp/test.tsv \
                                          --prompt_path models/prompts.json --prompt_id chnsenticorp_word \
                                          --epochs_num 0

--epochs_num 0 denotes that the training samples are not used.

UER-py supports multi-label classification where multiple labels may be assigned to each instance. We use Toxic Comment Classification Challenge as an example. The example of fine-tuning and doing inference with BERT on Toxic Comment Classification Challenge(in UER format):

python3 finetune/run_classifier_multi_label.py --pretrained_model_path models/bert_base_en_uncased_model.bin \
                                               --vocab_path models/google_uncased_en_vocab.txt \
                                               --config_path models/bert/base_config.json \
                                               --train_path datasets/toxic_comment/train.tsv \
                                               --dev_path datasets/toxic_comment/dev.tsv \
                                               --epochs_num 3 --batch_size 64 --seq_length 128

python3 inference/run_classifier_multi_label_infer.py --load_model_path models/finetuned_model.bin \
                                                      --vocab_path models/google_uncased_en_vocab.txt \
                                                      --config_path models/bert/base_config.json \
                                                      --test_path datasets/toxic_comment/dev.tsv \
                                                      --prediction_path datasets/toxic_comment/prediction.tsv \
                                                      --seq_length 128 --labels_num 7

Document-based question answering

run_dbqa.py uses the same network architecture with run_classifier.py . The document-based question answering (DBQA) can be converted to classification task. Column text_a contains question and column text_b contains sentence which may has answer. The example of using run_dbqa.py:

python3 finetune/run_dbqa.py --pretrained_model_path models/google_zh_model.bin \
                             --vocab_path models/google_zh_vocab.txt \
                             --config_path models/bert/base_config.json \
                             --train_path datasets/nlpcc-dbqa/train.tsv \
                             --dev_path datasets/nlpcc-dbqa/dev.tsv \
                             --test_path datasets/nlpcc-dbqa/test.tsv \
                             --epochs_num 3 --batch_size 64

The example of using inference/run_classifier_infer.py to do inference for DBQA:

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --test_path datasets/nlpcc-dbqa/test.tsv \
                                          --prediction_path datasets/nlpcc-dbqa/prediction.tsv \
                                          --labels_num 2 --output_logits --output_prob

The example of using ALBERT for DBQA:

python3 finetune/run_dbqa.py --pretrained_model_path models/google_zh_albert_base_model.bin \
                             --vocab_path models/google_zh_vocab.txt \
                             --config_path models/albert/base_config.json \
                             --train_path datasets/nlpcc-dbqa/train.tsv \
                             --dev_path datasets/nlpcc-dbqa/dev.tsv \
                             --test_path datasets/nlpcc-dbqa/test.tsv \
                             --epochs_num 3 --batch_size 64

The example of doing inference for ALBERT:

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/albert/base_config.json \
                                          --test_path datasets/nlpcc-dbqa/test.tsv \
                                          --prediction_path datasets/nlpcc-dbqa/prediction.tsv \
                                          --labels_num 2

Regression

The example of using run_regression.py on English regression task STS-B (GLUE benchmart):

python3 finetune/run_regression.py --pretrained_model_path models/bert_base_en_uncased_model.bin \
                                   --vocab_path models/google_uncased_en_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --train_path datasets/STS-B/train.tsv \
                                   --dev_path datasets/STS-B/dev.tsv \
                                   --epochs_num 3 --batch_size 64

The example of using inference/run_regression_infer.py:

python3 inference/run_regression_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_uncased_en_vocab.txt \
                                          --config_path models/bert/base_config.json \
                                          --test_path datasets/STS-B/test_nolabel.tsv \
                                          --prediction_path datasets/STS-B/prediction.tsv

Sequence labeling

run_ner.py adds one feedforward layer upon encoder layer. The example of using run_ner.py:

python3 finetune/run_ner.py --pretrained_model_path models/google_zh_model.bin --vocab_path models/google_zh_vocab.txt \
                            --config_path models/bert/base_config.json \
                            --train_path datasets/msra_ner/train.tsv \
                            --dev_path datasets/msra_ner/dev.tsv \
                            --test_path datasets/msra_ner/test.tsv \
                            --label2id_path datasets/msra_ner/label2id.json \
                            --epochs_num 5 --batch_size 16

The example of doing inference:

python3 inference/run_ner_infer.py --load_model_path models/finetuned_model.bin --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/bert/base_config.json \
                                   --test_path datasets/msra_ner/test_nolabel.tsv \
                                   --prediction_path datasets/msra_ner/prediction.tsv \
                                   --label2id_path datasets/msra_ner/label2id.json

The example of using ALBERT for NER:

python3 finetune/run_ner.py --pretrained_model_path models/google_zh_albert_base_model.bin \
                            --vocab_path models/google_zh_vocab.txt \
                            --config_path models/albert/base_config.json \
                            --train_path datasets/msra_ner/train.tsv \
                            --dev_path datasets/msra_ner/dev.tsv \
                            --test_path datasets/msra_ner/test.tsv \
                            --label2id_path datasets/msra_ner/label2id.json \
                            --learning_rate 1e-4 --epochs_num 5 --batch_size 16

python3 inference/run_ner_infer.py --load_model_path models/finetuned_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/albert/base_config.json \
                                   --test_path datasets/msra_ner/test_nolabel.tsv \
                                   --prediction_path datasets/msra_ner/prediction.tsv \
                                   --label2id_path datasets/msra_ner/label2id.json

The example of using ELMo for NER:

python3 finetune/run_ner.py --pretrained_model_path models/cluecorpussmall_elmo_model.bin \
                            --vocab_path models/google_zh_vocab.txt \
                            --config_path models/rnn/bilstm_config.json \
                            --train_path datasets/msra_ner/train.tsv \
                            --dev_path datasets/msra_ner/dev.tsv \
                            --test_path datasets/msra_ner/test.tsv \
                            --label2id_path datasets/msra_ner/label2id.json \
                            --learning_rate 5e-4 --epochs_num 5  --batch_size 16

python3 inference/run_ner_infer.py --load_model_path models/finetuned_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/rnn/bilstm_config.json \
                                   --test_path datasets/msra_ner/test_nolabel.tsv \
                                   --prediction_path datasets/msra_ner/prediction.tsv \
                                   --label2id_path datasets/msra_ner/label2id.json

We can use --crf_target to denote to use CRF in sequence labeling downstream task. Notice that --crf_target doesn't support multi-GPU mode:

CUDA_VISIBLE_DEVICES=0 python3 finetune/run_ner.py --pretrained_model_path models/cluecorpussmall_elmo_model.bin \
                                                   --vocab_path models/google_zh_vocab.txt \
                                                   --config_path models/rnn/bilstm_config.json \
                                                   --train_path datasets/msra_ner/train.tsv \
                                                   --dev_path datasets/msra_ner/dev.tsv \
                                                   --test_path datasets/msra_ner/test.tsv \
                                                   --label2id_path datasets/msra_ner/label2id.json \
                                                   --learning_rate 5e-4 --epochs_num 5 --batch_size 16 \
                                                   --crf_target

CUDA_VISIBLE_DEVICES=0 python3 inference/run_ner_infer.py --load_model_path models/finetuned_model.bin \
                                                          --vocab_path models/google_zh_vocab.txt \
                                                          --config_path models/rnn/bilstm_config.json \
                                                          --test_path datasets/msra_ner/test_nolabel.tsv \
                                                          --prediction_path datasets/msra_ner/prediction.tsv \
                                                          --label2id_path datasets/msra_ner/label2id.json \
                                                          --crf_target

Machine reading comprehension

run_cmrc.py adds one feedforward layer upon encoder layer. The example of using run_cmrc.py for Chinese Machine Reading Comprehension (CMRC):

python3 finetune/run_cmrc.py --pretrained_model_path models/google_zh_model.bin \
                             --vocab_path models/google_zh_vocab.txt \
                             --config_path models/bert/base_config.json \
                             --train_path datasets/cmrc2018/train.json \
                             --dev_path datasets/cmrc2018/dev.json \
                             --epochs_num 2 --batch_size 8 --seq_length 512

The train.json and dev.json are of squad-style. Train set and development set are available here. --test_path option is not specified since test set is not publicly available.

The example of doing inference:

python3  inference/run_cmrc_infer.py --load_model_path models/finetuned_model.bin \
                                     --vocab_path models/google_zh_vocab.txt \
                                     --config_path models/bert/base_config.json \
                                     --test_path datasets/cmrc2018/test.json \
                                     --prediction_path datasets/cmrc2018/prediction.json \
                                     --seq_length 512

The example of using ALBERT-xxlarge for CMRC:

python3 finetune/run_cmrc.py --pretrained_model_path models/google_zh_albert_xxlarge_model.bin \
                             --vocab_path models/google_zh_vocab.txt \
                             --config_path models/albert/xxlarge_config.json \
                             --train_path datasets/cmrc2018/train.json \
                             --dev_path datasets/cmrc2018/dev.json \
                             --learning_rate 1e-5 --epochs_num 2 --batch_size 8 --seq_length 512

The example of doing inference for ALBERT:

python3 inference/run_cmrc_infer.py --load_model_path models/finetuned_model.bin \
                                    --vocab_path models/google_zh_vocab.txt \
                                    --config_path models/albert/xxlarge_config.json \
                                    --test_path datasets/cmrc2018/test.json \
                                    --prediction_path datasets/cmrc2018/prediction.json \
                                    --seq_length 512

Multiple choice

C3 is a multiple choice dataset. Given context and question, one need to select one answer from four candidate answers. run_c3.py adds one feedforward layer upon encoder layer. The example of using run_c3.py for multiple choice task:

python3 finetune/run_c3.py --pretrained_model_path models/google_zh_model.bin \
                           --vocab_path models/google_zh_vocab.txt \
                           --config_path models/bert/base_config.json \
                           --train_path datasets/c3/train.json \
                           --dev_path datasets/c3/dev.json \
                           --epochs_num 8 --batch_size 8 --seq_length 512 --max_choices_num 4

--test_path option is not specified since test set of C3 dataset is not publicly available.
The actual batch size is --batch_size times --max_choices_num .
The question in C3 dataset contains at most 4 candidate answers. --max_choices_num is set to 4.

The example of doing inference:

python3 inference/run_c3_infer.py --load_model_path models/finetuned_model.bin \
                                  --vocab_path models/google_zh_vocab.txt \
                                  --config_path models/bert/base_config.json \
                                  --test_path datasets/c3/test.json \
                                  --prediction_path datasets/c3/prediction.json \
                                  --max_choices_num 4 --seq_length 512

The example of using ALBERT-xlarge for C3:

python3 finetune/run_c3.py --pretrained_model_path models/google_zh_albert_xlarge_model.bin \
                           --vocab_path models/google_zh_vocab.txt \
                           --config_path models/albert/xlarge_config.json \
                           --train_path datasets/c3/train.json \
                           --dev_path datasets/c3/dev.json \
                           --epochs_num 8 --batch_size 8 --seq_length 512 --max_choices_num 4

The example of doing inference for ALBERT-large:

python3  inference/run_c3_infer.py --load_model_path models/finetuned_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/albert/xlarge_config.json \
                                   --test_path datasets/c3/test.json \
                                   --prediction_path datasets/c3/prediction.json \
                                   --max_choices_num 4 --seq_length 512

IDiom Cloze

ChID is a Chinese idiom cloze for cloze test. Given the context and alternative idioms, one need to fill in the appropriate idioms in the text. The example of using run_chid.py for idiom cloze task:

python3 finetune/run_chid.py --pretrained_model_path models/google_zh_model.bin \
                             --vocab_path models/google_zh_vocab.txt \
                             --config_path models/bert/base_config.json \
                             --train_path datasets/chid/train.json \
                             --train_answer_path datasets/chid/train_answer.json \
                             --dev_path datasets/chid/dev.json \
                             --dev_answer_path datasets/chid/dev_answer.json \
                             --batch_size 24 --seq_length 64 --max_choices_num 10

The example of doing inference:

python3 inference/run_chid_infer.py --load_model_path models/finetuned_model.bin \
                                    --vocab_path models/google_zh_vocab.txt \
                                    --config_path models/bert/base_config.json \
                                    --test_path datasets/chid/test.json \
                                    --prediction_path datasets/chid/prediction.json \
                                    --seq_length 64 --max_choices_num 10

Fine-tuning in a text2text manner

T5 proposes to use seq2seq model to unify NLU and NLG tasks. run_text2text.py loads a seq2seq model and finetune the model in a text2text manner, i.e. the input and output are always text strings. The example of using run_text2text.py for book review task:

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_t5_small_seq512_model.bin \
                                  --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                  --config_path models/t5/small_config.json \
                                  --train_path datasets/book_review_text2text/train.tsv \
                                  --dev_path datasets/book_review_text2text/dev.tsv \
                                  --learning_rate 3e-4 --epochs_num 3 --batch_size 32 --seq_length 128 --tgt_seq_length 8

We download the pre-trained T5 model and finetune upon it. Users can download book review dataset of text2text format from here. The example of doing inference:

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                         --config_path models/t5/small_config.json \
                                         --test_path datasets/book_review_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/book_review_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8

The example of fine-tuning and doing inference with BART-large:

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_bart_large_seq512_model.bin \
                                  --vocab_path models/google_zh_vocab.txt \
                                  --config_path models/bart/large_config.json \
                                  --train_path datasets/book_review_text2text/train.tsv \
                                  --dev_path datasets/book_review_text2text/dev.tsv \
                                  --learning_rate 1e-5 --epochs_num 3 --batch_size 32 --seq_length 128 --tgt_seq_length 8

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_vocab.txt \
                                         --config_path models/bart/large_config.json \
                                         --test_path datasets/book_review_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/book_review_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8

The example of fine-tuning and doing inference with PEGASUS-base:

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_pegasus_base_seq512_model.bin \
                                  --vocab_path models/google_zh_vocab.txt \
                                  --config_path models/pegasus/base_config.json \
                                  --train_path datasets/book_review_text2text/train.tsv \
                                  --dev_path datasets/book_review_text2text/dev.tsv \
                                  --learning_rate 1e-5 --epochs_num 3 --batch_size 32 --seq_length 128 --tgt_seq_length 8

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_vocab.txt \
                                         --config_path models/pegasus/base_config.json \
                                         --test_path datasets/book_review_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/book_review_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8

SimCSE

SimCSE proposes to use dropout to construct positive sample for contrastive learning. The example of using run_simcse.py for unsupervised representation learning:

python3 finetune/run_simcse.py --pretrained_model_path models/bert_base_en_uncased_model.bin \
                               --vocab_path models/google_uncased_en_vocab.txt \
                               --config_path models/bert/base_config.json \
                               --train_path datasets/STS-B/train_unsup.tsv \
                               --dev_path datasets/STS-B/dev.tsv \
                               --learning_rate 1e-5 --epochs_num 1 --batch_size 64 --seq_length 64 \
                               --pooling first --temperature 0.05 --eval_steps 200

We download the pre-trained English BERT model and finetune upon it. Users can download STS-B dataset in downstream dataset section. The train set consists of unsupervised sentence (--train_path datasets/STS-B/train_unsup.tsv). The development set contains sentence pair and similarity score (--dev_path datasets/STS-B/dev.tsv). The example of extracting features with the model finetuned by SimCSE:

python3 scripts/extract_features.py --load_model_path models/finetuned_model.bin \
                                    --vocab_path models/google_uncased_en_vocab.txt \
                                    --config_path models/bert/base_config.json \
                                    --test_path datasets/tencent_profile.txt \
                                    --prediction_path features.pt \
                                    --pooling first
Clone this wiki locally