Pretrained Language Model

This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei Noah's Ark Lab.

Directory structure

NEZHA-TensorFlow is a pretrained Chinese language model which achieves the state-of-the-art performances on several Chinese NLP tasks developed by TensorFlow.
NEZHA-PyTorch is the PyTorch version of NEZHA.
NEZHA-Gen-TensorFlow provides two GPT models. One is Yuefu (乐府), a Chinese Classical Poetry generation model, the other is a common Chinese GPT model.
TinyBERT is a compressed BERT model which achieves 7.5x smaller and 9.4x faster on inference.
TinyBERT-MindSpore is a MindSpore version of TinyBERT.
DynaBERT is a dynamic BERT model with adaptive width and depth.
BBPE provides a byte-level vocabulary building tool and its correspoinding tokenizer.
PMLM is an improved method for pretrained language model. Trained without the complex two-stream self-attention, PMLM can be treated as a simple approximation of XLNet.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
BBPE		BBPE
DynaBERT		DynaBERT
NEZHA-Gen-TensorFlow		NEZHA-Gen-TensorFlow
NEZHA-PyTorch		NEZHA-PyTorch
NEZHA-TensorFlow		NEZHA-TensorFlow
PMLM		PMLM
TinyBERT-MindSpore		TinyBERT-MindSpore
TinyBERT		TinyBERT
README.md		README.md