Skip to content

Commit

Permalink
Merge pull request #32 from unitaryai/updated_multilingual_model
Browse files Browse the repository at this point in the history
Updated multilingual model & consistent class names
  • Loading branch information
laurahanu authored Oct 27, 2021
2 parents acf8697 + 440d6de commit 0cccd59
Show file tree
Hide file tree
Showing 22 changed files with 371 additions and 174 deletions.
49 changes: 31 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,19 @@

## News & Updates

### 22-10-2021: New improved multilingual model & standardised class names
- Updated the `multilingual` model weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as the `unbiased` model. New best AUC score on the test set: 92.11 (89.71 before).
- All detoxify models now return consistent class names (e.g. "identity_attack" replaces "identity_hate" in the `original` model to match the `unbiased` classes).

### 03-09-2021: New improved unbiased model
- Updated the `unbiased` model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 0.93744 (0.93639 before).
- Updated the `unbiased` model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 93.74 (93.64 before).

### 15-02-2021: Detoxify featured in Scientific American!
- Our opinion piece ["Can AI identify toxic online content?"](https://www.scientificamerican.com/article/can-ai-identify-toxic-online-content/) is now live on Scientific American

### 14-01-2021: Lightweight models

- Added smaller models trained with Albert for the `original` and `unbiased` models! Can access these in the same way with detoxify using `original-small` and `unbiased-small` as inputs. The `original-small` achieved a mean AUC score of 0.98281 (0.98636 before) and the `unbiased-small` achieved a final score of 0.93362 (0.93639 before).
- Added smaller models trained with Albert for the `original` and `unbiased` models! Can access these in the same way with detoxify using `original-small` and `unbiased-small` as inputs. The `original-small` achieved a mean AUC score of 98.28 (98.64 before) and the `unbiased-small` achieved a final score of 93.36 (93.64 before).

## Description

Expand All @@ -38,16 +42,26 @@ Dependencies:
- Kaggle API (to download data)


| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score
| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score % | Detoxify Score %
|-|-|-|-|-|-|-|
| [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636
| [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93744
| [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655*
| [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 98.86 | 98.64
| [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 94.73 | 93.74
| [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 95.36 | 92.11

*Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available.

It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.

### Multilingual model language breakdown

| Language Subgroup | Subgroup size | Subgroup AUC Score % |
|:-----------|----------------:|---------------:|
🇮🇹 it | 8494 | 89.18 |
🇫🇷 fr | 10920 | 89.61 |
🇷🇺 ru | 10948 | 89.81 |
🇵🇹 pt | 11012 | 91.00 |
🇪🇸 es | 8438 | 92.74 |
🇹🇷 tr | 14000 | 97.19 |

## Limitations and ethical considerations

If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.
Expand Down Expand Up @@ -261,34 +275,33 @@ kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification

```bash

python create_val_set.py
# combine test.csv and test_labels.csv
python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-toxic-comment-classification-challenge/test.csv --update_test

python train.py --config configs/Toxic_comment_classification_BERT.json
```
### Unintended Bias in Toxicicity Challenge

```bash

python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json

```
### Multilingual Toxic Comment Classification

This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge.

The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).

```bash

The translated data ([source 1](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) [source 2](https://www.kaggle.com/ludovick/jigsawtanslatedgoogle)) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).

# stage 1

python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
```bash

# stage 2
# combine test.csv and test_labels.csv
python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-multilingual-toxic-comment-classification/test.csv --update_test

python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json --resume path_to_saved_checkpoint_stage1
python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json

```

### Monitor progress with tensorboard

```bash
Expand Down
55 changes: 42 additions & 13 deletions configs/Multilingual_toxic_comment_classification_XLMR.json
Original file line number Diff line number Diff line change
@@ -1,38 +1,67 @@
{
"name": "Jigsaw_XLMRoBERTa_multilingual",
"name": "Jigsaw_XLM_multilingual",
"n_gpu": 1,
"batch_size": 8,
"accumulate_grad_batches": 3,
"batch_size": 30,
"accumulate_grad_batches": 4,
"loss": "binary_cross_entropy",
"arch": {
"type": "XLMRoBERTa",
"type": "XLMRoberta",
"args": {
"num_classes": 1,
"num_classes": 16,
"model_type": "xlm-roberta-base",
"model_name": "XLMRobertaForSequenceClassification",
"tokenizer_name": "XLMRobertaTokenizer"
}
},
"dataset": {
"type": "JigsawDataMultilingual",
"type": "JigsawDataBias",
"args": {
"train_csv_file": [
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-es-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-fr-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-it-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-pt-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-ru-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train.csv"
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_es_clean.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_fr_clean.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_it_clean.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_pt_clean.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_ru_clean.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_tr_clean.csv",
"jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv"
],
"test_csv_file": "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
"val_fraction": null,
"create_val_set": false,
"test_csv_file": ["jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
"jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv"],
"loss_weight": 0.75,
"classes": [
"toxic"
"toxicity",
"severe_toxicity",
"obscene",
"identity_attack",
"insult",
"threat",
"sexual_explicit"
],
"identity_classes": [
"male",
"female",
"homosexual_gay_or_lesbian",
"christian",
"jewish",
"muslim",
"black",
"white",
"psychiatric_or_mental_illness"
]
}
},
"optimizer": {
"type": "Adam",
"args": {
"lr": 3e-5,
"weight_decay": 3e-6,
"lr": 3e-6,
"weight_decay": 3e-7,
"amsgrad": true
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,18 @@
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-it-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-pt-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-ru-cleaned.csv",
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv"
"jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv",
"jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv"
],
"test_csv_file": "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
"val_fraction": null,
"create_val_set": false,
"classes": [
"toxic"
"toxicity",
"severe_toxicity",
"obscene",
"identity_attack",
"insult",
"threat",
"sexual_explicit"
]
}
},
Expand Down
8 changes: 3 additions & 5 deletions configs/Toxic_comment_classification_ALBERT.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,14 @@
"args": {
"train_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv",
"test_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv",
"val_fraction": null,
"create_val_set": false,
"add_test_labels": false,
"classes": [
"toxic",
"severe_toxic",
"toxicity",
"severe_toxicity",
"obscene",
"threat",
"insult",
"identity_hate"
"identity_attack"
]
}
},
Expand Down
8 changes: 3 additions & 5 deletions configs/Toxic_comment_classification_BERT.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,14 @@
"args": {
"train_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv",
"test_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv",
"val_fraction": null,
"create_val_set": false,
"add_test_labels": false,
"classes": [
"toxic",
"severe_toxic",
"toxicity",
"severe_toxicity",
"obscene",
"threat",
"insult",
"identity_hate"
"identity_attack"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@
"args": {
"train_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv",
"test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
"val_fraction": null,
"create_val_set": false,
"loss_weight": 0.75,
"classes": [
"toxicity",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@
"args": {
"train_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv",
"test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
"val_fraction": null,
"create_val_set": false,
"loss_weight": 0.75,
"classes": [
"toxicity",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@
"jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv"
],
"test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
"val_fraction": null,
"create_val_set": false,
"loss_weight": 0.75,
"classes": [
"toxicity",
Expand Down
2 changes: 1 addition & 1 deletion convert_weights.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def main():
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckeckpoint",
"--checkpoint",
type=str,
help="path to model checkpoint",
)
Expand Down
16 changes: 13 additions & 3 deletions detoxify/detoxify.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
MODEL_URLS = {
"original": "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt",
"unbiased": "https://github.com/unitaryai/detoxify/releases/download/v0.3-alpha/toxic_debiased-c7548aa0.ckpt",
"multilingual": "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_multilingual-bbddc277.ckpt",
"multilingual": "https://github.com/unitaryai/detoxify/releases/download/v0.4-alpha/multilingual_debiased-0b549669.ckpt",
"original-small": "https://github.com/unitaryai/detoxify/releases/download/v0.1.2/original-albert-0e1d6498.ckpt",
"unbiased-small": "https://github.com/unitaryai/detoxify/releases/download/v0.1.2/unbiased-albert-c8519128.ckpt"
}
Expand Down Expand Up @@ -39,7 +39,13 @@ def load_checkpoint(model_type="original", checkpoint=None, device='cpu'):
with as well as the state dict"
)
class_names = loaded["config"]["dataset"]["args"]["classes"]

# standardise class names between models
change_names = {
"toxic": "toxicity",
"identity_hate": "identity_attack",
"severe_toxic": "severe_toxicity",
}
class_names = [change_names.get(cl, cl) for cl in class_names]
model, tokenizer = get_model_and_tokenizer(
**loaded["config"]["arch"]["args"], state_dict=loaded["state_dict"]
)
Expand All @@ -58,7 +64,7 @@ def load_model(model_type, checkpoint=None):
class Detoxify:
"""Detoxify
Easily predict if a comment or list of comments is toxic.
Can initialize 3 different model types from model type or checkpoint path:
Can initialize 5 different model types from model type or checkpoint path:
- original:
model trained on data from the Jigsaw Toxic Comment
Classification Challenge
Expand All @@ -68,6 +74,10 @@ class Detoxify:
- multilingual:
model trained on data from the Jigsaw Multilingual
Toxic Comment Classification Challenge
- original-small:
lightweight version of the original model
- unbiased-small:
lightweight version of the unbiased model
Args:
model_type(str): model type to be loaded, can be either original,
unbiased or multilingual
Expand Down
Binary file modified examples.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 3 additions & 14 deletions model_eval/compute_bias_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,7 @@
import numpy as np
from sklearn.metrics import roc_auc_score
import argparse


def compute_auc(y_true, y_pred):
try:
return roc_auc_score(y_true, y_pred)
except ValueError:
return np.nan


def compute_subgroup_auc(df, subgroup, label, model_name):
subgroup_examples = df[df[subgroup]]
return compute_auc(subgroup_examples[label], subgroup_examples[model_name])
from utils import compute_auc, compute_subgroup_auc


def compute_bpsn_auc(df, subgroup, label, model_name):
Expand All @@ -34,7 +23,7 @@ def compute_bnsp_auc(df, subgroup, label, model_name):


def compute_bias_metrics_for_model(
dataset, subgroups, model, label_col, include_asegs=False
dataset, subgroups, model, label_col
):
"""Computes per-subgroup metrics for all subgroups and one model."""
records = []
Expand Down Expand Up @@ -89,7 +78,7 @@ def main():
with open(TEST, "r") as f:
results = json.load(f)

test_private_path = "../jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_private_expanded.csv"
test_private_path = "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_private_expanded.csv"
test_private = pd.read_csv(test_private_path)
test_private = convert_dataframe_to_bool(test_private)

Expand Down
Loading

0 comments on commit 0cccd59

Please sign in to comment.