Merge pull request #32 from unitaryai/updated_multilingual_model

Updated multilingual model & consistent class names
unitaryai · Oct 27, 2021 · 0cccd59 · 0cccd59
2 parents acf8697 + 440d6de
commit 0cccd59
Show file tree

Hide file tree

Showing 22 changed files with 371 additions and 174 deletions.
diff --git a/README.md b/README.md
@@ -14,15 +14,19 @@
 
 ## News & Updates
 
+### 22-10-2021: New improved multilingual model & standardised class names
+-  Updated the `multilingual` model weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as the `unbiased` model. New best AUC score on the test set: 92.11 (89.71 before).
+- All detoxify models now return consistent class names (e.g. "identity_attack" replaces "identity_hate" in the `original` model to match the `unbiased` classes).
+
 ### 03-09-2021: New improved unbiased model
--  Updated the `unbiased` model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 0.93744 (0.93639 before).
+-  Updated the `unbiased` model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 93.74 (93.64 before).
 
 ### 15-02-2021: Detoxify featured in Scientific American!
 - Our opinion piece ["Can AI identify toxic online content?"](https://www.scientificamerican.com/article/can-ai-identify-toxic-online-content/) is now live on Scientific American
 
 ### 14-01-2021: Lightweight models
 
-- Added smaller models trained with Albert for the `original` and `unbiased` models! Can access these in the same way with detoxify using `original-small` and `unbiased-small` as inputs. The `original-small` achieved a mean AUC score of 0.98281 (0.98636 before) and the `unbiased-small` achieved a final score of 0.93362 (0.93639 before).
+- Added smaller models trained with Albert for the `original` and `unbiased` models! Can access these in the same way with detoxify using `original-small` and `unbiased-small` as inputs. The `original-small` achieved a mean AUC score of 98.28 (98.64 before) and the `unbiased-small` achieved a final score of 93.36 (93.64 before).
 
 ## Description   
 
@@ -38,16 +42,26 @@ Dependencies:
   - Kaggle API (to download data)
 
 
-| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score
+| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score % | Detoxify Score %
 |-|-|-|-|-|-|-|
-| [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 |  build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636
-| [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93744
-| [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655*
+| [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 |  build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 98.86 | 98.64
+| [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 94.73 | 93.74
+| [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 95.36 | 92.11
 
-*Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available. 
 
 It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.
 
+### Multilingual model language breakdown
+
+| Language Subgroup   |   Subgroup size |   Subgroup AUC Score % |
+|:-----------|----------------:|---------------:|
+🇮🇹 it       |     8494   |   89.18 |
+🇫🇷 fr      |    10920   |   89.61 |
+🇷🇺 ru     |    10948   |   89.81 |
+🇵🇹 pt      |    11012   |   91.00 |
+🇪🇸 es      |     8438   |   92.74 |
+🇹🇷 tr     |    14000   |   97.19 |
+
 ## Limitations and ethical considerations
 
 If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.
@@ -261,34 +275,33 @@ kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification
 
  ```bash
 
-python create_val_set.py
+# combine test.csv and test_labels.csv
+python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-toxic-comment-classification-challenge/test.csv --update_test
 
 python train.py --config configs/Toxic_comment_classification_BERT.json
 ``` 
  ### Unintended Bias in Toxicicity Challenge
 
 ```bash
 
-python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
+python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json
 
 ```
  ### Multilingual Toxic Comment Classification
-
- This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge. 
 
- The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).
-
- ```bash
+
+ The translated data ([source 1](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) [source 2](https://www.kaggle.com/ludovick/jigsawtanslatedgoogle)) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).
 
-# stage 1
 
-python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
+```bash
 
-# stage 2
+# combine test.csv and test_labels.csv
+python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-multilingual-toxic-comment-classification/test.csv --update_test
 
-python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json --resume path_to_saved_checkpoint_stage1
+python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
 
 ```
+
 ### Monitor progress with tensorboard
 
  ```bash

diff --git a/configs/Multilingual_toxic_comment_classification_XLMR.json b/configs/Multilingual_toxic_comment_classification_XLMR.json
@@ -1,38 +1,67 @@
 {
-    "name": "Jigsaw_XLMRoBERTa_multilingual",
+    "name": "Jigsaw_XLM_multilingual",
     "n_gpu": 1,
-    "batch_size": 8,
-    "accumulate_grad_batches": 3,
+    "batch_size": 30,
+    "accumulate_grad_batches": 4,
     "loss": "binary_cross_entropy",
     "arch": {
-        "type": "XLMRoBERTa",
+        "type": "XLMRoberta",
         "args": {
-            "num_classes": 1,
+            "num_classes": 16,
             "model_type": "xlm-roberta-base",
             "model_name": "XLMRobertaForSequenceClassification",
             "tokenizer_name": "XLMRobertaTokenizer"
         }
     },
     "dataset": {
-        "type": "JigsawDataMultilingual",
+        "type": "JigsawDataBias",
         "args": {
             "train_csv_file": [
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-es-cleaned.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-fr-cleaned.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-it-cleaned.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-pt-cleaned.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-ru-cleaned.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv",
                 "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv",
-                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train.csv"
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_es_clean.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_fr_clean.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_it_clean.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_pt_clean.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_ru_clean.csv",
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-unintended-bias-train_only_tr_clean.csv",
+                "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv"
             ],
-            "test_csv_file": "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
-            "val_fraction": null,
-            "create_val_set": false,
+            "test_csv_file": ["jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
+                              "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv"],
+            "loss_weight": 0.75,
             "classes": [
-                "toxic"
+                "toxicity",
+                "severe_toxicity",
+                "obscene",
+                "identity_attack",
+                "insult",
+                "threat",
+                "sexual_explicit"
+            ],
+            "identity_classes": [
+                "male",
+                "female",
+                "homosexual_gay_or_lesbian",
+                "christian",
+                "jewish",
+                "muslim",
+                "black",
+                "white",
+                "psychiatric_or_mental_illness"
             ]
         }
     },
     "optimizer": {
         "type": "Adam",
         "args": {
-            "lr": 3e-5,
-            "weight_decay": 3e-6,
+            "lr": 3e-6,
+            "weight_decay": 3e-7,
             "amsgrad": true
         }
     }

diff --git a/configs/Multilingual_toxic_comment_classification_XLMR_stage2.json b/configs/Multilingual_toxic_comment_classification_XLMR_stage2.json
@@ -22,13 +22,18 @@
                 "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-it-cleaned.csv",
                 "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-pt-cleaned.csv",
                 "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-ru-cleaned.csv",
-                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv"
+                "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train-google-tr-cleaned.csv",
+                "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv"
             ],
             "test_csv_file": "jigsaw_data/jigsaw-multilingual-toxic-comment-classification/validation.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "classes": [
-                "toxic"
+                "toxicity",
+                "severe_toxicity",
+                "obscene",
+                "identity_attack",
+                "insult",
+                "threat",
+                "sexual_explicit"
             ]
         }
     },

diff --git a/configs/Toxic_comment_classification_ALBERT.json b/configs/Toxic_comment_classification_ALBERT.json
@@ -18,16 +18,14 @@
         "args": {
             "train_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv",
             "test_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "add_test_labels": false,
             "classes": [
-                "toxic",
-                "severe_toxic",
+                "toxicity",
+                "severe_toxicity",
                 "obscene",
                 "threat",
                 "insult",
-                "identity_hate"
+                "identity_attack"
             ]
         }
     },

diff --git a/configs/Toxic_comment_classification_BERT.json b/configs/Toxic_comment_classification_BERT.json
@@ -18,16 +18,14 @@
         "args": {
             "train_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv",
             "test_csv_file": "jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "add_test_labels": false,
             "classes": [
-                "toxic",
-                "severe_toxic",
+                "toxicity",
+                "severe_toxicity",
                 "obscene",
                 "threat",
                 "insult",
-                "identity_hate"
+                "identity_attack"
             ]
         }
     },

diff --git a/configs/Unintended_bias_toxic_comment_classification_Albert.json b/configs/Unintended_bias_toxic_comment_classification_Albert.json
@@ -19,8 +19,6 @@
         "args": {
             "train_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv",
             "test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "loss_weight": 0.75,
             "classes": [
                 "toxicity",

diff --git a/configs/Unintended_bias_toxic_comment_classification_RoBERTa.json b/configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
@@ -19,8 +19,6 @@
         "args": {
             "train_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/train.csv",
             "test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "loss_weight": 0.75,
             "classes": [
                 "toxicity",

diff --git a/configs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json b/configs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json
@@ -22,8 +22,6 @@
                 "jigsaw_data/jigsaw-toxic-comment-classification-challenge/train.csv"
             ],
             "test_csv_file": "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_public_expanded.csv",
-            "val_fraction": null,
-            "create_val_set": false,
             "loss_weight": 0.75,
             "classes": [
                 "toxicity",

diff --git a/convert_weights.py b/convert_weights.py
@@ -32,7 +32,7 @@ def main():
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument(
-        "--ckeckpoint",
+        "--checkpoint",
         type=str,
         help="path to model checkpoint",
     )

diff --git a/detoxify/detoxify.py b/detoxify/detoxify.py
@@ -4,7 +4,7 @@
 MODEL_URLS = {
     "original": "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt",
     "unbiased": "https://github.com/unitaryai/detoxify/releases/download/v0.3-alpha/toxic_debiased-c7548aa0.ckpt",
-    "multilingual": "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_multilingual-bbddc277.ckpt",
+    "multilingual": "https://github.com/unitaryai/detoxify/releases/download/v0.4-alpha/multilingual_debiased-0b549669.ckpt",
     "original-small": "https://github.com/unitaryai/detoxify/releases/download/v0.1.2/original-albert-0e1d6498.ckpt",
     "unbiased-small": "https://github.com/unitaryai/detoxify/releases/download/v0.1.2/unbiased-albert-c8519128.ckpt"
 }
@@ -39,7 +39,13 @@ def load_checkpoint(model_type="original", checkpoint=None, device='cpu'):
                     with as well as the state dict"
             )
     class_names = loaded["config"]["dataset"]["args"]["classes"]
-
+    # standardise class names between models
+    change_names = {
+        "toxic": "toxicity",
+        "identity_hate": "identity_attack",
+        "severe_toxic": "severe_toxicity",
+    }
+    class_names = [change_names.get(cl, cl) for cl in class_names]
     model, tokenizer = get_model_and_tokenizer(
         **loaded["config"]["arch"]["args"], state_dict=loaded["state_dict"]
     )
@@ -58,7 +64,7 @@ def load_model(model_type, checkpoint=None):
 class Detoxify:
     """Detoxify
     Easily predict if a comment or list of comments is toxic.
-    Can initialize 3 different model types from model type or checkpoint path:
+    Can initialize 5 different model types from model type or checkpoint path:
         - original:
             model trained on data from the Jigsaw Toxic Comment
             Classification Challenge
@@ -68,6 +74,10 @@ class Detoxify:
         - multilingual:
             model trained on data from the Jigsaw Multilingual
             Toxic Comment Classification Challenge
+        - original-small:
+            lightweight version of the original model
+        - unbiased-small:
+            lightweight version of the unbiased model
     Args:
         model_type(str): model type to be loaded, can be either original,
                          unbiased or multilingual

diff --git a/examples.png b/examples.png
diff --git a/model_eval/compute_bias_metric.py b/model_eval/compute_bias_metric.py
@@ -3,18 +3,7 @@
 import numpy as np
 from sklearn.metrics import roc_auc_score
 import argparse
-
-
-def compute_auc(y_true, y_pred):
-    try:
-        return roc_auc_score(y_true, y_pred)
-    except ValueError:
-        return np.nan
-
-
-def compute_subgroup_auc(df, subgroup, label, model_name):
-    subgroup_examples = df[df[subgroup]]
-    return compute_auc(subgroup_examples[label], subgroup_examples[model_name])
+from utils import compute_auc, compute_subgroup_auc
 
 
 def compute_bpsn_auc(df, subgroup, label, model_name):
@@ -34,7 +23,7 @@ def compute_bnsp_auc(df, subgroup, label, model_name):
 
 
 def compute_bias_metrics_for_model(
-    dataset, subgroups, model, label_col, include_asegs=False
+    dataset, subgroups, model, label_col
 ):
     """Computes per-subgroup metrics for all subgroups and one model."""
     records = []
@@ -89,7 +78,7 @@ def main():
     with open(TEST, "r") as f:
         results = json.load(f)
 
-    test_private_path = "../jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_private_expanded.csv"
+    test_private_path = "jigsaw_data/jigsaw-unintended-bias-in-toxicity-classification/test_private_expanded.csv"
     test_private = pd.read_csv(test_private_path)
     test_private = convert_dataframe_to_bool(test_private)