Skip to content

The project contains code and data leveraged in the research paper "Evaluating the Effectiveness of the Double-Hard Debias Technique Against Racial Bias in Word Embeddings"

Notifications You must be signed in to change notification settings

YolandaMDavis/DoubleHardMulticlass

Repository files navigation

Double-Hard Debias Study

This repository contains the code that was used in support of the paper "Evaluating the Effectiveness of the Double-Hard Debias Technique Against Racial Bias in Word Embeddings".

It includes three notebooks for reproducing the study:

  1. Double-Hard Debias generation and embedding bias analysis - double-hard-debias.ipynb. This notebook executes both hard an double-hard debias methods and runs a variety of analysis for review and compare. This notebook also generates required files needed for RSA for review and comparison (see 2).
  2. Representation Similarity analysis - rsa.ipynb. This notebook requires that double-hard-debias.ipynb be executed prior to running the notebook for the first time to generate required w2v files.
  3. Utility Evaluation - semantic_eval.ipynb. This notebook executes downstream evaluation methods Concept Categorization and Analogy Analysis.

Requirements

  1. This project was created and tested with python 3.9. The following libraries are required (and referenced in requirements.txt):
gensim==4.1.2
matplotlib==3.4.3
numpy==1.21.3
pandas==1.3.4
scikit_learn==1.0.1
scipy==1.7.1
six==1.16.0
statsmodels==0.13.1
openpyxl==3.0.9
seaborn==0.11.2

Jupyter server is also required for execution of notebooks

  1. This project also requires that two files be downloaded, saved per instructions and placed within the data folder before executing any notebook:

    1. Pre-trained Word2Vec pt. 0 (w2v_0) embeddings from T. Manzini et al study. File should be saved to data folder as data_vocab_race_pre_trained.w2v. download file
    2. Pre-processed Hard Debiased embeddings from T. Manzini et al study. File should be saved to data folder as data_vocab_race_hard_debias.w2v. download file
  2. Proceed with running research notebooks.

Reference Research & Code Repositories

This code is an adaptation of published code from the following research papers:

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings (project code)

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation (project code)

Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis (project code)

We appreciate the efforts of each of these projects that helped to support our research.

About

The project contains code and data leveraged in the research paper "Evaluating the Effectiveness of the Double-Hard Debias Technique Against Racial Bias in Word Embeddings"

Topics

Resources

Stars

Watchers

Forks