Double-Hard Debias Study

This repository contains the code that was used in support of the paper "Evaluating the Effectiveness of the Double-Hard Debias Technique Against Racial Bias in Word Embeddings".

It includes three notebooks for reproducing the study:

Double-Hard Debias generation and embedding bias analysis - double-hard-debias.ipynb. This notebook executes both hard an double-hard debias methods and runs a variety of analysis for review and compare. This notebook also generates required files needed for RSA for review and comparison (see 2).
Representation Similarity analysis - rsa.ipynb. This notebook requires that double-hard-debias.ipynb be executed prior to running the notebook for the first time to generate required w2v files.
Utility Evaluation - semantic_eval.ipynb. This notebook executes downstream evaluation methods Concept Categorization and Analogy Analysis.

Requirements

This project was created and tested with python 3.9. The following libraries are required (and referenced in requirements.txt):

gensim==4.1.2
matplotlib==3.4.3
numpy==1.21.3
pandas==1.3.4
scikit_learn==1.0.1
scipy==1.7.1
six==1.16.0
statsmodels==0.13.1
openpyxl==3.0.9
seaborn==0.11.2

Jupyter server is also required for execution of notebooks

This project also requires that two files be downloaded, saved per instructions and placed within the data folder before executing any notebook:
1. Pre-trained Word2Vec pt. 0 (w2v_0) embeddings from T. Manzini et al study. File should be saved to data folder as data_vocab_race_pre_trained.w2v. download file
2. Pre-processed Hard Debiased embeddings from T. Manzini et al study. File should be saved to data folder as data_vocab_race_hard_debias.w2v. download file
Proceed with running research notebooks.

Reference Research & Code Repositories

This code is an adaptation of published code from the following research papers:

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings (project code)

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation (project code)

Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis (project code)

We appreciate the efforts of each of these projects that helped to support our research.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
common		common
data		data
.gitignore		.gitignore
README.md		README.md
double-hard-debias.ipynb		double-hard-debias.ipynb
requirements.txt		requirements.txt
rsa.ipynb		rsa.ipynb
semantic_eval.ipynb		semantic_eval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Double-Hard Debias Study

Requirements

Reference Research & Code Repositories

About

Contributors 2

Languages

YolandaMDavis/DoubleHardMulticlass

Folders and files

Latest commit

History

Repository files navigation

Double-Hard Debias Study

Requirements

Reference Research & Code Repositories

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages