Skip to content
This repository has been archived by the owner on Oct 4, 2023. It is now read-only.

Latest commit

 

History

History
11 lines (10 loc) · 811 Bytes

README.md

File metadata and controls

11 lines (10 loc) · 811 Bytes

Hiring Data Scraping

This repo contains code for scraping hiring data for the hiring profiles in the paper "Investigations of Performance and Bias in Human-AI Teamwork in Hiring"

Usage

  • The directory hybridhiring can be downloaded from here and should be placed in the input_data directory.
  • The conda environment I used for the project can be recreated using environment.yml.
  • Currently, searches for each URL in each common crawl mentioned by De-Arteaga et al. manually, which is very slow. Searching the indexes is very slow, so I think we'll likely want to search the CC indexes using something like this