This repo contains code for scraping hiring data for the hiring profiles in the paper "Investigations of Performance and Bias in Human-AI Teamwork in Hiring"
- The directory
hybridhiring
can be downloaded from here and should be placed in theinput_data
directory. - The conda environment I used for the project can be recreated using
environment.yml
. - Currently, searches for each URL in each common crawl mentioned by De-Arteaga et al. manually, which is very slow. Searching the indexes is very slow, so I think we'll likely want to search the CC indexes using something like this