Skip to content

Commit

Permalink
Data documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jacob-stein1 committed Dec 10, 2024
1 parent 3d0ca2b commit e499527
Showing 1 changed file with 39 additions and 0 deletions.
39 changes: 39 additions & 0 deletions dataset-documentation/DATASETDOC-fa24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
### What is the project name?

MassMutual: Racist Deeds

### What is this project about? What is the goal of this project?

Making a data pipeline to classify scanned deeds as racist or not.

### What data sets did you use in your project? Please provide the link.

https://drive.google.com/drive/folders/1V9x-24SeIQlAyOeVQRXbRElQaw_ig6il

### Please provide a description of the data set, how it was collected, and how it was cleaned and processed.

It is a collection of housing deeds scanned into TIFF files.

### Did you use or create any data dictionaries for the data set in this project?

No

### Did the client put restrictions on this data?

No

### What is the data being used for? Please briefly explain the goal of the project.

To train a classification model to detect racist clauses in the data.

### Is there missing data from the client or additional data that needs to be collected by another team?

Yes, there is very little ground truth data. We need more labelled data, or to generate data synthetically.

### Who is the client for the project?

MassMutual and Longmeadow Historical Society

### Are there any limitations to the data you used? Are there specific use cases where the data should not be used?

There is very little ground truth data.

0 comments on commit e499527

Please sign in to comment.