diff --git a/dataset-documentation/DATASETDOC-fa24.md b/dataset-documentation/DATASETDOC-fa24.md new file mode 100644 index 0000000..27c4085 --- /dev/null +++ b/dataset-documentation/DATASETDOC-fa24.md @@ -0,0 +1,39 @@ +### What is the project name? + +MassMutual: Racist Deeds + +### What is this project about? What is the goal of this project? + +Making a data pipeline to classify scanned deeds as racist or not. + +### What data sets did you use in your project? Please provide the link. + +https://drive.google.com/drive/folders/1V9x-24SeIQlAyOeVQRXbRElQaw_ig6il + +### Please provide a description of the data set, how it was collected, and how it was cleaned and processed. + +It is a collection of housing deeds scanned into TIFF files. + +### Did you use or create any data dictionaries for the data set in this project? + +No + +### Did the client put restrictions on this data? + +No + +### What is the data being used for? Please briefly explain the goal of the project. + +To train a classification model to detect racist clauses in the data. + +### Is there missing data from the client or additional data that needs to be collected by another team? + +Yes, there is very little ground truth data. We need more labelled data, or to generate data synthetically. + +### Who is the client for the project? + +MassMutual and Longmeadow Historical Society + +### Are there any limitations to the data you used? Are there specific use cases where the data should not be used? + +There is very little ground truth data.