This repository contains the technical test solution for the Data Scientist position at Shimoku. It includes the following file structure:
-
A Jupyter notebook named EDA.ipynb that contains exploratory data analysis and the step-by-step construction of the dataset for classification model training.
-
Output files variables.csv with clean data and variables_numeric.csv that contains the same data in numeric format for direct use in a classification model.
-
The input files, leads.csv and offers.csv, which contain raw data.
-
A pandas extension module with functions for analyzing missing data. The module was taken from the resources of the 'Handling Missing Data: Detection and Exploration Course' on Platzi.
-
The PDF DS Technical Test Q4-2023.pdf containing the test instructions.