A deep diving into a kaggle competetion "TalkingData-AdTracking-Fraud-Detection-Challenge" (https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data) Include 2 main parts:
- Data Exploration
- Classifiers Comparison
cd main/feature_insights/visualization/
python data_insights.py ../../main/data/train_sample.csv
# output png files are in the output folder
python main/feature_insights/importance/feautre_importances.py main/data/train_sample.csv
xgboost feature importance:
('app', 0.33282444)
('channel', 0.20305343)
('click_time', 0.079389311)
('device', 0.12977099)
('ip', 0.19083969)
('os', 0.06412214)
ExtraTrees feature importance:
('app', 0.21306116235738021)
('channel', 0.15531389700691334)
('click_time', 0.18661141480186574)
('device', 0.099059296834343447)
('ip', 0.20542837318401089)
('os', 0.14052585581548629)
RandomForest feature importance:
('app', 0.18273912869613443)
('channel', 0.15020469480418228)
('click_time', 0.12178531327828397)
('device', 0.15908994155421691)
('ip', 0.24489949043695178)
('os', 0.1412814312302306)