Skip to content

whmou/TalkingData-AdTracking-Fraud-Detection-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

TalkingData-AdTracking-Fraud-Detection-Challenge

A deep diving into a kaggle competetion "TalkingData-AdTracking-Fraud-Detection-Challenge" (https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data) Include 2 main parts:

  • Data Exploration
  • Classifiers Comparison

Data Exploration

Usage

cd main/feature_insights/visualization/
python data_insights.py ../../main/data/train_sample.csv 
# output png files are in the output folder
  • Feature Heat Map:
    Heat Map

  • Feature Pair Map:
    Pair Map

  • Feature Histogram:
    histogram histogram histogram histogram histogram

  • Feature Boxplots:
    boxplot boxplot boxplot boxplot boxplot boxplot

Feature importance

python main/feature_insights/importance/feautre_importances.py main/data/train_sample.csv

xgboost feature importance:
('app', 0.33282444)
('channel', 0.20305343)
('click_time', 0.079389311)
('device', 0.12977099)
('ip', 0.19083969)
('os', 0.06412214)

ExtraTrees feature importance:
('app', 0.21306116235738021)
('channel', 0.15531389700691334)
('click_time', 0.18661141480186574)
('device', 0.099059296834343447)
('ip', 0.20542837318401089)
('os', 0.14052585581548629)

RandomForest feature importance:
('app', 0.18273912869613443)
('channel', 0.15020469480418228)
('click_time', 0.12178531327828397)
('device', 0.15908994155421691)
('ip', 0.24489949043695178)
('os', 0.1412814312302306)