Competition Page: DiDi-ETA
Final Official Ranking
Ranking | Award (Cash Prize) | Name | MAPE |
---|---|---|---|
1 | Champion ($10,000) | 单模CBT | 0.11974 |
2 | Runner-ups ($5,000 each team) |
Pims | 0.12099 |
3 | 华南工农联盟 | 0.12116 | |
4 | Second Runner-ups ($2,500 each team) |
机器算命 | 0.12177 |
5 | pumbaa | 0.12198 | |
6 | Recognition Award ($1,000 each team) |
MobiLab | 0.12478 |
7 | 悦智AI实验室 | 0.12511 |
Team Name: Pims
Team Members: Yunchong Gan, Mingjie Wang, Haoyu Zhang
Download the dataset from here and change data_dir
in dataset.py
.
python dataset.py
It will preprocess the original .txt
files, convert them into .json
files and .pickle
files to accelerate the data loading.
Then it will split the whole train dataset into 5Fold and 10Fold.
python train.py
python test.py
Use the simple average result to generate the final submission.
The final leaderboard result is the average of 5fold and 10fold (15 model in total).
python merge_submission.py
The whole model based on WDR, Didi ETA paper in KDD2018.
Wide \
\
Deep --- concat - MLP - Prediction
/
RNN -/
|
|----Predict Current Link Status
Wide
Name | Type | Number of Embedding | Embedding Dim | Description |
---|---|---|---|---|
Simple ETA | Numeric | 1 | ||
Distance | Numeric | 1 | ||
Link Number | Numeric | 1 | ||
Cross Number | Numeric | 1 | ||
Approximate Speed | Numeric | 1 | ||
Weekday | Categorical | 7 | 1 | |
Slice ID | Categorical | 48 | 1 | |
Distance(Categorical) | Categorical | 5 | 1 |
Deep
Name | Type | Number of Embedding | Embedding Dim | Description |
---|---|---|---|---|
Simple ETA | Numeric | 1 | ||
Distance | Numeric | 1 | ||
Link Number | Numeric | 1 | ||
Cross Number | Numeric | 1 | ||
Approximate Speed | Numeric | 1 | ||
Weekday | Categorical | 7 | 20 | |
Slice ID | Categorical | 48 | 20 | |
Driver ID | Categorical | depend on dataset | 64 | |
Distance(Categorical) | Categorical | 5 | 20 | Split in 3/7/12/20km |
RNN - Link
Name | Type | Number of Embedding | Embedding Dim | Description |
---|---|---|---|---|
Link Time | Numeric | 1 | ||
Link Ratio | Numeric | 1 | ||
Link Status(Onehot) | Numeric | 5 | ||
Weekday | Categorical | 7 | 20 | |
Slice ID | Categorical | 288 | 20 | compute with slice id and link/cross time |
Link ID | Categorical | depend on dataset | 20 |
RNN - Cross
Name | Type | Number of Embedding | Embedding Dim | Description |
---|---|---|---|---|
Cross Time | Numeric | 1 | ||
Start Link ID | Categorical | depend on dataset | 20 | |
End Link ID | Categorical | depend on dataset | 20 |
- Auxiliary Loss for Link Status Classification
- Concat result from different branches
- Random Split KFold
- Model Ensemble