Pedestrian Intention Detection with Hybrid Feature Fusion

The project is based on Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion.

Introduction

The goal of pedestrian intention prediction is to determine, for each prediction timestep, whether a given pedestrian will be crossing or non-crossing the road based on a sequence of past observations.

Contribution Overview

Drawing inspiration from the Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion [1], which focuses on state transition prediction, we adapt their model and approach to our specific crossing/non-crossing task. The model including four modules: visual information (encode with CNN), position and relative velocity (bounding box), pedestrain behavior, scene description.

Pipeline of hybrid model

The model employs individual LSTM units for three of modalities and applies a hybrid fusion technique, combining linear projections and concatenations, to integrate the multi-modal embeddings and obtain the final prediction.

Experimental Setup

In order to maintain consistency and improve the system's performance, we made two important decisions. Firstly, we opted to eliminate scene descriptions from the model to mitigate the risk of overfitting on scene attributes, which can adversely affect evaluation results. This adjustment enables the model to concentrate on learning other pertinent information more efficiently. Secondly, we also chose to exclude behavior data from the system. This decision was driven by the challenge of obtaining such data in real-life scenarios, ensuring that our system remains applicable and practical in real-world applications.

Pipeline of our hybrid model

Prior to training the hybrid model, we separately trained the CNN encoder with Resnet18 Backbone (image module) and LSTM encoder (pedestrian motion). These models were then utilized as pretrained checkpoints during the training of the hybrid model.

Pipeline of training cnn encoder

Pipeline of training rnn encoder for pedestrian motion

Dataset

JAAD [2] has been selected as the dataset. JAAD focuses on investigating pedestrian road crossing behaviors using a dataset comprising 346 videos which encompassing a range of weather and lighting conditions. Each pedestrian in the dataset is annotated with bounding boxes, behavioral data, and demographic information.

Label

crossing/non-crossing: {0: 'not-crossing', 1: 'crossing'}

The label assigned to a given sequence of past observations corresponds to the cross/non-cross label of the given prediction timestep.

Input

visual context: (channel x image height x image width)

a sequence of RGB images cropped with corresponding pedestrian bounding box and backbround information.

bounding boxes and relative velocities: $(x_t, y_t, H_t, W_t, \Delta x_t, \Delta y_t, \Delta H_t, \Delta W_t)$

$P_t = (x_t, y_t, H_t, W_t)$ -> bounding box of corresponding pedestrian

$V_t = (\Delta x_t, \Delta y_t, \Delta H_t, \Delta W_t) = (x_t - x_{t-1}, y_t - y_{t-1}, H_t - H_{t-1}, W_t - W_{t-1})$ -> relative velocity of the pedestrian

Output

p_t: the probability of crossing for each of the prediction timesteps

Download dataset

Please follow the instructions to prepare the JAAD data.

Furthermore, the prepared data also could be find in "/work/scitas-share/datasets/Vita/civil-459"

#Replace the follwing path in src/dataset/loader.py -> def define_path() to your own path
all_anns_paths = {'JAAD': {'anns': '/work/scitas-share/datasets/Vita/civil-459/JAAD/data_cache/jaad_database.pkl',
                          'split': 'DATA/annotations/JAAD/splits/'},
                           }
all_image_dir = {'JAAD': '/work/scitas-share/datasets/Vita/civil-459/JAAD/images',}

Installation

Clone this repository in order to use it.

# To clone the repository using HTTPS
git clone https://github.com/vita-student-projects/Group9_PedestrianIntentionDetection.git
cd Group9_PedestrianIntentionDetection

All dependencies can be found in the requirements.txt file.

# To install dependencies
pip install -r requirements.txt
# To install torch with cuda
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

This project has been tested with Python 3.7.7, PyTorch 1.10.1, CUDA 11.1.

Train

Hyperparameter:

max-frames: observation frame length (if not specified, default fps=5, so max-frames=5 means 1 second)

pred: prediction frame length

Training hybrid model:

train_crnn.py --cnn-encoder-path checkpoints/upbeat-wood-247/CNN_Encoder.pt --rnn-decoder-path checkpoints/vague-darkness-248/LSTM_pos_vel.pt --pred 5 --max-frame 5 -lr 5e-6 -wd 1e-2 --early-stopping-patience 5

Training cnn encoder:

train_cnn.py --epochs 50 --early-stopping-patience 5 -wd 1e-3 --pred 5 -lr 1e-5

Training rnn encoder:

train_rnn.py --epochs 50 --early-stopping-patience 5 -lr 1e-4 -wd 1e-4 --pred 5 --max-frames 5

Inference

The models are assessed using the F1 score, and to facilitate further analysis, we additionally provide the confusion matrices.

Evaluate hybrid model:

python eval_hybrid.py -cp checkpoints/silvery-music-263/crnn_lr1e-05_wd0.01_JAAD_pred5_bs4_202305311752.pt --max-frames 5 --pred 5 --mode hybrid

Evaluate cnn model:

python eval_hybrid.py -cp checkpoints/upbeat-wood-247/CNN_Encoder_lr1e-05_wd0.001_JAAD_pred5_bs4_202305311510.pt --pred 5 --mode cnn_only

Evaluate rnn model:

python eval_hybrid.py -cp checkpoints/vague-darkness-248/rnn_only_lr0.0001_wd0.0001_JAAD_pred5_bs4_202305311554.pt --max-frames 5 --pred 5 --mode rnn_only

Results

	Test/f1
Hybrid model	0.8035
CNN encoder	0.7808
RNN encoder	0.812

The detailed plots(loss, f1, prediction distribution ....) could be found in the following link:

hybrid model

cnn encoder

rnn encoder

Visualization

Green: crossing Red: non-crossing Target: word Prediction: bounding box

Conclusion

Sum up, the work predicts pedestrian road crossing behaviors using the hybrid model, and also cnn encoder and rnn encoder.

Reference

[1] Dongxu Guo, Taylor Mordan, and Alexandre Alahi. Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion. 2022. arXiv: 2203.02489 [cs.CV] .

[2] Amir Rasouli, Iuliia Kotseruba, and John K. Tsotsos. “Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior”. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 2017, pp. 206–213. doi: 10.1109/ICCVW.2017.33.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
DATA		DATA
checkpoints		checkpoints
docs		docs
figure		figure
src		src
wandb_sweeps		wandb_sweeps
.gitignore		.gitignore
DLAV_Intention_Prediction_Supplementary_Report.pdf		DLAV_Intention_Prediction_Supplementary_Report.pdf
EDA.ipynb		EDA.ipynb
README.md		README.md
ablation.ipynb		ablation.ipynb
baseline.py		baseline.py
eval_hybrid.py		eval_hybrid.py
mumble.ipynb		mumble.ipynb
requirement.txt		requirement.txt
train_cnn.py		train_cnn.py
train_crnn.py		train_crnn.py
train_hybrid.py		train_hybrid.py
train_rnn.py		train_rnn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pedestrian Intention Detection with Hybrid Feature Fusion

Introduction

Contribution Overview

Experimental Setup

Dataset

Label

Input

Output

Download dataset

Installation

Train

Inference

Results

Visualization

Conclusion

Reference

About

Releases

Packages

Contributors 3

Languages

vita-student-projects/Group9_PedestrianIntentionDetection

Folders and files

Latest commit

History

Repository files navigation

Pedestrian Intention Detection with Hybrid Feature Fusion

Introduction

Contribution Overview

Experimental Setup

Dataset

Label

Input

Output

Download dataset

Installation

Train

Inference

Results

Visualization

Conclusion

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages