Street_View_House_Numbers

This project aims to detect and recognize digit sequence in the SVHN dataset. The SVHN dataset consist of real-world images of house numbers from Google Street View, the project is organized into 2 parts:

A rough implementation of the research paper Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Networks to recognize the digit sequence.
Using SmallerVGGNet to detect the digit sequences in the images.

Recognizing multi-digit sequence

As mentioned, the network used to recognize the digit sequence is an implementation of the research paper Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Networks.

Most digit sequences are in sequence of 2 or 3, with only 1 example from the training set having 6 digits in the sequence. The network is designed to recognize up to 5 digits. The network outputs 6 values, the first 5 corresponds to each of the digit in the sequence (output 10 means empty digit), the last value is the number of digits in the sequence.

For the preprocessing in this experiment, using the given ground-truth bounding boxes, the digit sequence is cropped from the image and resized to 64x64. Random 54x54 patch is extracted from the 64x64 image to feed into the model for training.

Here are some examples of the predictions on the testing set:

Detecting multi-digit sequence

The SmallerVGGNet is use to regress the 4 points of the bounding box containing the digit sequence. The network outputs 4 values: the top left x, y coordinates and the width and height.

The images are first resized to 96 pixels at the shorter side and resizing the other side maintaining the aspect ratio. Then a 96x96 patch is cropped from the middle of the resized image. So the original given ground truth labels for the bounding boxes can't be used anymore, thus it is computed as shown in the notebook.

Here are some examples of the predictions on the testing set (RED = Ground Truth, GREEN = Predicted):

End to end

After training both models, the models can be used to detect and recognize digits from the street view images. The whole pipeline works as follow:

The input image is first scaled down to 96 pixels on the shorter side, a 96x96 patch is cropped from the middle.
The detection model accepts the cropped patch as input and outputs the bounding box.
A smaller patch is cropped from the 96x96 image according to the predicted bounding box.
The smaller patch is resized to 54x54 and fed into the recognizing model.

Below are some cherry-picked results:

Files

01_svhn_labels.ipynb:         extracts the metadata and labels from the given .mat file and save to pickle format

02_classify_preprocess.ipynb: preprocess the images to input into recognizing model and write into HDF5 file

03_detect_preprocess.ipynb:   preprocess the images to input into detection model and write into HDF5 file

04_classify_training.ipynb:   training the recognizing model

05_detect_training.ipynb:     training the detection model

06_classify_testing.ipynb:    testing the recognizing model

07_detect_testing.ipynb:      testing the detection model

08_end_to_end.ipynb:          testing both models working together

model.py:                     implementation of the model described in Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Networks

Environment

Python 3.6
Keras 2.1.6
Tensorflow 1.8.0 (GPU)
OpenCV 3.4.1

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bbox		bbox
hdf5		hdf5
img		img
logs		logs
pkl		pkl
test		test
train		train
utils		utils
.gitattributes		.gitattributes
01_svhn_labels.ipynb		01_svhn_labels.ipynb
02_classify_preprocess.ipynb		02_classify_preprocess.ipynb
03_detect_preprocess.ipynb		03_detect_preprocess.ipynb
04_classify_training.ipynb		04_classify_training.ipynb
05_classify_testing.ipynb		05_classify_testing.ipynb
06_detect_training.ipynb		06_detect_training.ipynb
07_detect_testing.ipynb		07_detect_testing.ipynb
08_end_to_end.ipynb		08_end_to_end.ipynb
LICENSE		LICENSE
README.md		README.md
model.py		model.py
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Street_View_House_Numbers

Recognizing multi-digit sequence

Detecting multi-digit sequence

End to end

Files

Environment

About

Releases

Packages

Languages

License

daQuincy/Street_View_House_Numbers

Folders and files

Latest commit

History

Repository files navigation

Street_View_House_Numbers

Recognizing multi-digit sequence

Detecting multi-digit sequence

End to end

Files

Environment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages