Skip to content

Latest commit

 

History

History
76 lines (44 loc) · 4.62 KB

File metadata and controls

76 lines (44 loc) · 4.62 KB

Finger Count Detection Using Deep Learning

Some headers are explanied briefly below:

Core Functions

Some of the file explorer management methods of python modules such as os and shutil have been modified according to the needs.

Download Data

Video data is downloaded from google drive of the author either by mounting drive or using public drive link. This Google Drive Link is where the video data is stored.

image

More Data

A fingers dataset published in Kaggle may be used later by concatinating with the custom dataset.

image

Extract Frames From Videos

Many videos of different people showing their hands in front of a camera doing signs of numbers from 1 to 5 is recorded. Now they are going to be split into frames and saved as images with following format:

"{finger-no}_{frame-no}.jpg"

Data Analysis

Extracted frames are analysed in this section. Their dimensions and ratios are found and processed, datasets are merged if there are more than one.

image

Dataset Preparation

In this section, noisy raw data is being processed by converting into grayscale, applying Gaussian Blur (for background noise removal) and used Adaptive Threshold to detect the hand gesture and its contours. They converted into numpy arrays and finally split into train and test sets with 20% test ratio before feeding into the CNN model by using train_test_split method of Sklearn.

Labels are also extracted from the corresponding file name and saved as a numpy array using prepare_dataset(folder) method.

For the sake of regularization of dataset, all images containing integers with uint8 data type (0-255) normalized into 0-1 scale by dividing by 255.0.

Since the labels are categorical, (either 1, 2, 3, 4 or 5) the One Hot Encoding technique is used to convert the labels into [1, 0, 0, 0, 0] format using to_categorical method of Keras.

Data Augmentation

Since we have limited amount of data relative to the need of a Convolutional Neural Network that can predict the signing accurately, we used "Data Augmentation" technique to achieve diversity in terms of color, shape, rotation, zoom etc. by using ImageDataGenerator of Keras

Model

In the first cell after the library imports, necessary configuration is done such as defining batch size, number of epochs that we are going to train the model, learning rate and number of classes that the CNN model is expected to predict.

Then, we define the create_model() function with the CNN architecture. This will return the model itself when we call it. After checking its architecture and parameters in every layer by using model.summary() method, we need to compile the model with an optimizer and loss function.

Right before the training starts, some helpful Callback Tools of Keras are going to assist the training process.

  • ModelCheckpoint: Checkpoints are going to be saved after every epoch during the training to be able to continue where the model left trraining if something went wrong.
  • ReduceLROnPlateau: When the loss could not be reduced on the last epoch, the learning rate will be reduced at some factor to prevent underfitting.
  • EarlyStopping: When the validation loss could not be reduced on the last epoch, the training process is going to be stopped to prevent wasting resource.

Evaluation

After the training completed, metrics of the model will be monitored and evaluated using Matplotlib plot methods and tested with new data.

Results are expected to be like the following evaluation:

fingers