Speech Emotion Recognition (SER)

Speech emotion recognition (SER) is the ability to identify the type of emotion from human speech, intending to create a natural interaction between humans and computers or commonly known as human-computer interaction (HCI). To increase human satisfaction while using digital devices such as computers and smartphones, the development of HCI is very much needed. Therefore, in this project, I will design a Speech Emotion Recognition (SER) system using Linear Predictive Coding (LPC) with K-Nearest Neighbor (KNN) classification.

Prepare Dataset

The dataset used in this project is a primary (self-collected) dataset. The collection process was carried out in one of the Telkom University laboratories. The dataset was collected with the help of 20 actors (10 men and 10 women). Of all the datasets collected, the data that has the best quality will be taken and selected.
The procedure for collecting datasets:
1. Dataset collection was carried out in the Telkom University laboratory room.
2. Determine the sentence that will be used in the dataset retrieval process. In this case the sentence used is "Dia pacar saya".
3. Each actor will say the sentence in 4 types of emotions (angry, happy, sad, disappointed), where each emotion will be repeated 10 times.
4. Each sentence spoken by the actor will be recorded and saved in .wav format.

Data Augmentation

Augmented data has a significant role in improving the performance of the DNN model and reducing overfitting. Data augmentation is a way to increase the amount of data and is very useful for processing small data. In augmentation of speech recognition data on DNN can be used to improve accuracy.

Linear Predictive Coding (LPC)

Linear predictive coding is one method that is often used for feature extraction in speech recognition. The most crucial aspect of LPC is the linear predictive filter which allows the value of the following sample to be determined by a linear combination of the previous models. The linear predictive method reduces the mean square error between the input signal and the signal to be estimated to obtain a filter coefficient equivalent to that of the vocal channel.

K-Nearest Neighbor (KNN)

The KNN classification belongs to the machine learning supervised learning group. The intuition underlying the KNN classification algorithm is straightforward, namely by classifying based on the class of its nearest neighbor (nearest neighbor). The basic principle of the nearest neighbor method is to measure the distance between the unknown signal pattern and the reference signal pattern in the database. The Euclidean distance equation is used to calculate the distance, which is one of the implementations in the nearest neighbor algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dataset		Dataset
Data_path.csv		Data_path.csv
LICENSE		LICENSE
README.md		README.md
SER.ipynb		SER.ipynb
features16.csv		features16.csv
features3.csv		features3.csv
features8.csv		features8.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition (SER)

Prepare Dataset

Data Augmentation

Linear Predictive Coding (LPC)

K-Nearest Neighbor (KNN)

About

Releases

Packages

Languages

License

darwinOne/Speech-Emotion-Recognition-SER-

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition (SER)

Prepare Dataset

Data Augmentation

Linear Predictive Coding (LPC)

K-Nearest Neighbor (KNN)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages