End-to-End Speech Enhancement With Perceptual Feature Losses

An end-to-end deep neural network for speech denoising using perceptual feature differences as a loss function (using PyTorch framework). The detailed report is available here - Report

DATA DOWNLOADING FILES:

1.) preprocess_lossdata.sh - This script downloads all the (Acoustic Scene Dataset and Domestic Audio Tagging) data, resamples it to 16 kHz, and the entire data is saved inside dataset/asc and dataset/dat. This dataset is used for training of the loss network.

				USAGE : ./preprocess_lossdata.sh

2.) preprocess_denoisingdata.sh - It downloads the Voice Bank Corpus Dataset and resamples all the files to 16 kHz. It will create 4 new folders inside the dataset folder (clean training set, noisy training set, clean validation set, and noisy validation set) -

trainset_clean
trainset_noisy
valset_clean

valset_noisy

 			USAGE : ./preprocess_denoisingdata.sh

TRAINING FEATURE LOSS NETWORK

The feature loss network is trained on 2 datasets - Acoustic Scene Classification and Domestic Audio Tagging. The network architecture is shown below -

Feature Loss Network Files

train_featurelossnet.py - This trains the featureloss network (or decoder network) on both tasks and also calculates the validation scores for both of these.
```
  			USAGE : python train_featurelossnet.py -o models
```
The model is saved inside the "models" folder with the name "loss_model.pth"

TRAINING SPEECH DENOISING NETWORK

The speech-denoising network is trained on the Voice Bank Corpus Dataset.
The network architecture is shown below -

Speech Denoising Network Files

train_denoisingnet.py - This trains the denoising network (or encoder network) on the Voice Bank Corpus training dataset and also calculates the validation scores on the validation dataset. It takes the loss network trained earlier as an argument.
```
  	USAGE : python train_denoisingnet.py -d dataset -l models/loss_model.pth -s models
```
The model is saved inside the "models" folder with the name "denoising_model.pth". Specify the loss model path in the -l option.
test_denosingnet.py - This tests the denoising network on any noisy audio. It takes as input, the input data folder that should contain all the audios that we wish to denoise.
```
  	USAGE : python test_denoisingnet.py -d data_folder -m denoising_model_path
```
data_folder - folder containing all the noisy audio
denoising_model_path - path for our denoised network model (encoder model).

The denoised audio will be saved in the same location as the input data folder. ($(data_folder)_denoised folder will get created).

models.py - Contains the architecture of both the encoder and the decoder.

Examples

Noisy Audio 1

p257_431_noisy.mp4

Clean Audio 1

p257_431.mp4

Noisy Audio 2

p257_432_noisy.mp4

Clean Audio 2

p257_432.mp4

Contact Info : [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
models		models
report_and_figs		report_and_figs
LICENSE		LICENSE
README.md		README.md
helper.py		helper.py
load_data.py		load_data.py
models.py		models.py
preprocess_denoisingdata.sh		preprocess_denoisingdata.sh
preprocess_lossdata.sh		preprocess_lossdata.sh
test_denoisingnet.py		test_denoisingnet.py
train_denoisingnet.py		train_denoisingnet.py
train_featurelossnet.py		train_featurelossnet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Speech Enhancement With Perceptual Feature Losses

DATA DOWNLOADING FILES: