Skip to content

End-to-End Speech Denoising With Perceptual Feature Losses

License

Notifications You must be signed in to change notification settings

neeleshverma/Speech-Enhancement

Repository files navigation

End-to-End Speech Enhancement With Perceptual Feature Losses

An end-to-end deep neural network for speech denoising using perceptual feature differences as a loss function (using PyTorch framework). The detailed report is available here - Report

DATA DOWNLOADING FILES:

1.) preprocess_lossdata.sh - This script downloads all the (Acoustic Scene Dataset and Domestic Audio Tagging) data, resamples it to 16 kHz, and the entire data is saved inside dataset/asc and dataset/dat. This dataset is used for training of the loss network.

				USAGE : ./preprocess_lossdata.sh

2.) preprocess_denoisingdata.sh - It downloads the Voice Bank Corpus Dataset and resamples all the files to 16 kHz. It will create 4 new folders inside the dataset folder (clean training set, noisy training set, clean validation set, and noisy validation set) -

  1. trainset_clean

  2. trainset_noisy

  3. valset_clean

  4. valset_noisy

     			USAGE : ./preprocess_denoisingdata.sh
    

TRAINING FEATURE LOSS NETWORK

The feature loss network is trained on 2 datasets - Acoustic Scene Classification and Domestic Audio Tagging. The network architecture is shown below -

Feature Loss Network Files

  • train_featurelossnet.py - This trains the featureloss network (or decoder network) on both tasks and also calculates the validation scores for both of these.

      			USAGE : python train_featurelossnet.py -o models
    

    The model is saved inside the "models" folder with the name "loss_model.pth"

TRAINING SPEECH DENOISING NETWORK

The speech-denoising network is trained on the Voice Bank Corpus Dataset.
The network architecture is shown below -

Speech Denoising Network Files

  • train_denoisingnet.py - This trains the denoising network (or encoder network) on the Voice Bank Corpus training dataset and also calculates the validation scores on the validation dataset. It takes the loss network trained earlier as an argument.

      	USAGE : python train_denoisingnet.py -d dataset -l models/loss_model.pth -s models
    

    The model is saved inside the "models" folder with the name "denoising_model.pth". Specify the loss model path in the -l option.

  • test_denosingnet.py - This tests the denoising network on any noisy audio. It takes as input, the input data folder that should contain all the audios that we wish to denoise.

      	USAGE : python test_denoisingnet.py -d data_folder -m denoising_model_path
    

    data_folder - folder containing all the noisy audio
    denoising_model_path - path for our denoised network model (encoder model).

    The denoised audio will be saved in the same location as the input data folder. ($(data_folder)_denoised folder will get created).

models.py - Contains the architecture of both the encoder and the decoder.

Examples

Noisy Audio 1

p257_431_noisy.mp4

Clean Audio 1

p257_431.mp4

Noisy Audio 2

p257_432_noisy.mp4

Clean Audio 2

p257_432.mp4

Contact Info : [email protected]

About

End-to-End Speech Denoising With Perceptual Feature Losses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published