Using Deep Convolution Generative Adversarial Networks to generate snare drum sounds
After 30 Epochs (Around 20 minutes of training on Google Colab with GPU Acceleration) the model clearly shows similarities with an actual snare sound:
Coloured images show short-time fourier transform output
Clearly, the model has learned that a snare sound typically has a transient (The initial peak), however in this example has placed two transients. This could be due to examples in the dataset with two transients.
Additionally, the STFT analysis shows strong similarities. With the initial first peak and subsequent tail noise.
However, there is no smooth tail, it is fairly abrubt. Additionally, on listening to the output, it sounds 'squeeky' with a noticable wet sounding noise. However the first transient sounds really good and this could easily be processing within a DAW to sound much closer to an actual snare sound.
The losses were tracked during training:
As you can see, the loss of the discriminator is converging to zero, however not-so in a fast/reliable manner. This may imply that a different discriminator design is needed. It is unknown due to lack of resources whether this model would produce better results given more training, although there is a good chance that as the loss for the discriminator decreased, eventually it would start to increase as the generator improves.
Click on 'SnareGAN.ipynb' to view to notebook