Skip to content

Generator useful to handle Raman spectra data augmentation for deep learning models

License

Notifications You must be signed in to change notification settings

Lily-learn/RamanDataGenerator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Raman Data Generator

Generic badge License: MIT

This project aims to offer a fast ⚡ and reliable data augmentation generator of Raman spectra

Arguments

Basic

Param Type Description
df pandas.DataFrame A pandas dataframe with shift's values as columns + a column called "labels" for the categories
batch_size int batch size of samples
max_classes int categories in the labels

Advanced

The standard paramenter were validated on a Raman task, however if you need a greater customization you can still tweak them!

The augmentation process works as follow. For each $sample_i$ of the current batch, takes another sample of the same class $sample_j$ (randomly) and performes:

  1. roll (shift horizontally, i used the roll term because it's easy to misunderstand the horizontal shift with the Raman's shift) $sample_j$ of some roll_factor (Raman's shift values).

  2. a weighted sum with respect of some $a$ probability variable

$$ sample_k = a·sample_i + (1-a)·sample_j $$

This augmentation step is based on the assumption that two samples of the same class are semantically equal (natural class variability) + some sensor noise.

  1. on $sample_k$ apply a slope of some slope factor, which is baseline linear error that emulates the fluorescence issue of some sensors.
  2. on $sample_k$ apply addittive white gaussian noise to the signal
Param Type Description
roll bool Enable/disable the roll step during the augmentation
roll_factor int The signal is rolled(horizontal shifted) of this amount of shifts. It rolls along the dataframe columns. If a signal has a precision of 10 Raman's shifts, wich means that the columns increase 10 shifts at time, using a roll factor of 5, it actually shifts 10*5 = 50 shifts
slope bool Enable/disable the slope step during the augmentation
slope_factor float It's the slope angle of the baseline linear error
noise bool Enable/disable the noise step during the augmentation
noise_range tuple The noise factor is sampled in this range. e.g. (min, max)

Requirements

The python libraries needed are:

random
dataclasses
pandas
numpy
tensorflow

The code is documented for more insightful informations 😉 !

Contributors are welcome 👍

About

Generator useful to handle Raman spectra data augmentation for deep learning models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%