This project aims to offer a fast ⚡ and reliable data augmentation generator of Raman spectra
Param | Type | Description |
---|---|---|
df | pandas.DataFrame | A pandas dataframe with shift's values as columns + a column called "labels" for the categories |
batch_size | int | batch size of samples |
max_classes | int | categories in the labels |
The standard paramenter were validated on a Raman task, however if you need a greater customization you can still tweak them!
The augmentation process works as follow.
For each
-
roll (shift horizontally, i used the roll term because it's easy to misunderstand the horizontal shift with the Raman's shift)
$sample_j$ of some roll_factor (Raman's shift values). -
a weighted sum with respect of some
$a$ probability variable
This augmentation step is based on the assumption that two samples of the same class are semantically equal (natural class variability) + some sensor noise.
- on
$sample_k$ apply a slope of some slope factor, which is baseline linear error that emulates the fluorescence issue of some sensors. - on
$sample_k$ apply addittive white gaussian noise to the signal
Param | Type | Description |
---|---|---|
roll | bool | Enable/disable the roll step during the augmentation |
roll_factor | int | The signal is rolled(horizontal shifted) of this amount of shifts. It rolls along the dataframe columns. If a signal has a precision of 10 Raman's shifts, wich means that the columns increase 10 shifts at time, using a roll factor of 5, it actually shifts 10*5 = 50 shifts |
slope | bool | Enable/disable the slope step during the augmentation |
slope_factor | float | It's the slope angle of the baseline linear error |
noise | bool | Enable/disable the noise step during the augmentation |
noise_range | tuple | The noise factor is sampled in this range. e.g. (min, max) |
The python libraries needed are:
random
dataclasses
pandas
numpy
tensorflow
The code is documented for more insightful informations 😉 !
Contributors are welcome 👍