implementing a rolling mean function #214
Replies: 9 comments
-
Hey @ahmadtourei, Having rolling window functions in DASCore is a great idea! There are a few problems with using pandas I think an approach based on bottleneck's moving functions is probably more optimal. If I recall correctly, pandas and xarray are using bottleneck under the hood anyway. Bottleneck handles numpy arrays directly and allow arbitrary dimensions. I don't know yet what the interface should be though. Two possibilities come to mind: import dascore as dc
from dascore.units import s
patch = dc.get_example_patch()
# a rolling intermediate object
rolling_mean = patch.rolling(time=10*s).mean()
# or just separate methods
rolling_mean = patch.rolling_mean(time=10*s) Which do you like better? Is there another we should consider? Another reference is xarray's rolling method. |
Beta Was this translation helpful? Give feedback.
-
@d-chambers I agree that it'd be nice to use a function that directly works with numpy arrays. bottleneck is a great one. Another possibility is using the convolve function: import dascore as dc
import numpy as np
def rolling_mean(arr, window_size):
# Define the kernel for the running mean average
kernel = np.ones(window_size) / window_size
# Apply the running mean average using numpy.convolve()
result = np.convolve(arr, kernel, mode='valid')
return result
patch = dc.get_example_patch()
d_t = 10.0
sampling_rate = 1000
window_size = d_t*sampling_rate
channel = 3000
result_rm = rolling_mean(patch.data[:, channel], window_size) However, this is not optimum for multichannel datasets like DAS because the numpy I like the |
Beta Was this translation helpful? Give feedback.
-
How about implementing this? def calculate_mean_over_samples(data, window_size = 1000, step_size=1000, axis=0):
total_samples = data.shape[0]
mean_values = np.empty((int(total_samples/step_size),data.shape[1]))
for j, k in enumerate(range(0, total_samples, step_size)):
mean_values[j,:] = np.mean(data[k:k+window_size, :], axis=axis)
return mean_values
for i, patch in enumerate (sub_sp):
window_size= int(dt*sampling_rate)
rolling_mean = calculate_mean_over_samples(patch.data, window_size=window_size, step_size=window_size)
new_attrs = dict(patch.attrs)
samples = np.array(patch.coords["time"])[::step_size]
new_coords = {x: patch.coords[x] for x in patch.dims}
new_coords["time"] = samples
new_attrs[f"d_time"] = dt
new_attrs[f"min_time"] = np.min(samples)
new_attrs[f"max_time"] = np.max(samples)
new_patch = patch.new(data=rolling_mean_scaled, attrs=new_attrs, dims=patch.dims, coords=new_coords)
# save the result in the output folder
new_patch.io.write(file_name, "dasdae") If you agree, please provide some instructions so I can start implementing this in the master branch. Is interpolate function a good example (but calling my |
Beta Was this translation helpful? Give feedback.
-
Here I provide some timing results (in sec.) comparing the
Please note that all timing results are for processing of 1 hour (3,600,000 samples) and 1000 channels. So, since the bottleneck I can provide some timing results comparing the bottleneck |
Beta Was this translation helpful? Give feedback.
-
Looks promising. Yes, I definitely agree we rolling functions. Unfortunately, we can't really df.rolling because we want to support more than just 2D patches but it is a good reference.
That would be great. I am curious about manually striding the data after move_mean to get the right step. For example: import bottleneck
def move_mean(array, window, step=1, axis=-1):
"""moving mean function with step using bottleneck."""
out = bottleneck.move_mean(array, window, axis=axis)[::step]
return out It might seem a bit wasteful but it might be faster (maybe?). Ok, so let's come up with a plan forward. Here is what I am thinking: I am leaning towards copying a subset of I also love what pandas has done by allowing (optional) numba acceleration. That way users can efficiently apply their own jit'ed rolling window functions. To start, let's create a new branch called "rolling". In it, we can create a new module in dascore/proc called rolling.py. It will also have a test file (tests/test_proc/test_rolling.py). The first thing to make is the rolling class. Something like this: from dascore.utils.patch import get_dim_value_from_kwargs
class PatchRoller:
"""A class to apply roller operations to patches"""
def __init__(self, patch, *, step=None, center=False, **kwargs):
....
def mean(self):
# your function here
def rolling(patch, *, step=None, center=False):
"""
nice docs
"""
dim, axis, value = get_dim_value_from_kwargs(patch, kwargs)
# probably other things I am missing.
return PatchRoller(patch, step=step, center=center) We can also work on other useful rolling functions such as median, max, min, etc. Some of the more complex bits will be supporting units for kwargs (which specify the dimension name along which to apply the rolling operation) and step, as well as implementing Are you going to be back on campus next week? If we can set aside a few hours to work on this (in person) together it might be more effective, but if you want to get a start on it feel free! |
Beta Was this translation helpful? Give feedback.
-
Thanks for your thoughts!
The problem would be it calculates the mean value of a window over all samples (not fast) and then resample based on the step.
I agree! Yes, I will be back. My calendar is updated, so feel free to schedule a meeting whenever works for you. |
Beta Was this translation helpful? Give feedback.
-
That's true, but bottleneck is not performing the same numpy operations as |
Beta Was this translation helpful? Give feedback.
-
I'm providing some timing results (in sec.) comparing the bottleneck with calculate_mean_over_samples function and pandas df.rolling function:
Looks not quite efficient, even for small windows and steps. |
Beta Was this translation helpful? Give feedback.
-
closed by #238 |
Beta Was this translation helpful? Give feedback.
-
Based on my preliminary investigations, Pandas' rolling function provides decent results compared to true low-passing. Therefore, I'm thinking of implementing a rolling mean function in DASCore. Basically, it would be calling a Pandas rolling function to process a patch of data:
@d-chambers Would you mind adding instructions on how can we implement this? So then I'd also be able to document implementing a function like this to DASCore (like what we already have for IO).
Beta Was this translation helpful? Give feedback.
All reactions