-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Modin and PyMars #461
Comments
Hey! It looks modin DataFrames are not a subclass of the pandas DataFrame, so siuba verbs like It looks like explicitly registering things like modin does allow them to dispatch correctly: import modin.pandas as pd
import pandas as pd2
from siuba import _, mutate
df = pd.DataFrame({'x': [1,2,3]})
mutate.register(df.__class__, mutate.dispatch(pd2.DataFrame))
mutate(df, res = _.x + 1) It seems like there are two challenges with implementing this:
(Maybe a last, future piece is that siuba has a system to speed up its pandas grouped operations, that also relies on pandas types :/. Would be quick to adjust, but requires again likely more abstract base classes, unless there's a way to connect a modin DataFrame back to pandas that I'm missing 😓) |
Thanks for looking into it in depth. I can't find it right now, but I think there is a way to convert from modin to pandas. Having said that, that may work if we do such a conversion after all the aggregations produce a dataframe that fits into memory but if we do it very early in the pipeline, then it may error out if the original modin dataframe is very big. If you have time, what about PyMars? I think that may fall more in the |
Just curious about this idea
I ask because I started writing a library with Modin as a backend and I felt I was merely duplicating a lot of ideas that you have so beautifully executed on. Siuba is one of the finest library designs that I have come across. |
Hi,
I was wondering if siuba could support Modin(https://modin.readthedocs.io/en/stable/index.html#modin-is-a-dataframe-for-datasets-from-1mb-to-1tb) and pymars(https://docs.pymars.org/en/latest/). Both are touted as api replacements for pandas
Caveats
execute
needs to be run at the end(https://docs.pymars.org/en/latest/#mars-dataframe)I'm interested in the Ray ML platform(both Modin and PyMars are dataframe apis over the distributed Ray platform) so if you are interested, it would be great to make this work for
The text was updated successfully, but these errors were encountered: