Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix code to work with Fugue 0.8.7 #245

Merged
merged 5 commits into from
Nov 14, 2023

Conversation

goodwanghan
Copy link
Contributor

No description provided.

@@ -577,8 +578,28 @@ def _deserialize(
arr = [pickle.loads(r["data"]) for r in df if r["left"] == left]
if len(arr) > 0:
return pd.concat(arr).sort_values(schema.names).reset_index(drop=True)
return pd.DataFrame(
{k: pd.Series(dtype=v) for k, v in schema.pandas_dtype.items()}
# The following is how to construct an empty pandas dataframe with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change to make fugue 0.8.6 compatible

@@ -541,6 +541,7 @@ def _distributed_compare(

def _serialize(dfs: Iterable[pd.DataFrame], left: bool) -> Iterable[Dict[str, Any]]:
for df in dfs:
df = df.convert_dtypes()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change to make fugue 0.8.7+ compatible

@goodwanghan
Copy link
Contributor Author

@fdosani this change will make datacompy wotk with both the latest fugue and the lower versions.

@fdosani
Copy link
Member

fdosani commented Nov 13, 2023

@fdosani this change will make datacompy wotk with both the latest fugue and the lower versions.

Thank you! Taking a look through the PR now.

@fdosani
Copy link
Member

fdosani commented Nov 13, 2023

@goodwanghan I'm having some issues when I was testing with "fugue==0.8.6" explicitly.
When I run pytest tests/test_fugue.py:

============================================================ short test summary info =============================================================
FAILED tests/test_fugue.py::test_is_match_native - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_is_match_spark - pyspark.errors.exceptions.captured.PythonException:
FAILED tests/test_fugue.py::test_is_match_polars - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_is_match_duckdb - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_doc_case - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_report_pandas - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_report_spark - pyspark.errors.exceptions.captured.PythonException:
================================================== 7 failed, 12 passed, 127 warnings in 32.60s ===================================================

Specifically seems to be maybe some incompatibility with Pandas dataframe conversion:

   def as_pandas(self) -> pd.DataFrame:
        """Convert to pandas DataFrame"""
        pdf = pd.DataFrame(self.as_array(), columns=self.columns)
        if len(pdf) == 0:  # TODO: move to triad
            return pd.DataFrame(
                {
                    k: pd.Series(dtype=v.type.to_pandas_dtype())
                    for k, v in self.schema.items()
                }
            )
>       return PD_UTILS.enforce_type(pdf, self.schema.pa_schema, null_safe=True)
E       AttributeError: 'PandasUtils' object has no attribute 'enforce_type'

../../../miniconda3/envs/datacompy/lib/python3.10/site-packages/fugue/dataframe/dataframe.py:123: AttributeError

Fugue 0.8.7 works prefectly fine since the GitHub Actions tests all seem to pass just fine.
Is this a mismatch with Triad maybe since the PandasUtils is coming from there?

EDIT: confirmed If I downgrade to triad==0.9.1 the tests are passing. Maybe we can just put fugue at 0.8.7:

dependencies = [
    "pandas<=2.0.2,>=0.25.0",
    "numpy<=1.26.0,>=1.22.0",
    "ordered-set<=4.1.0,>=4.0.2",
    "fugue<=0.8.7,>=0.8.7",
]

@goodwanghan
Copy link
Contributor Author

goodwanghan commented Nov 14, 2023

@goodwanghan I'm having some issues when I was testing with "fugue==0.8.6" explicitly. When I run pytest tests/test_fugue.py:

============================================================ short test summary info =============================================================
FAILED tests/test_fugue.py::test_is_match_native - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_is_match_spark - pyspark.errors.exceptions.captured.PythonException:
FAILED tests/test_fugue.py::test_is_match_polars - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_is_match_duckdb - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_doc_case - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_report_pandas - AttributeError: 'PandasUtils' object has no attribute 'enforce_type'
FAILED tests/test_fugue.py::test_report_spark - pyspark.errors.exceptions.captured.PythonException:
================================================== 7 failed, 12 passed, 127 warnings in 32.60s ===================================================

Specifically seems to be maybe some incompatibility with Pandas dataframe conversion:

   def as_pandas(self) -> pd.DataFrame:
        """Convert to pandas DataFrame"""
        pdf = pd.DataFrame(self.as_array(), columns=self.columns)
        if len(pdf) == 0:  # TODO: move to triad
            return pd.DataFrame(
                {
                    k: pd.Series(dtype=v.type.to_pandas_dtype())
                    for k, v in self.schema.items()
                }
            )
>       return PD_UTILS.enforce_type(pdf, self.schema.pa_schema, null_safe=True)
E       AttributeError: 'PandasUtils' object has no attribute 'enforce_type'

../../../miniconda3/envs/datacompy/lib/python3.10/site-packages/fugue/dataframe/dataframe.py:123: AttributeError

Fugue 0.8.7 works prefectly fine since the GitHub Actions tests all seem to pass just fine. Is this a mismatch with Triad maybe since the PandasUtils is coming from there?

EDIT: confirmed If I downgrade to triad==0.9.1 the tests are passing. Maybe we can just put fugue at 0.8.7:

dependencies = [
    "pandas<=2.0.2,>=0.25.0",
    "numpy<=1.26.0,>=1.22.0",
    "ordered-set<=4.1.0,>=4.0.2",
    "fugue<=0.8.7,>=0.8.7",
]

Yes, because 0.8.6 didn't have a cap on Triad, and triad 0.9.3 no longer works with Fugue 0.8.6-.

So if people pinned the versions, they should have Fugue 0.8.6 with triad 0.9.1 but if they let the versions to be flexible, both will update to the latest versions so there shouldn't be problem. Only when you manually set triad to be 0.9.3 and Fugue to be 0.8.6- there can be compatibility issues

@goodwanghan
Copy link
Contributor Author

And yes we can just update fugue requirement to 0.8.7

Copy link
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one change (made a suggestion). For context I set it as: "fugue<=0.8.7,>=0.8.7", because we use edgetest which runs once a week and will bump up versions as they pass and become available. It is automatically done via actions.

pyproject.toml Outdated Show resolved Hide resolved
Co-authored-by: Faisal <[email protected]>
@goodwanghan
Copy link
Contributor Author

@fdosani i think this is good to merge

@fdosani fdosani self-requested a review November 14, 2023 18:14
@fdosani fdosani merged commit d2cbb41 into capitalone:develop Nov 14, 2023
18 checks passed
@fdosani
Copy link
Member

fdosani commented Nov 14, 2023

@goodwanghan thank you again for your help!

@goodwanghan
Copy link
Contributor Author

Thank you so much @fdosani and I apologize for the inconvenience.

@fdosani fdosani mentioned this pull request Nov 15, 2023
rhaffar pushed a commit to rhaffar/datacompy that referenced this pull request Sep 12, 2024
* Fix code to work with Fugue 0.8.7

* update

* update

* update

* Update pyproject.toml

Co-authored-by: Faisal <[email protected]>

---------

Co-authored-by: Faisal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants