Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve error message when specifying dtype="float32[pyarrow]" while PyArrow is not installed #57928

Closed
1 of 3 tasks
wanglc02 opened this issue Mar 20, 2024 · 2 comments · Fixed by #60413
Closed
1 of 3 tasks
Assignees
Labels
Arrow pyarrow functionality Dependencies Required and optional dependencies Enhancement Error Reporting Incorrect or improved errors from pandas

Comments

@wanglc02
Copy link
Contributor

wanglc02 commented Mar 20, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

If PyArrow is not installed properly, running the following code snippet from the User Guide:
ser = pd.Series([-1.5, 0.2, None], dtype="float32[pyarrow]")
will result in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/pandas/core/series.py", line 493, in __init__
    dtype = self._validate_dtype(dtype)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/generic.py", line 515, in _validate_dtype
    dtype = pandas_dtype(dtype)
            ^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/common.py", line 1624, in pandas_dtype
    result = registry.find(dtype)
             ^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/base.py", line 576, in find
    return dtype_type.construct_from_string(dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/dtypes.py", line 2251, in construct_from_string
    pa_dtype = pa.type_for_alias(base_type)
               ^^
NameError: name 'pa' is not defined

which is not very informative.

Feature Description

We can improve the error message by letting the user know that there is something wrong regarding the installation of PyArrow, especially when the user believes that he/she has installed PyArrow, but actually installed it in a wrong location or installed an outdated version.

Alternative Solutions

Catch the NameError and raise another ImportError from it describing what happened.

Additional Context

This conforms to the description in Installation Guide: If the optional dependency is not installed, pandas will raise an ImportError when the method requiring that dependency is called.

@wanglc02 wanglc02 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 20, 2024
@lithomas1 lithomas1 added Error Reporting Incorrect or improved errors from pandas Dependencies Required and optional dependencies Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 22, 2024
@lithomas1
Copy link
Member

PRs for this would be greatly appreciated!

@dataxerik
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Dependencies Required and optional dependencies Enhancement Error Reporting Incorrect or improved errors from pandas
Projects
None yet
3 participants