Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected type of search_data() "bounding_box" seems too narrow #279

Open
arthur-e opened this issue Aug 15, 2023 · 11 comments
Open

Expected type of search_data() "bounding_box" seems too narrow #279

arthur-e opened this issue Aug 15, 2023 · 11 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request hackathon An issue we'd like to visit during a hackathon event help wanted Extra attention is needed

Comments

@arthur-e
Copy link

The following generates a TypeError (Traceback at bottom of post):

results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-01", "2023-07"), 
    bounding_box = [2, 35, 4, 37]) # NOTE: A list

While this example works as expected (no error):

results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-01", "2023-07"), 
    bounding_box = (2, 35, 4, 37)) # NOTE: A tuple

It's not immediately obvious from the documentation that bounding_box must be a tuple.

Maybe there is a good reason that a list type is not acceptable? It seems to me like list and tuple should both be accepted.

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[68], line 1
----> 1 results = earthaccess.search_data(
      2     short_name = 'M2SDNXSLV',
      3     temporal = ("2023-01", "2023-07"), 
      4     bounding_box = [2, 35, 4, 37])
      6 time_series = []
      7 file_list = earthaccess.open(results)

File /usr/local/python-env/OpenClimate/lib/python3.8/site-packages/earthaccess/api.py:103, in search_data(count, **kwargs)
     65 def search_data(
     66     count: int = -1, **kwargs: Any
     67 ) -> List[earthaccess.results.DataGranule]:
     68     """Search dataset granules using NASA's CMR.
     69 
     70     [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)
   (...)
    101         ```
    102     """
--> 103     query = DataGranules().parameters(**kwargs)
    104     granules_found = query.hits()
    105     print(f"Granules found: {granules_found}")

File /usr/local/python-env/OpenClimate/lib/python3.8/site-packages/earthaccess/search.py:347, in DataGranules.parameters(self, **kwargs)
    345         methods[key](*val)
    346     else:
--> 347         methods[key](val)
    349 return self

TypeError: bounding_box() missing 3 required positional arguments: 'lower_left_lat', 'upper_right_lon', and 'upper_right_lat'
@MattF-NSIDC MattF-NSIDC added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 15, 2023
@MattF-NSIDC
Copy link

MattF-NSIDC commented Aug 15, 2023

Good catch! Looks like we're specifically checking for tuples and splatting them. If we were also splatting lists in the same way, that would solve this issue, but would it possibly introduce another issue? @betolink

https://github.com/nsidc/earthaccess/blob/95253c01f37048f4eea780e08be9d9dca2dd9c0d/earthaccess/search.py#L344C13-L345C35

❓ Shouldn't GitHub be expanding that link to embed? What am I doing wrong? 😆

@betolink
Copy link
Member

It would be useful to accept a list where we currently have tuples, the reason to use tuples is that python_cmr uses them, but lists are more intuitive for sure.

@asteiker
Copy link
Member

asteiker commented Feb 6, 2024

We'll move forward with updating https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/#earthaccess.api.search_data for both temporal and bounding box to clarify that tuple is required.

@mfisher87
Copy link
Collaborator

We also discussed coming back to handling list-like objects with 4 elements at a future time.

@mfisher87
Copy link
Collaborator

Circling back with some research from another ticket: We can achieve this with Pydantic's runtime validation decorator: https://docs.pydantic.dev/latest/concepts/validation_decorator/#argument-types

@JessicaS11
Copy link
Collaborator

Not sure if this is helpful, but a couple years ago we separated out the temporal and spatial handling in icepyx to their own modules. Each module (here the examples are spatial) parses the user input (e.g. bbox lists, polygons, and shapefiles are accepted for the spatial ones), verifies that it's a valid geometry, and stores it as a list of coordinates. Additional functions make it easy to then return that verified spatial info in whatever format is needed (a geodataframe; a string formatted for cmr (or EGI); a string formatted for another API). I'm not sure if this is a good programmatic approach from a dev perspective, but I'd be happy to brainstorm if this sort of thing would be a good fit for earthaccess to allow users a broader set of input types (and if it makes sense to push these modules upstream).

@andypbarrett
Copy link
Collaborator

This is documented in the API but not in User Guide.

@danielfromearth danielfromearth added the help wanted Extra attention is needed label Oct 29, 2024
@asteiker
Copy link
Member

We may still need a decision here on whether to support lists in the future. This may be a good example for a decision committee (#761 )

@arthur-e
Copy link
Author

Would love to see some progress on this item, as I'm teaching earthaccess and it's hard to explain the narrow requirements of this particular aspect of the API. Happy to help in any way.

@mfisher87
Copy link
Collaborator

Agreed we should make a decision on this, and I think we should try to make that decision as a more general rule that can save time on future decisions. I think we should adopt the API design principle: Specify inputs loosely, i.e. take any sequence with 4 elements for a bounding box ((BoundingBox = Annotated[Sequence, Len(4, 4)]); and specify outputs tightly, i.e. the user knows the exact type coming out of a function.

@mfisher87 mfisher87 added the hackathon An issue we'd like to visit during a hackathon event label Nov 3, 2024
@JessicaS11
Copy link
Collaborator

It ties in with some ongoing conversation over in #804 too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request hackathon An issue we'd like to visit during a hackathon event help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants