-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft classes of object/opinionated layer #267
base: main
Are you sure you want to change the base?
Conversation
62583c8
to
b84c858
Compare
I think I'll attempt a fix on a separate PR. |
Small request, as we discussed today, let's change the name from objects.py to erddap.py (I know, zero creativity but I just want to avoid clobbering a python built-in). Regarding the draft: this look great. I'd like to wait for @abkfenris to be back from his vacation so to get his input. IMO the best strategy to get this merge is to issue new PRs with 1 class at a time instead of everything we have in this draft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that would probably help is to break up the classes into separate files. As is, maybe read my notes starting with connection, then server, then datasets.
* What would be the best name for the subpackage and main module? I went with `objects/` and `objects.py` so there wouldn't be any confusion with the Python `object` reserved word, but I also considered `orm` as the name.
What about something like array_like
?
Even if we aren't implementing any/all of the Python array manipulation dunder-methods at this point, it gives us an interface something to aspire to.
* On [GSoC 2022 ideas #228](https://github.com/ioos/erddapy/issues/228), @ocefpaf suggests that on the `ERDDAPConnection` class:
@property(server) -> ERDDAPConnection
Return a new ERDDAPConnection if trying to set a new server, or change other attributes rather than changing it in place.I wasn't sure of what's the best way to do this. I wrote a
.to_string
method which takes either a string (with the new server URL) or an instance ofERDDAPConnection
and returns just a string, or aTypeError
:erddapy/erddapy/objects/objects.py
Lines 22 to 33 in 257de3d
@classmethod def to_string(cls, value): """Convert an instance of ERDDAPConnection to a string.""" if isinstance(value, str): return value elif isinstance(value, cls): return value.server else: raise TypeError( f"Server must be either a string or an instance of ERDDAPConnection. '{value}' was " f"passed.", ) That function is then used in the
.server
setter and in theERDDAPServer
__init__
andconnection
. How should theERDDAPConnection.server
setter return a new instance? This doesn't look like it would work:@connection.setter def connection(self, value: str): return ERDDAPConnection(value)
See my inline comments, but I think we can simplify the connection and servers so that this isn't an issue.
erddapy/objects/objects.py
Outdated
FilePath = Union[str, Path] | ||
|
||
|
||
class ERDDAPConnection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would probably want to split each class out into it's own file. Maybe use the init docstring to give a big overview of relationship between classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERDDAPConnection
largely acts as a wrapper around requests/htmx/urllib that then can be used by the other classes when data access needs to happen.
erddapy/objects/objects.py
Outdated
def __init__(self, server: str): | ||
"""Initialize instance of ERDDAPConnection.""" | ||
self._server = self.to_string(server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if we actually need to store the server on here, which may simplify other things.
erddapy/objects/objects.py
Outdated
def open(self, url_part: str) -> FilePath: | ||
"""Yield file-like object for access for file types that don't enjoy getting passed a string.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe .open()
should be a context manager that gives a pathlib.Path
to a temporary file.
We could also have an explicit .save()
method that uses .open()
under the hood.
erddapy/objects/objects.py
Outdated
class ERDDAPDataset: | ||
"""Base class for more focused table or grid datasets.""" | ||
|
||
def __init__( | ||
self, | ||
dataset_id: str, | ||
connection: str | ERDDAPConnection, | ||
variables, | ||
constraints, | ||
): | ||
"""Initialize instance of ERDDAPDataset.""" | ||
self.dataset_id = dataset_id | ||
self._connection = ERDDAPConnection(ERDDAPConnection.to_string(connection)) | ||
self._variables = variables | ||
self._constraints = constraints | ||
self._meta = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the datasets, I envisions supporting some subset of the xarray/pandas filtering/subsetting API.
For instance if I had a GridDataset
for http://www.neracoos.org/erddap/griddap/WW3_72_GulfOfMaine_latest.html as eds
I could do something like to filter down:
eds = GridDataset.from_url("http://www.neracoos.org/erddap/griddap/WW3_72_GulfOfMaine_latest.html")
eds = eds[['hs', 'dir']] # select just the wave height and direction variables
eds = eds.sel(time=slice("2022-09-23T00:00:00Z", "2022-09-26T00:00:00Z")) # only for a subset of the time range
Then there are different ways that we could consume the dataset.
ds = eds.to_xarray() # creates a xarray dataset via OpenDAP and filters/slices based on the earlier operations
netcdf_url = eds.url('nc') # return the NetCDF url
eds.save('nc', NETCDF_PATH) # Save a netcdf locally.
We probably want to validate that filtering/slicing is working on legit values by default, but have a way to enable/disable it both globally/per dataset. To validate it, we would need to retrieve all the valid variables and dimensions from the server.
Hi @ocefpaf and @abkfenris, thank you very much for the review. I just arrived in Australia after a long trip and will take a couple of days to settle, so I will address the comments next week, and for this week I will attempt a fix to #261. Please let me know if there are any problems. Best, |
- Add interfaces.py to process responses into third-party library objects
- Use existing dataset - Pass requests_kwargs to 'to_pandas' method
This commit introduces a rough draft of the classes to be introduced in the object/opinionated layer. For now, the subpackage containing them is named 'objects' but that may change in the future. For discussion, see ioos#228. - Add 'objects' subpackage with init module - Add 'objects.py' module with 5 new classes - Methods will be implemented in the following commits.
The '|' operator for typing was introduced in Python 3.10. This import allows previous Python versions to work with this operator.
- code review Co-authored-by: Alex Kerney <[email protected]>
- Rename 'objects' to 'array_like' to avoid clobbering Python built-ins - Refactor imports
- Create connection.py, datasets.py, server.py
Replaced by connections, datasets, and server modules.
Add __future__.annotations import to modules 'datasets' and 'server'
Hi,
this PR is a draft of how the new object/opinionated layer should look like. It does not have any "proper" code yet, just a scaffold of how the classes would look like. I am submitting it so we can discuss the best way of implementing this layer based on the discussion on #228.
I have a couple of questions:
objects/
andobjects.py
so there wouldn't be any confusion with the Pythonobject
reserved word, but I also consideredorm
as the name.ERDDAPConnection
class:I wasn't sure of what's the best way to do this. I wrote a
.to_string
method which takes either a string (with the new server URL) or an instance ofERDDAPConnection
and returns just a string, or aTypeError
:erddapy/erddapy/objects/objects.py
Lines 22 to 33 in 257de3d
That function is then used in the
.server
setter and in theERDDAPServer
__init__
andconnection
. How should theERDDAPConnection.server
setter return a new instance? This doesn't look like it would work:Once you approve this, I will start implementing the
core
functions in this layer. Looking forward to hearing your feedback.Thanks,
Vini