-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostgreSQL Timestamp with time zone range type not supported #1550
Comments
Arrow doesn't have a native range type. Though it should've fallen back to giving you the raw bytes. What would be a useful way to encode the data? |
I imagine you could model it as a structure of two values and two flags. More generally, PostgreSQL seems to have a long tail of exotic data types and it's probably helpful to have generic guidance for those. |
Yes, well, the question is if it might be worth trying to define canonical extension types for each of these, or if we're going to just do something ad-hoc in this driver specifically |
Probably a struct with lower and upper bound timestamps and boolean flags indicating inclusivity like Curt mentioned. It could also be dumped in text representation that would be castable back to range type if loaded back to postgres. I don't know if the latter could be made automatically if type is not recognized. |
The extension type would make it easier for us to round trip data, though, if that's expected. (Also, it's the kind of thing that's harder to change down the line.) We can't figure out the string representation automatically. We get whatever bytes Postgres gives us and have to interpret it. |
This has been discussed before if someone would like to advance the discussion. apache/arrow#28389 |
That said, I'm sympathetic to Curt's point that there's going to be quite a few of these, and indeed I'm not sure every such type would want/need a dedicated extension type. CC @paleolimbot @zeroshade for opinions too. I could see an argument for having a non-canonical extension type (that we just define as part of the Postgres driver), but working with extension types may be a little awkward downstream? |
The only other time I've experienced this is when I created a type in the Postgres driver using the same underlying "database" object. We cache the type table in the database object and we don't currently provide a way to invalidate that. It could also be that something about the logic that queries the type table misses the range type, or there is an error when inserting it into the "type resolver". The place where that gets built is here: arrow-adbc/c/driver/postgresql/database.cc Line 133 in 37026a9
I agree that a struct would be the most intuitive. I would have to check how this comes through in COPY output, but I imagine you would need four columns.
You could probably do it with one extension type (maybe What we could do about this in the short-term is:
|
Hmm, I like the idea of a single generic extension type that just flags "hey, we don't know how to interpret this, but here are the raw bytes". I also like that over just field metadata since the field metadata is a bit of an ad-hoc convention vs. extension types which are actually meant for this sort of thing (I think).
I wouldn't be opposed to just always doing this (this came up elsewhere too)
I'm OK with just encoding as struct, then. Possibly also still wrapped under said generic extension type? |
What happened?
On arrow table fetch I got error saying that type 3910 ( tstzrange ) is not implemented
How can we reproduce the bug?
Create a table with timestamp range and try to read it using cursor.fetch_arrow_table()
Environment/Setup
PostgreSQL driver version 0.9.0
Python 3.9
pyarrow 15.0.0
Windows
pip
The text was updated successfully, but these errors were encountered: