Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL Timestamp with time zone range type not supported #1550

Open
ttomasz opened this issue Feb 15, 2024 · 9 comments
Open

PostgreSQL Timestamp with time zone range type not supported #1550

ttomasz opened this issue Feb 15, 2024 · 9 comments
Labels
Type: bug Something isn't working

Comments

@ttomasz
Copy link

ttomasz commented Feb 15, 2024

What happened?

On arrow table fetch I got error saying that type 3910 ( tstzrange ) is not implemented

How can we reproduce the bug?

Create a table with timestamp range and try to read it using cursor.fetch_arrow_table()

Environment/Setup

PostgreSQL driver version 0.9.0
Python 3.9
pyarrow 15.0.0
Windows
pip

@ttomasz ttomasz added the Type: bug Something isn't working label Feb 15, 2024
@lidavidm
Copy link
Member

Arrow doesn't have a native range type. Though it should've fallen back to giving you the raw bytes.

What would be a useful way to encode the data?

@lidavidm lidavidm added this to the ADBC Libraries 0.11.0 milestone Feb 15, 2024
@CurtHagenlocher
Copy link
Contributor

I imagine you could model it as a structure of two values and two flags. More generally, PostgreSQL seems to have a long tail of exotic data types and it's probably helpful to have generic guidance for those.

@lidavidm
Copy link
Member

Yes, well, the question is if it might be worth trying to define canonical extension types for each of these, or if we're going to just do something ad-hoc in this driver specifically

@ttomasz
Copy link
Author

ttomasz commented Feb 16, 2024

What would be a useful way to encode the data?

Probably a struct with lower and upper bound timestamps and boolean flags indicating inclusivity like Curt mentioned. It could also be dumped in text representation that would be castable back to range type if loaded back to postgres.

I don't know if the latter could be made automatically if type is not recognized.

@lidavidm
Copy link
Member

The extension type would make it easier for us to round trip data, though, if that's expected. (Also, it's the kind of thing that's harder to change down the line.)

We can't figure out the string representation automatically. We get whatever bytes Postgres gives us and have to interpret it.

@lidavidm
Copy link
Member

This has been discussed before if someone would like to advance the discussion.

apache/arrow#28389
https://lists.apache.org/thread/ybrm5m27xc163nmjzn9zwsvbq6cfc0x0

@lidavidm
Copy link
Member

That said, I'm sympathetic to Curt's point that there's going to be quite a few of these, and indeed I'm not sure every such type would want/need a dedicated extension type. CC @paleolimbot @zeroshade for opinions too.

I could see an argument for having a non-canonical extension type (that we just define as part of the Postgres driver), but working with extension types may be a little awkward downstream?

@paleolimbot
Copy link
Member

Though it should've fallen back to giving you the raw bytes.

The only other time I've experienced this is when I created a type in the Postgres driver using the same underlying "database" object. We cache the type table in the database object and we don't currently provide a way to invalidate that. It could also be that something about the logic that queries the type table misses the range type, or there is an error when inserting it into the "type resolver". The place where that gets built is here:

AdbcStatusCode PostgresDatabase::RebuildTypeResolver(struct AdbcError* error) {

I imagine you could model it as a structure of two values and two flags.

I agree that a struct would be the most intuitive. I would have to check how this comes through in COPY output, but I imagine you would need four columns.

the question is if it might be worth trying to define canonical extension types for each of these, or if we're going to just do something ad-hoc in this driver specifically

You could probably do it with one extension type (maybe arrow.adbc.postgres), whose serialization held the type OID (which is basically all we have at the point we start constructing a result). Field metadata would do about the same thing except with an extension type you can customize how those arrays are converted to Python objects.

What we could do about this in the short-term is:

  • Make sure there's an option to return the type OID and/or name as field metadata for all fields (so that consumers can implement workarounds if they need more granularity on Postgres type)
  • Return range types as structs (and make sure they make it into the type resolver)

@lidavidm
Copy link
Member

Hmm, I like the idea of a single generic extension type that just flags "hey, we don't know how to interpret this, but here are the raw bytes". I also like that over just field metadata since the field metadata is a bit of an ad-hoc convention vs. extension types which are actually meant for this sort of thing (I think).

Make sure there's an option to return the type OID and/or name as field metadata for all fields (so that consumers can implement workarounds if they need more granularity on Postgres type)

I wouldn't be opposed to just always doing this (this came up elsewhere too)

Return range types as structs (and make sure they make it into the type resolver)

I'm OK with just encoding as struct, then. Possibly also still wrapped under said generic extension type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants