Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add the ability to request a schema from a statement #1514

Open
paleolimbot opened this issue Feb 5, 2024 · 3 comments
Open

feat: Add the ability to request a schema from a statement #1514

paleolimbot opened this issue Feb 5, 2024 · 3 comments

Comments

@paleolimbot
Copy link
Member

There are some situations (e.g., #1513) where the mapping of a database type to an Arrow type is not canonical. SQLite is an example of an end-member where all mappings of a database result are approximate (and not necessarily stable between queries).

When I rewrote the typing part of the PostgreSQL driver, I intentionally separated the "guess Arrow type from Postgres type" and "convert Postgres data to Arrow data" components. Given an Arrow type, it's reasonably straightforward to write the conversion from a Postgres type. The hard (and imprecise) part is the guessing.

Instead of providing a possibly ever-accumulating pile of options along the lines of "adbc.postgresql.statement.numeric_as_double" = "true", I wonder if we could add AdbcStatementRequestSchema(struct AdbcStatement*, struct ArrowSchema*). Often the query author knows this information (or is using a SQL generation tool that already knows what column types to expect). In more dynamic wrappers, one could inspect AdbcStatementExecuteSchema() and look for specific types. This model fits nicely with how the Python __arrow_c_stream__(requested_schema=xxxx) protocol is parameterized as well.

I'm not sure whether the request should be best-effort or error-if-cannot-be-satisfied (or whether the caller should be able to choose). But without the ability to pass an ArrowSchema*, it's very difficult to work around this: you could provide an IPC-serialized schema to AdbcStatementSetOptionBytes().

@lidavidm
Copy link
Member

lidavidm commented Feb 5, 2024

IPC-serialized schema is an option, but that feels rather gross...the other way would be to have a fake "option" that expects you to Bind() an (empty) schema after setting it (which is also gross because of how stateful/procedural it is, but at least doesn't bounce through IPC)

@lidavidm
Copy link
Member

lidavidm commented Feb 5, 2024

Adding an explicit function would be best if we're going to expand things

@CurtHagenlocher
Copy link
Contributor

Other database client specifications typically have some provision for letting the caller say "give me the value in column 1 as a " -- in ODBC, this is via the binding mechanism and both JDBC and ADO.NET let a caller say "getString" (and it's up to the driver to perform a conversion or fail).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants