-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable connect to Redshift using adbc_driver_postgresql #1563
Comments
Hmm, it looks like Redshift uses the wire protocol but isn't quite compatible, so we'd have to handle it separately. Even then, maintaining CI would be a challenge, plus it appears it's based on PostgreSQL 8? We've had issues in the past with older PostgreSQL versions, too. |
Specifically for the error here, we are querying the database metadata at startup to figure out which column types map to which Arrow types, but Redshift appears to not provide one of the required columns. |
any updates on this issue ? |
We can probably solve the specific issue of not having the column |
We're blocked on a transition to ADBC by this; TextQL is happy to provide access to one of our demo data warehouses with their default sample database for a few weeks – it's serverless, so may require a cold start sometimes, but works very well. I assume you won't need more than a few thousand queries used for this. Who should I talk to about this? Feel free to reach out, [email protected]. I may be able to contribute myself in a month or so if the issue turns out to be much bigger than expected. This guide about the differences is pretty decent. |
I'll send you a note...I happen to be working on the Postgres driver right now and I'm happy to kick the tires a little to see if I can get it to work. To claim Redshift support long term we would need a set of more permanent credentials that we can use in CI, but there are a few companies that are doing this kind of testing anyway that might be willing to share with us if we can demonstrate a POC 🙂 . |
My CTO is currently on honeymoon; I might be able to get long-term credentials for you too but would definitely need his approval first. Will keep you updated! Responded to your email with credentials for you, might be able to get more for more people if needed. |
Supporting Redshift will significantly boost the exposure and usage of this driver. In my experience working with databases, particularly Redshift, the primary bottleneck is extracting the data fast, and this support will greatly alleviate that issue. ConnectorX has gained popularity as the default engine for Polars, but it only supports reading operations. For writing, one must use SQLAlchemy, which is less performant. Alternatively, you can use the Redshift COPY command, which involves dumping data to S3 via Arrow and then using the Redshift Data API to submit a COPY command. The ability to both read and write with adbc_driver would be a game-changer. |
I believe you can also just use the AWS SDK for pandas (previously known as awswrangler) which does this automatically for you. That may be the best of your intermediate options, although yea having ADBC support in the near term is probably best |
Yes, you are correct, awswrangler handles exactly this internally. However, I prefer using Polars over Pandas, and I know conversion between the two is possible. I just believe that supporting a premier data warehouse (Redshift) would solidify this as a strong alternative to JDBC/ODBC drivers. |
Does the Apache project get any promotional credits from cloud providers? While having someone donate access is nice, it might be best safest if we had a dedicated redshift to connect to over OIDC (ditto for Google cloud and the Bigquerry driver) This article is pretty old but I wonder if it still applies: https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/ |
I do not believe we get any credits, unfortunately. |
…dshift (#2219) Just following up on #1563 to see if the missing `typarray` column is the only issue. To get the details right might be a large project, but we might be able to support a basic connection without too much effort. Paramter binding and non-COPY result fetching seem to work...the default query fetch method (COPY) is not supported, `connection_get_info()` fails, and at a glance, `connection_get_objects()` might be returning incorrect results (and fails at the column depth). ``` r library(adbcdrivermanager) db <- adbc_database_init( adbcpostgresql::adbcpostgresql(), uri = Sys.getenv("ADBC_REDSHIFT_TEST_URI"), adbc.postgresql.load_array_types = FALSE ) con <- db |> adbc_connection_init() stmt <- con |> adbc_statement_init(adbc.postgresql.use_copy = FALSE) stream <- nanoarrow::nanoarrow_allocate_array_stream() stmt |> adbc_statement_bind(data.frame(45)) |> adbc_statement_set_sql_query("SELECT 1 + $1 as foofy, 'string' as foofy_string") |> adbc_statement_execute_query(stream) #> [1] -1 tibble::as_tibble(stream) #> # A tibble: 1 × 2 #> foofy foofy_string #> <dbl> <chr> #> 1 46 string ``` <sup>Created on 2024-10-04 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup> --------- Co-authored-by: William Ayd <[email protected]>
What happened?
I was attempting to create a connection to Redshift using adbc_driver_postgresql but I am getting this error: IO: [libpq] Failed to build type mapping table: ERROR: column "typarray" does not exist in pg_type.
I am aware that Redshift is based on an earlier version of Postgres. Is it currently possible to connect to Redshift? if not, are there plans to do so?
Environment/Setup
4.14.336-253.554.amzn2.x86_64
Python 3.9.18
adbc-driver-manager 0.10.0
adbc-driver-postgresql 0.10.0
pyarrow 14.0.2
The text was updated successfully, but these errors were encountered: