Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing drivers in python #2292

Open
tokoko opened this issue Oct 30, 2024 · 4 comments
Open

Implementing drivers in python #2292

tokoko opened this issue Oct 30, 2024 · 4 comments
Labels
Type: question Usage question

Comments

@tokoko
Copy link
Contributor

tokoko commented Oct 30, 2024

What would you like help with?

I suppose this is already possible by duck typing classes to look like the ones in adbc_driver_manager, but I'm curious what's the general attitude towards implementing new drivers in python. A couple of valid use cases that come to mind are:

  • cases when there are already available python packages for the backend that wrap other languages and can output arrow. For example, I eventually opted to go with rust in case of datafusion, but I could have instead implemented it in python using datafusion-python. I get that it's not an optimal solution as the end result would only be accessible with python, but it could have accelerated prototype development.
  • wrappers around existing drivers that augment underlying drivers with some additional functionality. For example, a python driver that wraps a sqlite driver and adds substrait capabilities by translating substrait to sql before invoking actual commands or a driver that's powered by something like sqlglot and does dialect translations in the python layer.

I'm wondering if it might be a good idea to add a dummy python driver implementation to encourage such use cases.

@tokoko tokoko added the Type: question Usage question label Oct 30, 2024
@lidavidm
Copy link
Member

Python wrappers is fine; SQlite already adds a couple of extra methods IIRC.

I'm not sure implementing drivers in Python makes any sense. At that point what you're actually doing is just implementing DB-API, no?

@paleolimbot
Copy link
Member

paleolimbot commented Oct 31, 2024

For what it's worth I think it's something that is perfectly valid to enable (although there is a long list of things ahead of it for me personally). Kirill and I chatted briefly about this in R since it would enable existing DBI drivers to more easily implement an ADBC-native interface (allowing us to migrate end-user usage to ADBC). In R we are perhaps more actively trying to move on from DBI than Python users are trying to move on from dbapi.

The ability to instantly prototype a driver and test it shouldn't be undersold, either (although we could make a project with the boilerplate in Go, C++, and Rust with a few Python tests that might accomplish something similar).

@tokoko
Copy link
Contributor Author

tokoko commented Oct 31, 2024

I'm not sure implementing drivers in Python makes any sense. At that point what you're actually doing is just implementing DB-API, no?

sure, I guess that is what I mean, but to be fair it's not just DB-API, right? It's a heavily adbc-flavored DB-API at best. Most of the features why people would look this way is adbc/arrow specific: fetch_arrow, get_objects, partitions, substrait.

The ability to instantly prototype a driver and test it shouldn't be undersold, either (although we could make a project with the boilerplate in Go, C++, and Rust with a few Python tests that might accomplish something similar).

I know this might not be the best comparison, but I'm sort of thinking of python drivers as analogous to the newly added Python DataSource API in pyspark. You could argue that prototyping in java/scala can be just as easy, but it's all about familiarity at the end of the day, right? For pyspark users, python API probably means less hurdles for a prototype. To extend the example to this discussion, if some python system/library is directly using adbc (meaning DB-API with adbc extensions) as a pluggable source, it might be easier to implement some unusual cases directly in python, most likely in the same codebase w/o any additional build steps.

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

I suppose anyone is free to duck-type themselves as an ADBC driver, I'm mostly just reluctant to expand the scope to include a formal Python API specification. But maybe we should try to intentionally compete with DB-API and/or formalize some of the extensions that we (and others) make to the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: question Usage question
Projects
None yet
Development

No branches or pull requests

3 participants