Directly read ADBC to Spark Dataframe #1801

HaoXuAI · 2024-04-30T23:51:13Z

What feature or improvement would you like to see?

Similar to JDBC, something like:

jdbcDF = spark.read \
    .format("adbc") \
    .option("url", "adbc:postgresql") \
    .option("dbtable", "schema.tablename") \
    .option("user", "username") \
    .option("password", "password") \
    .load()

That way help to leverage ADBC in Spark compute environment.

The text was updated successfully, but these errors were encountered:

lidavidm · 2024-05-01T00:00:49Z

I think this should be a Spark feature request?

What I would like to do here is provide a JNI driver that can leverage the better-optimized postgresql/snowflake drivers from Java, though.

HaoXuAI · 2024-05-01T00:06:40Z

I think this should be a Spark feature request?

What I would like to do here is provide a JNI driver that can leverage the better-optimized postgresql/snowflake drivers from Java, though.

Right, it should be a spark feature. I'm posting here to check if it is a meaningful feature, and someone from the arrow team is already working on it. :)
What do you mean by JNI driver?

lidavidm · 2024-05-01T00:08:13Z

I don't believe anyone is working on this. Best to take it to the Spark community.

The ADBC driver for postgres, snowflake in Java just wraps JDBC. It doesn't provide any benefits. If we had JNI bindings to the C++/Go drivers we might see some performance benefits.

HaoXuAI · 2024-05-01T00:10:50Z

make sense. let me post it in the Spark repo.

tokoko · 2024-05-07T12:53:56Z

@HaoXuAI hey, fancy seeing you here 😄 I've started this a while ago and then abandoned it (changed jobs and was no longer using Dremio). Can help you bring it back from the dead if you have a use case.

HaoXuAI · 2024-05-07T15:50:51Z

Hey @tokoko ! Great to see you here as well. Not a direct use case on work, but thinking about using ADBC in a project to read data on spark. Do you want to directly contribute to spark or keep it a plugin?

tokoko · 2024-05-07T17:43:20Z

@HaoXuAI My goal at the time was to get it to mostly working condition as a plugin and then contribute, but we can do it either way.

@lidavidm Even with JNI drivers, the adbc java interface itself will still look the same, right? spark data source implementation will be independent of how drivers are implemented.

lidavidm · 2024-05-07T22:13:40Z

Yes, the idea of JNI would be to implement the same Java-side interface

HaoXuAI added the Type: enhancement New feature or request label Apr 30, 2024

HaoXuAI changed the title ~~Directly load ADBC to Spark Dataframe~~ Directly read ADBC to Spark Dataframe Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directly read ADBC to Spark Dataframe #1801

Directly read ADBC to Spark Dataframe #1801

HaoXuAI commented Apr 30, 2024

lidavidm commented May 1, 2024 •

edited

Loading

HaoXuAI commented May 1, 2024

lidavidm commented May 1, 2024

HaoXuAI commented May 1, 2024

tokoko commented May 7, 2024

HaoXuAI commented May 7, 2024

tokoko commented May 7, 2024

lidavidm commented May 7, 2024

Directly read ADBC to Spark Dataframe #1801

Directly read ADBC to Spark Dataframe #1801

Comments

HaoXuAI commented Apr 30, 2024

What feature or improvement would you like to see?

lidavidm commented May 1, 2024 • edited Loading

HaoXuAI commented May 1, 2024

lidavidm commented May 1, 2024

HaoXuAI commented May 1, 2024

tokoko commented May 7, 2024

HaoXuAI commented May 7, 2024

tokoko commented May 7, 2024

lidavidm commented May 7, 2024

lidavidm commented May 1, 2024 •

edited

Loading