Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/adbc/driver/flightsql: long delay between createPreparedStatement and getFlightInfoPreparedStatement #2040

Open
aiguofer opened this issue Jul 25, 2024 · 14 comments
Labels
Type: bug Something isn't working

Comments

@aiguofer
Copy link
Contributor

What happened?

We have a Python API that uses the ADBC driver to execute queries against our Java Arrow Flight SQL server. The Python server uses FastAPI + Strawberry, and when we receive a request to execute a query, we spin up a background thread to handle the execution against the AFS server. Multiple threads on the same pod could be executing queries against the AFS server at any given moment.

We recently noticed some issues with hanging queries, and when looking at our DataDog traces, we notice that there is almost a 30 minute difference between the createPreparedStatement request and the getFlightInfoPreparedStatement request.

My initial guess is that this could be related to having multiple requests at the same time through the ADBC driver, but I don't have enough context about how the bindings between Go and Python work.

Is there anything that jumps out at you? Is there anything we could do to help debug this? Here's pictures of the traces:

Screenshot 2024-07-25 at 2 53 31 PM
Screenshot 2024-07-25 at 2 53 09 PM

Stack Trace

No response

How can we reproduce the bug?

No response

Environment/Setup

Python 3.12
ADBC FlightSQL driver 1.0.0
ADBC driver manager 1.0.0

@aiguofer aiguofer added the Type: bug Something isn't working label Jul 25, 2024
@lidavidm
Copy link
Member

....hmm. 30 minutes is rather concerning. Would it be possible to get a thread dump (e.g. with gdb) if you notice that happening?

There is a log level env var that can be set (ADBC_DRIVER_FLIGHTSQL_LOG_LEVEL=debug), but I guess the first question is if this is in any way reproducible, or something that was noticed after-the-fact with telemetry

We should be totally fine with concurrency. There are some locks around internal state but they are meant to be extremely short.

@aiguofer
Copy link
Contributor Author

Unfortunately it's not reproducible, we noticed it after-the-fact when users reported queries hanging. I only saw this happen with 4 queries, all around the same time, 3 on the same pod and 1 on a separate pod.

We might try setting ADBC_DRIVER_FLIGHTSQL_LOG_LEVEL=debug for some time in case it happens again

@lidavidm
Copy link
Member

@zeroshade or @joellubi any ideas here what could've happened?

@lidavidm
Copy link
Member

Or if there's any way to get Go to dump stack without killing the process? (Maybe that's another debug feature we should consider - optionally binding to SIGUSR or something)

@zeroshade
Copy link
Member

There's a few ways we could do so, though if this is being implemented via Python threads it'll be a bit difficult to track down which stack is the problematic one. Though we could also have a signal just start the pprof profiling to a known filename, and then receiving the signal again stops the profiling. Then it would be easy to pull up and read and check it out

@aiguofer
Copy link
Contributor Author

Well, this hasn't happened again and we've since noticed some memory issues. It's possible that this pod was having memory strain and that caused this slowness. I'm going to close it and report back if we see this again.

@aiguofer aiguofer reopened this Aug 21, 2024
@aiguofer
Copy link
Contributor Author

Well, sadly we've seen continued issues with this now that we fixed the memory issues so I'm reopening. Any thoughts on how we might try to debug this?

@lidavidm
Copy link
Member

We're about to do a release. We could add a bit more logging and hope that can help narrow things down?

@lidavidm
Copy link
Member

That said, those are 2 separate calls in Python. So maybe the Python side could also be logged. Also if you don't strictly need a prepared statement that could be disabled as well.

@lidavidm
Copy link
Member

That said, those are 2 separate calls in Python. So maybe the Python side could also be logged.

So specifically, you can explicitly call stmt.adbc_prepare(query) before calling stmt.execute. That'll let you log/time those calls separately, and we can try to narrow down where the extra 30 minutes is coming from.

I think to avoid the prepared statement, you'd have to drop down to the raw API. We could add a config option or something to control this instead...

@aiguofer
Copy link
Contributor Author

aiguofer commented Aug 22, 2024

ohhh interesting, I noticed the adbc_prepare in the docs recently and was wondering what it was for. We'll try that!

Unfortunately using regular Statements is not an option for us right now. Since neither ADBC nor JDBC use those requests, we have not implemented it.

Side note/question: is there any plan on having (A|J)DBC use regular Statements instead of PreparedStatements? we did notice some recent attempted requests from a Go ADBC driver to the regular Statement flow on our server.

@zeroshade
Copy link
Member

zeroshade commented Aug 22, 2024

The Python ADBC bindings tend to use the Prepared statement workflow rather than regular statements as a performance / safety scenario to ensure compatibility with DBAPI and parameter binding, while also allowing an easier workflow for users and reducing code duplication in the bindings.

The native implementations (the actual ADBC spec, the Go implementation, C# impl, etc.) expose the lower level constructs which allow users to make the decision themselves whether or not they call Prepare. So those implementations won't use the Prepare workflow unless a user explicitly calls the StatementPrepare / Statement.Prepare functions

@lidavidm
Copy link
Member

ohhh interesting, I noticed the adbc_prepare in the docs recently and was wondering what it was for. We'll try that!

I'll try to add an example.

So those implementations won't use the Prepare workflow unless a user explicitly calls the StatementPrepare / Statement.Prepare functions

I'll try to add examples here too. And/or think of a way to let you opt out of this in the DBAPI bindings

lidavidm added a commit to lidavidm/arrow-adbc that referenced this issue Aug 23, 2024
lidavidm added a commit to lidavidm/arrow-adbc that referenced this issue Aug 23, 2024
@lidavidm
Copy link
Member

See #2097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants