-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/adbc/driver/snowflake+flightsql: Memory access violation when both are used in Python #1841
Comments
Wow, that's really odd. My first thought to ask here is whether or not this reproduces on a non-Mac platform or if this is specific to MacOS? |
I don't get this (macOS aarch64) when building from main.
|
It also doesn't reproduce with the 0.11.0 wheels so this appears to be a macOS x64 issue, which is going to be difficult to debug as I only have an aarch64 machine. Are you able to reproduce it on any other platforms? |
ok, looks like it does happen under rosetta |
However, debugging under rosetta is not working great as lldb appears to get stuck/execution appears to be extremely slow. |
Actually under lldb the process just hangs forever, even if we only import/use a single driver. So that won't help. |
@zeroshade is there a problem perhaps with having two copies of the go runtime in one process? |
I do also see that there are public symbols in the driver from cgo et al |
I think we're between a rock and a hard place here.
I think the interim solution would be to build a "combined" driver that has both drivers in one library. I'm hesitant to hold up the release though since this has been a latent issue for a while. CC @zeroshade @kou @paleolimbot for opinions on the last part too. |
@cocoa-xu this will also affect BigQuery...it seems both C++ and Go have packaging issues here... |
FWIW, the issue above also seems to indicate it can only be reproduced on amd64 macOS. |
Since this is a latent issue for quite some time, and at least up to this point it appears that our consumers don't yet tend to have multiple drivers in one process (or if they do, it's not on macos and isn't exhibiting the issue) I'm okay with this not being a release blocker. We're going to have to figure something out, but I think it's fine to release without a fix for this. Maybe we just add something to the documentation to point out the problem for now and note that we're working on it? |
I agree that it is not a release blocker...the improvements to the individual drivers are (in my opinion) important to release whilst we improve the situation for using multiple drivers in one process. I do have an x86 mac locally that I might be use to iterate on some of this...in R we're using the c-static build mode instead of the c-shared one (but I assume we would have the same issue there). |
Also see: apache#1841
@paleolimbot depending on how R loads the module and it's dependencies, it might not be an issue with the static libs. If you load a bunch of static lib libraries, depending on how they are built, you might be able to only pull in one copy of the Go runtime. Potentially |
I agree with moving forward with the release. Would it make sense to delay the 1.0.0 release until this is resolved, and instead release as 0.12.0? |
@joellubi I don't think it's necessary to delay the 1.0.0 release of the couple of drivers we're doing so for. Since the issue is currently effectively limited to just amd64 Macos and in the most common use cases, users are more likely to be switching between drivers rather than using multiple drivers at the same time. With the doc update saying it's a known issue, I'm fine with moving forward as is. |
I can reproduce this in R as well (but only on my very old x86 laptop): library(adbcdrivermanager)
flightsql <- adbc_database_init(
adbcflightsql::adbcflightsql(),
uri = "grpc+tcp://localhost:8080"
)
snowflake <- adbc_database_init(adbcsnowflake::adbcsnowflake()) I tried adding linker flags to suppress CGO symbols:
...and checked |
I wonder, are you able to get a backtrace from lldb? Under rosetta it appears the Go runtime fails to initialize entirely (it appears to get stuck holding a lock) so I wasn't able to do so from an M1 machine. |
So I tried the following code on my x86_64 macOS and it had the same crash: Minimal Reproduction:import adbc_driver_snowflake.dbapi
import adbc_driver_bigquery.dbapi
adbc_driver_snowflake.dbapi.connect()
adbc_driver_bigquery.dbapi.connect() ResultFull logs
|
|
No, but see dotnet/runtime#102048 EDIT: if you follow the links from that reply, you get a pretty sobering picture about the viability of loading multiple mid-to-heavy weight runtimes into the same process. I guess it's back to Rust, then ;). |
Not quite sure if this is related, but it seems that we don't have an official port of gRPC for Rust yet: grpc/grpc#9316 There're some open source gRPC wrappers used in production, but I'm not sure if they can be loaded from different shared libraries without issues like this, or the similar issue with C++. |
Wow, I definitely didn't expect that. It looks like Tonic is fairly widely used, though. Apparently, there's no shortage of reasons for why it's so hard to move away from C as an implementation language. As a thought experiment, let's say that we wanted to implement a bunch of drivers in some (uniform) non-C language and then consume them via the C API from Python. For Go, the implication is that all the Go drivers would need to be bundled into a single cgo-based dynamic library in order to ensure no conflicts between them. It also implies that there could be problems trying to load any non-ADBC extension which also happened to be implemented in Go. For C# with AOT compilation, there wouldn't be direct conflicts between the individual drivers, but there would be an increasingly heavy burden in terms of thread and address space consumption. The larger problem would be that a lot of existing code doesn't work with AOT compilation. This seems to cover a good chunk of what's needed to connect to BigQuery. The resource consumption issue could also be resolved by bundling all the drivers together into a single dynamic library. For Java and for C# without AOT compilation, it would require the user to have the corresponding runtime installed on their machine. All the drivers implemented in those languages could then be dynamically loaded into that runtime and would share a single set of runtime features like the garbage collector. They could then also share dependencies for an overall smaller footprint -- but of course this has the corresponding drawback that isolation between dependencies is more difficult. There would also need to be a way to dynamically register with a shared ADBC driver manager in-memory, because in this scenario there wouldn't be native entry points available for the drivers. Feel free to add your favorite language to the list... . |
One other nuclear option is to spawn a sacrificial process for each driver that loads the driver and communicates with the main process over some sort of IPC/shared memory, which at least for the data parts should be relatively OK as Arrow's shared memory support shapes up |
FWIW, C# without AOT compilation could use something like DNNE to create the native entrypoints. DNNE allows building native libraries with proper exports. Behind the scenes, the native library uses the .NET runtime hosting APIs to activate the runtime and call into managed code.
As a sidenote, these problems are not unique to mid-to-heavy weight runtimes. Any time one loads a dynamic library with a different memory allocator, these problems can show up. They only don't show up if everyone agrees on using malloc/free from the same C runtime library. As soon as libraries start being fancy and each brings their own tcmalloc/jemalloc/whatever (and they often are), you now have multiple heaps with their own heuristics for expanding/giving memory back to the OS. (E.g. I don't know how Rust manages memory, but it's very likely each Rust DLL would have its own heap as well). |
I've been doing some research into this and came across the following library: gopy. It's meant to generate and compile CGO and CPython extensions from a Go package. What's specifically interesting about it is the unique-handler approach it uses to avoid passing Go pointers directly to C. There's a brief description in the README and the actual handler implementation is in handler.go. I'm still wrapping my head around what a specific implementation would look like, but I wonder if we could adopt a similar approach to isolate pointers within their respective Go runtimes. The actual memory we do want to share (arrays) should be allocated in C anyway via the CGO Mallocator, and with an approach like this all Go pointers are only accessed within the Go runtime via stub methods that delegate the actions to be done. It's not super obvious how it does all this just by looking at the source code, but if you look at the code generated by running the "command line example" in the readme it makes the mechanism clearer. |
I'll definitely try to take a look at that sometime this week. It's an interesting idea, I'm just not sure whether or not it'll solve our problem here. If I look closely at our existing implementation, we do utilize |
It's worth checking |
When you're building a DLL, the allocator in Rust defaults to using malloc on Unix platforms and HeapAlloc on Windows. |
FYI: I'd like to use DuckDB and Postgres ADBC drivers in Go. Not sure its connected but on testing I get a SIGSEV also on reading from a query result. So +1 from me if this can be supported. |
If you're trying to use the drivers from Go, then it's unlikely to be related to this issue. Can you file a new issue for that SIGSEV? |
Just to keep everyone apprised here, I'm gonna try posting in the #cgo and #darkarts channels on the Gopher slack to see if anyone might know of a workaround or other way to avoid this crash. Even though it is still not optimal to end up with multiple copies of the Go runtime in a single process, it would still be good to at least get it to not crash. |
@zeroshade was there any feedback from those channels? |
Nothing useful unfortunately, just one reply that reiterated the fact that it's technically undefined behavior etc. |
What happened?
When using both the snowflake driver and the flightsql driver in the same python program, the process crashes with a segmentation violation. Using one of these drivers at the same time as the postgres driver does not cause a crash, which makes it seem like there might be an issue in the common Go memory management of the snowflake and flightsql drivers.
How can we reproduce the bug?
Minimal Reproduction:
Result:
Environment/Setup
Reproduces with:
and:
Does NOT reproduce with:
However, as I've tried pushing against various edge cases I have been able to get some segmentation violations with
0.10.0
too, but haven't yet isolated the specific behavior that causes it in that version.The text was updated successfully, but these errors were encountered: