-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit GC.KeepAlive in classes with finalizers #103522
Comments
I would say that this is a job for interop generators to worry about. This problem only exists in manually written low-level unsafe interop. Manually written 100% correct low-level unsafe interop is hard. I do not think that this makes it significantly simpler.
It is not the only cost. It also increases register pressure a bit, prevents tail call optimizations, ... . |
Note that this is not 100% reliable solution. You can still get into trouble because resurrection. It is why we have switched BCL from |
If we were to build this feature, we would not be able to switch BCL interop from SafeHandles to the implicit KeepAlives. It would not pass security review. It begs the question why to build this interop feature when it is not good enough for BCL? |
Objects being finalized while methods are still executing on them is completely baffling behavior when you discover it, and it just feels like a bug (in .NET/C# itself). It's not a bug in the implementation, so I'd argue it's a bug in the design (IMO of course). Sure it (mostly) only affects low-level interop code, but a lot of code bases need that sort of thing and it's not the exclusive domain of interop experts. Even for "interop experts" it's very easy to forget that you need to use In Paint.NET I solved this structurally by using an "acquire self handle" pattern in my custom COM wrappers. I'm also using a custom source generator so I don't have to write each one of these "self handles" myself. This does introduce an implicit If [ComObjectRefImpl<ID2D1Mesh>]
internal unsafe partial class D2D1Mesh
: D2D1Resource,
IDeviceMesh
{
public D2D1Mesh(ID2D1Mesh* pObject)
: base((ID2D1Resource*)pObject)
{
// The pointer is stashed by the base-most `ComObjectRef` class and is only accessible via the "self handle"
// This makes it difficult to accidentally not use the "self handle"
}
public ITessellationSink Open()
{
using var self = AcquireSelfHandle(); // <-- acquire
using ComPtr<ID2D1TessellationSink> spSink = default;
HRESULT hr = self.pD2D1Mesh->Open(spSink.GetAddressOf()); // <-- use
hr.ThrowOnError();
return ComObjectRef.CreateRef<ITessellationSink>(spSink).NotNull();
}
// vvvvvvv Generated code (some of it anyway) vvvvvvv
[MethodImpl(MethodImplOptions.AggressiveInlining)]
protected new SelfHandle AcquireSelfHandle()
{
return new SelfHandle(this);
}
protected new readonly ref struct SelfHandle
//: IDisposable
{
private readonly D2D1Mesh self;
// vvvvvvv This is where you get access to the native pointer
public TerraFX.Interop.DirectX.ID2D1Mesh* pD2D1Mesh
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => (TerraFX.Interop.DirectX.ID2D1Mesh*)this.self.DangerousGetNativeObjectNoAddRef();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public SelfHandle(D2D1Mesh self)
{
self.VerifyNotDisposed();
this.self = self;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Dispose()
{
GC.KeepAlive(this.self); // I don't actually know if this accomplishes anything
}
}
} |
This is something every interop developer eventually hits. I'm not sure how we can fix this and instead I think the correct mitigation is to document in our https://learn.microsoft.com/dotnet/standard/native-interop/best-practices, but not attempt to hide from the user. |
It appears that the best practice doc says nothing about finalizers, which is good because using finalizer is not really a good practice, it complicates things, a lot. Non-deterministic releases of resources , unnecessarily elongating the life time of objects, non determinism due to threading, etc. It just makes thing so much harder than doing the right thing upfront. I wonder where does the idea of using finalizer to clean up comes from, we should probably stop teaching that. Maybe we should talk about preferring deterministic resource release through the |
Correct. Whenever the interop team is involved, finalizers are specifically discouraged for non-BCL related implementations. Instead, the This can be observed in the latest large interop API, Adding this perspective to the aformentioned best practices doc makes a lot of sense here. |
The most important thing is that: you won't be able to reproduce it in Debug configuration as the lifetime will be extended to the end of method implicitly. |
I think this is why people start using finalizer as the resource clean up mechanism in the first place.
If I knew nothing about what to do, I would skim through the document, read the first thing, you told me that is the most common way of doing thing, I will do just that.
Change the wordings and I think it might do some good:
Should I think of Fundamentally, managing life time of native object is a pain for managed developer. It is really just type 1 pain (i.e. figure out the logic so that we can deterministically release it) or type 2 pain (non-determinism and race condition). Type 2 pain feels way more painful than type 1, IMO. |
Yes, that is the expectation. Letting the GC handle object lifetime should always be the preference. It costs more to implement, but is more performant and less prone to use-after-free scenarios and corruption. In effect, the entire point of garbage collectors. Specifically for COM, there are a few niche cases where not relying on the GC is needed. The most common involve tooling for tracking memory usage of native COM components and when COM components hold onto resources, think of file locks. Both rely on predictable lifetimes so options are needed. All other cases should let the GC manage the lifetime of the interop projection, hence the default case is what we expect users to use. Manual management via Encouraging users, by default, to try and manually handle lifetimes with the GC is not appropriate for the .NET ecosystem. That is why we've spent multiple developer years constructing abstractions to avoid that. When users need bespoke solutions they need to reference best-practices guides and realize there is going to be pain in developing those solutions. See the effort being made for Swift interop, again we are creating abstractions to avoid the lifetime complexity in the common cases, but are providing APIs when manual management is needed.
Sort of. It is slightly different though because RAII semantics are difficult to avoid in C++, but not calling
That is the GC and COM |
As a reserve way, can we write a Roslyn analyzer to track such use cases? The generator will not provide a 100% safety, but something is better than nothing. public class MyClass
{
public IntPtr Handle;
public void DoWork()
{
// E.g. some pinvoke
Native.DoSomeWork(Handle);
}
~MyClass()
{
Native.DestroyHandle(Handle); // can be invoked before DoSomeWork finishes!
}
} |
IMO the way should be to align the behavior of the lifetime between debug and release so that developers can find the issue earlier in the debug build during development, instead of all going well during development but the issue suddenly appears after deploying the app to the production environment. |
No. In debug mode, lifetime of local variables should be extended to the whole method, so that when a breakpoint at the end of a method is encountered, |
Yup. You definitely could write an analyzer for this. |
It seems the JIT already knows how to do this due to generics or synchronized methods requiring it, so it probably wouldn't be too hard to check the asmdiffs impact of the change. Look for |
The fix is indeed trivial on jit side, thanks to @jakobbotsch's suggestion Jit-diffs: MihuBot/runtime-utils#463, dominated by System.Data (also keep in mind that this is PMI mode where jit-utils instanciate all generic methods with a set of predefined types)
I don't think it's possible, even in Release this specific case may reproduce only once a week since GC has to happen in an exact moment to reproduce it. |
My concerns with this feature would be:
|
But it's already pretty random, isn't? It depends on a codegen Tier, e.g. in Tier0 it will always be extended so I doubt anyone depends on this behavior |
From the interop perspective, I think this is best suited to be fixed with documentation. I would push back on this work if an interop scenario is solely being used as justification for it. |
How about introducing a runtime knob for this? So that such behavior can be enabled manually in order to diagnostic issues caused by missing |
Premortem finalization is a pretty fragile feature and always has been. Java has chosen an interesting way to deal with this - by deprecating finalizers in v9 |
Well, we didn't do the same🙂 The main motivation why I created this issue is that nobody among the people (mostly, seniors) I demonstrated the snippet knew the answer. Of course, it's possible to hit similar issues with e.g. multi-threads (although, there at least people normally do some synchronization) or mess up in the finalizer itself. Since the fix, presumably, is not too difficult (we just need to insert |
IIUC I think that to diagnose these issues you would need more aggressive garbage collection like GC stress, not less aggressive. |
I tried to at least recommend moving away from finalizer above, that doesn't go very well. |
It has to be some official statement with a "Deprecated" warning/error issued by Roslyn on use, otherwise it's a normal/popular feature developers use all the time, not only for interop, but also for various things like closing connections/files etc. In Java they also introduced Cleners API as a replacement and an addition to their "using" alternative, if I read it correctly |
I think we should really come to terms with when this manifests. It is very uncommon that users writing entirely in managed code are impacted by this behavior. Vlad mentioned one instance involving a weakref, but I don't know how common that truely is. The real case where this is common, indicated by this very issue, is interop. Jan mentioned it here, and I will say this again - manual interop is hard. Interop is by definition trying to reconcile two disparate, often incredibly complicated, systems. It isn't easy and causes no end to grief without deep knowledge of the two systems, even then it is very hard. Introducing minor "fixes" into the runtime to partially mitigate this friction is going to pessimize the rest of the system for a narrow case that will benefit few users. The vast majority of users rely on .NET built/owned interop solution. There are industrious users that go through the pain to create their own solutions and that is reasonable, but the pain is not unexpected nor something I think we should attempt to fix outside of improved documentation on best practices and recommended patterns. |
Java has phantom references that are a better replacement of finalizers. It allowed them to deprecate finalizers. In order to deprecate finalizers, we would need to have a replacement story like phantom references. |
As I said, I am fine with having a doc, I filed this issue to hear thoughs from both runtime folks and the community
To be honest, I don't think it's a good strategy to rely on all Pinvoke users to be seniors with 10 years of experience and knowing all the little details about the GC and if you're not qualified you can't pinvoke into this PS: I wouldn't file this issue if finalizers were deprecated, but they're not |
If you were to make this a dependable feature, you would also need to fix things throughout the tools. For example:
Our interop documentation tells you to use safe handles instead of finalizers: https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-runtime-interopservices-safehandle . If you do not have the experience required to write manual interop, use safe handles. |
I don't think this is a fair characterization of the recommendation. Calling a P/Invoke doesn't requiring knowing all this, but if solutions are architected with state living beyond the call, then yes, one can get themselves into a bad spot. Finalizers are notoriously hard, see Eric Lippert's series from 9 years ago. I can't even count the number of times where we've intentionally suggested that people not write their own finalizer because it is so difficult and prone to errors. The suggestion is always use In this case, if |
I am confused, this sounds contradictory to me. I thought the stance was pro finalizer, let's GC deal with the lifetime:
But then, we tell people to not do that many times?
What are we really recommending here, am I missing something? |
We're conflating two different ideas here. For any .NET provided solution, the most appropriate implementation will be taken. For COM, Objective-C, safe handles, that is finalizers. We provide these solutions and they use finalizers. Those solutions are designed to be transparent to the user and non-deterministic - treating all interop related objects like any other .NET objects and letting the GC clean them up. There are times though when we/they need more control so using For official recommendations on interop not in-box, both approaches are valid and should be discussed. We will always recommend built-in solutions first, but if users have a desire to write bespoke solutions we will recommend using |
Thanks for the clarity, I think I've got it now. That aligns with what I thought it should be. As long as:
then I think I am good with that. We should recommend inbox solution first, and that's sounds great too. The fact that our inbox solution uses finalizer is really just an implementation details that is irrelevant for them, we don't need to mention too much about that if at all. That is what triggered my confusion in the first place (i.e. My reasoning goes as follow: MS inbox solution use finalizer, then that must be a great way to do things, that's just not accurate) |
@MichalPetryka found a related case in a popular library libgit2sharp It uses a base class with just |
Looking at the base type, |
From the interop perspective, I think this can be closed. The documentation for interop best-practices has been updated with notes on |
Normally, the fact that GC is precise has no visible impact on user's code in terms of correctness, however, there are certain scenarios where developer might be expected to keep an object alive by rooting it or using
GC.KeepAlive
and may hit rarely reproducible bugs otherwise. It typically happens with finalizers, for example:Here it's not obvious that
DestroyHandle(Handle)
can be legally invoked whileDoSomeWork(Handle)
is still being executed (or even not started yet!), it happens because of the precise GC (FullOpts/Tier1) that may mark "this" object as dead afterHandle
field is accessed -DoWork
method doesn't even have to be inlined. The only way to prevent this behavior is to addGC.KeepAlive(this);
afterDoSomeWork(Handle)
;Numerous people were not aware about this, IMO, not obvious behavior and suspected their code is subject to potential bugs (typically, "use after free"-like). Afair, @rickbrew had to insert a lot of such
GC.KeepAlive(this);
in Paint.NET application, also, @kekekeks suspects they might have potential bugs in Avalonia due to this behavior.Here is the minimal repro: https://gist.github.com/EgorBo/f30e159c634f40b82b4edc78cf373466
Proposed solution
We can artificially extend life of
this
pointer in all instance methods of classes with finalizers by, effectively, emittingGC.KeepAlive(this)
before all returns in instance methods to significantly reduce number of potential issues in users code at a small cost of making some objects live longer than needed. It's not supposed to be a silver bullet as there are still cases whereGC.KeepAlive(obj)
(or rooting an object) may still be required. It might be implemented on Roslyn side (so other runtimes with precise GC won't step on this too) or in JIT's liveness analysis.cc @jkotas
The text was updated successfully, but these errors were encountered: