Luca's meaningless thoughts

Guaranteed finalization support

by Leandro Lucarella on 2009- 04- 19 17:03 (updated on 2009- 04- 19 17:03)
tagged d, dgc, en, finalization, specs, understanding the current gc - with 0 comment(s)

There was some discussion going on about what I found in my previous post. Unfortunately the discussion diverged a lot, and lots of people seems to defend not guaranteed finalization for no reason, or arguing that finalization is supposed to be used with RAII.

I find all the arguments very weak, at least for convincing me that the current specs are not broken (if finalizers shouldn't be used with objects with its lifetime determined by the GC, then don't let that happen).

The current specs allow a D implementation with a GC that don't call finalizers for collected objects at all! So any D program relying on that is actually broken.

Anyways, from all the possible solutions to this problem, I think the better is just to provide guaranteed finalization, at least at program exit. That is doable (and easily doable by the way).

I filed a bug report about this, but unfortunately, seeing how the discussion at the news group went, I'm very skeptic about this being fixed at all.

Object finalization

by Leandro Lucarella on 2009- 04- 18 15:18 (updated on 2009- 04- 18 15:18)
tagged d, dgc, en, finalization, specs, understanding the current gc - with 0 comment(s)

I'm writing a trivial naive (but fully working) GC implementation. The idea is:

Improve my understanding about how a GC is written from the ground up
Ease the learning curve for other people wanting to learn how to write a D GC
Serve as documentation (will be fully documented)
Serve as a benchmarking base (to see how better is an implementation compared to the dumbest and simplest implementation ever =)

There is a lot of literature on GC algorithms, but there is almost no literature of the particularities on implementing a GC in D (how to handle the stack, how finalize an object, etc.). The idea of this GC implementation is to tackle this. The collection and allocation algorithms are really simple so you can pay attention to the other stuff.

The exercise is already paying off. Implementing this GC I was able to see some details I missed when I've done the analysis of the current implementation.

For example, I completely missed finalization. The GC stores for each cell a flag that indicates when an object should be finalized, and when the memory is swept it calls rt_finalize() to take care of the business. That was easy to add to my toy GC implementation.

Then I was trying to decide if all memory should be released when the GC is terminated or if I could let the OS do that. Then I remembered finalization, so I realized I should at least call the finalizers for the live objects. So I went see how the current implementation does that.

It turns out it just calls a full collection (you have an option to not collect at all, or to collect excluding roots from the stack, using the undocumented gc_setTermCleanupLevel() and gc_gsetTermCleanupLevel() functions). So if there are still pointers in the static data or in the stack to objects with finalizers, those finalizers are never called.

I've searched the specs and it's a documented feature that D doesn't guarantee that all objects finalizers get called:

The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects.

I knew that ordering was not guaranteed so you can't call other finalizer in a finalizer (and that make a lot of sense), but I didn't knew about the other stuff. This is great for GC implementors but not so nice for GC users ;)

I know that the GC, being conservative, has a lot of limitations, but I think this one is not completely necessary. When the program ends, it should be fairly safe to call all the finalizers for the live objects, referenced or not.

In this scheme, finalization is as reliable as UDP =)