Luca's meaningless thoughts   SponsorGitHub SponsorsLiberapayPaypalBuy Me A CoffeePatreonFlattr

GC optimization for contiguous pointers to the same page

by Leandro Lucarella on 2009- 04- 01 23:41 (updated on 2009- 04- 01 23:41)
tagged d, dgc, en, gc, optimization, phobos - with 0 comment(s)

This optimization had a patch, written by Vladimir Panteleev, sitting on Bugzilla (issue #1923) for a little more than an year now. It was already included in both Tango (issue #982) and DMD 2.x but DMD 1.x was missing it.

Fortunately is now included in DMD 1.042, released yesterday.

This optimization is best seen when you do word splitting of a big text (as shown in the post that triggered the patch):

import std.file, std.string;
void main() {
    auto txt = cast(string) read("text.txt"); // 6.3 MiB of text
    auto words = txt.split();
}

Now in words we have an array of slices (a contiguous area in memory filled with pointers) about the same size of the original text, as explained by Vladimir.

The GC heap is divided in (4KiB) pages, each page contains cells of a fixed type called bins. There are bin sizes of 16 (B_16) to 4096 (B_PAGE), incrementing in steps of power of 2 (32, 64, etc.). See Understanding the current GC for more details.

For large contiguous objects (like txt in this case) multiple pages are needed, and that pages contains only one bin of size B_PAGEPLUS, indicating that this object is distributed among several pages.

Now, back with the words array, we have a range of about 3 millions interior pointers into the txt contiguous memory (stored in about 1600 pages of bins with size B_PAGEPLUS). So each time the GC needs to mark the heap, it has to follow this 3 millions pointers and find out where is the beginning of that block to see its mark-state (if it's marked or not). Finding the beginning of the block is not that slow, but when you multiply it by 3 millions, it could get a little noticeable. Specially when this is done several times as the dynamic array of words grow and the GC collection is triggered several times, so this is kind of exponential.

The optimization consist in remembering the last page visited if the bin size was B_PAGE or B_PAGEPLUS, so if the current pointer being followed points to the last visited (cached) page, we can skip this lookup (and all the marking indeed, as we know we already visited that page).