Luca's meaningless thoughts  

DMD beta

by Leandro Lucarella on 2010- 01- 27 21:01 (updated on 2010- 01- 27 21:01)
tagged beta, compiler, d, development model, dmd, druntime, en, phobos, software - with 2 comment(s)

After some discussion [*] in the D newsgroup about the value of having release candidates for DMD (due to the high number of regressions introduced in new versions mostly), Walter agreed to make public what he called beta versions of the compiler, which he sent privately to people who asked for them (like some Tango developers).

The new DMD betas are announced in a special mailing list (available through Gmane too). It seems like Walter want to keep the beta releases with some kind of secrecy, or only for people really interested on them (the zip files are even password protected! But the password is announced in a public mailing list, that doesn't make much sense =/). I think he should encourage people to try them as much as possible instead, but one step at the time, at least now people have a way to test the compiler before it's released.

I can say without fear that the experience has been very successful already, even when there is no DMD release yet that came from a beta pre-release, you can see in the beta mailing list that multiple regressions have been discovered and fixed because this new beta releases. I think the reliability of the compiler has been increased already. Is really interesting to see how the quality of a product increases proportionally to the level of openness and the numbers of eyes doing peer review.

The new DMD release should be published very soon, as all the regressions seems to be fixed now and big projects like Tango, GtkD and QTD compiles (a lot of focus on fixing bugs that prevented the later to compile has been put into this release, specially from Rainer Schuetze, who submitted a lot of patches).

So kudos for a new era in D, I think this is another big milestone for having a reliable compiler.

[*]I'm sure there was previos requests for having release candidates, I know I asked for it, but I can't find the threads in the archives =)

D and open development model

by Leandro Lucarella on 2009- 10- 15 17:09 (updated on 2009- 10- 15 17:09)
tagged compiler, d, development model, dmd, druntime, en, phobos, software - with 6 comment(s)

Warning

Long post ahead =)

I'm very glad that yesterday DMD had the first releases (DMD 1.050 and DMD 2.035) with a decent revision history. It took some time to Walter Bright to understand how the open source development model works, and I think he still has a lot more to learn, but I have some hope now about the future of D.

Not much time ago, neither Phobos, DMD nor Druntime had revision control. Druntime didn't even exist, making D 1 split in two because of the Phobos vs Tango dichotomy. DMD back-end sources were not available either, and Walter Bright was the only person writing stuff (sometimes not because people didn't want to, but because he was too anal retentive to let them ;). It was almost impossible to make patches back then (your only chance was hacking GDC, which is pretty hard).

Now I can say that DMD, Phobos and Druntime have full source availability (DMD back-end is not free/libre though), almost all the parts of DMD have the sources published under a source control system. The core team has been expanded and even when Walter Bright is still in charge, at least 3 developers are now very committed to D: Andrei Alexandrescu (in charge of Phobos), Sean Kelly (in charge of Druntime) and Don Clugston (squashing DMD bugs at full speed, specially in the back-end). Other people are contributing patches in a regular basis. There were about 72 patches submitted to bugzilla before DMD was distributed with full source (72 patches in ~10 years) , since then, 206 patches were submitted (that is, 206 patches in less than 8 months).

But even with this great improvement, there is much left to do yet (and I'm talking only about the development model). This is a small list of what I think it's necessary to keep moving to a more open development model:

Releases

The release process should be improved. Me and other people are suggesting release candidates. This will allow people to test the new releases to find any regressions. As things are now, releases are not much different from a nightly build, except that you don't have one available every night :). People get very frustrated when downloading a new version of the compiler and things stop working, and this holds back front-end updates in other compilers, like LDC (which is frozen at 1.045 because of the regressions found in the next 5 versions).

I think Walter Bright is suffering from premature releasing too. Releases comes from nowhere, when nobody expects them. Nobody knows when a new compiler version will be released. I think that hurts the language reliability.

I think the releases should be more predictable. A release schedule (even when not very accurate, like in many other open source projects) gives you some peace of mind.

Peer review

Even when commits are fairly small now in DMD, I think they are far from ideal. Is very common to see unrelated changes in a commit (the classic example is the compiler version number being bumped in an bug fix). See revision 214 for example: the compiler version is bumped and there are some changes to the new JSON output, totally unrelated to bug 3401, which is supposed to fix; or revision 213, which announces the release of DMD 1.050 and DMD 2.035, introducing a bunch of changes that who knows what are supposed to do (well, they look like the introduction of the new type T[new], but that's not even documented in the release changelog :S). This is bad for several reasons:

  • Reviewing a patch with unrelated changes is hard.
  • If you want to fold in a individual patch (let's say, LDC guys want to fold a bug fix), you have a lot of junk to take care of.
  • If you want to do some sort of bisection to find a regression, you still have to figure out which is the group of related changes that introduced the regression.

I'm sure there are more...

Commit messages lacks a good description of the problem and the solution. Most commit messages in DMD are "bugzilla N". You have to go to the bugzilla bug to know what's all about. For example, Don's patches usually comes with very good and juicy information about the bug causes and why the patch fixes it (see an example). That is a good commit message. You can learn a lot about the code by reading well commented patches, which can lead to more contributions in the future.

Commits in Phobos can be even worse. The commits with a message "bugzilla N" are usually the good ones. There are 56 commits that have "minor" as the commit message. Yes, just "minor". That's pretty useless, it's very hard to review a patch when you don't know what is supposed to do. Commit messages are the base of peer reviewing, and peer reviewing is the base for high quality code.

So I think that D developers should focus a lot more in commit message. I know it can sound silly at first, but I think I would be a huge gain with too little effort.

Besides this, commits should be mailed to a newsgroup or mailing list to easy peer review. Now it's a little hard to make comments about a commit, you have to post the comment in the D newsgroup or make the comment by personal e-mail to the author. The former is not that bad but it's not easy to include context and people reading the comment will probably have to open a browser and search for the commented commit. This clearly make peer reviewing more difficult when the ideal would be to encourage it. The private mail is simply wrong because other people can't see the comments.

Source control and versioning

This one is tightly related to the previous two topics. Using a good DVCS can make help a lot too. Subversion has a lot of problems with branching, which makes releases harder too (as having a branch for each release is very painful). Is bad for commit messages too, because there is no real difference in branches and directories, so know every commit is duplicated (both changes for DMD 1 and 2 are included). It's not easy to cherry-pick single commits either, and you can't fix you commits if you messed up, which leads to a lot of commits of the style "Woops! Fix the typo in the previous commit.".

I'm sure both the release process and peer reviewing can be greatly improved by using a better DVCS.

Easy branching can also lead to a more fast evolving and reliable language. Yes, both are possible with branches. Now there are 2 branches: stable (D1) and experimental (D2). D1 is almost frozen and people is seeing less and less interest on it as it goes old, and D2 is too unstable for real use. Having some intermediate can be really helpful. For example, it has been announced that the concurrency model proposed by Bartosz Milewski will be not part of D2 because there is not enough time to implement it, since D2 should be release fairly soon as Andrei Alexandrescu is writing a book that has a deadline and the language has to be finalized by the time the book is published.

So concurrency (as AST macros) are delayed to D3. D2 is more than 2 years old, so one should expect that D3 will be not available in less than 5 years from now (assuming D2 would take 2.5 years and D3 would take the same). This might be too much time.

I think the language should adopt a model closer to Python, where a minor language version (with backward compatible improvements) is release every 1 ~ 1.5 years. Last mayor version took about 8 years, but considering how many new features Python included in minor versions that's not a big issue. The last mayor version was mostly a clean up of old stuff/nasty stuff, not huge changes to the language.

Licensing

I think the DMD back-end should have a better license. Personal use is simply not enough for a reference implementation of a language that wants to hit mainstream. If you plan to do business with it, not being able to patch the compiler if you need to and distribute it is not an option.

This is for the sake of DMD only, because other compilers (like LDC and GDC) are fully free/libre.

Conclusion

Some of the things I mention are really hard to change, as they modify how people work and imply learning new tools. But other are fairly easy, and can be done progressively (like providing release candidates and improving commits and commit messages).

I hope Walter Bright & Co. keep walking the openness road =)

GC optimization for contiguous pointers to the same page

by Leandro Lucarella on 2009- 04- 01 20:41 (updated on 2009- 04- 01 20:41)
tagged d, dgc, en, gc, optimization, phobos - with 0 comment(s)

This optimization had a patch, written by Vladimir Panteleev, sitting on Bugzilla (issue #1923) for a little more than an year now. It was already included in both Tango (issue #982) and DMD 2.x but DMD 1.x was missing it.

Fortunately is now included in DMD 1.042, released yesterday.

This optimization is best seen when you do word splitting of a big text (as shown in the post that triggered the patch):

import std.file, std.string;
void main() {
    auto txt = cast(string) read("text.txt"); // 6.3 MiB of text
    auto words = txt.split();
}

Now in words we have an array of slices (a contiguous area in memory filled with pointers) about the same size of the original text, as explained by Vladimir.

The GC heap is divided in (4KiB) pages, each page contains cells of a fixed type called bins. There are bin sizes of 16 (B_16) to 4096 (B_PAGE), incrementing in steps of power of 2 (32, 64, etc.). See Understanding the current GC for more details.

For large contiguous objects (like txt in this case) multiple pages are needed, and that pages contains only one bin of size B_PAGEPLUS, indicating that this object is distributed among several pages.

Now, back with the words array, we have a range of about 3 millions interior pointers into the txt contiguous memory (stored in about 1600 pages of bins with size B_PAGEPLUS). So each time the GC needs to mark the heap, it has to follow this 3 millions pointers and find out where is the beginning of that block to see its mark-state (if it's marked or not). Finding the beginning of the block is not that slow, but when you multiply it by 3 millions, it could get a little noticeable. Specially when this is done several times as the dynamic array of words grow and the GC collection is triggered several times, so this is kind of exponential.

The optimization consist in remembering the last page visited if the bin size was B_PAGE or B_PAGEPLUS, so if the current pointer being followed points to the last visited (cached) page, we can skip this lookup (and all the marking indeed, as we know we already visited that page).

Testing druntime modifications

by Leandro Lucarella on 2008- 11- 30 01:51 (updated on 2008- 11- 30 01:51)
tagged d, dgc, druntime, en, howto, phobos - with 0 comment(s)

Now that we can compile druntime, we should be able to compile some programs that use our fresh, modified, druntime library.

Since DMD 2.021, druntime is built into phobos, so if we want to test some code we need to rebuild phobos too, to include our new druntime.

Since I'm particularly interested in the GC, let's say we want to use the GC stub implementation (instead of the basic default).

We can add a simple "init" message to see that something is actually happening. For example, open src/gc/stub/gc.d and add this import:

private import core.sys.posix.unistd: write;

Then, in the gc_init() function add this line:

write(1, "init\n".ptr, 5);

Now, we must tell druntime we want to use the stub GC implementation. Edit dmd-posix.mak, search for DIR_GC variable and change it from basic to stub:

DIR_GC=gc/stub

Great, now recompile druntime.

Finally, go to your DMD installation, and edit src/phobos/linux.mak. Search for the DRUNTIME variable and set it to the path to your newly generated libdruntime.a (look in the druntime lib directory). For me it's something like:

DRUNTIME=/home/luca/tesis/druntime/lib/libdruntime.a

Now recompile phobos. I have to do this, because my DMD compiler is named dmd2:

make -f linux.mak DMD=dmd2

Now you can compile some trivial D program (compile it in the src druntime directory so its dmd.conf is used to search for the libraries and imports) and see how "init" get printed when the program starts. For example:

druntime/src$ cat hello.d

import core.sys.posix.unistd: write;

void main()
{
    write(1, "hello!\n".ptr, 7);
}

druntime/src$ dmd2 -L-L/home/luca/tesis/dmd2/lib hello.d
druntime/src$ ./hello
init
hello!

Note that I passed my DMD compiler's lib path so it can properly find the newly created libphobos2.a.