D and open development model

by Leandro Lucarella on 2009- 10- 15 20:09 (updated on 2009- 10- 15 20:09)
tagged compiler, d, development model, dmd, druntime, en, phobos, software - with 6 comment(s)

Warning

Long post ahead =)

I'm very glad that yesterday DMD had the first releases (DMD 1.050 and DMD 2.035) with a decent revision history. It took some time to Walter Bright to understand how the open source development model works, and I think he still has a lot more to learn, but I have some hope now about the future of D.

Not much time ago, neither Phobos, DMD nor Druntime had revision control. Druntime didn't even exist, making D 1 split in two because of the Phobos vs Tango dichotomy. DMD back-end sources were not available either, and Walter Bright was the only person writing stuff (sometimes not because people didn't want to, but because he was too anal retentive to let them ;). It was almost impossible to make patches back then (your only chance was hacking GDC, which is pretty hard).

Now I can say that DMD, Phobos and Druntime have full source availability (DMD back-end is not free/libre though), almost all the parts of DMD have the sources published under a source control system. The core team has been expanded and even when Walter Bright is still in charge, at least 3 developers are now very committed to D: Andrei Alexandrescu (in charge of Phobos), Sean Kelly (in charge of Druntime) and Don Clugston (squashing DMD bugs at full speed, specially in the back-end). Other people are contributing patches in a regular basis. There were about 72 patches submitted to bugzilla before DMD was distributed with full source (72 patches in ~10 years) , since then, 206 patches were submitted (that is, 206 patches in less than 8 months).

But even with this great improvement, there is much left to do yet (and I'm talking only about the development model). This is a small list of what I think it's necessary to keep moving to a more open development model:

Releases

The release process should be improved. Me and other people are suggesting release candidates. This will allow people to test the new releases to find any regressions. As things are now, releases are not much different from a nightly build, except that you don't have one available every night :). People get very frustrated when downloading a new version of the compiler and things stop working, and this holds back front-end updates in other compilers, like LDC (which is frozen at 1.045 because of the regressions found in the next 5 versions).

I think Walter Bright is suffering from premature releasing too. Releases comes from nowhere, when nobody expects them. Nobody knows when a new compiler version will be released. I think that hurts the language reliability.

I think the releases should be more predictable. A release schedule (even when not very accurate, like in many other open source projects) gives you some peace of mind.

Peer review

Even when commits are fairly small now in DMD, I think they are far from ideal. Is very common to see unrelated changes in a commit (the classic example is the compiler version number being bumped in an bug fix). See revision 214 for example: the compiler version is bumped and there are some changes to the new JSON output, totally unrelated to bug 3401, which is supposed to fix; or revision 213, which announces the release of DMD 1.050 and DMD 2.035, introducing a bunch of changes that who knows what are supposed to do (well, they look like the introduction of the new type T[new], but that's not even documented in the release changelog :S). This is bad for several reasons:

Reviewing a patch with unrelated changes is hard.
If you want to fold in a individual patch (let's say, LDC guys want to fold a bug fix), you have a lot of junk to take care of.
If you want to do some sort of bisection to find a regression, you still have to figure out which is the group of related changes that introduced the regression.

I'm sure there are more...

Commit messages lacks a good description of the problem and the solution. Most commit messages in DMD are "bugzilla N". You have to go to the bugzilla bug to know what's all about. For example, Don's patches usually comes with very good and juicy information about the bug causes and why the patch fixes it (see an example). That is a good commit message. You can learn a lot about the code by reading well commented patches, which can lead to more contributions in the future.

Commits in Phobos can be even worse. The commits with a message "bugzilla N" are usually the good ones. There are 56 commits that have "minor" as the commit message. Yes, just "minor". That's pretty useless, it's very hard to review a patch when you don't know what is supposed to do. Commit messages are the base of peer reviewing, and peer reviewing is the base for high quality code.

So I think that D developers should focus a lot more in commit message. I know it can sound silly at first, but I think I would be a huge gain with too little effort.

Besides this, commits should be mailed to a newsgroup or mailing list to easy peer review. Now it's a little hard to make comments about a commit, you have to post the comment in the D newsgroup or make the comment by personal e-mail to the author. The former is not that bad but it's not easy to include context and people reading the comment will probably have to open a browser and search for the commented commit. This clearly make peer reviewing more difficult when the ideal would be to encourage it. The private mail is simply wrong because other people can't see the comments.

Source control and versioning

This one is tightly related to the previous two topics. Using a good DVCS can make help a lot too. Subversion has a lot of problems with branching, which makes releases harder too (as having a branch for each release is very painful). Is bad for commit messages too, because there is no real difference in branches and directories, so know every commit is duplicated (both changes for DMD 1 and 2 are included). It's not easy to cherry-pick single commits either, and you can't fix you commits if you messed up, which leads to a lot of commits of the style "Woops! Fix the typo in the previous commit.".

I'm sure both the release process and peer reviewing can be greatly improved by using a better DVCS.

Easy branching can also lead to a more fast evolving and reliable language. Yes, both are possible with branches. Now there are 2 branches: stable (D1) and experimental (D2). D1 is almost frozen and people is seeing less and less interest on it as it goes old, and D2 is too unstable for real use. Having some intermediate can be really helpful. For example, it has been announced that the concurrency model proposed by Bartosz Milewski will be not part of D2 because there is not enough time to implement it, since D2 should be release fairly soon as Andrei Alexandrescu is writing a book that has a deadline and the language has to be finalized by the time the book is published.

So concurrency (as AST macros) are delayed to D3. D2 is more than 2 years old, so one should expect that D3 will be not available in less than 5 years from now (assuming D2 would take 2.5 years and D3 would take the same). This might be too much time.

I think the language should adopt a model closer to Python, where a minor language version (with backward compatible improvements) is release every 1 ~ 1.5 years. Last mayor version took about 8 years, but considering how many new features Python included in minor versions that's not a big issue. The last mayor version was mostly a clean up of old stuff/nasty stuff, not huge changes to the language.

Licensing

I think the DMD back-end should have a better license. Personal use is simply not enough for a reference implementation of a language that wants to hit mainstream. If you plan to do business with it, not being able to patch the compiler if you need to and distribute it is not an option.

This is for the sake of DMD only, because other compilers (like LDC and GDC) are fully free/libre.

Conclusion

Some of the things I mention are really hard to change, as they modify how people work and imply learning new tools. But other are fairly easy, and can be done progressively (like providing release candidates and improving commits and commit messages).

I hope Walter Bright & Co. keep walking the openness road =)

Comment #0

by Alexander Pánek on 2009-10-16 06:48

Very nice heads-up! :)

Comment #1

by digited on 2009-10-16 08:42

Nice review!

Comment #2

by tweak on 2009-10-16 14:30

Hello, Leandro, nice review. There's still a room for improvements for development process, though: like toolchain issues with old OPTLINK linker, for example. Some people won't even consider another compiler implementation mature enough until it can successfully compile most usable libraries, like Tango. I consider that list should be extended to even more projects, like QtD, GUI libraries, graphics engines, so on. QtD still has some linking issues with OPTLINK/DMD toolchain on windows; some other projects stumble upon different toolchain bugs, also. So we need some feedback process to pinpoint those toolchain bugs and to schedule them fixed, too -- to get all major D libraries compiled successfully with every toolchain available. LDC could have some issues with SEH exception handling on Windows, too. Import/module subsystem have some issues, too (which you can hit or don't in random cases, like in QtD, which has cirricular reference issue with modules ported from C++). Maybe, some GC implementations' benchmarks you provided on your webpage could be considered as issues of that kind, too :)

It would be wise to get more feedback from libraries' writers, or from compilers' writers, or random porting/benchmark guys like you to get compiler/toolchain/library inter-exchangable freely :) Team0xf has succeded their patches for DMD/xfLinker to be folded back in, so I suppose that's only beginning :)

I'm pretty much impressed with current progress, though. Thanks for you post.

Comment #3

by Leandro Lucarella on 2009-10-16 15:22

tweak on 2009-10-16 11:30:

Hello, Leandro, nice review. There's still a room for improvements for development process, though: like toolchain issues with old OPTLINK linker [...]

I don't really think the toolchain is a development process issue per se; neither the circular import problems, but I agree that they are big issues for having a good reliable language, like improving GDB support and other tools to support D mangling for example. But I think an true open development model is more important than that, because it's what attract people to contribute, encouraging them to fix all that problems. If that would have to be fixed by a single person, they would stay broken for a long time. For me, the openness is the inflection point that can make D really a mainstream language.

I'm sorry not to mention Windows specific problems, but since I don't use Windows I can't talk about it very accurately.

Comment #4

by kl on 2009-11-01 02:07

I've tried D few times, and every time it ends with frustration and fight with compiler and libraries, rather than getting work done.

I wholeheartedly agree, D needs reliability. It's just unacceptable that I download release from the official site, spend hours chasing weird bugs in hello-world code, just to be informed afterwards that it's a known bug.

Within course of a single day I've found bugs in dmd1, dmd2. I'm still trying to get LDC to compile and GDC to see include files. Arrrrgghh!

Comment #5

by Leandro Lucarella on 2009-11-01 20:46

Same experience here, every time I want to do something useful with D, I always end up reporting a couple of bugs. No matter how simple is what I want to do.

Luca's meaningless thoughts

D and open development model

Releases

Peer review

Source control and versioning

Licensing

Conclusion

Comment #0

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Your comment