What If?
Is C++ hurting Mozilla development? Is it practical to move the Mozilla codebase to another language? Automated rewriting of code can help some, but there are some basic language limitations that limit the scope of rewriting within C++. What if Mozilla invented a mostly-C++-compatible language that solved problems better than C++?
Why!?
Am I crazy? Well, maybe a little. But what are the disadvantages of C++?
- Poor memory safety;
- Lack of good support for UTF strings/iterators;
- difficulty integrating correctly with garbage-collection runtimes (MMgc), especially if we want to move towards an exact/moving garbage collector with strong type annotations;
- difficulty integrating exception handling with other runtimes (Tamarin);
- C++ lacks features Mozilla hackers want: see some of roc’s ideas;
- static analysis: It would be good to statically prevent certain code patterns (such as stack-allocated nsCOMPtr), or behave differently than C++ normally would (change the behavior of a stack-allocated nsCOMArray versus a heap-allocated one);
- We are hoping to use trace-based optimization to speed JavaScript. If the C++ frontend shares a common runtime with the JS frontend, these optimizations could occur across JS/C++ language boundaries. The Moz2 team brainstormed using a C++ frontend that would produce Tamarin bytecode, but Tamarin bytecode doesn’t really have the primitives for operating on binary objects.
On the other hand, C++ or something like it have advantages that we should preserve:
- C++ can be very low level and performant, if coded carefully;
- The vast majority of our codebase is written in C++;
Is it practical?
I don’t know. Some wild speculation is below. Please don’t take it as anything more than a brainstorm informed by a little bit of IRC conversation.
LLVM is a project implementing a low-level bytecode format with strong type annotations, and code generator, optimizer, and compiler (static and JIT) to operate on this bytecode. There is a tool which uses the GCC frontend to compile C/C++ code into LLVM bytecode. It is already possible to use the llvm-gcc4 frontend to compile and run mozilla; it compiles the Mozilla codebase to an intermediate LLVM form and from there into standard binary objects. We would not invent an entirely new language: rather we would take the GCC frontend and gradually integrate features as needed by Mozilla.
In addition, we could translate spidermonkey bytecode into LLVM bytecode for optimization and JITting.
Finally, optimization passes for trace-based optimizations could be added which operate on the level of LLVM bytecode.
Problems…
The most pressing problem from my perspective is that using the G++ frontend requires compiling Mozilla with a mingw-llvm toolchain on Windows. The gcc toolchain on Windows does not use vtables which are MS-COM compatible, which means that Mozilla code which uses or implements MS-COM interfaces will fail. In addition, we would be tied to the mingw win32api libraries, which are not the supported Microsoft SDKs and may not be always up to date, because they are clean-room reverse-engineered headers and libraries. This mainly affects the accessibility code which makes extensive use of MS COM and ATL.
- Is it feasible to teach gcc/mingw to use the MS SDK headers and import libraries?
- Is it feasible to teach the minw compiler about the MSVC ABI so that our code can continue to call and implement MS COM interfaces?
- Is there another solution so that the accessibility code continues to work?
Questions
- Am I absolutely crazy?
- As a Mozilla developer, what features do you think are most important for ease of Mozilla development?
- Are there solutions other than the g++ frontend that would allow our C++ codebase to evolve to a new language?
- Is this a silly exercise? Would we be spending way too much time on the language and GCC and LLVM and whatnot, and not enough on our codebase? Are there other ways to modernize our codebase gradually but effectively? Is LLVM or a custom language something we should consider for mozilla2, or keep in mind for later releases?
November 5th, 2007 at 9:20 pm
My answers: yep, see my list, nope, yep, yep, maybe, forget it.
Some other problems you didn’t mention — teaching developers our own pet language, adding the burden of downloading/learning new tools to the contributor bar, dealing with the fact that gcc produces much worse code than MSVC.
I admit it’s awfully tempting to go down the new-language path but we cannot go down it alone. I would rather have our requirements inform the design of some new language that stands a chance of broad support and adoption.
Let’s keep pushing the C++ envelope and think about what we would do if and when that opportunity comes.
November 5th, 2007 at 9:23 pm
BTW what I would do is not make the new language compatible with C++’s horrible syntax. Instead I’d want a tool to automatically translate from C++ to the new language, with support for customization so a project’s idiomatic code gets a natural translation.
November 5th, 2007 at 9:41 pm
Have you considered looking at the D programming language , ? A quick look over your list of problems with C++ yields the following answers for D:
not really sure on the definition of this one so I won’t comment,
all strings are UTF,
GC built-in but compacting might be tricky with the current version,
I’ve seen D’s exceptions integrated seamlessly with Python exceptions and vice-versa,
a few of Robert’s suggestions are there,
no static analysis from within the compiler unless you’re doing static checks on template arguments or want to wait for AST macros,
no idea :P
On the C++ advantages side, D has been shown in benchmarks to perform close to C and C++ (you can write code that’s nearly identical between all three if you’re careful.) Sadly, D does not have large portions of Mozilla written in it; a bug that is sadly not very high on Walter’s TODO list I’m afraid.
Other comments on your post: there is a D compiler that outputs LLVM bytecode in the works, and DMD has native support for Win32 COM. The current experimental 2.x branch also has support for linking to some C++ constructs (global functions and virtual member functions of classes with single inheritance) as well as anything with C linkage.
If nothing else, we would *really* appreciate your input as someone who is seeing the limits of the C++ language, and what things you think are important in a successor.
November 5th, 2007 at 9:42 pm
Ack! My links got eaten. A little note on what syntax is valid for comments would be good. :P
DigitalMars D: http://www.digitalmars.com/d/1.0/ (experimental branch at http://www.digitalmars.com/d/)
D for GCC: http://dgcc.sourceforge.net/
November 5th, 2007 at 10:10 pm
There’s a sub project at LLVM called clang, led by Apple to develop a standalone C/C++/ObjC front-end to LLVM by-passing any need for gcc. It is still in early stages (incomplete C++) but development is at full swing and can certainly benefit from additional support. See http://clang.llvm.org/
IMO, I think mozilla definitely needs a language upgrade for the future. The XPCOM C++ macro hackery really drains you. A language with syntax like Java/C#/D that understands XPCOM would go a long way in alleviating developer stress :p
November 6th, 2007 at 12:48 am
If the usual rules apply only 20% of the code needs to be super efficient. The rest needs to be easily maintained (no more pointer foo) and more easily accessible to outside developers. ES4 fits the bill.
Any tools, libraries etc. developed would also have a direct benefit for web development. One million loc * 20% = a hell of lot smaller nightmare.
I just bet that experience with the Tamarin engine will boost efficiency – possibly even to the point where C++ could be replaced completely. If the byte code needs additions then add them.
Switching to a language like D would be a poor move in the long run. If you really wanted to be semi-future proof I would opt for Scala. 80-core processors are already in the Intel fabs. The C++ threading model is totally unsuited to that sort of machine. It is a good bet that ES4 suffers from the same GIL problems as Python. Any browser that could properly utilize such a machine would smoke its competitors.
November 6th, 2007 at 12:50 am
I’m told the problem with most C++ code isn’t the language itself, but rather just how you use it.
I think ROC hit it on the head… having a custom pet language decreases maintainability and access to good tools, rather than increasing it. I think that is the opposite from where you want to go.
Could you just ultra-modernize the style of C++ used, moving further away from C and taking more advantage of templates, more references instead of pointers, and take advantage of the Boost library? I like the idea of leveraging a well regarded and understood library like Boost, because that way you are working with a wide spectrum of others to fill in the missing language features with a library that is broadly understood by many developers. I think it would be nice to remove as many custom type aliases as you can, leverage stuff like C99 and templates more, and generally work to make the code look as much like other generic C/C++ code as you can, thus increasing the approachability.
I do like the idea of leveraging the good work in LLVM though! How hard would it be to adapt to writing modern C++ that could be compiled similarly to Managed C++, targeting LLVM or even something like the Java runtime, Parrot, Mono, CLR, etc? Maybe try to write C++ that is closer to Java/C#/Python/whatever, than it is to C?
Once upon a time when Netscape first open sourced the code I looked into helping a project port it to Java – that project quickly died once we saw the state of the code and digested what exactly would be involved with such an undertaking. Though it sounds like with this automated rewriting effort, it may not have to be that way?
November 6th, 2007 at 2:54 am
I’ve looked at D before, and I’ve been very impressed. I wonder why Peter Wilson is down on it; I don’t see anything specific to D that he points to as a poor move… D could really benefit from having major backers like Mozilla, and Mozilla could benefit from D as well (NS_PRECONDITION -> actual preconditions).
As for how crazy you are… well, welcome to the club.
Right now is the perfect time to raise the discussion. (Actually, six months ago would have been better, but…) There’s no reason a separate NEW_LANGUAGE_BRANCH of the Mozilla 2 repository couldn’t be created from the base and experimented with for a while.
November 6th, 2007 at 3:24 am
Alex: there’s a very good reason: wasted effort.
Peter: you cannot identify 20% of the code as the performance-critical stuff and convert the rest to JS. For one thing, which code is important for performance varies by Web page. For another thing, separating performance-critical from not-performance-critical code into separate modules is impossible. For a third thing, the cost of crossing between languages would kill performance and bloat the code.
November 6th, 2007 at 3:51 am
FWIW, I was also thinking of D when I first read this post. You could probably try to enlist Walter Bright as this would be a mutually beneficial move, and move a language that’s already got some mindshare and is set to gain much more in the direction you want it.
November 6th, 2007 at 4:08 am
Whatever direction and whichever language you choose (and I think moving away from C++ is a step in the right direction), it should compile to some kind of intermediate byte langauge, like for example LLVM. That enables garbage collection and language independency and can in the long run make replacing individual modules with new ones written in a completely different language not only possible, but probably even easy.
As to separating the codebase into 80% non-efficient-easy-to-maintain and 20% highly-efficient; I don’t know the Mozilla code base well enough to assert how feasible or possible this is. But I think there must be some pretty obvious modules that could easilly be written in another language without hurting performance and without effecting rendering time of a web page at all. Everything in “about:config”, all chrome menus, options, bookmarking code, etc. I’m not saying that you could ever reach anywhere near 80% non-C++ code, but moving to an intermediate byte-compiling framework enables this transition; a transition I think should be made softly, gently and on a module-to-module basis.
It’ll be very interesting to see the result of this fascinating discussion.
November 6th, 2007 at 4:39 am
The Mozilla is already split into two (primary) languages. Let us not introduce Yet Another Language.
The Javascript part will be addressed by Tamarin, and the strong advantage is the easy cross-over between mozilla front-end, extensions and web-applications (as shown by Prism).
The C++ part can be gradually improved (the code-rewriting effort to introduce garbage collection, exceptions, etc), but also minimized (moving more code to Javascript) focussing the C++ part on the really performance critical aspects (parsing, scanners, image handling, network/cache, etc).
November 6th, 2007 at 4:39 am
Peter: “It is a good bet that ES4 suffers from the same GIL problems as Python” — are you willing to bet? I’ll save you from losing money: read Jason Orendorff’s great write-up of my JS_THREADSAFE code in SpiderMonkey:
http://developer.mozilla.org/en/docs/index.php?title=SpiderMonkey_Internals:_Thread_Safety
No GIL on this monkey!
/be
November 6th, 2007 at 6:20 am
Inventing, maintaining and promoting a new language sounds like a tremendous time sink. I agree with Peter: if we’re creating a new language (ES4) anyway, why not use it? Migrating code incrementally to ES4 would enable measurement of the performance impact of each change in various test scenarios. It would avoid the risk inherent in a “boil the ocean” approach of the type you suggest. And it would allow us to feast on Brendan’s tasty dogfood, which will have a huge impact in jumpstarting the ES4 ecosystem. Some portion of the codebase will remain in C++ for the forseeable future, but in the really long term continued improvement in CPU performance and JIT compilation techniques make it plausible that only a small body of rarely touched code (if any) could not be migrated.
If I’m not mistaken, Benjamin, you’ve proposed something along these lines yourself in the past.
November 6th, 2007 at 12:59 pm
Another approach to be considered is something like vala (http://live.gnome.org/Vala),
but perhaps javascript (es4) based, not c#. Also, using xpcom instead of GObject (at least at first).
IMO a compilable, runtimeless subset of es4 would be nice. Mostly it should handle refcounting/interfaces and strings.
Integration with existing C/C++ code is a must, because a wholesale rewrite will not work (I’m slightly worried about current attempts to introduce garbage collection and exceptions).
November 6th, 2007 at 5:19 pm
Writing a new compiler has too many drawbacks to be feasible. I think many of the problems in the Mozilla codebase stem from poor/obsolete design decisions and code style. While some of these can be fixed in the source, some fixes can reduce code readability, so it would be interesting to see how feasible build-time code rewriting could solve some of these problems.
Poor memory safety is a problem, but proper abstractions and interface design should alleviate most of the typical security problems as well as the usual array index out of bounds problems. There are also compiler flags (at least for vc8) that can check some of these accesses.
Some UTF string support can be added through a better string class and some compilers have language extensions to help. Strings need an overhaul for Mozilla2 anyways…
C++ is one of the few mainstream languages left that gives nearly absolute freedom to memory allocation and management. Integrating two different memory management systems is going to be painful no matter what language is being used. Designing a new language for the purpose of integrating with MMgc is overkill.
Could cross-language exception handling be tackled by with automated code rewriting to explicitly marshal exceptions? I think boost::python does something similar to this.
Many of the features that roc has proposed do require a new language to work. This means a new compiler, but that compiler does not necessarily have to target LLVM or Tamarin. Rather, it could generate C++ code, but code that is provably safe (or least containing bounds checks and compiler hints). This has the advantages of maintaining compatibility with MS-COM, ATL and the Platform SDK.
As long as code could be replaced on a per-module, per-file or (even better) per-class basis, this seems like a gentle transition.
Some of the static analysis (divergent behavior for stack/heap allocated objects of the same class) could be handled by the aforementioned build-time rewriting.
So, in short:
Some problems can be fixed by build-time code rewriting, others will require a new language, but a new backend is not desirable.
November 6th, 2007 at 10:44 pm
On the idea of using a “clean” subset of C++ for the codebase, you might be interested to read “A Rationale for Semantically Enhanced Library Languages”, by Bjarne Stroustrup ( http://www.research.att.com/~bs/SELLrationale.pdf ). He talks about some of the practicalities mentioned here and suggests a solution that doesn’t require all new code/tools/etc.
November 7th, 2007 at 1:32 pm
Better to put effort into the ES4 compiler/runtime and slowly migrate more of the code base from C++ to javascript.
November 8th, 2007 at 9:44 pm
That SELL concept is quite interesting. I can see it being applied to the Gecko codebase quite easily, especially with the ongoing work with Elsa/Oink. It would require no change to existing compilers. Just modify the syntax analyzer to output errors when it sees forbidden syntax. If there are no errors, then feed the source into a standard C++ compiler.
November 10th, 2007 at 10:36 pm
Mozilla has already reinvented the wheel on so many things. Can’t you at least *look* at other languages that already exist before you start considering making a new one?
Switching to a different language isn’t crazy. If all the currently existing languages suck too much, thinking about making a new one isn’t crazy. But to just immediately start thinking about making a whole new language just for Mozilla, without first looking at some of the tons of languages that are already working and usable, that is crazy.