What If?
Monday, November 5th, 2007Is C++ hurting Mozilla development? Is it practical to move the Mozilla codebase to another language? Automated rewriting of code can help some, but there are some basic language limitations that limit the scope of rewriting within C++. What if Mozilla invented a mostly-C++-compatible language that solved problems better than C++?
Why!?
Am I crazy? Well, maybe a little. But what are the disadvantages of C++?
- Poor memory safety;
- Lack of good support for UTF strings/iterators;
- difficulty integrating correctly with garbage-collection runtimes (MMgc), especially if we want to move towards an exact/moving garbage collector with strong type annotations;
- difficulty integrating exception handling with other runtimes (Tamarin);
- C++ lacks features Mozilla hackers want: see some of roc’s ideas;
- static analysis: It would be good to statically prevent certain code patterns (such as stack-allocated nsCOMPtr), or behave differently than C++ normally would (change the behavior of a stack-allocated nsCOMArray versus a heap-allocated one);
- We are hoping to use trace-based optimization to speed JavaScript. If the C++ frontend shares a common runtime with the JS frontend, these optimizations could occur across JS/C++ language boundaries. The Moz2 team brainstormed using a C++ frontend that would produce Tamarin bytecode, but Tamarin bytecode doesn’t really have the primitives for operating on binary objects.
On the other hand, C++ or something like it have advantages that we should preserve:
- C++ can be very low level and performant, if coded carefully;
- The vast majority of our codebase is written in C++;
Is it practical?
I don’t know. Some wild speculation is below. Please don’t take it as anything more than a brainstorm informed by a little bit of IRC conversation.
LLVM is a project implementing a low-level bytecode format with strong type annotations, and code generator, optimizer, and compiler (static and JIT) to operate on this bytecode. There is a tool which uses the GCC frontend to compile C/C++ code into LLVM bytecode. It is already possible to use the llvm-gcc4 frontend to compile and run mozilla; it compiles the Mozilla codebase to an intermediate LLVM form and from there into standard binary objects. We would not invent an entirely new language: rather we would take the GCC frontend and gradually integrate features as needed by Mozilla.
In addition, we could translate spidermonkey bytecode into LLVM bytecode for optimization and JITting.
Finally, optimization passes for trace-based optimizations could be added which operate on the level of LLVM bytecode.
Problems…
The most pressing problem from my perspective is that using the G++ frontend requires compiling Mozilla with a mingw-llvm toolchain on Windows. The gcc toolchain on Windows does not use vtables which are MS-COM compatible, which means that Mozilla code which uses or implements MS-COM interfaces will fail. In addition, we would be tied to the mingw win32api libraries, which are not the supported Microsoft SDKs and may not be always up to date, because they are clean-room reverse-engineered headers and libraries. This mainly affects the accessibility code which makes extensive use of MS COM and ATL.
- Is it feasible to teach gcc/mingw to use the MS SDK headers and import libraries?
- Is it feasible to teach the minw compiler about the MSVC ABI so that our code can continue to call and implement MS COM interfaces?
- Is there another solution so that the accessibility code continues to work?
Questions
- Am I absolutely crazy?
- As a Mozilla developer, what features do you think are most important for ease of Mozilla development?
- Are there solutions other than the g++ frontend that would allow our C++ codebase to evolve to a new language?
- Is this a silly exercise? Would we be spending way too much time on the language and GCC and LLVM and whatnot, and not enough on our codebase? Are there other ways to modernize our codebase gradually but effectively? Is LLVM or a custom language something we should consider for mozilla2, or keep in mind for later releases?