Call for Help: Boehm+jemalloc

Wednesday, September 24th, 2008

At the Firefox summit we decided to change tack on XPCOMGC and try to use Boehm instead of MMgc. Overall I think this was a really good decision. With help from Graydon, I even have some Linux builds that use Boehm under the hood (most memory is not considered collectable, only string buffers are collected objects at this point).

Unfortunately, while Boehm is a pretty good collector, it doesn’t do so well at memory allocation and fragmentation. Heap usage is between 1.5x and 2x that of standard Firefox using jemalloc. What I really want is a combination of jemalloc and Boehm, taking the best features from each:

Boehm Features:

  • Fast and threadsafe conservative collector
  • Smart rooting of all thread stacks and static data
  • Incremental marking with hardware write barriers1
  • Option for parallel collection2
  • Ability to intermingle collected and non-collected memory

jemalloc features:

  • Better overall memory usage, primarily due to lower fragmentation
  • Very tight and well-performing allocator

Help Wanted

I’m looking for somebody who’s willing to painstakingly combine the best of these two allocators: either port the jemalloc low-fragmentation design to Boehm, or port the Boehm collection mechanism to the jemalloc allocator. If you’re interested, please contact me. Getting a solution to this problem really blocks any serious plans for further work on XPCOMGC.

Notes

  1. The key word is hardware. The MMgc solution failed because altering our codebase to have correct programmatic write barriers was going to involve boiling the ocean. And even with smart pointers, a standard MMgc write barrier involves a lot of overhead.
  2. In Boehm, parallel collection doesn’t work with most incremental collection, and so we may not actually decide to use it; avoiding large pauses with incremental collection is more important.

Patching the Windows CRT

Thursday, January 10th, 2008

Stuart has been working on using an allocator for Mozilla which has much better performance characteristics, especially with memory fragmentation and heap growth over time. The allocator he chose is jemalloc, the default allocator for the FreeBSD libc. On Linux, intercepting and replacing malloc is fairly easy, because of the way dynamic symbol loading works. On Windows, however, it is difficult or impossible to intercept and redirect calls to malloc to a custom allocator. So instead of trying to hook to a prebuilt CRT, I spent most of today hacking the Windows C runtime (CRT), replacing the default allocator with jemalloc.

The Windows CRT sources come with VC8 professional edition so any licensed user may hack on them and redistribute the result as part of a larger program. It comes with a mostly-working nmakefile1: I had to disable the code which builds the managed-code CRT because apparently I don’t have the right .NET headers installed. Getting the new jemalloc.c file to build wasn’t that hard, either: it required a few #defines, typedefs, and disabled warnings, but nothing serious. The hardest part about replacing It was figuring out what parts of the original CRT were heap-related and removing them correctly. It was a wild ride, but I think that I have a build of the Windows CRT that works… at least small programs like xpidl and shlibsign work.

Unfortunately, according to the EULA2 I am not allowed to redistribute this modified CRT by itself. So the only way you can get it is by distributing it with a Firefox build. Also, I’m not allowed to post the patch queue which I used to develop the custom CRT, because those patches may contain copyrighted code in context. Do any of my readers know of a format that will alter a set of files according to a set of instructions without the instructions revealing the contents of the original files? I would really love it if Microsoft would release their C runtime code under a liberal open-source license… can someone suggest a good person to contact at Microsoft?

Stuart will have some builds posted soon, once a few kinks get ironed out.

For Mozilla2 we’re probably going to push the solution to an intermediary library: we will have a single allocator library which is used for both garbage-collected (managed) and explicitly allocated/freed (unmanaged) memory. We will switch back to the standard CRT, but we will try to avoid using the standard CRT allocator at all. See the “space management” thread on the tamarin-devel mailing list (December and January) for some background discussion.

  1. nmake is one of the suckiest build systems on the planet. I could get better results with a .vbs.
  2. From the EULA, section 3.1: …you agree: (i) except as otherwise noted in Section 2.1 (Sample Code), to distribute the Redistributables only in object code form and in conjunction with and as a part of a software application product developed by you that adds significant and primary functionality to the Redistributables…