Patching the Windows CRT

Stuart has been working on using an allocator for Mozilla which has much better performance characteristics, especially with memory fragmentation and heap growth over time. The allocator he chose is jemalloc, the default allocator for the FreeBSD libc. On Linux, intercepting and replacing malloc is fairly easy, because of the way dynamic symbol loading works. On Windows, however, it is difficult or impossible to intercept and redirect calls to malloc to a custom allocator. So instead of trying to hook to a prebuilt CRT, I spent most of today hacking the Windows C runtime (CRT), replacing the default allocator with jemalloc.

The Windows CRT sources come with VC8 professional edition so any licensed user may hack on them and redistribute the result as part of a larger program. It comes with a mostly-working nmakefile1: I had to disable the code which builds the managed-code CRT because apparently I don’t have the right .NET headers installed. Getting the new jemalloc.c file to build wasn’t that hard, either: it required a few #defines, typedefs, and disabled warnings, but nothing serious. The hardest part about replacing It was figuring out what parts of the original CRT were heap-related and removing them correctly. It was a wild ride, but I think that I have a build of the Windows CRT that works… at least small programs like xpidl and shlibsign work.

Unfortunately, according to the EULA2 I am not allowed to redistribute this modified CRT by itself. So the only way you can get it is by distributing it with a Firefox build. Also, I’m not allowed to post the patch queue which I used to develop the custom CRT, because those patches may contain copyrighted code in context. Do any of my readers know of a format that will alter a set of files according to a set of instructions without the instructions revealing the contents of the original files? I would really love it if Microsoft would release their C runtime code under a liberal open-source license… can someone suggest a good person to contact at Microsoft?

Stuart will have some builds posted soon, once a few kinks get ironed out.

For Mozilla2 we’re probably going to push the solution to an intermediary library: we will have a single allocator library which is used for both garbage-collected (managed) and explicitly allocated/freed (unmanaged) memory. We will switch back to the standard CRT, but we will try to avoid using the standard CRT allocator at all. See the “space management” thread on the tamarin-devel mailing list (December and January) for some background discussion.

  1. nmake is one of the suckiest build systems on the planet. I could get better results with a .vbs.
  2. From the EULA, section 3.1: …you agree: (i) except as otherwise noted in Section 2.1 (Sample Code), to distribute the Redistributables only in object code form and in conjunction with and as a part of a software application product developed by you that adds significant and primary functionality to the Redistributables…

Atom Feed for Comments 28 Responses to “Patching the Windows CRT”

  1. Matthew Gregan Says:

    You might be able to use xdelta to generate a set of patches that can be distributed without infringing the EULA. From memory, xdelta patches only contain copy and insert instructions. They might contain very small amounts of context for changes under the minimum match length, but the patch file is binary and the context is quite small, so it might still be safe.

  2. pd Says:

    Ben what is the conclusion to your story? Will Firefox on Windows need to be shipped with your modified CRT build? Or will Windows now miss out on all the improvements to Firefox performance thanks to a Microsoft legal hiccup?

  3. jmdesp Says:

    @pd : The conclusion is that during the Moz2 timeframe the code will be rewritten to never use the default allocator, and to be able to ship with the standard CRT, but use the optimized allocator.

    But meanwhile it would be good to have people test the result with this new allocator, and really evil to distribute only the pre-compile CRT.

    Also it would be good to find a real solution to that for other reason. They are a few fatal error cases where the CRT will directly call Dr Watson, without letting the program override to use the Breakpad error reporting instead.

  4. Ian M Says:

    So what if anything of this will make Firefox 3?

  5. pdr Says:

    “Do any of my readers know of a format that will alter a set of files according to a set of instructions without the instructions revealing the contents of the original files?”

    diff can still generate ed scripts. They just list the line numbers of the text that is to be replaced, so they’re extremely fragile but legally safe.

  6. movl Says:

    Just skip CRT and call ntdll.dll directly. malloc in CRT uses (iirc) NtVirtualAlloc and friends. 9x/ME isn’t supported anymore anyway.

  7. Paul Betts Says:

    Hi, I’m a developer in Windows and I came across your article – I emailed the maintainer of CRT about your issue – while he thinks your claims of “superior malloc” are dubious ;), he also mentions that if you link your malloc as a lib *before* the CRT (i.e. make sure to turn on ‘ignore default libs’ and explictly include the CRT), you’ll get what you want, and can redistribute this lib without problems.

    I also vaguely remember some game developers wanting to do this as well, you may want to search through Gamasutra to see if you can find an article on it.

  8. Benjamin Smedberg Says:

    movl: the point is to replace the default Windows allocator (heap manager) with something better.

    Paul: you can direct calls to malloc/realloc/free to use a different allocator by inserting that allocator library before the standard one in the link line. Unfortunately, this is not sufficient. The CRT internally still calls its own version of malloc so that if you have a pattern like the following you end up crashing due to mismatched allocators:

    char *p = strdup("foo"); // uses CRT allocator
    free(p); // uses intercepted free... mismatched
  9. Neil Says:

    Presumably the 1.9 codebase has too many mismatched allocators for you to simply switch the obvious ones e.g. NS_Alloc/NS_Free?

  10. Mark S Says:

    There was no mention of an allocator on OS X. Is that being replaced as well?
    Perhaps it’s a silly question and I don’t know it.

  11. Benjamin Smedberg Says:

    Neil, we use malloc/free for many of our “non-object” allocations… having multiple allocators running would probably cause more fragmentation.

    Mark S, Stuart did his allocator benchmarking on OSX, I think, so I’m pretty sure there will be OSX support… but I’m not especially involved with it.

    Ian M, the goal is to do this for Firefox 3. The Mozilla 2 work is another step, because it involves adding garbage-collection smarts.

  12. movl Says:

    “the point is to replace the default Windows allocator (heap manager) with something better”

    What do you mean with that? There is no “default” allocator under Windows. HeapAlloc calls RtlAllocateHeap in ntdll, VirtualAlloc calls NtAllocateVirtualMemory. (GlobalAlloc and LocalAlloc both use RtlAllocateHeap too.) MSVCRT uses HeapAlloc. You can make your own malloc that uses RtlAllocateHeap, but you shouldn’t try to replace or patch the Run-Time Library or IFS where the allocation is actually done (see ZwAllocateVirtualMemory).

    I don’t know much about visual studio, but maybe you can change the .def files so the entry points for malloc/free are looked up in your own dll rather then msvcrt*.dll? Removing malloc/free under EXPORTS and linking in your own library might be all you need to do.

  13. ivank Says:

    I think there should be a way to move CRT code dependency on malloc to linking stage. When able to link with jemalloc.

  14. Benjamin Smedberg Says:

    movl, RtlAllocateHeap is the Windows default allocator, and we are bypassing it. jemalloc manages the heap itself, allocating memory pages using VirtualAlloc.

  15. archer Says:

    Maybe it’s a weird idea, but could you just use another, open-source implementation of CRT?

  16. Benjamin Smedberg Says:

    archer, I don’t know of a free Windows CRT implementation that compiles with MSVC… do you?

  17. kad77 Says:

    It’s still a little sad that the most popular open source app on windows (firefox) won’t build with the only free, open source, native complete compiler system for windows ( http://mingw.org/ ). At least MAME puts gcc-win32 through its paces.

    I wish the mozilla foundation would swing a few bucks / hands over to the MSYS/mingw team, and get the mozilla codebase off the proprietary microsoft development environment. Is there a good summary of why this endeavor is not undertaken?

    Anyways, happy hacking. Go mozilla! Go mingw!

  18. archer Says:

    I’ve found several implementations, but all of them look immature.
    http://wcrt.sourceforge.net/
    http://synesis.com.au/software/cruntiny/
    http://mingwacr.sourceforge.net/
    It seems that they couldn’t help.

  19. ivank Says:

    Next crazy idea.
    Statically link xul.dll (or it could be separate module) with MS CRT.
    Another mozilla modules should import CRT symbols from xul.dll.

  20. Benjamin Smedberg Says:

    kad77, the mozilla codebase does compile with mingw, at least mostly and most of the time. But gcc produces code that is far inferior to MSVC, and it doesn’t have a compatible vtable layout, so we’ll continue to use MSVC for releases for the forseeable future.

  21. Ted Mielczarek Says:

    For anyone who’s still interested, you can follow the work here: https://bugzilla.mozilla.org/show_bug.cgi?id=407459

  22. jemalloc now on the trunk « pavlov.net Says:

    […] changes to jemalloc that we wanted and Ted for his days and days of crazy build stuff.  Thanks to Benjamin for his work getting the CRT building initially — we wouldn’t be here without it.  […]

  23. Todd Whiteman Says:

    I am curious as to how this will affect (if at all) Mozilla extensions that use binary components (and also plugins) that require additional libraries. Will these additional libraries need to be complied and link with the custom mozilla crt, or will the extension/plugin need to package it’s own additional msvcxx80 dll’s in order to satisfy the library requirements?

  24. Shivanand Sharma Says:

    building with jemalloc enabled requires Visual Studio 2005 with SP1.

    i had the build fail endlessly until I realised this.

  25. jemalloc « /usr/bin/blog Says:

    […] lalu nyasar kesini […]

  26. La billeterie » Blog Archive » jemalloc et CRT custom dans Mozilla Says:

    […] le billet originel sur le problème : http://benjamin.smedbergs.us/blog/2008-01-10/patching-the-windows-crt/ […]

  27. baczek Says:

    could you please post a short guide how to port jemalloc to a different app without breaking any licenses (interested only in windows)? I think there are several open source projects that could benefit greatly from jemalloc (e.g. games which don’t use their own allocators yet), but there are no instructions how to do this anywhere.

  28. Fighting against CRT heap and winning | What your mother never told you about graphics development Says:

    […] there does not seem to be a clean way to do it. In fact, it seems that the only known solution is to modify the CRT code. I work with statically linked CRT, so this would mean less distribution problems, but more […]

Leave a Reply