Archive for the 'Mozilla' Category

Patching the Windows CRT

Thursday, January 10th, 2008

Stuart has been working on using an allocator for Mozilla which has much better performance characteristics, especially with memory fragmentation and heap growth over time. The allocator he chose is jemalloc, the default allocator for the FreeBSD libc. On Linux, intercepting and replacing malloc is fairly easy, because of the way dynamic symbol loading works. On Windows, however, it is difficult or impossible to intercept and redirect calls to malloc to a custom allocator. So instead of trying to hook to a prebuilt CRT, I spent most of today hacking the Windows C runtime (CRT), replacing the default allocator with jemalloc.

The Windows CRT sources come with VC8 professional edition so any licensed user may hack on them and redistribute the result as part of a larger program. It comes with a mostly-working nmakefile1: I had to disable the code which builds the managed-code CRT because apparently I don’t have the right .NET headers installed. Getting the new jemalloc.c file to build wasn’t that hard, either: it required a few #defines, typedefs, and disabled warnings, but nothing serious. The hardest part about replacing It was figuring out what parts of the original CRT were heap-related and removing them correctly. It was a wild ride, but I think that I have a build of the Windows CRT that works… at least small programs like xpidl and shlibsign work.

Unfortunately, according to the EULA2 I am not allowed to redistribute this modified CRT by itself. So the only way you can get it is by distributing it with a Firefox build. Also, I’m not allowed to post the patch queue which I used to develop the custom CRT, because those patches may contain copyrighted code in context. Do any of my readers know of a format that will alter a set of files according to a set of instructions without the instructions revealing the contents of the original files? I would really love it if Microsoft would release their C runtime code under a liberal open-source license… can someone suggest a good person to contact at Microsoft?

Stuart will have some builds posted soon, once a few kinks get ironed out.

For Mozilla2 we’re probably going to push the solution to an intermediary library: we will have a single allocator library which is used for both garbage-collected (managed) and explicitly allocated/freed (unmanaged) memory. We will switch back to the standard CRT, but we will try to avoid using the standard CRT allocator at all. See the “space management” thread on the tamarin-devel mailing list (December and January) for some background discussion.

  1. nmake is one of the suckiest build systems on the planet. I could get better results with a .vbs.
  2. From the EULA, section 3.1: …you agree: (i) except as otherwise noted in Section 2.1 (Sample Code), to distribute the Redistributables only in object code form and in conjunction with and as a part of a software application product developed by you that adds significant and primary functionality to the Redistributables…

Firefox 1.0.7!?

Saturday, December 8th, 2007

I used my father-in-law’s laptop recently. I had installed Firefox on it for him when he bought it, and I was happy to see it was still the default browser. I thought it was a little odd that it opened new windows by default instead of new tabs, and then I had a terrible suspicion and checked “Help -> About Firefox” and discovered to some dismay that he was still running Firefox 1.0.7.

Needless to say I installed an up-to-date version immediately. I wonder how many other people might have no clue that their applications are incredibly out of date. I also wonder why his security software (virus scanner/firewall) wouldn’t warn him about such an important aspect of system security.

Merging with Mercurial Queues

Friday, November 30th, 2007

I’ve spent most of the last 2.5 days merging code that lives in mercurial patch queues. It’s been an interesting (and frustrating) experience, but I’ve learned a lot, so I put together this tutorial on maintaining/merging/updating a mercurial patch queue.

The XPCOMGC patch queue

The XPCOMGC patch queue was actually 60+ patches long, but for example purposes let’s simplify to 7. What makes it tricky is that patches “after” the automated rewrites often won’t apply cleanly to the tree before the automatic rewrite. This means that moving patches from “after an automatic rewrite” to “before an automatic rewrite” requires a merge algorithm. Imagine a patch series like this:

manual-fixbugA
manual-fixbugB
automatic-rewrite-comptrs
automatic-rewrite-classhierarchy
automatic-rewrite-addrefs
manual-fixbugA-patch2
manual-fixbugC

You may be wondering about “manual-fixbugA-patch2″… why didn’t I just go back to manual-fixbugA and change it? This is because, once you’ve applied an automatic rewrite, popping/editing/pushing requires not only a complete tree rebuild (20 minutes), but also rebuilding the automated patches (hours). Instead, I create a “patch2” (and sometimes a “patch3” and “patch4”) and later fold the patches together.

Reordering a patch queue

I want to clean up this patch queue and update the “base tree” on which I’m applying patches. To do this, I’m first going to move all the “manual” patches before the automatic rewrites. Some of the patches may not apply cleanly in their new positions, so we prepare for merging. 64

$ hg qpush -a;          # Push all patches
$ hg qsave -e -c;       # Save the patch queue state... this allows for merging later
$ hg update -C qparent; # Move the working directory back to a completely unpatched state

Now we edit the .hg/patches/series file, removing the automatic rewrites. Instead of attempting to merge the automatic rewrites, we will simply regenerate them later.

manual-fixbugA
manual-fixbugB
manual-fixbugA-patch2
manual-fixbugC

Now we push the patches back on: if a patch doesn’t apply cleanly, use a three-way merge algorithm based on the saved patch queue state:

hg qpush -m; # applies manual-fixbugA
hg qpush -m; # applies manual-fixbugB
hg qpush -m; # applies manual-fixbugA-patch2 with interactive merging
hg qpush -m; # applies manual-fixbugC with interactive merging

Now, we want to merge the changes from “manual-fixbugA-patch2” into “manual-fixbugA”

hg qgoto manual-fixbugA;
hg qfold manual-fixbugA-patch2;

Now we have a clean patch queue which is ready for updating to a new “base”:

hg qpush -a;    # Push all patches in preparation for our second merge
hg qsave -e -c; # Save the patch queue state (again)
hg pull http://hg.mozilla.org/actionmonkey; # Pull new changesets from actionmonkey
hg up tip;      # Update the working directory to the unpatched actionmonkey tip
hg qpush -m;    # merge the patches one by one, with merging if necessary

In the case of XPCOMGC, “new-base” merges are difficult primarily because of the cycle collector. Cycle-collection is still be actively developed on trunk, with major changes to the xpconnect mark/trace algorithms as well as changes to timing. One of the earliest patches in the XPCOMGC queue removes cycle-collector entirely (when we have a garbage collector for XPCOM objects a cycle collector is no longer needed). Merge conflicts are common not only in the patch which removes the cycle collector, but in subsequent patches which touch xpconnect.

I’ve learned that merging goes much better if every patch in the patch queue produces a building tree. To test whether a merge was performed correctly (or discover which patch was merged incorrectly), simply build the tree at each patch state.

Conclusion

Mercurial patch queues make it possible to merge-update a series of patches against a moving base, which is very cool. But they don’t remove the actual hard work of merging changes that conflict.

Perils in Rewriting

Thursday, November 8th, 2007

Automatically rewriting code is perilous business. Can you spot the bug in the following rewrite to remove stack nsCOMPtrs?

@@ -432,8 +432,8 @@ LoadAppDirIntoArray(nsIFile* aXULAppDir, if (!aXULAppDir) return; 
- nsCOMPtr<nsIFile> subdir; 
- aXULAppDir->Clone(getter_AddRefs(subdir)); 
+ nsIFile* subdir; 
+ aXULAppDir->Clone(&subdir);
  if (!subdir) return;

The patch produced by garburator matches the spec I originally wrote for Taras, bug and all… the bug is that nsCOMPtr automatically initializes itself to null, while a raw pointer doesn’t.

What makes automatic rewriting hard is that the specification of the rewrite is usually incomplete. I could say to Taras “get rid of nsCOMPtrs on the stack” or even provide sample patches, but reality is much more complicated. For the automatic rewriting of stack nsCOMPtrs, this was an iterative process. Taras would run the tool (named garburator), apply the patch, and test-build. When something didn’t work right, he’d analyze the problem (often with help from the IRC channel) and then refine the rewriting tool. Sometimes refining the tool wasn’t worth it, and he’d simply fix the code by hand or hide it from the rewriter.

Quick quiz: how do you deal with functions that take a nsCOMPtr<nsIFoo>&?

Answer: It depends… is this a hashtable enumerator callback function? If so, don’t rewrite the signature at all. If not, the correct rewriting is probably nsIFoo**… but then if the code passes a heap variable to the function, you have to wrap it in getter_AddRefs to get the correct write-barrier semantics.

Automatic rewriting is not yet a complete solution: in particular, it doesn’t attempt to rewriting templated functions, and it will often give up on items which are passed to templated functions. And of course, we can really only run the rewriting tools on Linux, which means that platform-specific mac and windows code will need to be rewritten manually. If you want horror stories, ask Taras about nsMaybeWeakPtr, the helper-class from hell! In any case, I now have an XPCOMGC tree which runs through component registration before crashing… it feels like quite an accomplishment, even though it hasn’t gotten past the first GC yet.

What If?

Monday, November 5th, 2007

Is C++ hurting Mozilla development? Is it practical to move the Mozilla codebase to another language? Automated rewriting of code can help some, but there are some basic language limitations that limit the scope of rewriting within C++. What if Mozilla invented a mostly-C++-compatible language that solved problems better than C++?

Why!?

Am I crazy? Well, maybe a little. But what are the disadvantages of C++?

  • Poor memory safety;
  • Lack of good support for UTF strings/iterators;
  • difficulty integrating correctly with garbage-collection runtimes (MMgc), especially if we want to move towards an exact/moving garbage collector with strong type annotations;
  • difficulty integrating exception handling with other runtimes (Tamarin);
  • C++ lacks features Mozilla hackers want: see some of roc’s ideas;
  • static analysis: It would be good to statically prevent certain code patterns (such as stack-allocated nsCOMPtr), or behave differently than C++ normally would (change the behavior of a stack-allocated nsCOMArray versus a heap-allocated one);
  • We are hoping to use trace-based optimization to speed JavaScript. If the C++ frontend shares a common runtime with the JS frontend, these optimizations could occur across JS/C++ language boundaries. The Moz2 team brainstormed using a C++ frontend that would produce Tamarin bytecode, but Tamarin bytecode doesn’t really have the primitives for operating on binary objects.

On the other hand, C++ or something like it have advantages that we should preserve:

  • C++ can be very low level and performant, if coded carefully;
  • The vast majority of our codebase is written in C++;

Is it practical?

I don’t know. Some wild speculation is below. Please don’t take it as anything more than a brainstorm informed by a little bit of IRC conversation.

LLVM is a project implementing a low-level bytecode format with strong type annotations, and code generator, optimizer, and compiler (static and JIT) to operate on this bytecode. There is a tool which uses the GCC frontend to compile C/C++ code into LLVM bytecode. It is already possible to use the llvm-gcc4 frontend to compile and run mozilla; it compiles the Mozilla codebase to an intermediate LLVM form and from there into standard binary objects. We would not invent an entirely new language: rather we would take the GCC frontend and gradually integrate features as needed by Mozilla.

In addition, we could translate spidermonkey bytecode into LLVM bytecode for optimization and JITting.

Finally, optimization passes for trace-based optimizations could be added which operate on the level of LLVM bytecode.

Problems…

The most pressing problem from my perspective is that using the G++ frontend requires compiling Mozilla with a mingw-llvm toolchain on Windows. The gcc toolchain on Windows does not use vtables which are MS-COM compatible, which means that Mozilla code which uses or implements MS-COM interfaces will fail. In addition, we would be tied to the mingw win32api libraries, which are not the supported Microsoft SDKs and may not be always up to date, because they are clean-room reverse-engineered headers and libraries. This mainly affects the accessibility code which makes extensive use of MS COM and ATL.

  • Is it feasible to teach gcc/mingw to use the MS SDK headers and import libraries?
  • Is it feasible to teach the minw compiler about the MSVC ABI so that our code can continue to call and implement MS COM interfaces?
  • Is there another solution so that the accessibility code continues to work?

Questions

  • Am I absolutely crazy?
  • As a Mozilla developer, what features do you think are most important for ease of Mozilla development?
  • Are there solutions other than the g++ frontend that would allow our C++ codebase to evolve to a new language?
  • Is this a silly exercise? Would we be spending way too much time on the language and GCC and LLVM and whatnot, and not enough on our codebase? Are there other ways to modernize our codebase gradually but effectively? Is LLVM or a custom language something we should consider for mozilla2, or keep in mind for later releases?

Don’t dig deeper than you have to…

Monday, October 22nd, 2007

I replaced Windows XP with FC7 on my fast machine last week. Compiling and linking is much faster than on Windows, and I’ve managed to get 32-bit builds out of a 64-bit OS. But every once in a while I have problems with processes randomly hanging on startup at 100% CPU. I’m guessing about 1 out of every 1000 processes. It was all kinds of processes: /bin/sh, /usr/bin/perl, nsinstall for mozilla. There was nothing going on in strace, so I attached a hanging process in gdb:

#0  0x0000003ee327d323 in init_cacheinfo () from /lib64/libc.so.6
#1  0x0000003ee321d8e6 in __libc_global_ctors () from /lib64/libc.so.6
#2  0x0000003ee2e0d11b in call_init () from /lib64/ld-linux-x86-64.so.2
#3  0x0000003ee2e0d225 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#4  0x0000003ee2e00a9a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2

Stepping through the assembly, it’s repeating this loop infinitely:

0x0000003ee327d31d <init_cacheinfo+141>:        mov    %esi,%eax
0x0000003ee327d31f <init_cacheinfo+143>:        mov    %edi,%ecx
0x0000003ee327d321 <init_cacheinfo+145>:        cpuid  
0x0000003ee327d323 <init_cacheinfo+147>:        mov    %eax,%r8d
0x0000003ee327d326 <init_cacheinfo+150>:        shr    $0x5,%eax
0x0000003ee327d329 <init_cacheinfo+153>:        add    $0x1,%edi
0x0000003ee327d32c <init_cacheinfo+156>:        and    $0x7,%eax
0x0000003ee327d32f <init_cacheinfo+159>:        cmp    %r10d,%eax
0x0000003ee327d332 <init_cacheinfo+162>:        jne    0x3ee327d31d <init_cacheinfo+141>

I eventually tracked down an interesting comment/patch here. Now all I need to do is figure out how to get this fix onto my machine with the least amount of pain and suffering.

If you are a glass-half-full person, this is why open-source is cool: you have the ability to see all the relevant code, find the problem, and fix it (or at least find the person who already fixed it). If you are a glass-half-empty, you’d be complaining that FC7 just sucks and why haven’t they deployed this fix already. And you’d also complain that it’s really hard to figure out how to apply a patch to a system package and rebuild it. I’m just happy it was happening to me and not my wife. And I’m still a little confused why it only happens occasionally.

Compiling 32-bit Firefox on an x86-64 OS

Wednesday, October 17th, 2007

How hard is it to compile a 32-bit Firefox on an x86-64 Linux OS (preferably FC7, because that’s what I just installed). A quick google search failed me… I’d love directions, hints, or whatever you can provide. Is the correct way to do this to install all the necessary build dependencies in a chroot jail? And if so, how would I go about doing that?

Using Dehydra to Detect Problematic Code

Tuesday, October 16th, 2007

In XPCOMGC, the behavior of nsCOMPtr is very different than currently:

  • nsCOMPtr should only be used as a class member, never on the stack. Taras is working on a rewriting script that will replace nsCOMPtr<nsIFoo> on the stack with a raw nsIFoo* (more on that later).
  • the purpose of nsCOMPtr is not to ensure correct reference counting (there is no reference-counting!); instead it serves to enforce write barriers, so that MMgc can properly perform incremental GC.

I was able to rewrite nsCOMPtr so that existing code code mostly use the existing API: there is however one major difference: getter_AddRefs cannot return a pointer directly to the nsCOMPtr instance. Instead, it must save the value in a local variable and call nsCOMPtr.set() to preserve the write-barrier semantics. It does this using a temporary class:

/**
 * nsGetterAddRefs is used for XPCOM out parameters that need to be assigned
 * to nsCOMPtr members. We can't pass the address of nsCOMPtr.mRawPtr directly
 * because of the need to set the write barrier.
 */
template <class T>
class nsGetterAddRefs
{
public:
  explicit
  nsGetterAddRefs(nsCOMPtr<T> &aSmartPtr) :
    mTempPtr(aSmartPtr),
    mTargetSmartPtr(aSmartPtr)
  {
    // nothing else to do
  }

  ~nsGetterAddRefs()
  {
    mTargetSmartPtr = mTempPtr;
  }

  operator T**()
  {
    return &mTempPtr;
  }

private:
  T* mTempPtr;
  nsCOMPtr<T> &mTargetSmartPtr;
};

template <class T>
inline
nsGetterAddRefs<T>
getter_AddRefs(nsCOMPtr<T> &aSmartPtr)
{
  return nsGetterAddRefs<T>(aSmartPtr);
}

For the vast majority of cases where code makes a simple getter call, this works fine:

nsresult rv = something->GetAFoo(getter_AddRefs(mFoo));

However, if you test or use the returned value after you get it in the same statement, the value won’t be assigned yet:

Bad:

if (NS_SUCCEEDED(something->GetAFoo(getter_AddRefs(mFoo))) && mFoo)

Also bad:

NS_SUCCEEDED(something->GetAFoo(getter_AddRefs(mFoo))) && mFoo->DoSomething();

In the XPCOMGC world, both of these cases will fail because the ~nsGetterAddRefs destructor runs after the dereference of mFoo. Once we remove stack comptrs, this is not a common occurrence, but it does happen occasionally.

Checking for this kind of pattern is the perfect job for dehydra: check it out.

Sharing Mercurial Queues to Develop XPCOMGC

Tuesday, October 16th, 2007

One of the development techniques we’re experimenting with for Mozilla2 is the use of Mercurial Queues (MQ). Queues are a way of maintaining a set of patches on top of a base repository.

There are a few features of MQ that have made it very useful:

  • You can merge a stack of patches with changes in the underlying repository, instead of ending up with patches that don’t apply and .rej files. The magic you are looking for is hg qsave -e -c; hg qpush -m.
  • Patches can be versioned: you can create a mercurial repository of patches, commit and push that repository to share patches between developers.
  • Patches can be reordered, folded together, and with some special tricks even split apart into multiple patches.

MQ is ideal for doing large coding experiments such as XPCOMGC: it can keep track of manual rewriting and automatic patches separately; patches can be saved for review and merging upstream without delaying coding work; and the patch repository can be shared between the team doing the work.

The XPCOMGC patch repository is available publicly at http://hg.mozilla.org/users/bsmedberg_mozilla.com/xpcomgc-patches/. The patches are against the ActionMonkey tree… please don’t expect much more than “it builds” yet… the browser won’t start yet, due to various unrooted objects, bad assumptions, and other undiagnosed problems.

Caution: To use MQ effectively, you must set the following settings in your ~/.hgrc:

[extensions]
hgext.mq =
[diff]
git = 1

You must turn on git-style patches, or bad things will happen if you try to add/remove/move files!

Do I Want a Mac Pro?

Thursday, October 11th, 2007

I need to reconfigure the computers on my desk somehow, and I don’t really know that much about hardware. I think I want to ask Mozilla for a beefy Mac Pro, but I wanted to get opinions from my knowledgable readers first.

  • Currently I have three computers on my desk (see the chart below). I don’t think I need three any more, now that I know and love VMWare. I’d really prefer to have a single desktop and a laptop. I might keep and reimage the Linux machine for a stable performance-testing machine.
  • I’d like to be able to do builds on all three major platforms (Windows, Linux, MacOS) on both machines… this pretty much means that I need a Mac host machine, right?
  • Some of the mozilla2 automated-rewrite work is very processor-intensive, and it would be really nice to have more than two cores available for it.
  • I’m considering returning the Dell machine to Mozilla and asking for a Mac Pro with 4-8 cores and a fair amount of memory. But I’m concerned that I might be limited by the standard 7200RPM disks. Is that a valid concern?
  • For some reason, compiling on the Windows box is really slow. I’ve done all the ordinary stuff like turn off virus scanners, but it still seems unreasonably slow to compile for such a fast machine. I suspect there’s some I/O problem, but I can’t figure it out. It’s faster to compile in a VM on that machine than on the host OS. Also, is Windows capable of using the full 4G for standard applications? While I’m figuring out what to do, should I at least install Linux as the host OS on that machine?
  • What’s the best way to add a super-fast disk to the Macbook Pro? Preferably something portable enough that I can take it with me while traveling.

Random related question: are VMWare shared folders a lot slower on a Windows guest than a Linux guest? I’ve been using shared folders in VMWare Fusion to share a single mozilla source directory between multiple VMs, and this works really well with Linux guests and very poorly with Windows guests.

Type

Processor(s)

RAM

Disk

Current OS

Desktop – custom-built

Athlon64 3200+

1G

WD Caviar – 200GB 7200RPM

Linux – FC4

Desktop – Dell Dimension 9100

Pentium D – 3Ghz (dual core)

4G

WD Caviar – 250GB – 7200RPM

Windows XP

Laptop – Macbook Pro 15″

Core 2 Duo

2G

100G 7200RPM

MacOS10.4