Merging with Mercurial Queues

Friday, November 30th, 2007

I’ve spent most of the last 2.5 days merging code that lives in mercurial patch queues. It’s been an interesting (and frustrating) experience, but I’ve learned a lot, so I put together this tutorial on maintaining/merging/updating a mercurial patch queue.

The XPCOMGC patch queue

The XPCOMGC patch queue was actually 60+ patches long, but for example purposes let’s simplify to 7. What makes it tricky is that patches “after” the automated rewrites often won’t apply cleanly to the tree before the automatic rewrite. This means that moving patches from “after an automatic rewrite” to “before an automatic rewrite” requires a merge algorithm. Imagine a patch series like this:

manual-fixbugA
manual-fixbugB
automatic-rewrite-comptrs
automatic-rewrite-classhierarchy
automatic-rewrite-addrefs
manual-fixbugA-patch2
manual-fixbugC

You may be wondering about “manual-fixbugA-patch2″… why didn’t I just go back to manual-fixbugA and change it? This is because, once you’ve applied an automatic rewrite, popping/editing/pushing requires not only a complete tree rebuild (20 minutes), but also rebuilding the automated patches (hours). Instead, I create a “patch2” (and sometimes a “patch3” and “patch4”) and later fold the patches together.

Reordering a patch queue

I want to clean up this patch queue and update the “base tree” on which I’m applying patches. To do this, I’m first going to move all the “manual” patches before the automatic rewrites. Some of the patches may not apply cleanly in their new positions, so we prepare for merging. 64

$ hg qpush -a;          # Push all patches
$ hg qsave -e -c;       # Save the patch queue state... this allows for merging later
$ hg update -C qparent; # Move the working directory back to a completely unpatched state

Now we edit the .hg/patches/series file, removing the automatic rewrites. Instead of attempting to merge the automatic rewrites, we will simply regenerate them later.

manual-fixbugA
manual-fixbugB
manual-fixbugA-patch2
manual-fixbugC

Now we push the patches back on: if a patch doesn’t apply cleanly, use a three-way merge algorithm based on the saved patch queue state:

hg qpush -m; # applies manual-fixbugA
hg qpush -m; # applies manual-fixbugB
hg qpush -m; # applies manual-fixbugA-patch2 with interactive merging
hg qpush -m; # applies manual-fixbugC with interactive merging

Now, we want to merge the changes from “manual-fixbugA-patch2” into “manual-fixbugA”

hg qgoto manual-fixbugA;
hg qfold manual-fixbugA-patch2;

Now we have a clean patch queue which is ready for updating to a new “base”:

hg qpush -a;    # Push all patches in preparation for our second merge
hg qsave -e -c; # Save the patch queue state (again)
hg pull http://hg.mozilla.org/actionmonkey; # Pull new changesets from actionmonkey
hg up tip;      # Update the working directory to the unpatched actionmonkey tip
hg qpush -m;    # merge the patches one by one, with merging if necessary

In the case of XPCOMGC, “new-base” merges are difficult primarily because of the cycle collector. Cycle-collection is still be actively developed on trunk, with major changes to the xpconnect mark/trace algorithms as well as changes to timing. One of the earliest patches in the XPCOMGC queue removes cycle-collector entirely (when we have a garbage collector for XPCOM objects a cycle collector is no longer needed). Merge conflicts are common not only in the patch which removes the cycle collector, but in subsequent patches which touch xpconnect.

I’ve learned that merging goes much better if every patch in the patch queue produces a building tree. To test whether a merge was performed correctly (or discover which patch was merged incorrectly), simply build the tree at each patch state.

Conclusion

Mercurial patch queues make it possible to merge-update a series of patches against a moving base, which is very cool. But they don’t remove the actual hard work of merging changes that conflict.

Perils in Rewriting

Thursday, November 8th, 2007

Automatically rewriting code is perilous business. Can you spot the bug in the following rewrite to remove stack nsCOMPtrs?

@@ -432,8 +432,8 @@ LoadAppDirIntoArray(nsIFile* aXULAppDir, if (!aXULAppDir) return; 
- nsCOMPtr<nsIFile> subdir; 
- aXULAppDir->Clone(getter_AddRefs(subdir)); 
+ nsIFile* subdir; 
+ aXULAppDir->Clone(&subdir);
  if (!subdir) return;

The patch produced by garburator matches the spec I originally wrote for Taras, bug and all… the bug is that nsCOMPtr automatically initializes itself to null, while a raw pointer doesn’t.

What makes automatic rewriting hard is that the specification of the rewrite is usually incomplete. I could say to Taras “get rid of nsCOMPtrs on the stack” or even provide sample patches, but reality is much more complicated. For the automatic rewriting of stack nsCOMPtrs, this was an iterative process. Taras would run the tool (named garburator), apply the patch, and test-build. When something didn’t work right, he’d analyze the problem (often with help from the IRC channel) and then refine the rewriting tool. Sometimes refining the tool wasn’t worth it, and he’d simply fix the code by hand or hide it from the rewriter.

Quick quiz: how do you deal with functions that take a nsCOMPtr<nsIFoo>&?

Answer: It depends… is this a hashtable enumerator callback function? If so, don’t rewrite the signature at all. If not, the correct rewriting is probably nsIFoo**… but then if the code passes a heap variable to the function, you have to wrap it in getter_AddRefs to get the correct write-barrier semantics.

Automatic rewriting is not yet a complete solution: in particular, it doesn’t attempt to rewriting templated functions, and it will often give up on items which are passed to templated functions. And of course, we can really only run the rewriting tools on Linux, which means that platform-specific mac and windows code will need to be rewritten manually. If you want horror stories, ask Taras about nsMaybeWeakPtr, the helper-class from hell! In any case, I now have an XPCOMGC tree which runs through component registration before crashing… it feels like quite an accomplishment, even though it hasn’t gotten past the first GC yet.

What If?

Monday, November 5th, 2007

Is C++ hurting Mozilla development? Is it practical to move the Mozilla codebase to another language? Automated rewriting of code can help some, but there are some basic language limitations that limit the scope of rewriting within C++. What if Mozilla invented a mostly-C++-compatible language that solved problems better than C++?

Why!?

Am I crazy? Well, maybe a little. But what are the disadvantages of C++?

  • Poor memory safety;
  • Lack of good support for UTF strings/iterators;
  • difficulty integrating correctly with garbage-collection runtimes (MMgc), especially if we want to move towards an exact/moving garbage collector with strong type annotations;
  • difficulty integrating exception handling with other runtimes (Tamarin);
  • C++ lacks features Mozilla hackers want: see some of roc’s ideas;
  • static analysis: It would be good to statically prevent certain code patterns (such as stack-allocated nsCOMPtr), or behave differently than C++ normally would (change the behavior of a stack-allocated nsCOMArray versus a heap-allocated one);
  • We are hoping to use trace-based optimization to speed JavaScript. If the C++ frontend shares a common runtime with the JS frontend, these optimizations could occur across JS/C++ language boundaries. The Moz2 team brainstormed using a C++ frontend that would produce Tamarin bytecode, but Tamarin bytecode doesn’t really have the primitives for operating on binary objects.

On the other hand, C++ or something like it have advantages that we should preserve:

  • C++ can be very low level and performant, if coded carefully;
  • The vast majority of our codebase is written in C++;

Is it practical?

I don’t know. Some wild speculation is below. Please don’t take it as anything more than a brainstorm informed by a little bit of IRC conversation.

LLVM is a project implementing a low-level bytecode format with strong type annotations, and code generator, optimizer, and compiler (static and JIT) to operate on this bytecode. There is a tool which uses the GCC frontend to compile C/C++ code into LLVM bytecode. It is already possible to use the llvm-gcc4 frontend to compile and run mozilla; it compiles the Mozilla codebase to an intermediate LLVM form and from there into standard binary objects. We would not invent an entirely new language: rather we would take the GCC frontend and gradually integrate features as needed by Mozilla.

In addition, we could translate spidermonkey bytecode into LLVM bytecode for optimization and JITting.

Finally, optimization passes for trace-based optimizations could be added which operate on the level of LLVM bytecode.

Problems…

The most pressing problem from my perspective is that using the G++ frontend requires compiling Mozilla with a mingw-llvm toolchain on Windows. The gcc toolchain on Windows does not use vtables which are MS-COM compatible, which means that Mozilla code which uses or implements MS-COM interfaces will fail. In addition, we would be tied to the mingw win32api libraries, which are not the supported Microsoft SDKs and may not be always up to date, because they are clean-room reverse-engineered headers and libraries. This mainly affects the accessibility code which makes extensive use of MS COM and ATL.

  • Is it feasible to teach gcc/mingw to use the MS SDK headers and import libraries?
  • Is it feasible to teach the minw compiler about the MSVC ABI so that our code can continue to call and implement MS COM interfaces?
  • Is there another solution so that the accessibility code continues to work?

Questions

  • Am I absolutely crazy?
  • As a Mozilla developer, what features do you think are most important for ease of Mozilla development?
  • Are there solutions other than the g++ frontend that would allow our C++ codebase to evolve to a new language?
  • Is this a silly exercise? Would we be spending way too much time on the language and GCC and LLVM and whatnot, and not enough on our codebase? Are there other ways to modernize our codebase gradually but effectively? Is LLVM or a custom language something we should consider for mozilla2, or keep in mind for later releases?

Using Dehydra to Detect Problematic Code

Tuesday, October 16th, 2007

In XPCOMGC, the behavior of nsCOMPtr is very different than currently:

  • nsCOMPtr should only be used as a class member, never on the stack. Taras is working on a rewriting script that will replace nsCOMPtr<nsIFoo> on the stack with a raw nsIFoo* (more on that later).
  • the purpose of nsCOMPtr is not to ensure correct reference counting (there is no reference-counting!); instead it serves to enforce write barriers, so that MMgc can properly perform incremental GC.

I was able to rewrite nsCOMPtr so that existing code code mostly use the existing API: there is however one major difference: getter_AddRefs cannot return a pointer directly to the nsCOMPtr instance. Instead, it must save the value in a local variable and call nsCOMPtr.set() to preserve the write-barrier semantics. It does this using a temporary class:

/**
 * nsGetterAddRefs is used for XPCOM out parameters that need to be assigned
 * to nsCOMPtr members. We can't pass the address of nsCOMPtr.mRawPtr directly
 * because of the need to set the write barrier.
 */
template <class T>
class nsGetterAddRefs
{
public:
  explicit
  nsGetterAddRefs(nsCOMPtr<T> &aSmartPtr) :
    mTempPtr(aSmartPtr),
    mTargetSmartPtr(aSmartPtr)
  {
    // nothing else to do
  }

  ~nsGetterAddRefs()
  {
    mTargetSmartPtr = mTempPtr;
  }

  operator T**()
  {
    return &mTempPtr;
  }

private:
  T* mTempPtr;
  nsCOMPtr<T> &mTargetSmartPtr;
};

template <class T>
inline
nsGetterAddRefs<T>
getter_AddRefs(nsCOMPtr<T> &aSmartPtr)
{
  return nsGetterAddRefs<T>(aSmartPtr);
}

For the vast majority of cases where code makes a simple getter call, this works fine:

nsresult rv = something->GetAFoo(getter_AddRefs(mFoo));

However, if you test or use the returned value after you get it in the same statement, the value won’t be assigned yet:

Bad:

if (NS_SUCCEEDED(something->GetAFoo(getter_AddRefs(mFoo))) && mFoo)

Also bad:

NS_SUCCEEDED(something->GetAFoo(getter_AddRefs(mFoo))) && mFoo->DoSomething();

In the XPCOMGC world, both of these cases will fail because the ~nsGetterAddRefs destructor runs after the dereference of mFoo. Once we remove stack comptrs, this is not a common occurrence, but it does happen occasionally.

Checking for this kind of pattern is the perfect job for dehydra: check it out.

Sharing Mercurial Queues to Develop XPCOMGC

Tuesday, October 16th, 2007

One of the development techniques we’re experimenting with for Mozilla2 is the use of Mercurial Queues (MQ). Queues are a way of maintaining a set of patches on top of a base repository.

There are a few features of MQ that have made it very useful:

  • You can merge a stack of patches with changes in the underlying repository, instead of ending up with patches that don’t apply and .rej files. The magic you are looking for is hg qsave -e -c; hg qpush -m.
  • Patches can be versioned: you can create a mercurial repository of patches, commit and push that repository to share patches between developers.
  • Patches can be reordered, folded together, and with some special tricks even split apart into multiple patches.

MQ is ideal for doing large coding experiments such as XPCOMGC: it can keep track of manual rewriting and automatic patches separately; patches can be saved for review and merging upstream without delaying coding work; and the patch repository can be shared between the team doing the work.

The XPCOMGC patch repository is available publicly at http://hg.mozilla.org/users/bsmedberg_mozilla.com/xpcomgc-patches/. The patches are against the ActionMonkey tree… please don’t expect much more than “it builds” yet… the browser won’t start yet, due to various unrooted objects, bad assumptions, and other undiagnosed problems.

Caution: To use MQ effectively, you must set the following settings in your ~/.hgrc:

[extensions]
hgext.mq =
[diff]
git = 1

You must turn on git-style patches, or bad things will happen if you try to add/remove/move files!

Dehydra is cool

Thursday, October 11th, 2007

In a lonely corner of Mozilla-land, Taras Glek has been working on tools to support automated analysis and rewriting of the Mozilla codebase. I hadn’t gotten a chance to play with these tools until recently, when I started working on mozilla2 code refactoring pretty much full-time. And Taras’ tools are amazing!

For instance, for XPCOMGC we needed to identify Mozilla XPCOM classes that need to inherit from GCObject. Taras has an uber-cool tool called Dehydra that reflects the C++ AST into JavaScript: with a little handholding, I was able to write the logic for this task in in 127-line JS file!

Processing (122/2004): ./security/manager/ssl/src/nsSmartCardMonitor.ii...  running dehydra... done.

It takes about 4 hours to run this script on every file in the tree (using multiple cores could make this much faster). Then I use a python script to post-process the output and create gcobject.patch (truncated example patch).

The possible uses of dehydra are almost limitless: it’s very likely that in due course we will do nightly dehydra runs with scripts that detect bad code patterns; it’s also possible to set up a tryserver to run dehydra over a build tree.

See the Mozilla wiki for more information about dehydra, including links to a PDF specification and more information from Taras’ blog.