Using Mercurial Revsets to Search for Changes Between Firefox Releases

Monday, July 9th, 2012

I was recently asked to provide a list of all the plugin-related changes from Firefox 11 to 12 to 13. This is actually not the easiest question to answer: both bugzilla and source code searches may produce false negatives and false positives.

Using bugzilla to answer this kind of query fails whenever the bugzilla metadata is incorrect or misleading. Usually the Target Milestone for a fixed bug indicates the release version that it first ships in, but it is not uncommon to forget to set the TM for bugs. And when a bug is backported to an earlier train, the target milestone doesn’t always get moved backwards to match. In addition, filtering by bugzilla component often leaves out important bugs, since changes to graphics, DOM, layout, or JS can make noticeable changes in plugin code.

Instead, I decided to use Mercurial to answer this question. Almost all changes which affect plugin function are going to be located in the following directories and files:

  • dom/plugins
  • content/base/src/nsObjectLoadingContent.{h,cpp}

So what I really wanted was a list of changes between two Firefox releases which touch that set of files. But how can I use Mercurial to search only the revisions between two specific releases? Here is the branch diagram for the Firefox trains:
Diagram of train model and branches from the FIREFOX_12_0_RELEASE tag to the FIREFOX_13_0_RELEASE tag

By default, hg log -rfoo:bar will list the changes between two particular revisions in whatever linear order these revisions happened to be pulled into your repository. This means that you may get different results depending on the order you pulled. But there is a feature in Mercurial called “revsets” that lets you pick changes in much more specific ways. For more information about revsets, issue hg help revsets or see the online documentation.

In this case, I wanted to log all the revisions between the mozilla-central branch point of Firefox 12 and the release of Firefox 13. Revsets can figure the branch point with the ancestor operator, and can then list all the changesets between two revisions (including all branch/merge revisions) with the :: operator. So the final command for finding all changes which touch plugin files between Firefox 12 and 13 was:

$ hg log -r "ancestor(FIREFOX_12_0_RELEASE,FIREFOX_13_0_RELEASE)::FIREFOX_13_0_RELEASE" dom/plugins/ content/base/src/nsObjectLoadingContent.{h,cpp}

Special thanks to Dirkjan Ochtman (djc) for introducing me to revsets!

Laying Blame

Tuesday, December 9th, 2008

As I mentioned last week, I’ve been resurrecting a project to report on compiler warnings. A basic form of this buildbot is now operational on the Firefox tinderbox tree (look to the far right for the static-analysis-bsmedberg column). It prints a summary of the total number of warnings on the summary page: in the full tinderbox log, it lists each warning and who can be “blamed” for that warning:

/builds/static-analysis-buildbot/slave/full/build/memory/jemalloc/jemalloc.c:177:1: warning: C++ style comments are not allowed in ISO C90 blamed on Taras Glek  in revision hg:36156fbf817d8a0e2d54a271cf0bff94a1c41c13:memory/jemalloc/jemalloc.c
/builds/static-analysis-buildbot/slave/full/build/js/src/jsdbgapi.cpp:712: warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object blamed on brendan@mozilla.org in revision cvs:3.36:js/src/jsdbgapi.c

Assigning blame can be a tricky process. In order to figure out the blame for a warning, the code uses the following steps:

  • Resolve relative paths against the current working directory, using GNU make “Entering/Leaving directory” markers as a guide.
  • Dereference symlinks to find the source tree location of an error. For instance, Mozilla headers which produce warnings often do so via paths in dist/include. We have to resolve these to their original source tree location in order to find blame.
  • Using mercurial APIs (through python), find the mercurial changeset which introduced the line in question.
  • If the code dates back to Mercurial revision 9b2a99adc05e, which is the original import of CVS code to Mercurial, use a database of CVS blame to find the original CVS checking which was responsible for introducing that line of code.

If you’re interested, take a look at the build log parsing code, or see the scripts which save CVS blame to a database (thanks Ted!).

The current reporting system for warnings is very primitive. I’m currently working on a second version of this code which will provide additional features:

  • Compare warnings with the previous build and highlight “new” warnings. I do this by recording the error text and the blamed location of the warning. As lines are added and removed from the code, the reported location of the warning changes, but the location of Hg/CVS blame doesn’t. This means it is a stable location which can be used for comparisons across runs. It even works across file renames/moves!
  • Web frontend to the warning database to allow users to query warnings by user or directory.
  • Classify warnings by “type”. This is not a simple process, because GCC mixes distinctive error text, such as “may be used uninitialized in this function” with variable names; and the granularity of -fdiagnostic-show-option is low enough that it’s not very useful by itself. Oh, I wish GCC had error codes like MSVC does: C1234 is easy to recognize!

At one point, I thought I could implement all of the warning mechanism on the buildbot server by parsing the BuildStep logs. It quickly became clear that I couldn’t do that, because I couldn’t resolve symlinks, and getting Mercurial blame was difficult or impossible. My new version actually uses a hybrid mechanism where the build log is parsed on the buildbot slave: this parses out warnings, resolves symlinks, and looks up blame. It then sends the results back via stdout to the master. A custom build step on the master parses this log, saves the information to a database, and does the checking for new warnings and prints various results to custom build logs.

Release branches in mozilla-central

Thursday, October 9th, 2008

On the Firefox development branches, the version number is always “pre”: for example, 3.1b1pre. This makes it easy to distinguish between nightly builds and release builds. To produce a release, the release team creates a “minibranch”. This minibranch exists for the following reasons:

  • To allow bumping the version numbers to the release version, for example: 3.1b1.
  • To isolate the release process and allow the main development tree to re-open as quickly as possible.

This is a long-standing tradition in CVS, but we haven’t really done it before 3.1b1 in Mercurial. This week, mozilla-central grew a new branch: the GECKO191b1_20081007_RELBRANCH. Pushing this branch to mozilla-central caused some unexpected side effects for developers:

  1. Developers who issued a normal hg pull -u got the following message:
    adding 171 changesets with 234 changes to 110 files (+1 heads)
    not updating, since new heads added
    (run 'hg heads' to see heads, 'hg merge' to merge)

    Yes, a new head was added; but this head is on a named branch and shouldn’t affect developers who aren’t on that branch. This is a bug in Mercurial that will be fixed in future versions. To work around the problem, just run hg up, which will update you to the latest revision of the default branch.

  2. hg heads shows branch heads. Normally, developers working on the default branch don’t care about heads on other branches, and don’t want release branch heads showing up when they issue the hg heads command. The Mercurial developers are aware of this issue and will fix it in a future version. In the meantime, use the following command to see only the heads of the default branch: hg heads default.

Note: even with the above bugs fixed, hg pull -u isn’t the exact equivalent of hg pull; hg up: in the case where no new changes are available on the remote server, no update will be performed. This only affects trees where the working directory is not at the tip revision. This slightly unintuitive behavior is considered a feature by the Mercurial developers, not a bug.

Getting mozilla-central with limited bandwidth

Thursday, June 5th, 2008

Recently we opened up mozilla-central for checkins for Mozilla 1.9.1/Firefox.Next. As people on IRC have started using the repository, one of the major complaints has been that cloning the entire repository for the first time can take a very long time over a slow network… and for flaky networks, it may be impossible to clone at all.

There is a solution: instead of cloning directly from hg.mozilla.org (hg clone http://hg.mozilla.org/mozilla-central/), download a changeset bundle and unbundle it to create a local repository.

  1. Download a mozilla-central bundle. For the moment, I’m hosting one here. I’m going to ask the mozilla release team to produce one nightly and host it on the mozilla FTP server. The bundle file is approximately 65MB.
  2. Create a new, empty repository:
    $ hg init mozilla-central
  3. Un-bundle the real mozilla-central changes to that repository:
    $ cd mozilla-central;
    $ hg unbundle /path/to/mozilla-central.bundle
  4. Tell mercurial where you normally want to pull from by copying the following content into your mozilla-central/.hg/hgrc file:
    [paths]
    default = http://hg.mozilla.org/mozilla-central/
  5. Pull any additional changes that happened since the bundle was created:
    $ hg pull
  6. Update your working directory to the latest change:
    $ hg up

    Happy hacking!

Reviewing Merges in Mercurial

Thursday, April 17th, 2008

In the brave new world of distributed version control, history is no longer linear. It can branch freely, under no central control. But what is perhaps more interesting is that you can merge disparate branches back together. This makes it much easier to do long-term feature work, because it is possible to track a moving target. For the ActionMonkey project, we are following a procedure where all changes are reviewed before they are pushed to the actionmonkey branch. This means that we won’t have a huge blob of changes that need review before they are integrated into mozilla-central.

This creates a new class of problem: how do you perform merges? Is it necessary to review merge changesets with the same rigor as regular changes? How would one actually go about reviewing a merge changeset?

Today, I wanted to update ActionMonkey with the latest changes from mozilla-central. There is enough divergence between ActionMonkey and Mozilla 1.9 that this is not always a simple task: fortunately for me, the only conflicts that required any real merging were fairly simple. I went ahead an performed the merge and did some basic testing. But I really want Jason Orendorff to review my changes for sanity. There is no simple way to see what I actually did:

Merge graph of actionmonkey

It isn’t hard to diff this changeset against its mozilla-central parent, or against its actionmonkey parent. But neither of these diffs give you a sense of the real work involved in the merge. What I really want to give to Jason is a static view of a three-way merge as it already happened, highlighting the source locations where I made conflict resolutions or manual changes. Does anyone know if such a tool exists for Mercurial, or for any other distributed version control system?

Because I don’t know of a tool like this, I did the next-best thing: in my checkin comment, I carefully listed every file and function where I made a conflict resolution or manual change. This will at least make it clear in the future where I may have goofed.

hgweb viewer, canvas version

Monday, April 14th, 2008

If you want to go drawing complex graphics on the web, you have two basic options: SVG and the HTML canvas element.

My first attempt at the hgweb graphical browser used SVG. Actually, it used an unholy hybrid of SVG and HTML: the revisions themselves were drawn using absolutely positioned HTML. The arrows between the boxes were drawn using SVG. I did this for several reasons:

  • only Firefox supports drawing text into canvas elements
  • I could use DOM events to do hit-testing and navigation
  • you can select/copy/paste text in HTML

Unfortunately, the performance of the first version was not very good, and the code was very complex. I was maintaining several parallel data structures: a JS object graph, and the DOM objects for revisions and arrows. The bookkeeping code to keep data in sync was dwarfing the actual layout logic.

Instead I decided to try using canvas. Sucking out the bookkeeping code and replacing it with a custom drawing code turned out to be much easier than I expected. Now all of the data is kept in a single tree, and layout, rendering, and hit-testing are all blazingly fast.

After getting it basically working, I was able to add additional features in a matter of minutes:

  • Collision detection
  • Animation when navigating between revisions
  • Switched to a vertical layout which provides more information
  • Made arrows into curves
  • Highlight the “center” node

The disadvantages of this approach are unfortunate:

  • Only works in Firefox 3+ (needs the experimental mozDrawText API)
  • Impossible to select or copy text

check it out (my development machine, so it may go down at any time) or get the source.

Now I really promise I don’t have any more time to spend on this project. Contributions welcome!

Wandering around Mercurial Repositories

Wednesday, March 26th, 2008

One of the things that makes mercurial useful and frustrating at the same time is that history is not linear. History can separate and join at multiple places. Tools such as hg log can’t show the outline of history clearly. There have been several attempts to remedy this situation: hg glog prints a revision graph using ASCII art, and hg view allows browing a repository history in a GUI app written in TK.

I think we can do better, so I started a new project: it is an extension to hgweb which allows browsing repository history graphically in a web browser. You can navigate graphically through revisions, and it will load new revisions on demand. Note that I’ve only tested it with Firefox, and it won’t work in IE (no SVG support):

mozilla-central on office.smedbergs.us.

There is certainly lots more that can and should be done:

  • The layout algorithm is really dumb
  • Revisions should link back to hgweb for viewing diffs, etc
  • Would be nice to zoom out to get a larger picture with less detail

Unfortunately I don’t have any more time to spend on this project. So I’m looking for volunteers to come forward and finish it.

Installation Instructions

The code is here. It is packaged as a single mercurial extension which must be installed in a global hgrc file, and a www/ directory with static files. I believe this extension will work correctly against Mercurial 1.0: I’ve been testing it against a mercurial-crew pull from a couple days ago.

  • In www/navigate.js configure BASEURL for your site
  • In www/index.xhtml populate select-repo with the repositories on your site.

The hgweb extension exists primarily to provide repository information in JSON format. I hope that it will be useful for other kinds of projects as well.

Things I Learned From Writing a Mercurial Extension

Thursday, March 6th, 2008

I wrote my first mercurial extension today. It will print a log of all the commits you need to get from a changeset A which used to be the only head of a repository to a changeset B which is now the only head of the repository.It’s part of a project to get our buildbots hooked up to mercurial in a sane way. I need to make the same logic available via HTTP. It wasn’t too hard, but there are some tricky/undocumented things in the mercurial API that made life difficult.

  • The wire protocol is a pretty elegant way to ask a server for changesets when you don’t know what the graph looks like yet with relatively few queries.
  • The HTTP version of the protocol returns all the results in a plain-text format, except for the actual changesets which come across as a relatively opaque “bundle”.
  • The between command in the wire protocol doesn’t do what you think it does, or what the documentation seems to say it does. Instead of returning all the changesets between point A and point B, it walks backward through the list returning a selection of changesets along the line. The farther you go back the less often it will pick a changeset to return.
  • There is currently no way to query hgweb for metadata about a particular commit, such as the author, date, or commit message. This is trivial to add, and I will be submitting a patch shortly.
  • With a few extra functions, it should be relatively trivial to write a web repository browser which allows you to nagivate the revision graph graphically and dynamically load information about history as you need it.
  • cmdutil.show_changeset can be used to show changesets the same way hg log does. I haven’t yet figured out how to expose all of the formatting options it accepts as options accepted by my extension.

Splitting a Changeset/Patch

Wednesday, February 13th, 2008

I have a common problem with revision control:

  1. Clone a tree
  2. Start working on a problem X (say, adding valgrind annotations to MMgc) with edits to GC.cpp
  3. Get distracted and start working on problem Y (say, MMgc crashing) with edits to GC.cpp
  4. I now have unrelated changes in GC.cpp that I’d like to separate.

Dear lazyweb, is there a tool I can use to interactively separate out these changes into two changesets? The solution should work with my mercurial workflow, no “use git-rebase” comments please. I happen to be using patch queues, but a solution that worked on real mercurial changesets or on raw patches would also be acceptable. It happens that usually the patch hunks for problem X and problem Y are completely separate, so a simple tool that let me throw “this hunk for X, this hunk for Y” would probably be ok. Better would be something like a three-way merge tool where I can edit an intermediate state between a fixed start-point and end-point.

Merging with Mercurial Queues

Friday, November 30th, 2007

I’ve spent most of the last 2.5 days merging code that lives in mercurial patch queues. It’s been an interesting (and frustrating) experience, but I’ve learned a lot, so I put together this tutorial on maintaining/merging/updating a mercurial patch queue.

The XPCOMGC patch queue

The XPCOMGC patch queue was actually 60+ patches long, but for example purposes let’s simplify to 7. What makes it tricky is that patches “after” the automated rewrites often won’t apply cleanly to the tree before the automatic rewrite. This means that moving patches from “after an automatic rewrite” to “before an automatic rewrite” requires a merge algorithm. Imagine a patch series like this:

manual-fixbugA
manual-fixbugB
automatic-rewrite-comptrs
automatic-rewrite-classhierarchy
automatic-rewrite-addrefs
manual-fixbugA-patch2
manual-fixbugC

You may be wondering about “manual-fixbugA-patch2″… why didn’t I just go back to manual-fixbugA and change it? This is because, once you’ve applied an automatic rewrite, popping/editing/pushing requires not only a complete tree rebuild (20 minutes), but also rebuilding the automated patches (hours). Instead, I create a “patch2” (and sometimes a “patch3” and “patch4”) and later fold the patches together.

Reordering a patch queue

I want to clean up this patch queue and update the “base tree” on which I’m applying patches. To do this, I’m first going to move all the “manual” patches before the automatic rewrites. Some of the patches may not apply cleanly in their new positions, so we prepare for merging. 64

$ hg qpush -a;          # Push all patches
$ hg qsave -e -c;       # Save the patch queue state... this allows for merging later
$ hg update -C qparent; # Move the working directory back to a completely unpatched state

Now we edit the .hg/patches/series file, removing the automatic rewrites. Instead of attempting to merge the automatic rewrites, we will simply regenerate them later.

manual-fixbugA
manual-fixbugB
manual-fixbugA-patch2
manual-fixbugC

Now we push the patches back on: if a patch doesn’t apply cleanly, use a three-way merge algorithm based on the saved patch queue state:

hg qpush -m; # applies manual-fixbugA
hg qpush -m; # applies manual-fixbugB
hg qpush -m; # applies manual-fixbugA-patch2 with interactive merging
hg qpush -m; # applies manual-fixbugC with interactive merging

Now, we want to merge the changes from “manual-fixbugA-patch2” into “manual-fixbugA”

hg qgoto manual-fixbugA;
hg qfold manual-fixbugA-patch2;

Now we have a clean patch queue which is ready for updating to a new “base”:

hg qpush -a;    # Push all patches in preparation for our second merge
hg qsave -e -c; # Save the patch queue state (again)
hg pull http://hg.mozilla.org/actionmonkey; # Pull new changesets from actionmonkey
hg up tip;      # Update the working directory to the unpatched actionmonkey tip
hg qpush -m;    # merge the patches one by one, with merging if necessary

In the case of XPCOMGC, “new-base” merges are difficult primarily because of the cycle collector. Cycle-collection is still be actively developed on trunk, with major changes to the xpconnect mark/trace algorithms as well as changes to timing. One of the earliest patches in the XPCOMGC queue removes cycle-collector entirely (when we have a garbage collector for XPCOM objects a cycle collector is no longer needed). Merge conflicts are common not only in the patch which removes the cycle collector, but in subsequent patches which touch xpconnect.

I’ve learned that merging goes much better if every patch in the patch queue produces a building tree. To test whether a merge was performed correctly (or discover which patch was merged incorrectly), simply build the tree at each patch state.

Conclusion

Mercurial patch queues make it possible to merge-update a series of patches against a moving base, which is very cool. But they don’t remove the actual hard work of merging changes that conflict.