tinderclient.py

Thursday, December 21st, 2006

As promised, I have created a python module which can be used to implement a tinderbox client to report arbitrary. I’ve also created a driver which can pull and build a Mozilla-like application. Sources here. I’ve tested it on Windows and Linux, but I fully expect it would work on Mac as well. It requires Python 2.4 and the killableprocess module.

I know it’s not especially obvious how to actually run a build. I use the following command line:

python mozbuild.py --config=/builds/tinderclient/firefox-config.py,/builds/tinderclient/sample-config.py --private-config=/builds/bs-passwords.py

Take a look at the MozillaTest tinderbox logs from Thursday to see the results of my test builds. Please note that anyone is welcome to run a tinderbox that reports to MozillaTest; it’s a place to test tinderbox scripts! If you want to start reporting to an official tree like MozillaExperimental or SeaMonkey-Ports, please ask permission from build@mozilla.org.

For those keeping track of killableprocess, I’ve committed some changes:

  • on *nix, create and kill process groups properly;
  • on Windows, allow redirecting the standard handles to files (instead of pipes);
  • on Windows, allow passing a dictionary for the environment of the new process to create.

Right now this is a technology experiment. We’ll probably use it to drive a Tamarin tinderbox. It’s vaguely possible that Mozilla will switch away from the old-style perl tinderbox client altogether going forward, but that requires replicating a lot of logic, and might not be worth it.

Sometime after the new year, I will be adding a driver script which can perform builds in a loop, perhaps with config updating from CVS the way the current tinderbox scripts do.

killableprocess.py

Monday, December 11th, 2006

I’ve managed, at long last, to solve the problem of launching subprocesses from python. I have created a python module which can launch a subprocess, wait for the process with a timeout, and kill that process and all of its sub-subprocesses correctly, on Windows, Mac, and Linux. Source code is here. It requires python 2.4+ because it subclasses the subprocess module. On Windows, it only works on Win2k+, and it requires the ctypes module, which comes with Python 2.5+, or can be installed into earlier versions of Python.

You will be seeing a python-based tinderbox client appear on the MozillaTest tree shortly. Small projects or projects that don’t want or need the byzantine logic of the existing tinderbox client scripts can use a Python module to do tinderbox reporting using a simple object-oriented API. I’m hoping to use this to get Tamarin builds reporting to a tinderbox tree, as well as do some of the FF+XR build automation (which is significantly different from the existing build process).

Unified Windows Build Prerequisites

Thursday, December 7th, 2006

I have put together an initial draft of a unified Mozilla-build package. It is available (temporarily) on my website: mozilla-build.zip (55MB). It contains everything needed to build Mozilla on Windows except the Microsoft Visual C++ compiler and the JDK (only required for XULRunner).

Using MSYS with gmake 3.81 requires one code change, in bug 345482: I’m hoping to get this fix backported to all the active branches soon.

I would like to get some testing of this package, to see if it has missing pieces or causes builders any problems. Please report success or failure in blog comments.

Usage Instructions:

  1. Install MS Visual C++. Do not add paths to the environment when the installer gives you the option.
  2. If you already have MSVC installed, check your environment and remove any references to it from PATH, INCLUDE, and LIB.
  3. Unzip the package to a path with no spaces in it: I recommend C:\ (it will unpack to C:\mozilla-build).
  4. Run start-msvc6.bat, start-msvc71.bat, or start-msvc8.bat. It should detect the installed location of MSVC from the registry, set up paths correctly, and launch an MSYS shell.

Plans for the Mozilla Build System

Thursday, November 30th, 2006

Does the Mozilla build system need to change? The build system is impressively flexible and relatively accurate, but it is not especially easy to use or hack, and it can be slow. There are some common issues that cause major pain for developers. We need to reduce the pain of using our build system, without any major rewrites and without causing major disruption. I have spent some time investigating various options, and I have identified a set of changes that can make a major impact by improving ease of use, speed, and maintenance of the build system.

Pain #1: Setting up a Windows Build Environment

As the Windows Build Prerequisites page suggests, setting up a Windows build environment is very complex and easy to screw up. Developers have to obtain tools from five or six different sources and hook them up with carefully crafted scripts to set environment variables in a magic order. This is one of the major barriers to entry facing developers who want to start hacking Mozilla. This problem is not hard to fix:

  • Mozilla will provide an installer for all of the prerequisites for building Mozilla. The only thing that developers will need to install separately is Microsoft Visual C++. The installer will provide scripts and shortcuts that will provide a ready-made build environment. If build requirements change, a new version of the installer will be made available.
  • Configure checks will be improved to detect the common setup issues.

As part of this process, I am planning to make the MSYS build environment the official build environment and soon drop support for the cygwin build environment. Cygwin is a known source of performance problems and forces a lot of extra complexity in our build scripts to translate between Windows and unix-style paths.

Pain #2: Lack of Documentation

The Mozilla build system has many features that were added without any documentation whatsoever. Brian Ryner wrote a short summary of the build system last year that I have been slowly expanding. I have moved and expanded the old build glossary of makefile variables and other build-system terms, and provided example makefiles to create certain kinds of output (static library, shared library, component library). I hope to have this reference basically completed before Christmas. Any help others can give to complete and edit this documentation is much appreciated.

Pain #3: Depend Build Speed (Recursive Make Harmful)

The current build system uses a multi-pass recursive make system that, while mostly accurate, can be slow. It is tempting to architect a replacement build system (using either an existing framework such as SCons or WAF, or a homegrown solution), but there are over 2000 build scripts (makefiles, perl scripts, and various build manifests) in the Mozilla tree: none of the new build systems have a facility for porting this makefile logic, and any complete rewrite of the build system that is not incremental would be suicidal.

Instead, I have devised techniques to build large parts of the mozilla tree using fewer invocations of make. This will allow the build system to compute dependencies more accurately, as well as significantly reduce the number
of intermediate static libraries that are created during a build. It will also significantly help build times for people using parallel builds with -j and distributed builds with distcc. These new techniques will require GNU make 3.81, but will otherwise be fairly straightforward and can be implemented gradually, a few directories at a time.

Pain #4: Monolithic configure

As the Mozilla project transitioned from the unified suite to the standalone Firefox and Thunderbird, and now to a multitude of projects/products, the root configure script has become increasingly difficult to manage. It is difficult/impossible to tell which configure options work with which products. All the configure options are thrown into a single autoconf.mk file, which can cause hidden dependencies between modules.

While working on a cross-platform Tamarin build system, I discovered that replacing autoconf configuration scripts with python configuration scripts is relatively easy; it can be done without altering the Makefile-based build system. My first-cut scripts are a bare imitation of autoconf functionality, but fleshing out scriptable compiler and feature tests should be relatively straightforward.

I am hoping to port the main Mozilla configuration scripts to python over the next year. I’ll learn a lot from the Tamarin experience which can be applied to the main tree. I expect that the new scripts will not be ready for Mozilla 1.9, but will be used for Mozilla 2.

Flights of Fancy

If I had unlimited time, I would think about the following things:

  • Writing a makefile parser that could be used to read or convert the existing makefiles into a python-based build system with better scriptability and flexibility.
    • Which could detect when rules changed and rebuild automatically
    • Which could detect when JAR members changed and rebuild only those members
  • Improving the XUL preprocessor to support “real” #if conditions

Conclusion

Mozilla can and should make incremental improvements to the build system. This will be done gradually and carefully to solve specific pain-points and refactor code without disrupting existing work. The short term projects will reduce the pain for new developers and hackers, while the longer-term projects will reduce the pain of maintaining the configuration/build system.

mddepend.pl stats

Monday, November 27th, 2006

My last post mentioned that mddepend.pl causes our build system to do many extra calls to stat(). I’ve done some instrumentation and come up with the following numbers (Linux, Firefox trunk):

Calls to mddepend.pl

mddepend calls to stat()

New objdir

336

65832

Nothing-changed rebuild

1148

224536

When building from scratch, there isn’t any need to call mddepend.pl: all the invocations and stat()s performed are unnecessary overhead. When doing a rebuild, some portion of the stats performed are necessary/expected, but nowhere near as many as are actually performed. I expect a full two-thirds of the calls to mddepend.pl are unnecesary, and probably 90% of the calls to stat(). The 224k stats in the depend build checked 16599 unique files, which means that a good stat cache reduces the size of the problem significantly.

Depths of the Mozilla Build System

Wednesday, November 22nd, 2006

I should really be posting about some important plans for the Mozilla build system that I nailed down during the Firefox summit. But I don’t have time to give that a proper post, so instead I’m going to discuss one of the amazing things I’ve learned about the Mozilla build system over the past few days. Look at this snippet from rules.mk. I’ve been the nominal owner of this code for almost two years, but didn’t really understand it until this week. This little piece of code is one of the things that makes our build system really great and horrible at the same time:

  1. Whenever people remove or alter the location of header files, this code keeps all the depend builds from going red.
  2. It causes us to call stat() an extra 10,000+ times per depend build. Probably a lot more than that, actually, but I didn’t instrument it.

We do an end-run around the normal dependency checks done by GNU make: the mddepend.pl script stats and calculates the compiler-generated dependencies in advance. If the dependency is missing or new, it adds a FORCE dependency on the object file. Unfortunately, we do this calculation on each build pass: once for export, once for libs, once for tools, and perhaps another time for check. This causes us to check dependencies many many more times than we actually need to.

What we really want to implement is an “optional dependency”: a directive that if a header has been updated, we should rebuild the object; but if the header doesn’t exist any more, we shouldn’t try to build it (because we don’t have any rules to generate such headers which were removed or relocated intentionally). This is probably not something I’m going to fix any time soon. But I may find time to write it up in detail to propose it as a feature for gmake 3.82.

Adventures in Python: Launching Subprocesses

Thursday, November 9th, 2006

I’ve been looking at python for various build automation. I had what I thought would be a simple problem:

How do I launch a process, collecting stdout/stderr, with a timeout to kill the process if it runs too long?

The python subprocess module gets about 80% there. You can launch a process, and hook up stdout/stderr/stdin. You can poll the process for completion. But subprocess doesn’t have a simple parameter for process timeout. Total time spent: 45 minutes.

So, you use a loop or a thread to wait for the process and kill it if it takes too long, right? Subprocess doesn’t have an instance method to kill the process. Answer according to #python on freenode? os.kill(theprocess.pid, signal.SIGTERM). Except that this apparently doesn’t work on Windows: you have to emulate it. Total time spent: 1.5 hours.

This works, on unixy systems. But it fails miserably on Windows. It turns out that on Windows when you kill a process, any subprocesses that were launched don’t get killed. So I went searching code that I thought must have already solved this problem: BuildBot launches processes and has to kill them, right? Well, it turns out that BuildBot uses Twisted to do the dirty work. Twisted completely ignores the problem, as far as I can tell. It doesn’t use subprocess, but instead has a file called _dumbwin32proc.py which provides the event-driven access to the process pipes and status. This file is uglier than the devil’s rear end. Total time spent: 2.5 hours.

After much pain, I found Windows documentation that might help: Windows 2000+ can put processes into jobs. Instead of killing the parent process, you can kill the entire job. As far as I can tell this should be implementable in Python, but I haven’t found anyone who’s done it yet (even better, abstracted it behind a cross-platform API). If you know of code which has this working properly, please let me know. Otherwise I will be spending another 4 hours tomorrow to get this working (I know only halting python, though I’m getting better quickly). Total time spent: 3.5 hours.

Learning new languages isn’t that hard. Learning new programming worlds, with their bugs and quirks, is really hard.

Update: Solution in my post on killableprocess.py.

The Mozilla build system and Tamarin

Wednesday, November 8th, 2006

As the owner of the Mozilla build system, I get to hear endless moaning about how byzantine the build system is and how it would be nice to use something more modern. I sympathize with the problem: our build system is difficult to set up, has a very steep learning curve, and uses lots of unusual languages. On the other hand, our build system works amazingly well: we have grown a set of rules that build an amazing variety of file types, and works on a wide variety of platforms. Out build system is complex for the very good reason that our needs are complex.

A few of the more aware complainers have suggested alternatives such as SCons. I have avoided making huge changes to our build system because it is overwhelming to contemplate migrating our existing build logic to a new build system. Any new build system would have to compare favorably to our existing system, which has years of bugfixes and features to its credit.

I was excited when I was asked to help implement a cross-platform build system for the Tamarin project. Tamarin was released with platform-specific Visual Studio and XCode project files. I thought that perhaps this was a perfect opportunity to try one of the newer build systems, without paying the cost of porting our entire existing build system over.

I created a requirements list. Unfortunately, I don’t think that any of the existing tools are going to meet our needs without extensive customization. I’m going to continue to examine SCons and WAF more over the next few days, but it looks like the path of easiest success is still going to be an old-school autoconf + make build system. Hopefully at least I’ll be able to fix some issues with recursive make while implementing the new system.

New release repackager and about:config fixer

Tuesday, October 10th, 2006

I’ve prepared a couple new software releases this week:

Firefox Release Repackager (1.2)

Adds support for repacking the NSIS installers used by Firefox 2.

about:config fixer

Fix about:config to display the chrome URI of localized prefs, instead of the localized value.

Recursive Make Isn’t All That Bad

Friday, September 29th, 2006

In 1998, Peter Miller wrote an essay Recursive Make Considered Harmful (PDF). He outlines several convincing arguments for why the traditional practice of recursing through directories and calling make in each directory may not be a good technique.

Mozilla uses recursive make extensively. In fact, during a typical build we traverse each directory in the build tree at least twice, and in some cases three or even four times. So, way back in 2002, Chris Seawood filed a bug to reduce the number of subdirectories we traverse during a typical build. This had the promise of significantly reducing build time on Windows because forking processes on Windows (and especially under cygwin) is really expensive.

Unfortunately, it turned out that a non-recursive make was taking longer than the standard recursive make. I thought, at the time, that this was due to the use of extra $(shell cygpath) subprocesses needed to handle sources from multiple subdirectories correctly. About a month ago, at schrep‘s prompting, I went back to see whether there was a way to get it working. The results are below:

make -C netwerk libs
(Nothing to rebuild)

Standard (recursive) make

Non-recursive make

real

5.64s

7.36s

user

6.50s

1.17s

sys

3.31s

1.96s

It turns out that the problem with non-recursive make, at least as I currently have it coded, is not the cost of forking processes, but the cost of statting files that don’t exist. Every time a makefile adds a directory to VPATH, make ends up searching through additional directories looking for files. As you can see from the chart, the actual processor time spent by the non-recursive process is a lot less; it takes longer in wall-clock time only because it’s waiting on disk I/O.

I’m hoping to try another technique for non-recursive make that doesn’t set the VPATH for each source directory. Unfortunately, this is going to involve some changes to the dependency system (which I don’t understand very well), so I don’t know when I’m going to find the time.

My technique involved the use of the $(eval) makefile function, which is only available in gmake 3.80 or higher. The tests were performed under MSYS, using GNU make 3.81, available from MSYS snapshots. Special thanks to Earnie Boyd for patiently dealing with me and pointing me at the right files.