Archive for the 'Mozilla' Category

Improving XPCOM for Mozilla 2

Friday, December 22nd, 2006

XPCOM technology, based on Microsoft COM, is fundamentally structured around the concept of binary object layouts and stylized calling conventions. XPCOM was a good technique for introducing modularity and extensibility to the Mozilla codebase, but it is showing its age. One of the interesting things about Mozilla 2 is that we can breaking API and binary compatibility.

There are several ways we should improve XPCOM:

  1. Improve reference-counting (this could include universal support for cycle collection, or even allocating all XPCOM objects using MMgc; Graydon and I talked about this some at the Summit, and I’m sure he’ll take the lead determining what this means in practice.
  2. Allow throwing complicated (object-type) exceptions from any XPCOM method, and reduce the verbosity and inefficiencies of nsresult return codes. C++ exceptions, as much as I dislike them, provide the shortest path to this goal. Taras has been working with oink to provide an automated way to convert method calls automatically.
  3. Reduce the complexity and verbosity of using XPCOM. I’ve been spending a fair amount of time working in Python recently, and I’m very impressed with its use of module objects. Using XPCOM could be a lot easier from script with some very simple changes. I’ll blog about this soon!

In order to achieve these objectives, I’m convinced that Mozilla must break XPCOM binary compatibility, and should stop using XPCOM as the binary embedding solution:

  • We may want the flexibility of making GCRoot or another abstract non-interface class the root type (nsISupports) for all XPCOM objects. We at least ought to add interfacerequestor and classinfo functionality to the root object type, and perhaps weak-reference support as well.
  • C++ exceptions are very compiler dependent (and compiler-version dependent) and are not good candidates for binary freezing.

The implications of a change like this are considerable:

  • It will no longer be possible (or desirable) to write binary XPCOM components in C++ that don’t live in the monolithic platform binary (libxul). At first this seemed like a significant challenge: Firefox and Thunderbird use binary components to do OS integration (profile migration and OS integration). Various extension also use binary components to integrate with external libraries. But most of these use-cases can be solved with a good foreign-function-interface library available from script. I’ll blog about this separately; I’ve been very impressed with the expressiveness and flexibility of the python ctypes library and I think it could be ported to SpiderMonkey rather easily.
  • Binary embedders (e.g. gtkmozembed clients) will no longer be able to access DOM objects via their XPCOM interfaces.
    The simplest way to solve this problem is to extend the scriptable NPAPI object model to be accessible by binary embedders. This will give embedders access to the DOM that is straightforward and relatively complete.

Brainstorming Example

class nsISupports : virtual public RCObject
{
  inline void AddRef() { IncrementRef(); return 2; }
  inline void Release() { DecrementRef(); return 1; }

  virtual nsISupports* QueryInterface(REFNSIID aIID, PRBool aAddRef) = 0;

  /**
   * For ease of conversion, provide an old-style QI wrapper.
   */
  inline nsresult QueryInterface(REFNSIID aIID, void **aResult) {
    *aResult = QueryInterface(aIID, PR_TRUE);
    return (*aResult) ? NS_OK : NS_NOINTERFACE;
  }

  virtual nsISupports* GetInterface(REFNSIID aIID, PRBool aAddRef) = 0;

  virtual nsIClassInfo* GetClassInfo() = 0;
};

The virtual inheritance of RCObject could be a problem for xptcall. There are ways around that. I’m also a little concerned that objects won’t be storing pointers to the “root” GCObject, but rather vtables within that object. I hope that doesn’t mess up MMgc.

tinderclient.py

Thursday, December 21st, 2006

As promised, I have created a python module which can be used to implement a tinderbox client to report arbitrary. I’ve also created a driver which can pull and build a Mozilla-like application. Sources here. I’ve tested it on Windows and Linux, but I fully expect it would work on Mac as well. It requires Python 2.4 and the killableprocess module.

I know it’s not especially obvious how to actually run a build. I use the following command line:

python mozbuild.py --config=/builds/tinderclient/firefox-config.py,/builds/tinderclient/sample-config.py --private-config=/builds/bs-passwords.py

Take a look at the MozillaTest tinderbox logs from Thursday to see the results of my test builds. Please note that anyone is welcome to run a tinderbox that reports to MozillaTest; it’s a place to test tinderbox scripts! If you want to start reporting to an official tree like MozillaExperimental or SeaMonkey-Ports, please ask permission from build@mozilla.org.

For those keeping track of killableprocess, I’ve committed some changes:

  • on *nix, create and kill process groups properly;
  • on Windows, allow redirecting the standard handles to files (instead of pipes);
  • on Windows, allow passing a dictionary for the environment of the new process to create.

Right now this is a technology experiment. We’ll probably use it to drive a Tamarin tinderbox. It’s vaguely possible that Mozilla will switch away from the old-style perl tinderbox client altogether going forward, but that requires replicating a lot of logic, and might not be worth it.

Sometime after the new year, I will be adding a driver script which can perform builds in a loop, perhaps with config updating from CVS the way the current tinderbox scripts do.

killableprocess.py

Monday, December 11th, 2006

I’ve managed, at long last, to solve the problem of launching subprocesses from python. I have created a python module which can launch a subprocess, wait for the process with a timeout, and kill that process and all of its sub-subprocesses correctly, on Windows, Mac, and Linux. Source code is here. It requires python 2.4+ because it subclasses the subprocess module. On Windows, it only works on Win2k+, and it requires the ctypes module, which comes with Python 2.5+, or can be installed into earlier versions of Python.

You will be seeing a python-based tinderbox client appear on the MozillaTest tree shortly. Small projects or projects that don’t want or need the byzantine logic of the existing tinderbox client scripts can use a Python module to do tinderbox reporting using a simple object-oriented API. I’m hoping to use this to get Tamarin builds reporting to a tinderbox tree, as well as do some of the FF+XR build automation (which is significantly different from the existing build process).

Unified Windows Build Prerequisites

Thursday, December 7th, 2006

I have put together an initial draft of a unified Mozilla-build package. It is available (temporarily) on my website: mozilla-build.zip (55MB). It contains everything needed to build Mozilla on Windows except the Microsoft Visual C++ compiler and the JDK (only required for XULRunner).

Using MSYS with gmake 3.81 requires one code change, in bug 345482: I’m hoping to get this fix backported to all the active branches soon.

I would like to get some testing of this package, to see if it has missing pieces or causes builders any problems. Please report success or failure in blog comments.

Usage Instructions:

  1. Install MS Visual C++. Do not add paths to the environment when the installer gives you the option.
  2. If you already have MSVC installed, check your environment and remove any references to it from PATH, INCLUDE, and LIB.
  3. Unzip the package to a path with no spaces in it: I recommend C:\ (it will unpack to C:\mozilla-build).
  4. Run start-msvc6.bat, start-msvc71.bat, or start-msvc8.bat. It should detect the installed location of MSVC from the registry, set up paths correctly, and launch an MSYS shell.

Plans for the Mozilla Build System

Thursday, November 30th, 2006

Does the Mozilla build system need to change? The build system is impressively flexible and relatively accurate, but it is not especially easy to use or hack, and it can be slow. There are some common issues that cause major pain for developers. We need to reduce the pain of using our build system, without any major rewrites and without causing major disruption. I have spent some time investigating various options, and I have identified a set of changes that can make a major impact by improving ease of use, speed, and maintenance of the build system.

Pain #1: Setting up a Windows Build Environment

As the Windows Build Prerequisites page suggests, setting up a Windows build environment is very complex and easy to screw up. Developers have to obtain tools from five or six different sources and hook them up with carefully crafted scripts to set environment variables in a magic order. This is one of the major barriers to entry facing developers who want to start hacking Mozilla. This problem is not hard to fix:

  • Mozilla will provide an installer for all of the prerequisites for building Mozilla. The only thing that developers will need to install separately is Microsoft Visual C++. The installer will provide scripts and shortcuts that will provide a ready-made build environment. If build requirements change, a new version of the installer will be made available.
  • Configure checks will be improved to detect the common setup issues.

As part of this process, I am planning to make the MSYS build environment the official build environment and soon drop support for the cygwin build environment. Cygwin is a known source of performance problems and forces a lot of extra complexity in our build scripts to translate between Windows and unix-style paths.

Pain #2: Lack of Documentation

The Mozilla build system has many features that were added without any documentation whatsoever. Brian Ryner wrote a short summary of the build system last year that I have been slowly expanding. I have moved and expanded the old build glossary of makefile variables and other build-system terms, and provided example makefiles to create certain kinds of output (static library, shared library, component library). I hope to have this reference basically completed before Christmas. Any help others can give to complete and edit this documentation is much appreciated.

Pain #3: Depend Build Speed (Recursive Make Harmful)

The current build system uses a multi-pass recursive make system that, while mostly accurate, can be slow. It is tempting to architect a replacement build system (using either an existing framework such as SCons or WAF, or a homegrown solution), but there are over 2000 build scripts (makefiles, perl scripts, and various build manifests) in the Mozilla tree: none of the new build systems have a facility for porting this makefile logic, and any complete rewrite of the build system that is not incremental would be suicidal.

Instead, I have devised techniques to build large parts of the mozilla tree using fewer invocations of make. This will allow the build system to compute dependencies more accurately, as well as significantly reduce the number
of intermediate static libraries that are created during a build. It will also significantly help build times for people using parallel builds with -j and distributed builds with distcc. These new techniques will require GNU make 3.81, but will otherwise be fairly straightforward and can be implemented gradually, a few directories at a time.

Pain #4: Monolithic configure

As the Mozilla project transitioned from the unified suite to the standalone Firefox and Thunderbird, and now to a multitude of projects/products, the root configure script has become increasingly difficult to manage. It is difficult/impossible to tell which configure options work with which products. All the configure options are thrown into a single autoconf.mk file, which can cause hidden dependencies between modules.

While working on a cross-platform Tamarin build system, I discovered that replacing autoconf configuration scripts with python configuration scripts is relatively easy; it can be done without altering the Makefile-based build system. My first-cut scripts are a bare imitation of autoconf functionality, but fleshing out scriptable compiler and feature tests should be relatively straightforward.

I am hoping to port the main Mozilla configuration scripts to python over the next year. I’ll learn a lot from the Tamarin experience which can be applied to the main tree. I expect that the new scripts will not be ready for Mozilla 1.9, but will be used for Mozilla 2.

Flights of Fancy

If I had unlimited time, I would think about the following things:

  • Writing a makefile parser that could be used to read or convert the existing makefiles into a python-based build system with better scriptability and flexibility.
    • Which could detect when rules changed and rebuild automatically
    • Which could detect when JAR members changed and rebuild only those members
  • Improving the XUL preprocessor to support “real” #if conditions

Conclusion

Mozilla can and should make incremental improvements to the build system. This will be done gradually and carefully to solve specific pain-points and refactor code without disrupting existing work. The short term projects will reduce the pain for new developers and hackers, while the longer-term projects will reduce the pain of maintaining the configuration/build system.

Unit-Testing Update

Tuesday, November 28th, 2006

Unit testing of the Mozilla toolkit has been making some great progress behind the scenes, as developers look at how their current bugs are going to be tested. I spent the morning looking through toolkit checkins since my original announcement, and only a couple bugs landed without proper testcases. I think we are ready to start the triage process for older bugs, and I so I have created a status page with bugzilla queries. If you are interested in volunteering, I would love help with triaging bugs that do and don’t need testcases (this is also a great way to learn about our codebase). The current stats are:

Needs triage

524

Needs testcase (in-testsuite?)

5

in-testsuite+

2

in-testsuite-

2

The stats aren’t really accurate, since most of the feed parser bugs are filed in the Firefox product, even though they are fixed (and tested!) in toolkit code.

mddepend.pl stats

Monday, November 27th, 2006

My last post mentioned that mddepend.pl causes our build system to do many extra calls to stat(). I’ve done some instrumentation and come up with the following numbers (Linux, Firefox trunk):

Calls to mddepend.pl

mddepend calls to stat()

New objdir

336

65832

Nothing-changed rebuild

1148

224536

When building from scratch, there isn’t any need to call mddepend.pl: all the invocations and stat()s performed are unnecessary overhead. When doing a rebuild, some portion of the stats performed are necessary/expected, but nowhere near as many as are actually performed. I expect a full two-thirds of the calls to mddepend.pl are unnecesary, and probably 90% of the calls to stat(). The 224k stats in the depend build checked 16599 unique files, which means that a good stat cache reduces the size of the problem significantly.

Depths of the Mozilla Build System

Wednesday, November 22nd, 2006

I should really be posting about some important plans for the Mozilla build system that I nailed down during the Firefox summit. But I don’t have time to give that a proper post, so instead I’m going to discuss one of the amazing things I’ve learned about the Mozilla build system over the past few days. Look at this snippet from rules.mk. I’ve been the nominal owner of this code for almost two years, but didn’t really understand it until this week. This little piece of code is one of the things that makes our build system really great and horrible at the same time:

  1. Whenever people remove or alter the location of header files, this code keeps all the depend builds from going red.
  2. It causes us to call stat() an extra 10,000+ times per depend build. Probably a lot more than that, actually, but I didn’t instrument it.

We do an end-run around the normal dependency checks done by GNU make: the mddepend.pl script stats and calculates the compiler-generated dependencies in advance. If the dependency is missing or new, it adds a FORCE dependency on the object file. Unfortunately, we do this calculation on each build pass: once for export, once for libs, once for tools, and perhaps another time for check. This causes us to check dependencies many many more times than we actually need to.

What we really want to implement is an “optional dependency”: a directive that if a header has been updated, we should rebuild the object; but if the header doesn’t exist any more, we shouldn’t try to build it (because we don’t have any rules to generate such headers which were removed or relocated intentionally). This is probably not something I’m going to fix any time soon. But I may find time to write it up in detail to propose it as a feature for gmake 3.82.

Learning About Tamarin

Monday, November 20th, 2006

When a project like Tamarin is released as open-source, there is naturally a lot of interest in the project. I had a meeting with Eric Shepherd, Jeff Dyer, and Steven Johnson to coordinate a documentation plan for the Tamarin project. This is going to be an incremental process, starting with docs that already exist or are being created within Adobe. We have also identified various documents that we would like to create. I have put up notes from the meeting, for future reference.

If you have questions about Tamarin, feel free to ask them in the mozilla.dev.tech.js-engine newsgroup, or the #tamarin channel on irc.mozilla.org. Once you get your answer, please write an article for the Tamarin documentation on the Mozilla Developer Center.

Adventures in Python: Launching Subprocesses

Thursday, November 9th, 2006

I’ve been looking at python for various build automation. I had what I thought would be a simple problem:

How do I launch a process, collecting stdout/stderr, with a timeout to kill the process if it runs too long?

The python subprocess module gets about 80% there. You can launch a process, and hook up stdout/stderr/stdin. You can poll the process for completion. But subprocess doesn’t have a simple parameter for process timeout. Total time spent: 45 minutes.

So, you use a loop or a thread to wait for the process and kill it if it takes too long, right? Subprocess doesn’t have an instance method to kill the process. Answer according to #python on freenode? os.kill(theprocess.pid, signal.SIGTERM). Except that this apparently doesn’t work on Windows: you have to emulate it. Total time spent: 1.5 hours.

This works, on unixy systems. But it fails miserably on Windows. It turns out that on Windows when you kill a process, any subprocesses that were launched don’t get killed. So I went searching code that I thought must have already solved this problem: BuildBot launches processes and has to kill them, right? Well, it turns out that BuildBot uses Twisted to do the dirty work. Twisted completely ignores the problem, as far as I can tell. It doesn’t use subprocess, but instead has a file called _dumbwin32proc.py which provides the event-driven access to the process pipes and status. This file is uglier than the devil’s rear end. Total time spent: 2.5 hours.

After much pain, I found Windows documentation that might help: Windows 2000+ can put processes into jobs. Instead of killing the parent process, you can kill the entire job. As far as I can tell this should be implementable in Python, but I haven’t found anyone who’s done it yet (even better, abstracted it behind a cross-platform API). If you know of code which has this working properly, please let me know. Otherwise I will be spending another 4 hours tomorrow to get this working (I know only halting python, though I’m getting better quickly). Total time spent: 3.5 hours.

Learning new languages isn’t that hard. Learning new programming worlds, with their bugs and quirks, is really hard.

Update: Solution in my post on killableprocess.py.