Firefox Modularity and WebExtensions

Saturday, September 3rd, 2016

Last week, Andy McKay posted skeptical thoughts about whether Firefox system addons (also known as go-faster addons) should be required to use the WebExtensions API surface. I believe that all Firefox gofaster addons, as well as our Test Pilot experiments and SHIELD studies, should use WebExtensions. The explanation comes in three main areas:

  1. The core limiting factor in our ability to go faster while still shipping high-quality software isn’t the packaging and shipping mechanisms for our code. It’s the modularity and module boundaries of our codebase.
  2. The module structure and API surfaces of the Firefox/gecko codebase is hurting our product quality, hurting our teams and our ability to attract volunteer contributors, and hurting our ability to go faster. But there are some shining counter-examples!
  3. WebExtensions should be the future for module boundaries in the Firefox frontend. It’s worth planning carefully and going slow enough to get good module boundaries first, so that we can go blazingly fast later.
  4. Coda: by sticking to a single model for Firefox addons and builtin system addons, we can focus supporting work more effectively and make life better for both addon authors and Firefox engineers.

A diagram showing the "clustered cost" of Mozilla's module interdependencies from 1998-2000.

There is a long history of academic research about software modularity. I remember the OSCON talk in 2006 when I first heard about Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code: this talk and paper has changed the way I think about software structure and open source and how to organize engineering teams including volunteers. (Incidentally, this seems like another way of saying “everything that’s important to me at work”.)

Quoting one of the followup papers that studies the impact of software modularity on change over time:

Specifically, we show that i) tightly-coupled components are “harder to kill,” in that they have a greater likelihood of survival in subsequent versions of a design; ii) tightly-coupled components are “harder to maintain,” in that they experience more surprise changes to their dependency relationships that are not associated with new functionality; and iii) tightly-coupled components are “harder to augment,” in that the mix of new components added in each version is significantly more modular than the legacy design.

—Alan MacCormack, John Rusnak, and Carliss Y. Baldwin. The Impact of Component Modularity on Design Evolution: Evidence from the Software Industry.

The structure and connectedness of modules in a codebase determines what you can change easily (and therefore quickly) and what things have to change slowly. There is not a single design for a software project: you have to design the structure of your modules and interfaces based on what may need to change in the future. Even more fundamentally, the shape and structure of APIs and modules determines where teams can be innovative most successfully. Relatedly, module size and structure determines where contributors can be most effective and feel most welcome.

There are some shining success stories at Mozilla where good modular design has allowed teams to work independently and quickly. The devtools debugging API is a great example of carefully designing an API surface which then allowed the team to move quickly and innovate. The team spent a lot of time refactoring guts within the JS engine to expose a debugging API which wasn’t tightly tied to the underlying platform. And since devtools are isolated from the rest of the browser in an iframe, there is a natural separation which allows them to be developed as a unit and independently. The advantages of this model became quickly apparent when the team started reusing the same basic debugging structure for a desktop Firefox to debug content in Firefox for Android, or debug the Firefox chrome itself, or even debug content in Chrome!

A lot of the pain and slowness in Firefox development relates to our code not having well-defined, documented, or enforced module boundaries or interfaces. Within the Firefox frontend itself, we have many different kinds of functionality that runs in the single technical context of the main browser window. We sometimes pretend that there are relatively clean boundaries between the Firefox frontend and the underlying platform, but this is a myth. There are pervasive implementation dependencies between things like the tab browser in the frontend and the network/docshell/PSM validation in the platform which often have to be modified together.

Lack of clean modules and interfaces manifest themselves in many ways. It is notoriously difficult to write tests for Firefox. This is pretty natural, since good tests often focus on the interface boundaries between modules. It also makes it extremely difficult to “mock” unrelated modules cleanly for a test. The pervasiveness of “random oranges” is not because tests are written poorly: it is most related to tests being very difficult to write well.

Historically problems maintaining and building Firefox addons and the addon ecosystem is directly related to not having an API surface for addons. Because writing an addon usually involved hacking into the structure of the browser.xul DOM and JS, it was natural that addon authors were frequently affected by browser changes. And it was very difficult for anyone, Firefox hacker or addon hacker alike, to know whether or how any particular change would affect addons. The most common complaint from addon authors was that we didn’t have good API documentation; but this is not a problem with the documentation. It’s a problem of not having good APIs. Similarly, a major user complaint was that addons break too often. This is not a problem of addon authors being bad coders; it is more fundamentally a lack of stable API surfaces that would allow anyone to write code that continues working over time.

In order to make addon development fun and productive again, we’ve made big investments in WebExtensions. WebExtensions is a clearly specified, tested, documented, and maintained API surface targeted at addons. The WebExtensions system will allow addon authors to innovate and be creative within a structure that lets Firefox support them over time. Over the next year, we expect basically all Firefox addons to move to WebExtensions. We’ve already seen this energize the addon community and especially promote addon development by authors who gave up developing old-style Firefox addons.

Within Firefox development itself, we need this same combination of structure and innovation. Starting with the new system addons, we need to spend the time up front to build API surfaces that allow teams within Mozilla to experiment and build new systems safely, with the confidence that they aren’t going to break Firefox by accident. We need to have enough foresight to say where we want to experiment, so that we can make sure that we have the right API and module boundaries in place so that experiments can be successful.

If our system addons are still coded in a way that depends on the internal structure of browser.xul, or XPCOM, or the network stack, then they have inherently high testing cost. We need to re-validate them against each release, and be very cautious about pushing changes because we don’t know whether any change could break the browser.

If, on the other hand, we use WebExtensions and have technical guarantees that addons are isolated from other code, we have the ability to push new changes aggressively and actually independently. QA and release teams don’t need to test every change extensively and in all combinations: instead we can use the technical isolation guarantees of the WebExtensions API to allow teams to be more completely independent, the same way we want addon authors to be essentially independent.

As we start developing system addons using WebExtensions, there are going to be APIs which we don’t want to commit to, or that we’re not sure about. I think we will have to come up with APIs that are not exposed to all addons, but only to particular system addons where we understand the risks and experimental nature. We are going to need a way for many teams to be responsible for their own API surfaces, and not route every change through the addons engineering team. We should embrace these challenges as the cost of success.

Having more code use WebExtensions is a way to focus engineering efforts. We know that both addons and system code needs better support for profiling and performance monitoring, telemetry, unit testing, localization, and other engineering support functions. By having our system addons use the same basic API layer as external addons, we can build tools that improve life for everyone. The cost of building each of these things well is large, but through shared effort we can make our engineering investment return much greater dividends.

At the end of the day, I believe we should end up in a world where most of the Firefox frontend is made available via system addons and glued together using well-defined API surfaces. What would it be like to have our tabbed browser, location bar, search interface, bookmarks system, session restore, etc be independently hackable? It would be awesome! It would mean a higher quality product, with more opportunities for volunteer contributors to improve specific areas. It would likely mean reduced edit/build/run/test cycles. It might even give addon authors the ability to completely replace certain components of the browser if they so choose.

I shared a draft of this post with Andy and other tech leads and got some great feedback. In order for discussion to go to one place, please post any replies or thoughts to the firefox-dev mailing list.

Chart of the Day: Firefox Nightly Update Adoption Curves

Monday, April 15th, 2013

In general, people who are running the Firefox Nightly and Aurora channel are offered a new build every day. But users don’t update immediately, because Firefox does not interrupt you with an update prompt upon receiving an update. Instead it waits and applies the update at the next Firefox restart, or prompts the user to update only after significant idle time.

This means that there is a noticeable “delay” between a nightly build and when people start reporting bugs or crashes against the build. It also means that the number of users using any particular nightly build can vary widely. The following charts demonstrate this variability and the update adoption curves:

Per-build usage and adoption curves, Firefox nightly builds on Windows, 1-March to 14-April 2013
Overlapped adoption curves, 1-March to 14-April 2013

Because of this variability, engineers and QA should use care when using data from nightly builds. Note the following conclusions and recommendations:

  • Holidays, weekends, and other unexplained factors may mean that some nightly builds get below-average user totals.
  • Users often skip nightlies: reported regression ranges should be verified.
  • Reliable crash metrics will not be available for several days after a nightly build is released.
  • It may be necessary to correlate crash rates on particular builds against the user counts for that build in order to accurately measure crashes-per-user.
  • When multiple nightlies are built on the same day (for example, a respin for a bad regression), the user count for each build will be lower than an average nightly build.

This data was collected from ADU data provided by metrics and mirrored in the crash-stats database. The script used to collect this data is available in socorro-toolbox.

Make the Title Bar Clickable When Tabs are On Top

Thursday, March 24th, 2011

In Firefox 4, tab are “on top” by default. When you are in maximized mode, this also means that the tabs move up into the titlebar. I understand that the UI design team decided to do this to make the tabs easier to click (because of Fitts’ law), but I personally find this behavior annoying, because it means that you cannot use the title bar for its normal tasks. I almost always double-click the titlebar to switch between maximized and regular mode, or drag the window to the side to get half-screen mode.

Fortunately, there is a hidden preference which controls this behavior. You can restore normal title bar behavior in two ways:

  • Set the hidden preference “browser.tabs.drawInTitlebar” to false, using about:config
  • Install my extension Aero Window Title, which not only changes the default preference for you, but also puts the window title back onto the screen so that you can see the title of the current tab easily.

While I’m talking about it, I’d also like to plug my other new extension, Iconic Firefox Menu, which will turn the orange Firefox menu (on Windows) into a Firefox icon. Together, the two extensions look like this:

The Firefox menu is now an icon, instead of an ugly orange button. The title of the current tab is displayed in the title bar.

The Firefox Plugin Hang Detector

Wednesday, June 9th, 2010

If all goes well, Firefox 3.6.4 will be released with support for out-of-process plugins on Windows and Linux next week. As part of this project, Firefox now features a “hang detector” for plugins. The hang detector helps protect users from plugins and plugin scripts which stop responding for 10 seconds.

When Firefox makes an NPAPI call to a plugin which is being run in a separate plugin process, that call is translated into a message which is posted to the process. Firefox then waits for the response. If the plugin takes longer than 10 seconds to respond to a call, Firefox performs the following actions:

  • Stop the plugin process and collect a plugin-process “crash” minidump.
  • Collect a “crash” minidump from the browser process.
  • Terminate the plugin process.
  • Display the plugin-crashed UI, giving the user the opportunity to refresh the page and try again.

There are several reasons plugins might trigger the hang detector:

  • A plugin script (such as ActionScript run inside the Flash plugin) may be in an infinite loop or performing a very long computation.
  • The plugin itself may have a bug (such as a threading deadlock) which causes it to stop responding.
  • The plugin is not deadlocked, but is not processing events quickly enough, causing the event to linger waiting to be processed.
  • The implementation of Firefox out-of-process plugins may be causing a deadlock.

It is this last possibility that is most concerning, and we have pored over our Firefox crash stats studying the hang reports that we receive, trying to categorize the reports into one the categories above. During the long process of Firefox 3.6.4 release candidates, we have identified and fixed several “tophangs”: see bug 561817 for an example.

When hang reports were first submitted to crash-stats, it was very difficult to distinguish full crashes from hangs. It was also impossible to correlate the browser and plugin parts of a hang. Since then, the Socorro team has done an outstanding job of improving the crash-stats UI so that we can analyze hang reports. It is now possible, using the advanced query page, to search for only crash reports, only hang reports, and to limits searches to either the browser process or the plugin process. Individual hang reports are now cross-linked: see the following reports for an example:

In this particular hang-pair, it isn’t immediately clear what is causing the hang. The browser is painting, and during the process of painting a windowless Flash plugin sends the NPP_SetWindow NPAPI call. This is the hanging call. At the time the hang report was collected, the NPAPI thread of the Flash plugin is calling NPN_InvalidateRect, and inside that implementation locking a mutex. But we don’t know, looking at the stacks, whether this is a deadlock on that mutex, or whether the plugin just happened to be making that call at the time the hang detector collected its stack.

In some cases, developers may want to disable the hang detector. If you are using the Flash debugger, or if you are debugging Firefox, you can change the hang detector timeout by changing the preference dom.ipc.plugins.timeoutSecs. Setting a value of -1 disables the hang detector entirely. For more information on setting preferences, see about:config on MozillaZine.

Multi-Process Plugins on By Default

Wednesday, January 27th, 2010

Out-of-process plugins (OOPP) are now on by default in mozilla-central! Starting tomorrow morning, the mozilla-central nightly builds will load Flash and all other plugins in a separate process by default (on Windows and Linux). The Electrolysis team would love for people to test any plugins on their system, especially less-popular plugins.

Since we are moving relatively quickly with multi-process plugins, there are a few known issues to be aware of:

  • The plugin-crash UI is not finished. The current UI is just a non-localized dialog so that we can get crash reports from nightly testers. This will be changed soon!
  • On Windows, tearing/repainting issues when scrolling, bug 535295
  • On Linux, compiz effects and Flash don’t work together on some systems, bug 535612
  • On Windows, selecting “Print” option in Flash may lock up Firefox, bug 538918
  • On Windows, hulu won’t switch to full-screen mode, bug 539658
  • On Linux with GTK+-2.18 or later, GDK assertions and a fatal XError, bug 540197
  • Firefox-process crashes at NPObjWrapper_NewResolve with silverlight and sometimes Flash, bug 542263

If you discover crashes while running nightlies, please make sure you submit them, and check about:crashes for the crash ID and signature. We could use help making sure plugin-related crashes and instability are filed and tracked by searching for signatures here and filing bugs in the Core:Plug-Ins component.

If your browser hangs, you can probably recover by killing the mozilla-runtime process in the Windows task manager or via `kill` on Linux. If you are a developer with a debugger, please use the Mozilla symbol server and get stacks for both the Firefox process and the mozilla-runtime process and file a bug.

In some cases, it may be useful to the Electrolysis developers if you obtain a plugin log, which is a log of calls made between the plugin and the browser. Instructions for obtaining the log are available here.

I am very excited that we’ve made it this far, and I look forward to our next milestone release, which will backport these changes to the 1.9.2 release in preparation for Firefox Lorentz.

Flash in a separate process

If for some reason you need to disable multi-process plugins, set the pref dom.ipc.plugins.enabled to false.

Firefox 3.6

Thursday, January 21st, 2010

We released Firefox 3.6 today. If you are currently running Firefox, choose “Check for Updates” from the Help menu. If you aren’t, go get Firefox 3.6 now! One of our most popular new features is Personas, which you can use to style Firefox the way you want. We’ve also made Firefox faster, more responsive, and more secure than ever.

Mousewheel Zoom Eureka!

Wednesday, October 28th, 2009

In Firefox, you can make the page text larger and smaller by holding down the Control key and rolling your mouse scroll wheel up and down. Before continuing to read, I want you to think about which direction makes more sense: should scrolling down make the page get bigger or smaller?


How to disable the Comodo reseller root certificate in Firefox

Wednesday, December 24th, 2008

Slashdot is a-buzz (and rightly so!) with news that people have been able to obtain an SSL certificate for a domain they don’t own, by applying with one of Comodo’s certificate resellers. It is clear that there has been a major breach of trust, but we’re not sure of the best general solution. There has been a discussion in the newsgroup about what steps Mozilla should take for this breach.

In the meantime, I recommend disabling the root certificate used by this certificate authority, to avoid the possibility that other fraudulent certificates are floating around in the wild. Here’s how to disable the relevant CA root in Firefox:

  1. Open the preferences window
  2. Select the “Advanced” tab
  3. Select the “Encryption” sub-tab
  4. Choose “View Certificates”
    Firefox Preferences Window: Advanced -> Encryption -> Certificates

  5. Find and select the “AddTrust AB / AddTrust External CA Root” item
  6. Choose the “Edit’ button
    Root Certificates Dialog

  7. Remove all trust setting check-boxes.
    Edit Certificate Dialog

Note: disabling this root certificate will SSL websites validated by this Comodo reseller to stop working. That’s why you’re doing it, but if it’s your favorite website that stops working, please don’t blame me! If you’re really paranoid, you could also disable all Comodo roots: these include all the certificates with names like “AddTrust”, “Comodo CA Limited”, and “The UserTrust Network”.

Thanks to Eddy Nigg for first providing these instructions.

Firefox 1.0.7!?

Saturday, December 8th, 2007

I used my father-in-law’s laptop recently. I had installed Firefox on it for him when he bought it, and I was happy to see it was still the default browser. I thought it was a little odd that it opened new windows by default instead of new tabs, and then I had a terrible suspicion and checked “Help -> About Firefox” and discovered to some dismay that he was still running Firefox 1.0.7.

Needless to say I installed an up-to-date version immediately. I wonder how many other people might have no clue that their applications are incredibly out of date. I also wonder why his security software (virus scanner/firewall) wouldn’t warn him about such an important aspect of system security.