Debugging Official Builds (or, how cool is the Mozilla symbol server?)

Monday, June 11th, 2007

Not infrequently, there are bugs filed in Mozilla by Smart People who want to help and who are experiencing an odd behavior or a bug. They want to help, but they really don’t want to spend the time to build Mozilla themself (and I really don’t blame them).

Now, at least on Windows, interested hackers have the ability to debug release builds of Firefox! Mozilla finally has its own symbol server which will provide debugging PDBs for nightly and release builds. See the Mozilla Developer Center for more information about using this new and exciting service. Note that this will only work for trunk builds from 1.9a5 forward, so it won’t be much help with our current Firefox 2.0.0.x release series. If you want to disable breakpad crash reporting and have crashes in nightly builds go straight to the Windows JIT debugging system, export MOZ_CRASHREPORTER_DISABLE=1 in your environment.

Visual C++ Express Edition symbol path dialog.

Kudos to Ted and Aravind for getting this set up.

Crash! Bang! Boom!

Wednesday, May 30th, 2007

If you were one of the lucky users who tried to do a nightly update on Windows between 5am and 11am PDT Tuesday morning, you were probably treated to this dialog when you launched your new Firefox:

Crash! Bang! Boom!

Because the new crash-reporting UI had just been landed, people on the forums assumed that the new crash reporter UI was malfunctioning, but actually it was doing its job perfectly. It turns out that there was actually a startup crash on many Windows systems. What’s even more exciting is that this kind of crash is not caught by the Talkback crash reporting system because of the sequence of how we load XPCOM components.

The Breakpad/Socorro crash reporting project has really come together in the past few weeks. After a lot of pain and frustration, Morgamic and I concocted a database schema that is scalable and efficient. We have been building the basic pieces of a reporting app that will allow QA and developers to analyze crash data. Sayrer has spent untold hours getting Socorro ready for initial deployment, and then dealing with a set of problems1 that are still being diagnosed and fixed. Aravind has been patiently dealing with a new deployment of a complicated three-part application which is rough around the edges. Luser and dcamp rushed to get the client UI in usable shape from beltzner’s mockup, which we got landed 5 minutes before this morning’s nightly builds.

This is a major milestone, and I am really proud of the team that has come together to make this all happen.

Status Update

  • For Firefox 3.0a5, Breakpad is enabled by default:
    • on Mac: 100% of Mac installations will have Breakpad and Talkback will not be available.
    • on Windows: Breakpad is enabled on all installations, but 50% of installations will still have the Talkback client. This will allow us to compare some statistics between the old and new systems. When both systems are enabled, Talkback “wins”, because it registers last.
    • not on Linux: the Linux client is not ready yet; it will be completed within the next few weeks. There are some unsolved issues in the breakpad library itself, as well as integration with Mozilla and how to allow the client to submit reports via HTTPS.
  • The crash reporting server currently has some issues (i.e. it is only processing one report per hour, due to some design flaws). The fixes have been landed in SVN and should be on the staging server today.
  • The server currently has very basic reporting/searching capabilities only. These capabilities will be expanded fairly quickly, with weekly updates to the underlying software.
  • Currently we plan on keeping crash reporting data “forever”. The database has partitions that will allow most common queries to operate on a subset of the data in an efficient manner.

What’s Next?

There’s still a lot to be done. There are lots of reports we need on the server, and many more features that would be nice. Sancus, ispiked, and jay are on board to help develop the server, but we could use more help!

  • We need design help! If you do active QA using the existing Talkback reporting tools, please take a moment to think of what kinds of crash reporting features you would find most useful in the new system. Please post your ideas to the mozilla.dev.quality newsgroup, being as specific as possible.
  • We need a statistician. I am especially looking for someone who is skilled at identifying statistical anomalies over time in a fairly large set of data, for reports such as “Help me reproduce this crash” and “Find new crash regressions”.
  • We need implementation help on the server. To get people started, I have created a CentOS5 image which can be run in VMWare Player with a pre-installed version of the Socorro server (available on request). There are also documents on getting started hacking Socorro, building Firefox with breakpad symbols.
  • For more information about the project schedule and planning, see the Mozilla wiki.

If you are interested in helping, or just have questions, feel free to stop by the #breakpad channel on irc.mozilla.org, or post to mozilla.dev.quality.

Socorro server pieces and interactions.

Notes

  1. Deploying a web app is really hard. Production environments are hard to replicate on local testing servers: NFS mounts, tightly controlled versions, heavy loads, secured databases, and real-world data are hard to come by. #

Using Breakpad with Gran Paradiso (1.9a3)

Thursday, March 29th, 2007

As I’ve mentioned before, for Firefox 3 we are planning on replacing the old and crusty talkback crash reporting system with a shiny new system based on Google Breakpad (formerly called Airbag). We have a set of milestones and a schedule, and a great team working on client and server pieces.

The Gran Paradiso 1.9a3 release has support for breakpad crash reporting on Windows. It is off by default because we don’t have a production server configured yet, but we would be happy for people to test using Ted‘s development server. To submit crashes to the server, follow this procedure:

  • Download Gran Paradiso Setup Alpha 3.exe
  • Disable talkback. You can do this either by choosing not to install talkback in the installer, or by disabling it in the Add-ons manager.
  • Set MOZ_AIRBAG=1 in your environment. To make this a permanent setting, go to My Computer -> Properties -> Advanced -> Environment Variables.
  • Run Firefox. Crash. You can crash 1.9a3 reliably by following this link.
  • Unfortunately, the client doesn’t keep track of a crash report ID yet. You’ll have to find the report you submitted by date/time.

WARNINGS: Because this is Ted’s development server it may disappear at any time, or your crash data might disappear whenever he feels like it. Secondly, it is possible for passwords and other data to end up in minidump files. If you are paranoid about HTTP snooping (or you don’t trust Ted), don’t send crash reports.

NOTE: You could follow this procedure with any nightly build, and it would submit the report successfully. But since Ted doesn’t have symbol data for every nightly build, you wouldn’t get much useful information.

There is at least one thing this system already does better than talkback: it provides symbolic information for Windows system DLLs. Ted has uploaded the symbol information for Window XP SP2 and the VC8 CRT. In the future we can upload symbol information for other versions of Windows. For an example of a stack that walks through the system and the CRT, see this report from a memory-corruption bug.