Laying Blame
Tuesday, December 9th, 2008As I mentioned last week, I’ve been resurrecting a project to report on compiler warnings. A basic form of this buildbot is now operational on the Firefox tinderbox tree (look to the far right for the static-analysis-bsmedberg column). It prints a summary of the total number of warnings on the summary page: in the full tinderbox log, it lists each warning and who can be “blamed” for that warning:
/builds/static-analysis-buildbot/slave/full/build/memory/jemalloc/jemalloc.c:177:1: warning: C++ style comments are not allowed in ISO C90 blamed on Taras Glekin revision hg:36156fbf817d8a0e2d54a271cf0bff94a1c41c13:memory/jemalloc/jemalloc.c /builds/static-analysis-buildbot/slave/full/build/js/src/jsdbgapi.cpp:712: warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object blamed on brendan@mozilla.org in revision cvs:3.36:js/src/jsdbgapi.c
Assigning blame can be a tricky process. In order to figure out the blame for a warning, the code uses the following steps:
- Resolve relative paths against the current working directory, using GNU make “Entering/Leaving directory” markers as a guide.
- Dereference symlinks to find the source tree location of an error. For instance, Mozilla headers which produce warnings often do so via paths in dist/include. We have to resolve these to their original source tree location in order to find blame.
- Using mercurial APIs (through python), find the mercurial changeset which introduced the line in question.
- If the code dates back to Mercurial revision 9b2a99adc05e, which is the original import of CVS code to Mercurial, use a database of CVS blame to find the original CVS checking which was responsible for introducing that line of code.
If you’re interested, take a look at the build log parsing code, or see the scripts which save CVS blame to a database (thanks Ted!).
The current reporting system for warnings is very primitive. I’m currently working on a second version of this code which will provide additional features:
- Compare warnings with the previous build and highlight “new” warnings. I do this by recording the error text and the blamed location of the warning. As lines are added and removed from the code, the reported location of the warning changes, but the location of Hg/CVS blame doesn’t. This means it is a stable location which can be used for comparisons across runs. It even works across file renames/moves!
- Web frontend to the warning database to allow users to query warnings by user or directory.
- Classify warnings by “type”. This is not a simple process, because GCC mixes distinctive error text, such as “may be used uninitialized in this function” with variable names; and the granularity of -fdiagnostic-show-option is low enough that it’s not very useful by itself. Oh, I wish GCC had error codes like MSVC does: C1234 is easy to recognize!
At one point, I thought I could implement all of the warning mechanism on the buildbot server by parsing the BuildStep logs. It quickly became clear that I couldn’t do that, because I couldn’t resolve symlinks, and getting Mercurial blame was difficult or impossible. My new version actually uses a hybrid mechanism where the build log is parsed on the buildbot slave: this parses out warnings, resolves symlinks, and looks up blame. It then sends the results back via stdout to the master. A custom build step on the master parses this log, saves the information to a database, and does the checking for new warnings and prints various results to custom build logs.