Archive for the 'Mozilla' Category

Wordmaps without Java

Friday, December 12th, 2008

Word maps generated by wordle.net have been making the rounds. They are very cool representations of the frequency that various words appear in a hunk of text (such as a blog feed). Unfortunately, the code to generate these word maps is not open source, and it requires Java.

So I decided to take on Johnath’s challenge and produce something similar using HTML canvas and JavaScript:

Wordmap of BSBlog

You can take it for a spin too, but only if you have Firefox 3.1. Try it out!. I’m currently using some features that are specific to Firefox 3.1, such as JavaScript 1.8 and Canvas.measureText. I think I can backport this code to support Firefox 3 by checking for .mozMeasureText and .mozTextStyle. I don’t know whether Safari currently supports text drawing or measurement in their canvas implementation. If they do, this can probably be made to work there as well.

If you’re interested in the code, a Mercurial repository is available on hg.mozilla.org. There are a couple improvement possibilities noted in the README file. Some other possibilities that I’m just thinking of now:

  • Produce an image map to make all the terms link to the relevant post(s).
  • Produce SVG output to make the output scalable.

Laying Blame

Tuesday, December 9th, 2008

As I mentioned last week, I’ve been resurrecting a project to report on compiler warnings. A basic form of this buildbot is now operational on the Firefox tinderbox tree (look to the far right for the static-analysis-bsmedberg column). It prints a summary of the total number of warnings on the summary page: in the full tinderbox log, it lists each warning and who can be “blamed” for that warning:

/builds/static-analysis-buildbot/slave/full/build/memory/jemalloc/jemalloc.c:177:1: warning: C++ style comments are not allowed in ISO C90 blamed on Taras Glek  in revision hg:36156fbf817d8a0e2d54a271cf0bff94a1c41c13:memory/jemalloc/jemalloc.c
/builds/static-analysis-buildbot/slave/full/build/js/src/jsdbgapi.cpp:712: warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object blamed on brendan@mozilla.org in revision cvs:3.36:js/src/jsdbgapi.c

Assigning blame can be a tricky process. In order to figure out the blame for a warning, the code uses the following steps:

  • Resolve relative paths against the current working directory, using GNU make “Entering/Leaving directory” markers as a guide.
  • Dereference symlinks to find the source tree location of an error. For instance, Mozilla headers which produce warnings often do so via paths in dist/include. We have to resolve these to their original source tree location in order to find blame.
  • Using mercurial APIs (through python), find the mercurial changeset which introduced the line in question.
  • If the code dates back to Mercurial revision 9b2a99adc05e, which is the original import of CVS code to Mercurial, use a database of CVS blame to find the original CVS checking which was responsible for introducing that line of code.

If you’re interested, take a look at the build log parsing code, or see the scripts which save CVS blame to a database (thanks Ted!).

The current reporting system for warnings is very primitive. I’m currently working on a second version of this code which will provide additional features:

  • Compare warnings with the previous build and highlight “new” warnings. I do this by recording the error text and the blamed location of the warning. As lines are added and removed from the code, the reported location of the warning changes, but the location of Hg/CVS blame doesn’t. This means it is a stable location which can be used for comparisons across runs. It even works across file renames/moves!
  • Web frontend to the warning database to allow users to query warnings by user or directory.
  • Classify warnings by “type”. This is not a simple process, because GCC mixes distinctive error text, such as “may be used uninitialized in this function” with variable names; and the granularity of -fdiagnostic-show-option is low enough that it’s not very useful by itself. Oh, I wish GCC had error codes like MSVC does: C1234 is easy to recognize!

At one point, I thought I could implement all of the warning mechanism on the buildbot server by parsing the BuildStep logs. It quickly became clear that I couldn’t do that, because I couldn’t resolve symlinks, and getting Mercurial blame was difficult or impossible. My new version actually uses a hybrid mechanism where the build log is parsed on the buildbot slave: this parses out warnings, resolves symlinks, and looks up blame. It then sends the results back via stdout to the master. A custom build step on the master parses this log, saves the information to a database, and does the checking for new warnings and prints various results to custom build logs.

Parsing Compiler Errors

Wednesday, December 3rd, 2008

Long ago, Mozilla had a tinderbox which would collate every warning produced by the Mozilla build and generate statistics and reports about them. I’m trying to re-create this tool.

When building Mozilla or most other large software projects that use GNU make, compiler warnings get sent to stdout (and sometimes stderr). The messages usually look something like this:

../../../dist/include/dom/nsIDOMXULSelectCntrlEl.h:33: warning: ‘virtual nsresult nsIDOMXULSelectControlElement::GetSelectedItem(nsIDOMXULSelectControlItemElement**)’ was hidden
../../../dist/include/dom/nsIDOMXULMultSelectCntrlEl.h:73: warning:   by ‘virtual nsresult nsIDOMXULMultiSelectControlElement::GetSelectedItem(PRInt32, nsIDOMXULSelectControlItemElement**)’

All of the file paths in the warning (or set of warnings in this case) are relative to the current working directory. Because the working directory changes during the build as make recurses through subdirectories, automatic parsers need some way to know what the working directory is at any point in the build log. Make provides the -w option which will print the following output every time it recurses into or leaves a directory:

make[1]: Entering directory ‘/builds/static-analysis-buildbot/slave/full/build’
make[1]: Leaving directory ‘/builds/static-analysis-buildbot/slave/full/build’

This is fine if you are only building in one directory at a time. But with the -j option, it is likely that make will be building in multiple directories at once. This will interleave output from multiple jobs in the same log, making it difficult for an automated parser to make any sense of them.

What I’d like is a tool or technique which will save the build log for each makefile command separately and combine them all at the end of a build.

Pre-emptive snarky comment: “switch Mozilla to use {scons,waf,cmake,ant,…}”

Generated Documentation, part 2

Wednesday, November 26th, 2008

As I noted previously, I’ve been using our static analysis tools to generate documentation for the Mozilla string classes.

All of the code to generate this documentation is now checked in to mozilla-central. To regenerate documentation or hack the scripts, you will first need to build with static-checking enabled. Then, simply run the following command:

make -C xpcom/analysis classapi

To automatically upload the documentation to the Mozilla Developer Center, run the following command:

MDC_USER="Your Username" MDC_PASSWORD="YourPassword" make -C xpcom/analysis upload_classapi

One of the really exciting things about the Dehydra static-analysis project is that the analysis is not baked into any compiler. You can version your analysis scripts as part of your source code, run them from within your build system, and change them as your analysis needs change.

For example, I decided that a class inheritance diagram would help people understand the Mozilla string classes. So I modified the documentation script to produce graphviz output in addition to the standard XML markup. I then process the graphviz output to PNG with an imagemap and upload it to MDC along with the other output as an attachment1.

The output is available now. I’m still looking for volunteers to improve the output as well as the source comments to make it all clearer!

1. There is a MediaWiki extension so you can put graphviz markup directly in a wiki page and it will be transformed automatically. However, this extension currently doesn’t work on the Mozilla Developer Center. It’s being tracked in bug 463464 if you’re interested. ^

ABC Meme

Monday, November 24th, 2008

Instructions: type the letter ‘a’ in your browser location bar and choose the first match from the dropdown. Repeat for each letter of the alphabet.

Browser: Firefox 3.1 beta

a: Air Mozilla
b: /buildbot/steps/source.py – Buildbot – Trac
c: mozilla mozilla/configure.in
d: Digg / News
e: Enter Bug: Core
f: First National Bank of PA Personal Banking Services
g: Google Quicksearch: g
h: hghooks: Summary
i: intranet
j: mozilla mozilla/js/src/jsapi.h
k: Build Log (Brief) – Win2k3 comm-central dep unit test on 2008/10/30 15:57:27
l: Lilypond program-reference
m: tinderbox: MozillaTry
n: Google News
o: os.path – Common pathname manipulations
p: mozilla-central: pushlog
q: The irc.mozilla.org QDB: Welcome
r: mozilla-central mozilla/config/rules.mk
s: Slashdot – News for nerds, stuff that matters
t: BSBlog > Blog Archive > The Testing Matrix
u: United Airlines – Airline Tickets, Airline Reservations, Flight Airfare
v: Lilypond program-reference
w: washingtonpost.com – nation, world, technology and Washington area news and headlines
x: XPCOM Glue – MDC
y: The New York Times – Breaking News, World News & Multimedia
z: Mozilla Dehydra Cross Reference

DTrace Bugs on Mac

Thursday, November 20th, 2008

Ted and I have been looking rather closely at the performance of the Mozilla build system. In order to get a better sense of where we’re spending time, I wanted to use dtrace to get statistics on an entire build.

Basic Process Information From DTrace

In theory, the dtrace proc provider lets a system administrator watch process and thread creation for a tree of processes. Using normal dtrace globals, you can track the process parent, arguments, working directory, and other information:

/* progenyof($1) lets us trace any subprocess of a specific process, in this case the shell from
   which we launch the build */

proc:::create
/progenyof($1)/
{
  printf("FORKED\t%i\t%i\t%i\n", timestamp, pid, args[0]->pr_pid);
}

proc:::exec
/progenyof($1)/
{
  printf("EXEC\t%i\t%i\t%s\t%s\n", timestamp, pid, curpsinfo->ps_args, cwd);
}

proc:::exit
/progenyof($1)]
{
  printf("EXIT\t%i\t%i\n", timestamp, pid);
}

Unfortunately, the MacOS implementation of dtrace doesn’t reflect information very well:

  • curpsinfo->ps_args doesn’t contain the entire command-line of the process; it only contains the first word
  • cwd doesn’t contain the entire working directory /builds/mddepend/ff-debug but only the last component ff-debug. Since many of our directories within the tree share names such as src and public, the information is pretty much useless.

Process CPU Time in DTrace

Dtrace doesn’t give scripts a simple way to track the CPU time used by a process: the kernel psinfo_t struct does have a pr_time member, but this is of non-reflected struct timestruc_t.

There is another way to calculate this: dtrace exposes a variable vtimestamp which represents, for each thread, a virtual timestamp when that thread was executing. By subtracting the vtimestamp at proc:::lwp-start from the vtimestamp at proc:::lwp-exit you can calculate the time spent in each thread, and use sums to calculate the per-process total.

proc:::lwp-start
/progenyof($1)/
{
  self->start = vtimestamp;
}

proc:::lwp-exit
/self->start/
{
  @[pid] = sum(vtimestamp - self->start);
  self->start = 0;
}

END
{
  printf("%-12s %-20s\n", "PID", "TIME");
  printa("%-12i %@i\n", @);
}

Unfortunately, the MacOS implementation of DTrace has a serious bug in the implementation of proc:::lwp-start: it isn’t fired in the context of the thread that’s being started, but in the context of the thread (and process!) that created the thread. This means that the pid and vtimestamp reported in the probe are useless. I have filed this with Apple as radar 6386219.

Summary

Overall, the bugs in the Apple implementation of DTrace make it pretty much useless for doing the build system profiling I intended. I am now trying to get an OpenSolaris virtual machine up for building, since I know that DTrace is not broken on Solaris; but never having used Solaris before, I’ll save that story for another day.

Paginate the Web?

Tuesday, October 28th, 2008

Web pages scroll, usually vertically. Is this a good thing?

I was reading an article that Deb pointed out from The Atlantic: “Is Google Making Us Stupid?” and I noticed something: I could easily keep attention on the page when I wasn’t scrolling. But as soon as I got to the bottom of the page, it was much harder to stay focused.

What if web browsers paginated articles by default, instead of laying them out in a vertical scroll view by default? Would that improve reader attention span, or just cause users to stop reading after the first page?

Is it possible to write a Firefox extension to render websites as paginated entities instead of scrolling entities? I suspect not, and that would require assistance from the core Gecko layout engine, but I think it would be a very interesting UI experiment!

Unusual Town Names in Pennsylvania

Tuesday, October 21st, 2008

Moving to Pennsylvania a few years ago, I discovered that, unlike Virginia, not all places are named after English nobility or geographic features. I have collected some of the more amusing/unusual place names I discovered in Pennsylvania into a Google map.


View Larger Map

I love how the triangle of Paradise, Intercourse, and Fertility are so far removed from Climax.

Many coal towns in Western PA were named by the mining companies. Uninventive names like “Mine 71” are common. But I think the best is “Revloc”. This is the next town over from Colver, and whoever named it just decided to spell Colver backwards.

Cambria county doesn’t allow outdoor advertising of pornography; so immediately outside of the county on U.S. route 22 (near Climax PA) there are a group of video stores and strip clubs. One sign in particular has a rather amusing mis-use of quotation marks:

“Live” girls!!!

PUTting and DELETEing in python urllib2

Tuesday, October 21st, 2008

The urllib2 Python module makes it pretty simple to GET and POST data using HTTP (and other protocols). But there isn’t a good built-in way to issue HTTP PUT or DELETE requests. I ran into this limitation while working on a project to upload automatically generated documentation to the Mozilla Developer Center. The DekiWiki API for uploading an file attachment uses the HTTP PUT method.

It turns out there is an easy workaround. You can subclass the urllib2.Request class and explicitly override the method:

import urllib2

class RequestWithMethod(urllib2.Request):
  def __init__(self, method, *args, **kwargs):
    self._method = method
    urllib2.Request.__init__(*args, **kwargs)

  def get_method(self):
    return self._method

Preview for Thursday’s post: the generated documentation is already online.

What Do People Do All Day?

Thursday, October 9th, 2008

There are very few picture books which talk about money, and even fewer do it well. Richard Scarry’s What Do People Do All Day? is a notable and wonderful exception.

Scan from "What Do People Do All Day?": Farmer Alfalfa selling produce

Throughout the book, characters are creating value by farming, tailoring, or baking. They sell their goods for money, use it to pay for raw materials, buy gifts for their wives, and put the extra in the bank. When the tailor decides to build a new house, he hands a large sack of money to the builders. When the mayors of two towns decide to pave a road between them, they have several huge sacks of money for the road builders.

I recommend pretty much anything by Richard Scarry, but this is my personal favorite. If you have children under the age of ten, or just love picture books, look for it in your local library or bookstore.