Graph of the Day: Virtual and Physical Memory Starvation

Wednesday, April 24th, 2013

Today’s graph is a scatter plot of out-of-memory crashes. It categorizes crashes according to the smallest block of available VM and the amount of available pagefile space.

There were roughly 1000 crashes due to bug 829954 between 10-April and 15-April 2013. Click on individual crash plots to see memory details and a link to the crash report.

Direct link to SVG file. Link to raw data.

Conclusions

After graphing these crashes, it seems clear that there are two distinct issues:

  • Crashes which are above the blue line and to the left have free space in their page file, but we have run out of contiguous virtual memory space. This is likely caused by the virtual memory leak from last week.
  • Crashes which are below the blue line and to the right have available virtual memory, but don’t have any real memory for allocation. It is likely that the computer is already thrashing pretty heavily and appears very slow to the user.

I was surprised to learn that not all of these crashes were caused by the VM leak.

The short-term solution for this issue remains the same: the Mozilla graphics engine should stop using the infallible/aborting allocator for graphics buffers. All large allocations (network and graphics buffers) should use the fallible allocator and take extra effort to be OOM-safe.

Long-term, we need Firefox to be aware of the OS memory situation and continue to work on memory-shrinking behavior when the system starts paging or running out of memory. This includes obvious behaviors like throwing away the in-memory network and graphics caches, but it may also require drastic measures such as throwing away the contents of inactive tabs and reloading them later.

Charting Technique

With this post, I am starting to use a different charting technique. Previously, I was using the Flot JS library to generate graphs. Flot makes pretty graphs (although it doesn’t support labeling Axes without a plugin!). It also features a wide range of plugins which add missing features. But often, it doesn’t do exactly what I want and I’ve had to dig deep into its guts to get things to look good. It is also cumbersome to include dynamically generated JS graphs in a blog post, and the prior graphs have been screenshots.

This time around, I generated the graph as an SVG image using the svgwrite python library. This allows me to put the full SVG graph directly into the blog, and it also allows me to dynamic features such as rollovers directly in these blog posts. Currently I’m setting up the axes and labels manually in python, but I expect that this will turn into a library pretty quickly. I experimented with svgplotlib but the installation requirements were too complex for my needs.

I’m not sure whether or not the embedded SVG will make it through feed aggregators./readers or not. Leave comments if you see weird results.

Graph of the Day: Crash Report Metadata

Friday, April 12th, 2013

When Firefox crashes, it submits a minidump of the crash; it also submits some key/value metadata. The metadata includes basic information about the product/version/buildid, but it also includes a bunch of other information that we use to group, correlate, and diagnose crashes. Using data collected with jydoop, I created a graph of how frequently various metadata keys were submitted on various platforms:

Frequency of Crash Report Metadata on 2-April-2013

The jydoop script to collect this data is simple:

import crashstatsutils
import json
import jydoop

setupjob = crashstatsutils.dosetupjob([('meta_data', 'json'), ('processed_data', 'json')])

def map(k, meta_data, processed_data, context):
    if processed_data is None:
        return

    try:
        meta = json.loads(meta_data)
        processed = json.loads(processed_data)
    except:
        context.write('jsonerror', 1)
        return

    product = meta['ProductName']

    os = processed.get('os_name', None)
    if os is None:
        return

    context.write((product, os, '$total'), 1)
    for key in meta:
        context.write((product, os, key), 1)

combine = jydoop.sumreducer
reduce = jydoop.sumreducer

View the raw data (CSV).

If you are interested in learning what each piece of metadata means and is used for, I’ve started to document them on the Mozilla docs site.

This graph was generated using the flot JS library and the code is available on github.