pymake: A Mostly GNU-compatible `make`, Written in Python

Mozilla has a lot of Makefiles, and although we’re not completely happy with our current recursive-make build system, converting to something else is a very difficult task. In order to consider other solutions, we have to have a seamless migration path from our existing infrastructure. Unfortunately for any potential migration, Mozilla uses pretty much every make feature and quirk imaginable. It isn’t sufficient to get by with some simplistic importer: any migration needs to understand the arcane details of makefile syntax and parsing rules.

So finally, after two years of considering whether it’s feasible and a good idea, I bit the bullet and wrote a make replacement in python. It was almost as mind-numbingly difficult as I thought it would be. But after two solid weeks, I have a tool which will mostly build Mozilla, and seems to be doing it correctly.

Why would you do such a thing?

Mark Pilgrim was right: only a crazy person says “I’m going to build my own build system”. So rather than creating a whole new build system with a new syntax, I wanted to start out replicating an existing build system exactly. Then we can make incremental changes if appropriate.

What kind of incremental changes?

First up are speed increases. Our Windows builds are currently very slow, in part due to the large number of processes that are spawned. There are several ways to attack this:

Where can I see this monstrosity?

The pymake website. You can also just pull or browser the pymake sources directly from hg.mozilla.org using Mercurial.

Why don’t you just hack GNU make itself?

There are some improvements we’re interested in doing which just aren’t possible within a C codebase:

. Python is also about a zillion times easier to hack quickly.

pymake is mostly GNU-make compatible. What are the differences?

The known differences are documented in the README file.

Does it perform well?

Not yet. pymake currently takes more time than GNU make. Almost all of this time is spent reading and parsing makefiles. Parsing autoconf.mk/config.mk/rules.mk from the Mozilla tree takes 0.3 seconds on my Linux machine, where GNU make takes 0.03 seconds. Dependency resolution and execution is as fast or faster than GNU make. I’d love to have some help profiling and fixing the parsing. Also, pymake does not yet support parallel execution.

Next week sometime, I promise to write about some of the difficulties I encountered writing pymake, and how test-driven development saved me from terrible pain and suffering.

Atom Feed for Comments 19 Responses to “pymake: A Mostly GNU-compatible `make`, Written in Python”

  1. James Says:

    How pointless, why add another dependency for mozilla just to satisfy an imagined need to have something in python? Aren’t the whitespace issues with [g]make enough to deal with that you have to add another layer of whitespace-sensitivity? If you must have rules written in a dynamic language, Perl is installed by default on a much larger set of computers than python, and for a fairly complex build system written in make+perl, see the OpenBSD ports system. Perl is even easier to hack on quickly than python.

    Perhaps this is a case of, if you only have a hammer, everything you see is a nail. The hammer in this case is python. Mark was right, but you seem to have missed his point, Darkroom was another text editor, and version 8 got undo, like your make system lacks many of the things a full make system needs, when gnu make exists.

  2. Arpad Borsos Says:

    Very nice.
    Can’t wait to see more progress on this. And I hope I can learn something from your followup on test-driven development.

  3. Ted Mielczarek Says:

    James: Python is already a dependency of building Mozilla, so it doesn’t add anything new.

    “Perl is even easier to hack on quickly than python.”

    That sounds like a personal preference, as does most of your comment.

  4. Benjamin Smedberg Says:

    James: the Mozilla build already depends heavily on Python. We prefer python over perl for all new build-system code because it improves code readability and the standard library is much better suited to our needs.

    My make system only lacks two things: support for the `vpath` directive, which can be added in a couple hours, and support for parallel execution, which will take a bit more time but is eminently doable. I’m not creating a new build system, I’m just re-implementing a part of our existing build system so that we can extend it.

  5. Colin Barrett Says:

    Have you considered using a parser generator for the syntax parsing? Something along the lines of ANTLR or anything else that generates DFAs from some kind of grammar file (e.g. EBNF).

    I don’t have a recommendation off-hand for a module to do this, but a quick Google search found these pages which might help:

    http://wiki.python.org/moin/LanguageParsing
    http://nedbatchelder.com/text/python-parsers.html

    Good luck!

  6. Ian McKellar Says:

    Wow – that’s pretty awesome. This could be a great way to build a path forward for projects stuck in autoconf/automake/libtool hell too. Extending pymake to understand Makefile.am syntax and executing it without needing to resort to the libtool disaster could be incredibly valuable.

    Ian

  7. Benjamin Smedberg Says:

    Colin, yes, I considered it. I’ve used PLY successfully before to parse IDL (see http://benjamin.smedbergs.us/blog/2008-10-07/parsing-idl-in-python/). But you’ll quickly find that tokenizing make syntax using a standard lex-like seup is almost impossible: the rules for whitespace, comments, and line continuations are context-sensitive and can change in the course of a single line.

  8. beza1e1 Says:

    > It isn’t sufficient to get by with some simplistic importer: any migration needs to understand the arcane details of makefile syntax and parsing rules.

    Since you now have a parser for all the arcane details anyways, why not get pymake to export a different format instead of executing it?

  9. Benjamin Smedberg Says:

    beza1e1: that’s one possibility, yes. Partisans of SCons, cmake, waf, etc would love to see us migrate, and that eventually may be the right decision. In the meantime, I think we can make some improvements, especially in Windows build times, without completely migrating to another toolchain.

  10. Krzysztof Kowalczyk Says:

    Benjamin, this is a wonderful idea – thanks for doing the hard work of implementing it and sharing it with us. It might just become my next build tool.

  11. James Napolitano Says:

    >But you’ll quickly find that tokenizing make syntax using a standard lex-like seup is almost impossible: the rules for whitespace, comments, and line continuations are context-sensitive and can change in the course of a single line.

    Even if you use stacked start conditions?
    http://www.dabeaz.com/ply/ply.html#ply_nn21

  12. Benjamin Smedberg Says:

    James, I’ll try to post in more detail later, but you’d basically have to construct a huge number of start conditions, and tracking start states would be more trouble than it’s worth.

  13. Eric Melski Says:

    As someone with no small amount of experience building a ground-up gmake implementation, let me add my support to the idea that trying to use a parser generator is probably an exercise in futility. But I do wonder about the applicability of this work to other projects. It’s one thing to make a gmake replacement that “mostly” builds one project (where most of the makefiles were probably based off a handful of patterns and idioms), and quite another to make a replacement that works on a wide variety of projects from dozens of different authors and backgrounds. If your aim is the latter, I think you’re in for a surprise at the difficulty.

  14. Benjamin Smedberg Says:

    Eric, good question. I’m primarily interested in writing a tool to build Mozilla. But if this system ends up providing significant benefits over GNU make (better parallelism, logging etc), I hope that the community will contribute to fix other issues. I’d happily accept patches for the remaining few known issues: lack of the standard builtin rules and variables, and missing order-only prerequisites. Then of course it’s just a matter of discovering any lurking bugs.

    I have confidence that it can be made to build other projects with little effort. The Mozilla build is in fact 4 different build systems with very different idioms:

    * mozilla
    * NSPR
    * NSS/coreconf
    * Tamarin

    Many of the latest checkins are specifically bugs that I discovered while building NSS, which uses a very different makefile style than mozilla proper: http://hg.mozilla.org/users/bsmedberg_mozilla.com/pymake/pushloghtml?fromchange=1995f94b1c2f&tochange=450813b7b94a

  15. Rob Campbell Says:

    Wow. You’ve out-done yourself this time, Smedberg. ;)

    One thing that sprang to mind when I read this was, “hey, maybe now it’s possible to add some smarts to the build system so we can do subdirectory builds that will automatically go up (or sideways) however many levels to complete the build without having to rebuild the entire tree”. We were just discussing this yesterday in IRC after reading a newsgroup post.

    I for one welcome our new Python Overlords.

  16. James Napolitano Says:

    >This could be a great way to build a path forward for projects stuck in autoconf/automake/libtool hell too.
    >Since you now have a parser for all the arcane details anyways, why not get pymake to export a different format instead of executing it?
    >…I hope that the community will contribute to fix other issues.

    It’s probably worth contacting some of the other build projects (SCons, waf, etc.), particularly the python-based ones, as they might want to use pymake as the basis for an importer/migrator. This could take a lot of the load off you.

    >James, I’ll try to post in more detail later, but you’d basically have to construct a huge number of start conditions, and tracking start states would be more trouble than it’s worth.

    Thanks for the explanation. I suppose the fundamental issue is that tools geared for “context-free grammars” don’t deal well with context-dependent languages.

    Great work, BTW.

  17. Trent Mick Says:

    Here is a quick decorator I use for profiling Python code:

    http://code.activestate.com/recipes/576656/

    Here is the profile output from running “mkparse.py” over all the Makefile’s in my mozbuild’s objdir:

    2089198 function calls (2016559 primitive calls) in 11.447 CPU seconds

    Ordered by: internal time, call count
    List reduced from 88 to 20 due to restriction

    ncalls tottime percall cumtime percall filename:lineno(function)
    57716 1.908 0.000 5.593 0.000 parser.py:682(parsemakesyntax)
    893 1.558 0.002 11.447 0.013 parser.py:467(parsestream)
    227122/155505 1.472 0.000 1.679 0.000 parser.py:192(itermakefilechars)
    71661 1.406 0.000 1.406 0.000 parser.py:369(iterlines)
    76507 0.687 0.000 0.981 0.000 parserdata.py:34(__add__)
    76507 0.451 0.000 1.653 0.000 parser.py:73(getloc)
    71661 0.430 0.000 2.153 0.000 parser.py:127(readline)
    64578 0.409 0.000 0.409 0.000 parser.py:97(findtoken)
    64098 0.355 0.000 0.355 0.000 parser.py:672(__init__)
    71411 0.281 0.000 0.312 0.000 data.py:87(append)
    298705 0.266 0.000 0.266 0.000 parserdata.py:9(_charlocation)
    70768 0.234 0.000 0.234 0.000 parser.py:69(append)
    76507 0.184 0.000 0.221 0.000 parser.py:32(findlast)
    70179 0.179 0.000 0.179 0.000 parser.py:86(skipwhitespace)
    56295 0.167 0.000 0.167 0.000 parserdata.py:428(append)
    64409 0.163 0.000 0.213 0.000 parser.py:155(get)
    58978 0.148 0.000 0.231 0.000 parser.py:122(__init__)
    10035 0.121 0.000 0.629 0.000 parser.py:357(_iterflatten)
    15874 0.117 0.000 0.123 0.000 parser.py:246(itercommandchars)
    93986 0.112 0.000 0.112 0.000 parserdata.py:29(__init__)

  18. Baz Says:

    Couple of micro-optimisations. I notice you’re doing a lot of work to collect debug info up front in Location (eg in the profile above, there are nearly 300k calls to _charlocation that do no useful work unless a message is printed). I fixed that in a hacky way, storing ‘data’ in the Location object too, and recalculate the offset on demand in __str__(). there’s got to be a better way, but this knocks parserdata.py well down the profile.

    The other improvement is a bit smaller – the loop in findlast is quite high on the profile and the function call overhead for that lambda is showing up, but I don’t think it adds much to the readability. Also, there are two common cases, where getloc is asked for the first/last char, which must select the first/last line. The code below fixes all of this.

    I’m testing with a 100kloc makefile made by repeating rules.mk – these changes knocked the parse time down from ~14.4s to ~12.9s. Not the order-of-magnitude change you’re looking for, and not a real-world test, but maybe useful. Its still passing the test suite for me, but I’m not sure that tests the output of the Location hack.

    def getloc(self, offset):
    “””
    Get the location of an offset within data.
    “””
    if offset is None or offset >= len(self.data):
    offset = len(self.data) – 1
    begin, loc = self._locs[-1]
    return loc + self.data[begin:offset]

    if offset < 1:
    begin, loc = self._locs[0]
    return loc + self.data[begin:offset]

    for i in self._locs:
    if i[0] <= offset:
    f = i
    else:
    break

    begin, loc = f
    return loc + self.data[begin:offset]

  19. BSBlog » Blog Archive » Performance and Pymake Says:

    […] I know getloc is inefficient, and a commenter on one of my previous posts suggests a possible solution. But that’s not going to create any […]

Leave a Reply