pymake: A Mostly GNU-compatible `make`, Written in Python
Mozilla has a lot of Makefiles, and although we’re not completely happy with our current recursive-make build system, converting to something else is a very difficult task. In order to consider other solutions, we have to have a seamless migration path from our existing infrastructure. Unfortunately for any potential migration, Mozilla uses pretty much every make feature and quirk imaginable. It isn’t sufficient to get by with some simplistic importer: any migration needs to understand the arcane details of makefile syntax and parsing rules.
So finally, after two years of considering whether it’s feasible and a good idea, I bit the bullet and wrote a make replacement in python. It was almost as mind-numbingly difficult as I thought it would be. But after two solid weeks, I have a tool which will mostly build Mozilla, and seems to be doing it correctly.
Why would you do such a thing?
Mark Pilgrim was right: only a crazy person says “I’m going to build my own build system”. So rather than creating a whole new build system with a new syntax, I wanted to start out replicating an existing build system exactly. Then we can make incremental changes if appropriate.
What kind of incremental changes?
First up are speed increases. Our Windows builds are currently very slow, in part due to the large number of processes that are spawned. There are several ways to attack this:
- Each makefile command currently spawns an MSYS shell and then (usually) the actual binary. The MSYS shell is expensive and in most cases unnecessary. pymake should be able to skip the shell for simple command lines.
- Mozilla runs `nsinstall` a lot. Axel has already implemented nsinstall as a python module: our makefile rules should be able to run commands in python and skip the external process.
- We can use recursive make without launching additional make processes by recursing within the same pymake process.
Where can I see this monstrosity?
The pymake website. You can also just pull or browser the pymake sources directly from hg.mozilla.org using Mercurial.
Why don’t you just hack GNU make itself?
There are some improvements we’re interested in doing which just aren’t possible within a C codebase:
- implementing rules directly in Python
- Condensing multiple invocations of make into a single process
. Python is also about a zillion times easier to hack quickly.
pymake is mostly GNU-make compatible. What are the differences?
The known differences are documented in the README file.
Does it perform well?
Not yet. pymake currently takes more time than GNU make. Almost all of this time is spent reading and parsing makefiles. Parsing autoconf.mk/config.mk/rules.mk from the Mozilla tree takes 0.3 seconds on my Linux machine, where GNU make takes 0.03 seconds. Dependency resolution and execution is as fast or faster than GNU make. I’d love to have some help profiling and fixing the parsing. Also, pymake does not yet support parallel execution.
Next week sometime, I promise to write about some of the difficulties I encountered writing pymake, and how test-driven development saved me from terrible pain and suffering.
February 13th, 2009 at 10:52 pm
How pointless, why add another dependency for mozilla just to satisfy an imagined need to have something in python? Aren’t the whitespace issues with [g]make enough to deal with that you have to add another layer of whitespace-sensitivity? If you must have rules written in a dynamic language, Perl is installed by default on a much larger set of computers than python, and for a fairly complex build system written in make+perl, see the OpenBSD ports system. Perl is even easier to hack on quickly than python.
Perhaps this is a case of, if you only have a hammer, everything you see is a nail. The hammer in this case is python. Mark was right, but you seem to have missed his point, Darkroom was another text editor, and version 8 got undo, like your make system lacks many of the things a full make system needs, when gnu make exists.
February 14th, 2009 at 10:18 am
Very nice.
Can’t wait to see more progress on this. And I hope I can learn something from your followup on test-driven development.
February 14th, 2009 at 5:11 pm
James: Python is already a dependency of building Mozilla, so it doesn’t add anything new.
“Perl is even easier to hack on quickly than python.”
That sounds like a personal preference, as does most of your comment.
February 14th, 2009 at 7:51 pm
James: the Mozilla build already depends heavily on Python. We prefer python over perl for all new build-system code because it improves code readability and the standard library is much better suited to our needs.
My make system only lacks two things: support for the `vpath` directive, which can be added in a couple hours, and support for parallel execution, which will take a bit more time but is eminently doable. I’m not creating a new build system, I’m just re-implementing a part of our existing build system so that we can extend it.
February 14th, 2009 at 10:54 pm
Have you considered using a parser generator for the syntax parsing? Something along the lines of ANTLR or anything else that generates DFAs from some kind of grammar file (e.g. EBNF).
I don’t have a recommendation off-hand for a module to do this, but a quick Google search found these pages which might help:
http://wiki.python.org/moin/LanguageParsing
http://nedbatchelder.com/text/python-parsers.html
Good luck!
February 15th, 2009 at 4:03 pm
Wow – that’s pretty awesome. This could be a great way to build a path forward for projects stuck in autoconf/automake/libtool hell too. Extending pymake to understand Makefile.am syntax and executing it without needing to resort to the libtool disaster could be incredibly valuable.
Ian
February 15th, 2009 at 4:15 pm
Colin, yes, I considered it. I’ve used PLY successfully before to parse IDL (see http://benjamin.smedbergs.us/blog/2008-10-07/parsing-idl-in-python/). But you’ll quickly find that tokenizing make syntax using a standard lex-like seup is almost impossible: the rules for whitespace, comments, and line continuations are context-sensitive and can change in the course of a single line.
February 16th, 2009 at 10:32 am
> It isn’t sufficient to get by with some simplistic importer: any migration needs to understand the arcane details of makefile syntax and parsing rules.
Since you now have a parser for all the arcane details anyways, why not get pymake to export a different format instead of executing it?
February 16th, 2009 at 1:37 pm
beza1e1: that’s one possibility, yes. Partisans of SCons, cmake, waf, etc would love to see us migrate, and that eventually may be the right decision. In the meantime, I think we can make some improvements, especially in Windows build times, without completely migrating to another toolchain.
February 16th, 2009 at 8:26 pm
Benjamin, this is a wonderful idea – thanks for doing the hard work of implementing it and sharing it with us. It might just become my next build tool.
February 16th, 2009 at 10:04 pm
>But you’ll quickly find that tokenizing make syntax using a standard lex-like seup is almost impossible: the rules for whitespace, comments, and line continuations are context-sensitive and can change in the course of a single line.
Even if you use stacked start conditions?
http://www.dabeaz.com/ply/ply.html#ply_nn21
February 17th, 2009 at 10:35 am
James, I’ll try to post in more detail later, but you’d basically have to construct a huge number of start conditions, and tracking start states would be more trouble than it’s worth.
February 17th, 2009 at 11:43 pm
As someone with no small amount of experience building a ground-up gmake implementation, let me add my support to the idea that trying to use a parser generator is probably an exercise in futility. But I do wonder about the applicability of this work to other projects. It’s one thing to make a gmake replacement that “mostly” builds one project (where most of the makefiles were probably based off a handful of patterns and idioms), and quite another to make a replacement that works on a wide variety of projects from dozens of different authors and backgrounds. If your aim is the latter, I think you’re in for a surprise at the difficulty.
February 18th, 2009 at 11:29 am
Eric, good question. I’m primarily interested in writing a tool to build Mozilla. But if this system ends up providing significant benefits over GNU make (better parallelism, logging etc), I hope that the community will contribute to fix other issues. I’d happily accept patches for the remaining few known issues: lack of the standard builtin rules and variables, and missing order-only prerequisites. Then of course it’s just a matter of discovering any lurking bugs.
I have confidence that it can be made to build other projects with little effort. The Mozilla build is in fact 4 different build systems with very different idioms:
* mozilla
* NSPR
* NSS/coreconf
* Tamarin
Many of the latest checkins are specifically bugs that I discovered while building NSS, which uses a very different makefile style than mozilla proper: http://hg.mozilla.org/users/bsmedberg_mozilla.com/pymake/pushloghtml?fromchange=1995f94b1c2f&tochange=450813b7b94a
February 18th, 2009 at 12:45 pm
Wow. You’ve out-done yourself this time, Smedberg. ;)
One thing that sprang to mind when I read this was, “hey, maybe now it’s possible to add some smarts to the build system so we can do subdirectory builds that will automatically go up (or sideways) however many levels to complete the build without having to rebuild the entire tree”. We were just discussing this yesterday in IRC after reading a newsgroup post.
I for one welcome our new Python Overlords.
February 18th, 2009 at 11:41 pm
>This could be a great way to build a path forward for projects stuck in autoconf/automake/libtool hell too.
>Since you now have a parser for all the arcane details anyways, why not get pymake to export a different format instead of executing it?
>…I hope that the community will contribute to fix other issues.
It’s probably worth contacting some of the other build projects (SCons, waf, etc.), particularly the python-based ones, as they might want to use pymake as the basis for an importer/migrator. This could take a lot of the load off you.
>James, I’ll try to post in more detail later, but you’d basically have to construct a huge number of start conditions, and tracking start states would be more trouble than it’s worth.
Thanks for the explanation. I suppose the fundamental issue is that tools geared for “context-free grammars” don’t deal well with context-dependent languages.
Great work, BTW.
February 21st, 2009 at 2:46 am
Here is a quick decorator I use for profiling Python code:
http://code.activestate.com/recipes/576656/
Here is the profile output from running “mkparse.py” over all the Makefile’s in my mozbuild’s objdir:
2089198 function calls (2016559 primitive calls) in 11.447 CPU seconds
Ordered by: internal time, call count
List reduced from 88 to 20 due to restriction
ncalls tottime percall cumtime percall filename:lineno(function)
57716 1.908 0.000 5.593 0.000 parser.py:682(parsemakesyntax)
893 1.558 0.002 11.447 0.013 parser.py:467(parsestream)
227122/155505 1.472 0.000 1.679 0.000 parser.py:192(itermakefilechars)
71661 1.406 0.000 1.406 0.000 parser.py:369(iterlines)
76507 0.687 0.000 0.981 0.000 parserdata.py:34(__add__)
76507 0.451 0.000 1.653 0.000 parser.py:73(getloc)
71661 0.430 0.000 2.153 0.000 parser.py:127(readline)
64578 0.409 0.000 0.409 0.000 parser.py:97(findtoken)
64098 0.355 0.000 0.355 0.000 parser.py:672(__init__)
71411 0.281 0.000 0.312 0.000 data.py:87(append)
298705 0.266 0.000 0.266 0.000 parserdata.py:9(_charlocation)
70768 0.234 0.000 0.234 0.000 parser.py:69(append)
76507 0.184 0.000 0.221 0.000 parser.py:32(findlast)
70179 0.179 0.000 0.179 0.000 parser.py:86(skipwhitespace)
56295 0.167 0.000 0.167 0.000 parserdata.py:428(append)
64409 0.163 0.000 0.213 0.000 parser.py:155(get)
58978 0.148 0.000 0.231 0.000 parser.py:122(__init__)
10035 0.121 0.000 0.629 0.000 parser.py:357(_iterflatten)
15874 0.117 0.000 0.123 0.000 parser.py:246(itercommandchars)
93986 0.112 0.000 0.112 0.000 parserdata.py:29(__init__)
February 25th, 2009 at 5:28 pm
Couple of micro-optimisations. I notice you’re doing a lot of work to collect debug info up front in Location (eg in the profile above, there are nearly 300k calls to _charlocation that do no useful work unless a message is printed). I fixed that in a hacky way, storing ‘data’ in the Location object too, and recalculate the offset on demand in __str__(). there’s got to be a better way, but this knocks parserdata.py well down the profile.
The other improvement is a bit smaller – the loop in findlast is quite high on the profile and the function call overhead for that lambda is showing up, but I don’t think it adds much to the readability. Also, there are two common cases, where getloc is asked for the first/last char, which must select the first/last line. The code below fixes all of this.
I’m testing with a 100kloc makefile made by repeating rules.mk – these changes knocked the parse time down from ~14.4s to ~12.9s. Not the order-of-magnitude change you’re looking for, and not a real-world test, but maybe useful. Its still passing the test suite for me, but I’m not sure that tests the output of the Location hack.
def getloc(self, offset):
“””
Get the location of an offset within data.
“””
if offset is None or offset >= len(self.data):
offset = len(self.data) – 1
begin, loc = self._locs[-1]
return loc + self.data[begin:offset]
if offset < 1:
begin, loc = self._locs[0]
return loc + self.data[begin:offset]
for i in self._locs:
if i[0] <= offset:
f = i
else:
break
begin, loc = f
return loc + self.data[begin:offset]
February 26th, 2009 at 11:18 pm
[…] I know getloc is inefficient, and a commenter on one of my previous posts suggests a possible solution. But that’s not going to create any […]