Parsing IDL in Python
Tuesday, October 7th, 2008One of the current pain points in our build system is the xpidl compiler. This is a binary tool which is used to generate C++ headers and XPT files from XPIDL input files. The code depends on libIDL, which in turn depends on glib. Because this tool is a build-time requirement, we have to build a host version, and in most cases we also build a target version to put in the SDK package.
Getting glib and libidl on linux systems is not very difficult: all the major distros have developer packages for them. But getting libidl and glib on Windows and mac can be quite painful. On Windows, we have had to create our own custom static library versions of this code which are compatible with all the different versions of Microsoft Visual C++. On Mac you can get them from macports, but as far as I know they are not available in universal binaries, which means that you can’t cross-compile a target xpidl.
Parsing IDL, while not trivial, is not so complicated that it requires huge binary code libraries. So a while back I reimplemented the XPIDL parser using python and the PLY (python lex-yacc) parsing library. The core parsing grammar and object model is only 1200 lines of code.
Because we don’t have any unit tests for xpidl, I chose to use A-B testing against the output of the binary xpidl: the header output of the python xpidl should match byte-for-byte the header output of the binary xpidl. I wrote a myrules.mk file which would automatically build and compare both versions during a buld. This turned out to be a royal pain, because the libIDL parser is not very consistent and has bugs:
- Some, but not all attributes are re-ordered so that they are no longer in original order, but are ordered according to an internal glib hash function.
- The code which associates doc-comments with IDL items is buggy: in many cases the comment is associated with a later item.
I had to add some temporary nasty hacks in order to work around these issues. And finally, reproducing the wacky whitespace of the binary tool wasn’t worthwhile, so I starting comparing the results using diff -w -B. But with these hacks and changes, both xpidl compilers produce identical C++ headers.
I completed the code to produce C++ headers during a couple of not-quite-vacation days, but I didn’t write any code to produce XPT files. I shelved the project as an attractive waste of time, until jorendorff needed an IDL parser to produce quick stub C++ code. Jason managed to take my existing code and hook up a quick-stub generator to it. The python xpidl parser and quick-stub generator are both used in the codebase.
Currently, we’re still using the old binary xpidl to produce C++ headers and XPT files. If somebody is interested, I’d really like help adding code to produce XPT files from the new parser, so that we can ditch the old binary code completely.
If you ever need to use python to parse some interesting grammar, I highly recommend PLY. If you turn on optimization it performs very well, and it has very good support for detailed error reporting.