Wisent: a Python parser generator
By Jochen Voss, last updated .
Contents
Introduction
When writing a computer program, implementing methods to read data from input files with a complex structure can be surprisingly difficult. For example, if the input data comes from an untrusted source, errors in the input file often need to be dealt with very carefully. If your program is written in Python and if the input data is sufficiently structured (i.e., if the format can be described by a context free grammar), Wisent can help you to implement parts of the input processing of your program.
A cave painting from the cave of Altamira, showing a wisent. The photo was taken from the Wikimedia Commons and is in the public domain.
Features
The parser generator has the following features:
- Wisent can deal with general LR(1) grammars.
- Provides helpful error messages: if there is a problem with the input grammar, Wisent generates an example input string to illustrate the problem.
- The language to specify grammars allows use of the
?(optional elements),*(zero or more copies) and+(one or more copies) operators. - Wisent is distributed under the terms of the GNU General Public License (GPL) version 2.
The generated parsers have the following features:
- The generated parser is stand-alone, i.e. you can add the generated parser to your project without adding Wisent to the project dependencies.
- The generated parser is implemented as a Python class.
- Automatic error repair and good error reporting: on invalid input, the generated parser tries to fix the problem to allow continuing the parsing process. At the end of parsing, all detected errors are reported together.
- A call to the parser returns a parse tree. Wisent can create parsers
which omit
uninteresting
nodes from the generated tree. - The generated parsers can be distributed under the 3-clause BSD license. Since this license is compatible with the GPL, you can of course use the generated parsers in GPL projects.
More information can be found in the Wisent Users' Manual.
About the name
I called the program Wisent
because the first parser generator I
encountered was Bison and
the Wisent is the European variant of the Bison. Unfortunately, I learned
later that there are at least two other parser generators which use the
name Wisent
:
- Wisent by Thomas B. Preußer: a Parser Generator for C++ and Java implemented in C++.
- Wisent by
David Ponce: one component of the
Semantic
package for emacs.
Download
- wisent version 0.6.2, 2012-04-10
allow '-' in symbol names, minor fixes
- archive: wisent-0.6.2.tar.gz (1MB)
signature: wisent-0.6.2.tar.gz.asc
sha1: 88560d57326d8796f468173c9d4c9f1da304ed36
md5: f346ae35d789ee33d9c0a829ff318448
- archive: wisent-0.6.2.tar.gz (1MB)
- wisent version 0.6.1, 2010-09-16 (bug fix release)
Comments and multi-line strings in grammar files were broken since version 0.6. Release 0.6.1 fixes this problem.
- archive: wisent-0.6.1.tar.gz (1MB)
signature: wisent-0.6.1.tar.gz.asc
sha1: 288b7e7efe7508c44d0593c7fb07583307e20d99
md5: 209d9866ddaf634986dbc6d603559a8d
- archive: wisent-0.6.1.tar.gz (1MB)
The source code for more recent, experimental versions of wisent may (or may not) be available on github.com.
Generic installation instructions are in the file INSTALL. On most systems, the following commands should be sufficient:
./configure make make install
Alternatively you can omit the make install and run Wisent
directly in the build directory.
Please send any suggestions and bug reports to
Jochen Voss. Your message should include the Wisent
version number, as obtained by the command wisent -V.
References
- the Wisent Users' Manual.
- The algorithm used in Wisent to generate the parsers is based on the
following article:
David Pager, A practical general method for constructing LR(k) parsers.
Acta Informatica, volume 7 (1977), number 3, pages 249–268. - Wikipedia has a entries about context free grammars, LR parsers and Wisents.
- The first edition of the book Parsing Techniques — A Practical Guide by Dick Grune and Ceriel J.H. Jacobs is available online.
- The Bison parser generator is an excellent parser generator for C and C++ projects. Bison comes with an excellent manual.
- The LanguageParsing entry on wiki.python.org lists other Python parser generators.
- Xkcd knows regular expressions.