This chapter gives a complete example of a project implemented using the Wisent parser generator. Here we will implement a simple calculator.
We create a file calculator.wi which contains the following grammar.
expr: _additive ;
_additive: sum | difference | _multiplicative ;
sum: _additive '+' _multiplicative ;
difference: _additive '-' _multiplicative ;
_multiplicative: product | quotient | _primary ;
product: _multiplicative '*' _primary ;
quotient: _multiplicative '/' _primary ;
_primary: NUMBER
| brackets
| function ;
brackets: '(' _additive ')' ;
function: SYMBOL '(' _additive ')' ;
This file describes the structure of the input for our calculator. The format of grammar files is describes in the section Describing the Format of your Input. The reason for the fact that some symbol names are prefixed by _ is explained in the section about Transparent Symbols.
We use Wisent to generate a parser. Execute the following command to convert the grammar file calculator.wi into python code:
$ wisent -o parser.py -e calc.py calculator.wi
Detail about using the wisent program are describe in the section Running Wisent. The program call generates a (long) file parser.py which contains the generated parser. You can use pythons introspection abilities, e.g. by calling pydoc parser on the command line. The generated parser is described in more detail in the section The Parser class.
The -e option instructs Wisent to generate example code in the file calc.py. This code will not be useful in itself, but it serves as an illustration of how to use the generated parser. It looks as follows:
from sys import stderr
from parser import Parser
def print_tree(tree, terminals, indent=0):
"""Print a parse tree to stdout."""
prefix = " "*indent
if tree[0] in terminals:
print prefix + repr(tree)
else:
print prefix + unicode(tree[0])
for x in tree[1:]:
print_tree(x, terminals, indent+1)
input = [ ('NUMBER',), ('/',), ('SYMBOL',), ('(',), ('NUMBER',), ('*',),
('NUMBER',), ('+',), ('NUMBER',), ('/',), ('NUMBER',), (')',),
('*',), ('SYMBOL',), ('(',), ('NUMBER',), ('-',), ('NUMBER',),
('-',), ('NUMBER',), (')',) ]
p = Parser()
try:
tree = p.parse(input)
except p.ParseErrors, e:
for token,expected in e.errors:
if token[0] == p.EOF:
print >>stderr, "unexpected end of file"
continue
found = repr(token[0])
if len(expected) == 1:
msg = "missing %s (found %s)"%(repr(expected[0]), found)
else:
msg1 = "parse error before %s, "%found
l = sorted([ repr(s) for s in expected ])
msg2 = "expected one of "+", ".join(l)
msg = msg1+msg2
print >>stderr, msg
raise SystemExit(1)
print_tree(tree, p.terminals)
The input of the generated parser (input in the example code) must be a sequence (or any iterable) of Python tuples where the first element gives the type of input token. It must be one of the terminal symbols +, -, *, /, (, ), NUMBER, and SYMBOL from the grammar. The remaining elements of the tuples (not present in the example code) don’t affect the parsing process, they are directly copied into the resulting output tree. We will use these elements to store the value of input numbers and function names.
The program just outputs the parse tree returned by p.parse:
expr
difference
('NUMBER',)
('-',)
quotient
function
('SYMBOL',)
('(',)
('NUMBER',)
(')',)
('/',)
('NUMBER',)
The next step is to edit the file calc.py to add a tokenizer which can convert a string typed by the user into a list of input tokens for p.parse:
def tokenize(str):
from re import match
res = []
while str:
if str[0].isspace():
str = str[1:]
continue
m = match('[0-9.]+', str)
if m:
res.append(('NUMBER', float(m.group(0))))
str = str[m.end(0):]
continue
m = match('[a-z]+', str)
if m:
res.append(('SYMBOL', m.group(0)))
str = str[m.end(0):]
continue
res.append((str[0],))
str = str[1:]
return res
As required, this function breaks a string into a list of tokens:
>>> tokenize("fn(1+2)")
[('SYMBOL', fn'), ('(',), ('NUMBER', 1.0), ('+',), ('NUMBER', 2.0), (')',)]
Write a function which recursively examines the parse tree and computes the numerical result:
def eval_tree(tree):
if tree[0] == 'expr':
return eval_tree(tree[1])
elif tree[0] == 'sum':
return eval_tree(tree[1]) + eval_tree(tree[3])
elif tree[0] == 'difference':
return eval_tree(tree[1]) - eval_tree(tree[3])
elif tree[0] == 'product':
return eval_tree(tree[1]) * eval_tree(tree[3])
elif tree[0] == 'quotient':
return eval_tree(tree[1]) / eval_tree(tree[3])
elif tree[0] == 'NUMBER':
return tree[1]
elif tree[0] == 'brackets':
return eval_tree(tree[2])
elif tree[0] == 'function':
fn = getattr(math, tree[1][1])
return fn(eval_tree(tree[3]))
Details about the format of the parse tree are contained in the section Parse Trees.
The final result of our tutorial is the following python file calculator.wi:
#! /usr/bin/env python
from sys import stderr
import math
from parser import Parser
def tokenize(str):
from re import match
res = []
while str:
if str[0].isspace():
str = str[1:]
continue
m = match('[0-9.]+', str)
if m:
res.append(('NUMBER', float(m.group(0))))
str = str[m.end(0):]
continue
m = match('[a-z]+', str)
if m:
res.append(('SYMBOL', m.group(0)))
str = str[m.end(0):]
continue
res.append((str[0],))
str = str[1:]
return res
def eval_tree(tree):
if tree[0] == 'expr':
return eval_tree(tree[1])
elif tree[0] == 'sum':
return eval_tree(tree[1]) + eval_tree(tree[3])
elif tree[0] == 'difference':
return eval_tree(tree[1]) - eval_tree(tree[3])
elif tree[0] == 'product':
return eval_tree(tree[1]) * eval_tree(tree[3])
elif tree[0] == 'quotient':
return eval_tree(tree[1]) / eval_tree(tree[3])
elif tree[0] == 'NUMBER':
return tree[1]
elif tree[0] == 'brackets':
return eval_tree(tree[2])
elif tree[0] == 'function':
fn = getattr(math, tree[1][1])
return fn(eval_tree(tree[3]))
p = Parser()
while True:
try:
s = raw_input("calc: ")
except EOFError:
print
break
input = tokenize(s)
try:
tree = p.parse(input)
except p.ParseErrors, e:
for token,expected in e.errors:
if token[0] == p.EOF:
print >>stderr, "unexpected end of file"
continue
found = repr(token[0])
if len(expected) == 1:
msg = "missing %s (found %s)"%(repr(expected[0]), found)
else:
msg1 = "parse error before %s, "%found
l = sorted([ repr(s) for s in expected ])
msg2 = "expected one of "+", ".join(l)
msg = msg1+msg2
print >>stderr, msg
continue
print eval_tree(tree)
This file, together with the parser generated by Wisent, implements a rudimentary calculator including some error handling:
$ ./calc.py
calc: 1
1.0
calc: 1*(2+3)
5.0
calc: 1+*2
parse error before '*', expected one of '(', 'NUMBER', 'SYMBOL'
calc: 4*sin(6)
-1.1176619928
Exercise: This calculator can not yet deal with negative numbers:
calc: -3
parse error before '-', expected one of '(', 'NUMBER', 'SYMBOL'
Make the required changes to the code to fix this.