Download lex.py (and other files) from http://systems.cs.uchicago.edu/ply for your code.
This lab goes through the steps of showing you how to create and run a python lexer file. No familiarity with Python is assumed. Familiarity with programming in general and using computers IS assumed.
See - Seven easy steps!
First, we have a two line header that signifies which program to execute when running this like a script. This may be /usr/bin/python2.2 or something else for your chosen machine.
#!/usr/bin/python #
We then use the python import command to bring in the lexing module.
import lex, sys
Next, we have to create a tuple of all of the token names and assign that to the variable token. In Python, a tuple is automatically created by surrounding a list of items in parentesis.
tokens=('PRINT', 'LPAR', 'RPAR', 'SEMICOLON',
'NUM', 'ADD', 'SUB', 'MULT', 'DIV')
Following this are the lexing rules. The name of each rule starts with the characters "t_" and is followed by the tokens name, exceptin when the characters are just thrown away. Note that the first rule, t_ignore does not return a token.
Each rule is either a simple variable where the regular expression string is assigned to it, or a method where the regular expression is a docstring, occurring on the first line after the def t_TKNNAME:. Python allows docstrings as a convenient method of short documentation of a method, which is accessible within a running python program.
When using the variable method (e.g. t_ADD = r'\+'), the token returned will have a type equal to the name of the variable less the t_ (e.g. 'ADD') and a value of whatever string was matched (e.g. '+'). The default when using a method is the same, but the method/function may change any of these values. See the pylex documentation for more detail.
# Ignore whitespace.
t_ignore = '\n\t '
t_ADD = r'\+'
t_SUB = r'-'
t_MULT = r'\*'
t_DIV = r'\/'
t_LPAR = r'\('
t_RPAR = r'\)'
t_SEMICOLON = r' ; '
t_PRINT = r' print '
def t_NUM(t):
r' \d+ '
t.value=int(t.value)
return t
def t_error(t):
print "Illegal character %s" % repr(t.value[0])
t.skip(1)
The last method, t_error is matched if nothing else is.
We then initialize the lexing system by a call to the method lex.lex()
lex.lex()
In this example we now use a standard Python variable __name__ to see if we are being run as the main program. The rest of the code that is indented below it will be run in that case.
if __name__ == "__main__":
We then read in the entire standard in file and put its contents in the variable data.
data = sys.stdin.read()
We then pass the file contents to the lexing routine.
lex.input(data)
This is followed by a loop that continues until there are no more tokens. It prints the token type and in one case, prints the value of it.
while 1:
tok = lex.token()
if not tok: break # No more input
print tok.type,
if tok.type == "NUM": print "(%s)"%tok.value,
print
Last modified by
Brett Giles