Download lex.py (and other files) from http://systems.cs.uchicago.edu/ply for your code.
This lab goes through the steps of showing you how to create and run a python lexer file. No familiarity with Python is assumed. Familiarity with programming in general and using computers IS assumed.
See - Seven easy steps!
First, we have a two line header that signifies which program to execute when running this like a script. This may be /usr/bin/python2.2 or something else for your chosen machine.
#!/usr/bin/python #
We then use the python import command to bring in the lexing module.
import lex, sys
Next, we have to create a tuple of all of the token names and assign that to the variable token. In Python, a tuple is automatically created by surrounding a list of items in parentesis.
tokens=('PRINT', 'LPAR', 'RPAR', 'SEMICOLON', 'NUM', 'ADD', 'SUB', 'MULT', 'DIV')
Following this are the lexing rules. The name of each rule starts with the characters "t_" and is followed by the tokens name, exceptin when the characters are just thrown away. Note that the first rule, t_ignore does not return a token.
Each rule is either a simple variable where the regular expression string is assigned to it, or a method where the regular expression is a docstring, occurring on the first line after the def t_TKNNAME:. Python allows docstrings as a convenient method of short documentation of a method, which is accessible within a running python program.
When using the variable method (e.g. t_ADD = r'\+'), the token returned will have a type equal to the name of the variable less the t_ (e.g. 'ADD') and a value of whatever string was matched (e.g. '+'). The default when using a method is the same, but the method/function may change any of these values. See the pylex documentation for more detail.
# Ignore whitespace. t_ignore = '\n\t ' t_ADD = r'\+' t_SUB = r'-' t_MULT = r'\*' t_DIV = r'\/' t_LPAR = r'\(' t_RPAR = r'\)' t_SEMICOLON = r' ; ' t_PRINT = r' print ' def t_NUM(t): r' \d+ ' t.value=int(t.value) return t def t_error(t): print "Illegal character %s" % repr(t.value[0]) t.skip(1)
The last method, t_error is matched if nothing else is.
We then initialize the lexing system by a call to the method lex.lex()
lex.lex()
In this example we now use a standard Python variable __name__ to see if we are being run as the main program. The rest of the code that is indented below it will be run in that case.
if __name__ == "__main__":
We then read in the entire standard in file and put its contents in the variable data.
data = sys.stdin.read()
We then pass the file contents to the lexing routine.
lex.input(data)
This is followed by a loop that continues until there are no more tokens. It prints the token type and in one case, prints the value of it.
while 1: tok = lex.token() if not tok: break # No more input print tok.type, if tok.type == "NUM": print "(%s)"%tok.value, printLast modified by Brett Giles