PLY (Python Lex-Yacc) Version 2.3 (February 18, 2007) David M. Beazley (dave@dabeaz.com) Copyright (C) 2001-2007 David M. Beazley This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA See the file COPYING for a complete copy of the LGPL. Introduction ============ PLY is a 100% Python implementation of the common parsing tools lex and yacc. Although several other parsing tools are available for Python, there are several reasons why you might want to consider PLY: - The tools are very closely modeled after traditional lex/yacc. If you know how to use these tools in C, you will find PLY to be similar. - PLY provides *very* extensive error reporting and diagnostic information to assist in parser construction. The original implementation was developed for instructional purposes. As a result, the system tries to identify the most common types of errors made by novice users. - PLY provides full support for empty productions, error recovery, precedence specifiers, and moderately ambiguous grammars. - Parsing is based on LR-parsing which is fast, memory efficient, better suited to large grammars, and which has a number of nice properties when dealing with syntax errors and other parsing problems. Currently, PLY builds its parsing tables using the SLR algorithm which is slightly weaker than LALR(1) used in traditional yacc. - PLY uses Python introspection features to build lexers and parsers. This greatly simplifies the task of parser construction since it reduces the number of files and eliminates the need to run a separate lex/yacc tool before running your program. - PLY can be used to build parsers for "real" programming languages. Although it is not ultra-fast due to its Python implementation, PLY can be used to parse grammars consisting of several hundred rules (as might be found for a language like C). The lexer and LR parser are also reasonably efficient when parsing typically sized programs. The original version of PLY was developed for an Introduction to Compilers course where students used it to build a compiler for a simple Pascal-like language. Their compiler had to include lexical analysis, parsing, type checking, type inference, and generation of assembly code for the SPARC processor. Because of this, the current implementation has been extensively tested and debugged. In addition, most of the API and error checking steps have been adapted to address common usability problems. How to Use ========== PLY consists of two files : lex.py and yacc.py. These are contained within the 'ply' directory which may also be used as a Python package. To use PLY, simply copy the 'ply' directory to your project and import lex and yacc from the associated 'ply' package. For example: import ply.lex as lex import ply.yacc as yacc Alternatively, you can copy just the files lex.py and yacc.py individually and use them as modules. For example: import lex import yacc The file setup.py can be used to install ply using distutils. The file doc/ply.html contains complete documentation on how to use the system. The example directory contains several different examples including a PLY specification for ANSI C as given in K&R 2nd Ed. A simple example is found at the end of this document Requirements ============ PLY requires the use of Python 2.1 or greater. However, you should use the latest Python release if possible. It should work on just about any platform. PLY has been tested with both CPython and Jython. However, it does not seem to work with IronPython. Resources ========= More information about PLY can be obtained on the PLY webpage at: http://www.dabeaz.com/ply For a detailed overview of parsing theory, consult the excellent book "Compilers : Principles, Techniques, and Tools" by Aho, Sethi, and Ullman. The topics found in "Lex & Yacc" by Levine, Mason, and Brown may also be useful. A Google group for PLY can be found at http://groups.google.com/group/ply-hack Acknowledgments =============== A special thanks is in order for all of the students in CS326 who suffered through about 25 different versions of these tools :-). The CHANGES file acknowledges those who have contributed patches. Elias Ioup did the first implementation of LALR(1) parsing in PLY-1.x. Andrew Waters and Markus Schoepflin were instrumental in reporting bugs and testing a revised LALR(1) implementation for PLY-2.0. Special Note for PLY-2.x ======================== PLY-2.0 is the first in a series of PLY releases that will be adding a variety of significant new features. The first release in this series (Ply-2.0) should be 100% compatible with all previous Ply-1.x releases except for the fact that Ply-2.0 features a correct implementation of LALR(1) table generation. If you have suggestions for improving PLY in future 2.x releases, please contact me. - Dave Example ======= Here is a simple example showing a PLY implementation of a calculator with variables. # ----------------------------------------------------------------------------- # calc.py # # A simple calculator with variables. # ----------------------------------------------------------------------------- tokens = ( 'NAME','NUMBER', 'PLUS','MINUS','TIMES','DIVIDE','EQUALS', 'LPAREN','RPAREN', ) # Tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_EQUALS = r'=' t_LPAREN = r'\(' t_RPAREN = r'\)' t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*' def t_NUMBER(t): r'\d+' try: t.value = int(t.value) except ValueError: print "Integer value too large", t.value t.value = 0 return t # Ignored characters t_ignore = " \t" def t_newline(t): r'\n+' t.lexer.lineno += t.value.count("\n") def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer import ply.lex as lex lex.lex() # Precedence rules for the arithmetic operators precedence = ( ('left','PLUS','MINUS'), ('left','TIMES','DIVIDE'), ('right','UMINUS'), ) # dictionary of names (for storing variables) names = { } def p_statement_assign(p): 'statement : NAME EQUALS expression' names[p[1]] = p[3] def p_statement_expr(p): 'statement : expression' print p[1] def p_expression_binop(p): '''expression : expression PLUS expression | expression MINUS expression | expression TIMES expression | expression DIVIDE expression''' if p[2] == '+' : p[0] = p[1] + p[3] elif p[2] == '-': p[0] = p[1] - p[3] elif p[2] == '*': p[0] = p[1] * p[3] elif p[2] == '/': p[0] = p[1] / p[3] def p_expression_uminus(p): 'expression : MINUS expression %prec UMINUS' p[0] = -p[2] def p_expression_group(p): 'expression : LPAREN expression RPAREN' p[0] = p[2] def p_expression_number(p): 'expression : NUMBER' p[0] = p[1] def p_expression_name(p): 'expression : NAME' try: p[0] = names[p[1]] except LookupError: print "Undefined name '%s'" % p[1] p[0] = 0 def p_error(p): print "Syntax error at '%s'" % p.value import ply.yacc as yacc yacc.yacc() while 1: try: s = raw_input('calc > ') except EOFError: break yacc.parse(s) Bug Reports and Patches ======================= Because of the extremely specialized and advanced nature of PLY, I rarely spend much time working on it unless I receive very specific bug-reports and/or patches to fix problems. I also try to incorporate submitted feature requests and enhancements into each new version. To contact me about bugs and/or new features, please send email to dave@dabeaz.com. In addition there is a Google group for discussing PLY related issues at http://groups.google.com/group/ply-hack -- Dave