ext/ply/CHANGES

   1
   2 Version 3.2
   3 -----------------------------
   4 03/24/09: beazley
   5           Added an extra check to not print duplicated warning messages
   6           about reduce/reduce conflicts.
   7
   8 03/24/09: beazley
   9           Switched PLY over to a BSD-license.
  10
  11 03/23/09: beazley
  12           Performance optimization.  Discovered a few places to make
  13           speedups in LR table generation.
  14
  15 03/23/09: beazley
  16           New warning message.  PLY now warns about rules never
  17           reduced due to reduce/reduce conflicts.  Suggested by
  18           Bruce Frederiksen.
  19
  20 03/23/09: beazley
  21           Some clean-up of warning messages related to reduce/reduce errors.
  22
  23 03/23/09: beazley
  24           Added a new picklefile option to yacc() to write the parsing
  25           tables to a filename using the pickle module.   Here is how
  26           it works:
  27
  28               yacc(picklefile="parsetab.p")
  29
  30           This option can be used if the normal parsetab.py file is
  31           extremely large.  For example, on jython, it is impossible
  32           to read parsing tables if the parsetab.py exceeds a certain
  33           threshold.
  34
  35           The filename supplied to the picklefile option is opened
  36           relative to the current working directory of the Python
  37           interpreter.  If you need to refer to the file elsewhere,
  38           you will need to supply an absolute or relative path.
  39
  40           For maximum portability, the pickle file is written
  41           using protocol 0.
  42
  43 03/13/09: beazley
  44           Fixed a bug in parser.out generation where the rule numbers
  45           where off by one.
  46
  47 03/13/09: beazley
  48           Fixed a string formatting bug with one of the error messages.
  49           Reported by Richard Reitmeyer
  50
  51 Version 3.1
  52 -----------------------------
  53 02/28/09: beazley
  54           Fixed broken start argument to yacc().  PLY-3.0 broke this
  55           feature by accident.
  56
  57 02/28/09: beazley
  58           Fixed debugging output. yacc() no longer reports shift/reduce
  59           or reduce/reduce conflicts if debugging is turned off.  This
  60           restores similar behavior in PLY-2.5.   Reported by Andrew Waters.
  61
  62 Version 3.0
  63 -----------------------------
  64 02/03/09: beazley
  65           Fixed missing lexer attribute on certain tokens when
  66           invoking the parser p_error() function.  Reported by
  67           Bart Whiteley.
  68
  69 02/02/09: beazley
  70           The lex() command now does all error-reporting and diagonistics
  71           using the logging module interface.   Pass in a Logger object
  72           using the errorlog parameter to specify a different logger.
  73
  74 02/02/09: beazley
  75           Refactored ply.lex to use a more object-oriented and organized
  76           approach to collecting lexer information.
  77
  78 02/01/09: beazley
  79           Removed the nowarn option from lex().  All output is controlled
  80           by passing in a logger object.   Just pass in a logger with a high
  81           level setting to suppress output.   This argument was never
  82           documented to begin with so hopefully no one was relying upon it.
  83
  84 02/01/09: beazley
  85           Discovered and removed a dead if-statement in the lexer.  This
  86           resulted in a 6-7% speedup in lexing when I tested it.
  87
  88 01/13/09: beazley
  89           Minor change to the procedure for signalling a syntax error in a
  90           production rule.  A normal SyntaxError exception should be raised
  91           instead of yacc.SyntaxError.
  92
  93 01/13/09: beazley
  94           Added a new method p.set_lineno(n,lineno) that can be used to set the
  95           line number of symbol n in grammar rules.   This simplifies manual
  96           tracking of line numbers.
  97
  98 01/11/09: beazley
  99           Vastly improved debugging support for yacc.parse().   Instead of passing
 100           debug as an integer, you can supply a Logging object (see the logging
 101           module). Messages will be generated at the ERROR, INFO, and DEBUG
 102           logging levels, each level providing progressively more information.
 103           The debugging trace also shows states, grammar rule, values passed
 104           into grammar rules, and the result of each reduction.
 105
 106 01/09/09: beazley
 107           The yacc() command now does all error-reporting and diagnostics using
 108           the interface of the logging module.  Use the errorlog parameter to
 109           specify a logging object for error messages.  Use the debuglog parameter
 110           to specify a logging object for the 'parser.out' output.
 111
 112 01/09/09: beazley
 113           *HUGE* refactoring of the the ply.yacc() implementation.   The high-level
 114           user interface is backwards compatible, but the internals are completely
 115           reorganized into classes.  No more global variables.    The internals
 116           are also more extensible.  For example, you can use the classes to
 117           construct a LALR(1) parser in an entirely different manner than
 118           what is currently the case.  Documentation is forthcoming.
 119
 120 01/07/09: beazley
 121           Various cleanup and refactoring of yacc internals.
 122
 123 01/06/09: beazley
 124           Fixed a bug with precedence assignment.  yacc was assigning the precedence
 125           each rule based on the left-most token, when in fact, it should have been
 126           using the right-most token.  Reported by Bruce Frederiksen.
 127
 128 11/27/08: beazley
 129           Numerous changes to support Python 3.0 including removal of deprecated
 130           statements (e.g., has_key) and the additional of compatibility code
 131           to emulate features from Python 2 that have been removed, but which
 132           are needed.   Fixed the unit testing suite to work with Python 3.0.
 133           The code should be backwards compatible with Python 2.
 134
 135 11/26/08: beazley
 136           Loosened the rules on what kind of objects can be passed in as the
 137           "module" parameter to lex() and yacc().  Previously, you could only use
 138           a module or an instance.  Now, PLY just uses dir() to get a list of
 139           symbols on whatever the object is without regard for its type.
 140
 141 11/26/08: beazley
 142           Changed all except: statements to be compatible with Python2.x/3.x syntax.
 143
 144 11/26/08: beazley
 145           Changed all raise Exception, value statements to raise Exception(value) for
 146           forward compatibility.
 147
 148 11/26/08: beazley
 149           Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
 150           directly.  Preparation for Python 3.0 support.
 151
 152 11/04/08: beazley
 153           Fixed a bug with referring to symbols on the the parsing stack using negative
 154           indices.
 155
 156 05/29/08: beazley
 157           Completely revamped the testing system to use the unittest module for everything.
 158           Added additional tests to cover new errors/warnings.
 159
 160 Version 2.5
 161 -----------------------------
 162 05/28/08: beazley
 163           Fixed a bug with writing lex-tables in optimized mode and start states.
 164           Reported by Kevin Henry.
 165
 166 Version 2.4
 167 -----------------------------
 168 05/04/08: beazley
 169           A version number is now embedded in the table file signature so that
 170           yacc can more gracefully accomodate changes to the output format
 171           in the future.
 172
 173 05/04/08: beazley
 174           Removed undocumented .pushback() method on grammar productions.  I'm
 175           not sure this ever worked and can't recall ever using it.  Might have
 176           been an abandoned idea that never really got fleshed out.  This
 177           feature was never described or tested so removing it is hopefully
 178           harmless.
 179
 180 05/04/08: beazley
 181           Added extra error checking to yacc() to detect precedence rules defined
 182           for undefined terminal symbols.   This allows yacc() to detect a potential
 183           problem that can be really tricky to debug if no warning message or error
 184           message is generated about it.
 185
 186 05/04/08: beazley
 187           lex() now has an outputdir that can specify the output directory for
 188           tables when running in optimize mode.  For example:
 189
 190              lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
 191
 192           The behavior of specifying a table module and output directory are
 193           more aligned with the behavior of yacc().
 194
 195 05/04/08: beazley
 196           [Issue 9]
 197           Fixed filename bug in when specifying the modulename in lex() and yacc().
 198           If you specified options such as the following:
 199
 200              parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
 201
 202           yacc would create a file "foo.bar.parsetab.py" in the given directory.
 203           Now, it simply generates a file "parsetab.py" in that directory.
 204           Bug reported by cptbinho.
 205
 206 05/04/08: beazley
 207           Slight modification to lex() and yacc() to allow their table files
 208           to be loaded from a previously loaded module.   This might make
 209           it easier to load the parsing tables from a complicated package
 210           structure.  For example:
 211
 212                import foo.bar.spam.parsetab as parsetab
 213                parser = yacc.yacc(tabmodule=parsetab)
 214
 215           Note:  lex and yacc will never regenerate the table file if used
 216           in the form---you will get a warning message instead.
 217           This idea suggested by Brian Clapper.
 218
 219
 220 04/28/08: beazley
 221           Fixed a big with p_error() functions being picked up correctly
 222           when running in yacc(optimize=1) mode.  Patch contributed by
 223           Bart Whiteley.
 224
 225 02/28/08: beazley
 226           Fixed a bug with 'nonassoc' precedence rules.   Basically the
 227           non-precedence was being ignored and not producing the correct
 228           run-time behavior in the parser.
 229
 230 02/16/08: beazley
 231           Slight relaxation of what the input() method to a lexer will
 232           accept as a string.   Instead of testing the input to see
 233           if the input is a string or unicode string, it checks to see
 234           if the input object looks like it contains string data.
 235           This change makes it possible to pass string-like objects
 236           in as input.  For example, the object returned by mmap.
 237
 238               import mmap, os
 239               data = mmap.mmap(os.open(filename,os.O_RDONLY),
 240                                os.path.getsize(filename),
 241                                access=mmap.ACCESS_READ)
 242               lexer.input(data)
 243
 244
 245 11/29/07: beazley
 246           Modification of ply.lex to allow token functions to aliased.
 247           This is subtle, but it makes it easier to create libraries and
 248           to reuse token specifications.  For example, suppose you defined
 249           a function like this:
 250
 251                def number(t):
 252                     r'\d+'
 253                     t.value = int(t.value)
 254                     return t
 255
 256           This change would allow you to define a token rule as follows:
 257
 258               t_NUMBER = number
 259
 260           In this case, the token type will be set to 'NUMBER' and use
 261           the associated number() function to process tokens.
 262
 263 11/28/07: beazley
 264           Slight modification to lex and yacc to grab symbols from both
 265           the local and global dictionaries of the caller.   This
 266           modification allows lexers and parsers to be defined using
 267           inner functions and closures.
 268
 269 11/28/07: beazley
 270           Performance optimization:  The lexer.lexmatch and t.lexer
 271           attributes are no longer set for lexer tokens that are not
 272           defined by functions.   The only normal use of these attributes
 273           would be in lexer rules that need to perform some kind of
 274           special processing.  Thus, it doesn't make any sense to set
 275           them on every token.
 276
 277           *** POTENTIAL INCOMPATIBILITY ***  This might break code
 278           that is mucking around with internal lexer state in some
 279           sort of magical way.
 280
 281 11/27/07: beazley
 282           Added the ability to put the parser into error-handling mode
 283           from within a normal production.   To do this, simply raise
 284           a yacc.SyntaxError exception like this:
 285
 286           def p_some_production(p):
 287               'some_production : prod1 prod2'
 288               ...
 289               raise yacc.SyntaxError      # Signal an error
 290
 291           A number of things happen after this occurs:
 292
 293           - The last symbol shifted onto the symbol stack is discarded
 294             and parser state backed up to what it was before the
 295             the rule reduction.
 296
 297           - The current lookahead symbol is saved and replaced by
 298             the 'error' symbol.
 299
 300           - The parser enters error recovery mode where it tries
 301             to either reduce the 'error' rule or it starts
 302             discarding items off of the stack until the parser
 303             resets.
 304
 305           When an error is manually set, the parser does *not* call
 306           the p_error() function (if any is defined).
 307           *** NEW FEATURE *** Suggested on the mailing list
 308
 309 11/27/07: beazley
 310           Fixed structure bug in examples/ansic.  Reported by Dion Blazakis.
 311
 312 11/27/07: beazley
 313           Fixed a bug in the lexer related to start conditions and ignored
 314           token rules.  If a rule was defined that changed state, but
 315           returned no token, the lexer could be left in an inconsistent
 316           state.  Reported by
 317
 318 11/27/07: beazley
 319           Modified setup.py to support Python Eggs.   Patch contributed by
 320           Simon Cross.
 321
 322 11/09/07: beazely
 323           Fixed a bug in error handling in yacc.  If a syntax error occurred and the
 324           parser rolled the entire parse stack back, the parser would be left in in
 325           inconsistent state that would cause it to trigger incorrect actions on
 326           subsequent input.  Reported by Ton Biegstraaten, Justin King, and others.
 327
 328 11/09/07: beazley
 329           Fixed a bug when passing empty input strings to yacc.parse().   This
 330           would result in an error message about "No input given".  Reported
 331           by Andrew Dalke.
 332
 333 Version 2.3
 334 -----------------------------
 335 02/20/07: beazley
 336           Fixed a bug with character literals if the literal '.' appeared as the
 337           last symbol of a grammar rule.  Reported by Ales Smrcka.
 338
 339 02/19/07: beazley
 340           Warning messages are now redirected to stderr instead of being printed
 341           to standard output.
 342
 343 02/19/07: beazley
 344           Added a warning message to lex.py if it detects a literal backslash
 345           character inside the t_ignore declaration.  This is to help
 346           problems that might occur if someone accidentally defines t_ignore
 347           as a Python raw string.  For example:
 348
 349               t_ignore = r' \t'
 350
 351           The idea for this is from an email I received from David Cimimi who
 352           reported bizarre behavior in lexing as a result of defining t_ignore
 353           as a raw string by accident.
 354
 355 02/18/07: beazley
 356           Performance improvements.  Made some changes to the internal
 357           table organization and LR parser to improve parsing performance.
 358
 359 02/18/07: beazley
 360           Automatic tracking of line number and position information must now be
 361           enabled by a special flag to parse().  For example:
 362
 363               yacc.parse(data,tracking=True)
 364
 365           In many applications, it's just not that important to have the
 366           parser automatically track all line numbers.  By making this an
 367           optional feature, it allows the parser to run significantly faster
 368           (more than a 20% speed increase in many cases).    Note: positional
 369           information is always available for raw tokens---this change only
 370           applies to positional information associated with nonterminal
 371           grammar symbols.
 372           *** POTENTIAL INCOMPATIBILITY ***
 373
 374 02/18/07: beazley
 375           Yacc no longer supports extended slices of grammar productions.
 376           However, it does support regular slices.  For example:
 377
 378           def p_foo(p):
 379               '''foo: a b c d e'''
 380               p[0] = p[1:3]
 381
 382           This change is a performance improvement to the parser--it streamlines
 383           normal access to the grammar values since slices are now handled in
 384           a __getslice__() method as opposed to __getitem__().
 385
 386 02/12/07: beazley
 387           Fixed a bug in the handling of token names when combined with
 388           start conditions.   Bug reported by Todd O'Bryan.
 389
 390 Version 2.2
 391 ------------------------------
 392 11/01/06: beazley
 393           Added lexpos() and lexspan() methods to grammar symbols.  These
 394           mirror the same functionality of lineno() and linespan().  For
 395           example:
 396
 397           def p_expr(p):
 398               'expr : expr PLUS expr'
 399                p.lexpos(1)     # Lexing position of left-hand-expression
 400                p.lexpos(1)     # Lexing position of PLUS
 401                start,end = p.lexspan(3)  # Lexing range of right hand expression
 402
 403 11/01/06: beazley
 404           Minor change to error handling.  The recommended way to skip characters
 405           in the input is to use t.lexer.skip() as shown here:
 406
 407              def t_error(t):
 408                  print "Illegal character '%s'" % t.value[0]
 409                  t.lexer.skip(1)
 410
 411           The old approach of just using t.skip(1) will still work, but won't
 412           be documented.
 413
 414 10/31/06: beazley
 415           Discarded tokens can now be specified as simple strings instead of
 416           functions.  To do this, simply include the text "ignore_" in the
 417           token declaration.  For example:
 418
 419               t_ignore_cppcomment = r'//.*'
 420
 421           Previously, this had to be done with a function.  For example:
 422
 423               def t_ignore_cppcomment(t):
 424                   r'//.*'
 425                   pass
 426
 427           If start conditions/states are being used, state names should appear
 428           before the "ignore_" text.
 429
 430 10/19/06: beazley
 431           The Lex module now provides support for flex-style start conditions
 432           as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
 433           Please refer to this document to understand this change note.  Refer to
 434           the PLY documentation for PLY-specific explanation of how this works.
 435
 436           To use start conditions, you first need to declare a set of states in
 437           your lexer file:
 438
 439           states = (
 440                     ('foo','exclusive'),
 441                     ('bar','inclusive')
 442           )
 443
 444           This serves the same role as the %s and %x specifiers in flex.
 445
 446           One a state has been declared, tokens for that state can be
 447           declared by defining rules of the form t_state_TOK.  For example:
 448
 449             t_PLUS = '\+'          # Rule defined in INITIAL state
 450             t_foo_NUM = '\d+'      # Rule defined in foo state
 451             t_bar_NUM = '\d+'      # Rule defined in bar state
 452
 453             t_foo_bar_NUM = '\d+'  # Rule defined in both foo and bar
 454             t_ANY_NUM = '\d+'      # Rule defined in all states
 455
 456           In addition to defining tokens for each state, the t_ignore and t_error
 457           specifications can be customized for specific states.  For example:
 458
 459             t_foo_ignore = " "     # Ignored characters for foo state
 460             def t_bar_error(t):
 461                 # Handle errors in bar state
 462
 463           With token rules, the following methods can be used to change states
 464
 465             def t_TOKNAME(t):
 466                 t.lexer.begin('foo')        # Begin state 'foo'
 467                 t.lexer.push_state('foo')   # Begin state 'foo', push old state
 468                                             # onto a stack
 469                 t.lexer.pop_state()         # Restore previous state
 470                 t.lexer.current_state()     # Returns name of current state
 471
 472           These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
 473           yy_top_state() functions in flex.
 474
 475           The use of start states can be used as one way to write sub-lexers.
 476           For example, the lexer or parser might instruct the lexer to start
 477           generating a different set of tokens depending on the context.
 478
 479           example/yply/ylex.py shows the use of start states to grab C/C++
 480           code fragments out of traditional yacc specification files.
 481
 482           *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
 483           discussed various aspects of the design.
 484
 485 10/19/06: beazley
 486           Minor change to the way in which yacc.py was reporting shift/reduce
 487           conflicts.  Although the underlying LALR(1) algorithm was correct,
 488           PLY was under-reporting the number of conflicts compared to yacc/bison
 489           when precedence rules were in effect.  This change should make PLY
 490           report the same number of conflicts as yacc.
 491
 492 10/19/06: beazley
 493           Modified yacc so that grammar rules could also include the '-'
 494           character.  For example:
 495
 496             def p_expr_list(p):
 497                 'expression-list : expression-list expression'
 498
 499           Suggested by Oldrich Jedlicka.
 500
 501 10/18/06: beazley
 502           Attribute lexer.lexmatch added so that token rules can access the re
 503           match object that was generated.  For example:
 504
 505           def t_FOO(t):
 506               r'some regex'
 507               m = t.lexer.lexmatch
 508               # Do something with m
 509
 510
 511           This may be useful if you want to access named groups specified within
 512           the regex for a specific token. Suggested by Oldrich Jedlicka.
 513
 514 10/16/06: beazley
 515           Changed the error message that results if an illegal character
 516           is encountered and no default error function is defined in lex.
 517           The exception is now more informative about the actual cause of
 518           the error.
 519
 520 Version 2.1
 521 ------------------------------
 522 10/02/06: beazley
 523           The last Lexer object built by lex() can be found in lex.lexer.
 524           The last Parser object built  by yacc() can be found in yacc.parser.
 525
 526 10/02/06: beazley
 527           New example added:  examples/yply
 528
 529           This example uses PLY to convert Unix-yacc specification files to
 530           PLY programs with the same grammar.   This may be useful if you
 531           want to convert a grammar from bison/yacc to use with PLY.
 532
 533 10/02/06: beazley
 534           Added support for a start symbol to be specified in the yacc
 535           input file itself.  Just do this:
 536
 537                start = 'name'
 538
 539           where 'name' matches some grammar rule.  For example:
 540
 541                def p_name(p):
 542                    'name : A B C'
 543                    ...
 544
 545           This mirrors the functionality of the yacc %start specifier.
 546
 547 09/30/06: beazley
 548           Some new examples added.:
 549
 550           examples/GardenSnake : A simple indentation based language similar
 551                                  to Python.  Shows how you might handle
 552                                  whitespace.  Contributed by Andrew Dalke.
 553
 554           examples/BASIC       : An implementation of 1964 Dartmouth BASIC.
 555                                  Contributed by Dave against his better
 556                                  judgement.
 557
 558 09/28/06: beazley
 559           Minor patch to allow named groups to be used in lex regular
 560           expression rules.  For example:
 561
 562               t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
 563
 564           Patch submitted by Adam Ring.
 565
 566 09/28/06: beazley
 567           LALR(1) is now the default parsing method.   To use SLR, use
 568           yacc.yacc(method="SLR").  Note: there is no performance impact
 569           on parsing when using LALR(1) instead of SLR. However, constructing
 570           the parsing tables will take a little longer.
 571
 572 09/26/06: beazley
 573           Change to line number tracking.  To modify line numbers, modify
 574           the line number of the lexer itself.  For example:
 575
 576           def t_NEWLINE(t):
 577               r'\n'
 578               t.lexer.lineno += 1
 579
 580           This modification is both cleanup and a performance optimization.
 581           In past versions, lex was monitoring every token for changes in
 582           the line number.  This extra processing is unnecessary for a vast
 583           majority of tokens. Thus, this new approach cleans it up a bit.
 584
 585           *** POTENTIAL INCOMPATIBILITY ***
 586           You will need to change code in your lexer that updates the line
 587           number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
 588
 589 09/26/06: beazley
 590           Added the lexing position to tokens as an attribute lexpos. This
 591           is the raw index into the input text at which a token appears.
 592           This information can be used to compute column numbers and other
 593           details (e.g., scan backwards from lexpos to the first newline
 594           to get a column position).
 595
 596 09/25/06: beazley
 597           Changed the name of the __copy__() method on the Lexer class
 598           to clone().  This is used to clone a Lexer object (e.g., if
 599           you're running different lexers at the same time).
 600
 601 09/21/06: beazley
 602           Limitations related to the use of the re module have been eliminated.
 603           Several users reported problems with regular expressions exceeding
 604           more than 100 named groups. To solve this, lex.py is now capable
 605           of automatically splitting its master regular regular expression into
 606           smaller expressions as needed.   This should, in theory, make it
 607           possible to specify an arbitrarily large number of tokens.
 608
 609 09/21/06: beazley
 610           Improved error checking in lex.py.  Rules that match the empty string
 611           are now rejected (otherwise they cause the lexer to enter an infinite
 612           loop).  An extra check for rules containing '#' has also been added.
 613           Since lex compiles regular expressions in verbose mode, '#' is interpreted
 614           as a regex comment, it is critical to use '\#' instead.
 615
 616 09/18/06: beazley
 617           Added a @TOKEN decorator function to lex.py that can be used to
 618           define token rules where the documentation string might be computed
 619           in some way.
 620
 621           digit            = r'([0-9])'
 622           nondigit         = r'([_A-Za-z])'
 623           identifier       = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
 624
 625           from ply.lex import TOKEN
 626
 627           @TOKEN(identifier)
 628           def t_ID(t):
 629                # Do whatever
 630
 631           The @TOKEN decorator merely sets the documentation string of the
 632           associated token function as needed for lex to work.
 633
 634           Note: An alternative solution is the following:
 635
 636           def t_ID(t):
 637               # Do whatever
 638
 639           t_ID.__doc__ = identifier
 640
 641           Note: Decorators require the use of Python 2.4 or later.  If compatibility
 642           with old versions is needed, use the latter solution.
 643
 644           The need for this feature was suggested by Cem Karan.
 645
 646 09/14/06: beazley
 647           Support for single-character literal tokens has been added to yacc.
 648           These literals must be enclosed in quotes.  For example:
 649
 650           def p_expr(p):
 651                "expr : expr '+' expr"
 652                ...
 653
 654           def p_expr(p):
 655                'expr : expr "-" expr'
 656                ...
 657
 658           In addition to this, it is necessary to tell the lexer module about
 659           literal characters.   This is done by defining the variable 'literals'
 660           as a list of characters.  This should  be defined in the module that
 661           invokes the lex.lex() function.  For example:
 662
 663              literals = ['+','-','*','/','(',')','=']
 664
 665           or simply
 666
 667              literals = '+=*/()='
 668
 669           It is important to note that literals can only be a single character.
 670           When the lexer fails to match a token using its normal regular expression
 671           rules, it will check the current character against the literal list.
 672           If found, it will be returned with a token type set to match the literal
 673           character.  Otherwise, an illegal character will be signalled.
 674
 675
 676 09/14/06: beazley
 677           Modified PLY to install itself as a proper Python package called 'ply'.
 678           This will make it a little more friendly to other modules.  This
 679           changes the usage of PLY only slightly.  Just do this to import the
 680           modules
 681
 682                 import ply.lex as lex
 683                 import ply.yacc as yacc
 684
 685           Alternatively, you can do this:
 686
 687                 from ply import *
 688
 689           Which imports both the lex and yacc modules.
 690           Change suggested by Lee June.
 691
 692 09/13/06: beazley
 693           Changed the handling of negative indices when used in production rules.
 694           A negative production index now accesses already parsed symbols on the
 695           parsing stack.  For example,
 696
 697               def p_foo(p):
 698                    "foo: A B C D"
 699                    print p[1]       # Value of 'A' symbol
 700                    print p[2]       # Value of 'B' symbol
 701                    print p[-1]      # Value of whatever symbol appears before A
 702                                     # on the parsing stack.
 703
 704                    p[0] = some_val  # Sets the value of the 'foo' grammer symbol
 705
 706           This behavior makes it easier to work with embedded actions within the
 707           parsing rules. For example, in C-yacc, it is possible to write code like
 708           this:
 709
 710                bar:   A { printf("seen an A = %d\n", $1); } B { do_stuff; }
 711
 712           In this example, the printf() code executes immediately after A has been
 713           parsed.  Within the embedded action code, $1 refers to the A symbol on
 714           the stack.
 715
 716           To perform this equivalent action in PLY, you need to write a pair
 717           of rules like this:
 718
 719                def p_bar(p):
 720                      "bar : A seen_A B"
 721                      do_stuff
 722
 723                def p_seen_A(p):
 724                      "seen_A :"
 725                      print "seen an A =", p[-1]
 726
 727           The second rule "seen_A" is merely a empty production which should be
 728           reduced as soon as A is parsed in the "bar" rule above.  The use
 729           of the negative index p[-1] is used to access whatever symbol appeared
 730           before the seen_A symbol.
 731
 732           This feature also makes it possible to support inherited attributes.
 733           For example:
 734
 735                def p_decl(p):
 736                      "decl : scope name"
 737
 738                def p_scope(p):
 739                      """scope : GLOBAL
 740                               | LOCAL"""
 741                    p[0] = p[1]
 742
 743                def p_name(p):
 744                      "name : ID"
 745                      if p[-1] == "GLOBAL":
 746                           # ...
 747                      else if p[-1] == "LOCAL":
 748                           #...
 749
 750           In this case, the name rule is inheriting an attribute from the
 751           scope declaration that precedes it.
 752
 753           *** POTENTIAL INCOMPATIBILITY ***
 754           If you are currently using negative indices within existing grammar rules,
 755           your code will break.  This should be extremely rare if non-existent in
 756           most cases.  The argument to various grammar rules is not usually not
 757           processed in the same way as a list of items.
 758
 759 Version 2.0
 760 ------------------------------
 761 09/07/06: beazley
 762           Major cleanup and refactoring of the LR table generation code.  Both SLR
 763           and LALR(1) table generation is now performed by the same code base with
 764           only minor extensions for extra LALR(1) processing.
 765
 766 09/07/06: beazley
 767           Completely reimplemented the entire LALR(1) parsing engine to use the
 768           DeRemer and Pennello algorithm for calculating lookahead sets.  This
 769           significantly improves the performance of generating LALR(1) tables
 770           and has the added feature of actually working correctly!  If you
 771           experienced weird behavior with LALR(1) in prior releases, this should
 772           hopefully resolve all of those problems.  Many thanks to
 773           Andrew Waters and Markus Schoepflin for submitting bug reports
 774           and helping me test out the revised LALR(1) support.
 775
 776 Version 1.8
 777 ------------------------------
 778 08/02/06: beazley
 779           Fixed a problem related to the handling of default actions in LALR(1)
 780           parsing.  If you experienced subtle and/or bizarre behavior when trying
 781           to use the LALR(1) engine, this may correct those problems.  Patch
 782           contributed by Russ Cox.  Note: This patch has been superceded by
 783           revisions for LALR(1) parsing in Ply-2.0.
 784
 785 08/02/06: beazley
 786           Added support for slicing of productions in yacc.
 787           Patch contributed by Patrick Mezard.
 788
 789 Version 1.7
 790 ------------------------------
 791 03/02/06: beazley
 792           Fixed infinite recursion problem ReduceToTerminals() function that
 793           would sometimes come up in LALR(1) table generation.  Reported by
 794           Markus Schoepflin.
 795
 796 03/01/06: beazley
 797           Added "reflags" argument to lex().  For example:
 798
 799                lex.lex(reflags=re.UNICODE)
 800
 801           This can be used to specify optional flags to the re.compile() function
 802           used inside the lexer.   This may be necessary for special situations such
 803           as processing Unicode (e.g., if you want escapes like \w and \b to consult
 804           the Unicode character property database).   The need for this suggested by
 805           Andreas Jung.
 806
 807 03/01/06: beazley
 808           Fixed a bug with an uninitialized variable on repeated instantiations of parser
 809           objects when the write_tables=0 argument was used.   Reported by Michael Brown.
 810
 811 03/01/06: beazley
 812           Modified lex.py to accept Unicode strings both as the regular expressions for
 813           tokens and as input. Hopefully this is the only change needed for Unicode support.
 814           Patch contributed by Johan Dahl.
 815
 816 03/01/06: beazley
 817           Modified the class-based interface to work with new-style or old-style classes.
 818           Patch contributed by Michael Brown (although I tweaked it slightly so it would work
 819           with older versions of Python).
 820
 821 Version 1.6
 822 ------------------------------
 823 05/27/05: beazley
 824           Incorporated patch contributed by Christopher Stawarz to fix an extremely
 825           devious bug in LALR(1) parser generation.   This patch should fix problems
 826           numerous people reported with LALR parsing.
 827
 828 05/27/05: beazley
 829           Fixed problem with lex.py copy constructor.  Reported by Dave Aitel, Aaron Lav,
 830           and Thad Austin.
 831
 832 05/27/05: beazley
 833           Added outputdir option to yacc()  to control output directory. Contributed
 834           by Christopher Stawarz.
 835
 836 05/27/05: beazley
 837           Added rununit.py test script to run tests using the Python unittest module.
 838           Contributed by Miki Tebeka.
 839
 840 Version 1.5
 841 ------------------------------
 842 05/26/04: beazley
 843           Major enhancement. LALR(1) parsing support is now working.
 844           This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
 845           and optimized by David Beazley. To use LALR(1) parsing do
 846           the following:
 847
 848                yacc.yacc(method="LALR")
 849
 850           Computing LALR(1) parsing tables takes about twice as long as
 851           the default SLR method.  However, LALR(1) allows you to handle
 852           more complex grammars.  For example, the ANSI C grammar
 853           (in example/ansic) has 13 shift-reduce conflicts with SLR, but
 854           only has 1 shift-reduce conflict with LALR(1).
 855
 856 05/20/04: beazley
 857           Added a __len__ method to parser production lists.  Can
 858           be used in parser rules like this:
 859
 860              def p_somerule(p):
 861                  """a : B C D
 862                       | E F"
 863                  if (len(p) == 3):
 864                      # Must have been first rule
 865                  elif (len(p) == 2):
 866                      # Must be second rule
 867
 868           Suggested by Joshua Gerth and others.
 869
 870 Version 1.4
 871 ------------------------------
 872 04/23/04: beazley
 873           Incorporated a variety of patches contributed by Eric Raymond.
 874           These include:
 875
 876            0. Cleans up some comments so they don't wrap on an 80-column display.
 877            1. Directs compiler errors to stderr where they belong.
 878            2. Implements and documents automatic line counting when \n is ignored.
 879            3. Changes the way progress messages are dumped when debugging is on.
 880               The new format is both less verbose and conveys more information than
 881               the old, including shift and reduce actions.
 882
 883 04/23/04: beazley
 884           Added a Python setup.py file to simply installation.  Contributed
 885           by Adam Kerrison.
 886
 887 04/23/04: beazley
 888           Added patches contributed by Adam Kerrison.
 889
 890           -   Some output is now only shown when debugging is enabled.  This
 891               means that PLY will be completely silent when not in debugging mode.
 892
 893           -   An optional parameter "write_tables" can be passed to yacc() to
 894               control whether or not parsing tables are written.   By default,
 895               it is true, but it can be turned off if you don't want the yacc
 896               table file. Note: disabling this will cause yacc() to regenerate
 897               the parsing table each time.
 898
 899 04/23/04: beazley
 900           Added patches contributed by David McNab.  This patch addes two
 901           features:
 902
 903           -   The parser can be supplied as a class instead of a module.
 904               For an example of this, see the example/classcalc directory.
 905
 906           -   Debugging output can be directed to a filename of the user's
 907               choice.  Use
 908
 909                  yacc(debugfile="somefile.out")
 910
 911
 912 Version 1.3
 913 ------------------------------
 914 12/10/02: jmdyck
 915           Various minor adjustments to the code that Dave checked in today.
 916           Updated test/yacc_{inf,unused}.exp to reflect today's changes.
 917
 918 12/10/02: beazley
 919           Incorporated a variety of minor bug fixes to empty production
 920           handling and infinite recursion checking.  Contributed by
 921           Michael Dyck.
 922
 923 12/10/02: beazley
 924           Removed bogus recover() method call in yacc.restart()
 925
 926 Version 1.2
 927 ------------------------------
 928 11/27/02: beazley
 929           Lexer and parser objects are now available as an attribute
 930           of tokens and slices respectively. For example:
 931
 932              def t_NUMBER(t):
 933                  r'\d+'
 934                  print t.lexer
 935
 936              def p_expr_plus(t):
 937                  'expr: expr PLUS expr'
 938                  print t.lexer
 939                  print t.parser
 940
 941           This can be used for state management (if needed).
 942
 943 10/31/02: beazley
 944           Modified yacc.py to work with Python optimize mode.  To make
 945           this work, you need to use
 946
 947               yacc.yacc(optimize=1)
 948
 949           Furthermore, you need to first run Python in normal mode
 950           to generate the necessary parsetab.py files.  After that,
 951           you can use python -O or python -OO.
 952
 953           Note: optimized mode turns off a lot of error checking.
 954           Only use when you are sure that your grammar is working.
 955           Make sure parsetab.py is up to date!
 956
 957 10/30/02: beazley
 958           Added cloning of Lexer objects.   For example:
 959
 960               import copy
 961               l = lex.lex()
 962               lc = copy.copy(l)
 963
 964               l.input("Some text")
 965               lc.input("Some other text")
 966               ...
 967
 968           This might be useful if the same "lexer" is meant to
 969           be used in different contexts---or if multiple lexers
 970           are running concurrently.
 971
 972 10/30/02: beazley
 973           Fixed subtle bug with first set computation and empty productions.
 974           Patch submitted by Michael Dyck.
 975
 976 10/30/02: beazley
 977           Fixed error messages to use "filename:line: message" instead
 978           of "filename:line. message".  This makes error reporting more
 979           friendly to emacs. Patch submitted by François Pinard.
 980
 981 10/30/02: beazley
 982           Improvements to parser.out file.  Terminals and nonterminals
 983           are sorted instead of being printed in random order.
 984           Patch submitted by François Pinard.
 985
 986 10/30/02: beazley
 987           Improvements to parser.out file output.  Rules are now printed
 988           in a way that's easier to understand.  Contributed by Russ Cox.
 989
 990 10/30/02: beazley
 991           Added 'nonassoc' associativity support.    This can be used
 992           to disable the chaining of operators like a < b < c.
 993           To use, simply specify 'nonassoc' in the precedence table
 994
 995           precedence = (
 996             ('nonassoc', 'LESSTHAN', 'GREATERTHAN'),  # Nonassociative operators
 997             ('left', 'PLUS', 'MINUS'),
 998             ('left', 'TIMES', 'DIVIDE'),
 999             ('right', 'UMINUS'),            # Unary minus operator
1000           )
1001
1002           Patch contributed by Russ Cox.
1003
1004 10/30/02: beazley
1005           Modified the lexer to provide optional support for Python -O and -OO
1006           modes.  To make this work, Python *first* needs to be run in
1007           unoptimized mode.  This reads the lexing information and creates a
1008           file "lextab.py".  Then, run lex like this:
1009
1010                    # module foo.py
1011                    ...
1012                    ...
1013                    lex.lex(optimize=1)
1014
1015           Once the lextab file has been created, subsequent calls to
1016           lex.lex() will read data from the lextab file instead of using
1017           introspection.   In optimized mode (-O, -OO) everything should
1018           work normally despite the loss of doc strings.
1019
1020           To change the name of the file 'lextab.py' use the following:
1021
1022                   lex.lex(lextab="footab")
1023
1024           (this creates a file footab.py)
1025
1026
1027 Version 1.1   October 25, 2001
1028 ------------------------------
1029
1030 10/25/01: beazley
1031           Modified the table generator to produce much more compact data.
1032           This should greatly reduce the size of the parsetab.py[c] file.
1033           Caveat: the tables still need to be constructed so a little more
1034           work is done in parsetab on import.
1035
1036 10/25/01: beazley
1037           There may be a possible bug in the cycle detector that reports errors
1038           about infinite recursion.   I'm having a little trouble tracking it
1039           down, but if you get this problem, you can disable the cycle
1040           detector as follows:
1041
1042                  yacc.yacc(check_recursion = 0)
1043
1044 10/25/01: beazley
1045           Fixed a bug in lex.py that sometimes caused illegal characters to be
1046           reported incorrectly.  Reported by Sverre Jørgensen.
1047
1048 7/8/01  : beazley
1049           Added a reference to the underlying lexer object when tokens are handled by
1050           functions.   The lexer is available as the 'lexer' attribute.   This
1051           was added to provide better lexing support for languages such as Fortran
1052           where certain types of tokens can't be conveniently expressed as regular
1053           expressions (and where the tokenizing function may want to perform a
1054           little backtracking).  Suggested by Pearu Peterson.
1055
1056 6/20/01 : beazley
1057           Modified yacc() function so that an optional starting symbol can be specified.
1058           For example:
1059
1060                  yacc.yacc(start="statement")
1061
1062           Normally yacc always treats the first production rule as the starting symbol.
1063           However, if you are debugging your grammar it may be useful to specify
1064           an alternative starting symbol.  Idea suggested by Rich Salz.
1065
1066 Version 1.0  June 18, 2001
1067 --------------------------
1068 Initial public offering
1069