ARM: Decode the arm version of ldrexd.
[gem5.git] / ext / ply / CHANGES
1
2 Version 3.2
3 -----------------------------
4 03/24/09: beazley
5 Added an extra check to not print duplicated warning messages
6 about reduce/reduce conflicts.
7
8 03/24/09: beazley
9 Switched PLY over to a BSD-license.
10
11 03/23/09: beazley
12 Performance optimization. Discovered a few places to make
13 speedups in LR table generation.
14
15 03/23/09: beazley
16 New warning message. PLY now warns about rules never
17 reduced due to reduce/reduce conflicts. Suggested by
18 Bruce Frederiksen.
19
20 03/23/09: beazley
21 Some clean-up of warning messages related to reduce/reduce errors.
22
23 03/23/09: beazley
24 Added a new picklefile option to yacc() to write the parsing
25 tables to a filename using the pickle module. Here is how
26 it works:
27
28 yacc(picklefile="parsetab.p")
29
30 This option can be used if the normal parsetab.py file is
31 extremely large. For example, on jython, it is impossible
32 to read parsing tables if the parsetab.py exceeds a certain
33 threshold.
34
35 The filename supplied to the picklefile option is opened
36 relative to the current working directory of the Python
37 interpreter. If you need to refer to the file elsewhere,
38 you will need to supply an absolute or relative path.
39
40 For maximum portability, the pickle file is written
41 using protocol 0.
42
43 03/13/09: beazley
44 Fixed a bug in parser.out generation where the rule numbers
45 where off by one.
46
47 03/13/09: beazley
48 Fixed a string formatting bug with one of the error messages.
49 Reported by Richard Reitmeyer
50
51 Version 3.1
52 -----------------------------
53 02/28/09: beazley
54 Fixed broken start argument to yacc(). PLY-3.0 broke this
55 feature by accident.
56
57 02/28/09: beazley
58 Fixed debugging output. yacc() no longer reports shift/reduce
59 or reduce/reduce conflicts if debugging is turned off. This
60 restores similar behavior in PLY-2.5. Reported by Andrew Waters.
61
62 Version 3.0
63 -----------------------------
64 02/03/09: beazley
65 Fixed missing lexer attribute on certain tokens when
66 invoking the parser p_error() function. Reported by
67 Bart Whiteley.
68
69 02/02/09: beazley
70 The lex() command now does all error-reporting and diagonistics
71 using the logging module interface. Pass in a Logger object
72 using the errorlog parameter to specify a different logger.
73
74 02/02/09: beazley
75 Refactored ply.lex to use a more object-oriented and organized
76 approach to collecting lexer information.
77
78 02/01/09: beazley
79 Removed the nowarn option from lex(). All output is controlled
80 by passing in a logger object. Just pass in a logger with a high
81 level setting to suppress output. This argument was never
82 documented to begin with so hopefully no one was relying upon it.
83
84 02/01/09: beazley
85 Discovered and removed a dead if-statement in the lexer. This
86 resulted in a 6-7% speedup in lexing when I tested it.
87
88 01/13/09: beazley
89 Minor change to the procedure for signalling a syntax error in a
90 production rule. A normal SyntaxError exception should be raised
91 instead of yacc.SyntaxError.
92
93 01/13/09: beazley
94 Added a new method p.set_lineno(n,lineno) that can be used to set the
95 line number of symbol n in grammar rules. This simplifies manual
96 tracking of line numbers.
97
98 01/11/09: beazley
99 Vastly improved debugging support for yacc.parse(). Instead of passing
100 debug as an integer, you can supply a Logging object (see the logging
101 module). Messages will be generated at the ERROR, INFO, and DEBUG
102 logging levels, each level providing progressively more information.
103 The debugging trace also shows states, grammar rule, values passed
104 into grammar rules, and the result of each reduction.
105
106 01/09/09: beazley
107 The yacc() command now does all error-reporting and diagnostics using
108 the interface of the logging module. Use the errorlog parameter to
109 specify a logging object for error messages. Use the debuglog parameter
110 to specify a logging object for the 'parser.out' output.
111
112 01/09/09: beazley
113 *HUGE* refactoring of the the ply.yacc() implementation. The high-level
114 user interface is backwards compatible, but the internals are completely
115 reorganized into classes. No more global variables. The internals
116 are also more extensible. For example, you can use the classes to
117 construct a LALR(1) parser in an entirely different manner than
118 what is currently the case. Documentation is forthcoming.
119
120 01/07/09: beazley
121 Various cleanup and refactoring of yacc internals.
122
123 01/06/09: beazley
124 Fixed a bug with precedence assignment. yacc was assigning the precedence
125 each rule based on the left-most token, when in fact, it should have been
126 using the right-most token. Reported by Bruce Frederiksen.
127
128 11/27/08: beazley
129 Numerous changes to support Python 3.0 including removal of deprecated
130 statements (e.g., has_key) and the additional of compatibility code
131 to emulate features from Python 2 that have been removed, but which
132 are needed. Fixed the unit testing suite to work with Python 3.0.
133 The code should be backwards compatible with Python 2.
134
135 11/26/08: beazley
136 Loosened the rules on what kind of objects can be passed in as the
137 "module" parameter to lex() and yacc(). Previously, you could only use
138 a module or an instance. Now, PLY just uses dir() to get a list of
139 symbols on whatever the object is without regard for its type.
140
141 11/26/08: beazley
142 Changed all except: statements to be compatible with Python2.x/3.x syntax.
143
144 11/26/08: beazley
145 Changed all raise Exception, value statements to raise Exception(value) for
146 forward compatibility.
147
148 11/26/08: beazley
149 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
150 directly. Preparation for Python 3.0 support.
151
152 11/04/08: beazley
153 Fixed a bug with referring to symbols on the the parsing stack using negative
154 indices.
155
156 05/29/08: beazley
157 Completely revamped the testing system to use the unittest module for everything.
158 Added additional tests to cover new errors/warnings.
159
160 Version 2.5
161 -----------------------------
162 05/28/08: beazley
163 Fixed a bug with writing lex-tables in optimized mode and start states.
164 Reported by Kevin Henry.
165
166 Version 2.4
167 -----------------------------
168 05/04/08: beazley
169 A version number is now embedded in the table file signature so that
170 yacc can more gracefully accomodate changes to the output format
171 in the future.
172
173 05/04/08: beazley
174 Removed undocumented .pushback() method on grammar productions. I'm
175 not sure this ever worked and can't recall ever using it. Might have
176 been an abandoned idea that never really got fleshed out. This
177 feature was never described or tested so removing it is hopefully
178 harmless.
179
180 05/04/08: beazley
181 Added extra error checking to yacc() to detect precedence rules defined
182 for undefined terminal symbols. This allows yacc() to detect a potential
183 problem that can be really tricky to debug if no warning message or error
184 message is generated about it.
185
186 05/04/08: beazley
187 lex() now has an outputdir that can specify the output directory for
188 tables when running in optimize mode. For example:
189
190 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
191
192 The behavior of specifying a table module and output directory are
193 more aligned with the behavior of yacc().
194
195 05/04/08: beazley
196 [Issue 9]
197 Fixed filename bug in when specifying the modulename in lex() and yacc().
198 If you specified options such as the following:
199
200 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
201
202 yacc would create a file "foo.bar.parsetab.py" in the given directory.
203 Now, it simply generates a file "parsetab.py" in that directory.
204 Bug reported by cptbinho.
205
206 05/04/08: beazley
207 Slight modification to lex() and yacc() to allow their table files
208 to be loaded from a previously loaded module. This might make
209 it easier to load the parsing tables from a complicated package
210 structure. For example:
211
212 import foo.bar.spam.parsetab as parsetab
213 parser = yacc.yacc(tabmodule=parsetab)
214
215 Note: lex and yacc will never regenerate the table file if used
216 in the form---you will get a warning message instead.
217 This idea suggested by Brian Clapper.
218
219
220 04/28/08: beazley
221 Fixed a big with p_error() functions being picked up correctly
222 when running in yacc(optimize=1) mode. Patch contributed by
223 Bart Whiteley.
224
225 02/28/08: beazley
226 Fixed a bug with 'nonassoc' precedence rules. Basically the
227 non-precedence was being ignored and not producing the correct
228 run-time behavior in the parser.
229
230 02/16/08: beazley
231 Slight relaxation of what the input() method to a lexer will
232 accept as a string. Instead of testing the input to see
233 if the input is a string or unicode string, it checks to see
234 if the input object looks like it contains string data.
235 This change makes it possible to pass string-like objects
236 in as input. For example, the object returned by mmap.
237
238 import mmap, os
239 data = mmap.mmap(os.open(filename,os.O_RDONLY),
240 os.path.getsize(filename),
241 access=mmap.ACCESS_READ)
242 lexer.input(data)
243
244
245 11/29/07: beazley
246 Modification of ply.lex to allow token functions to aliased.
247 This is subtle, but it makes it easier to create libraries and
248 to reuse token specifications. For example, suppose you defined
249 a function like this:
250
251 def number(t):
252 r'\d+'
253 t.value = int(t.value)
254 return t
255
256 This change would allow you to define a token rule as follows:
257
258 t_NUMBER = number
259
260 In this case, the token type will be set to 'NUMBER' and use
261 the associated number() function to process tokens.
262
263 11/28/07: beazley
264 Slight modification to lex and yacc to grab symbols from both
265 the local and global dictionaries of the caller. This
266 modification allows lexers and parsers to be defined using
267 inner functions and closures.
268
269 11/28/07: beazley
270 Performance optimization: The lexer.lexmatch and t.lexer
271 attributes are no longer set for lexer tokens that are not
272 defined by functions. The only normal use of these attributes
273 would be in lexer rules that need to perform some kind of
274 special processing. Thus, it doesn't make any sense to set
275 them on every token.
276
277 *** POTENTIAL INCOMPATIBILITY *** This might break code
278 that is mucking around with internal lexer state in some
279 sort of magical way.
280
281 11/27/07: beazley
282 Added the ability to put the parser into error-handling mode
283 from within a normal production. To do this, simply raise
284 a yacc.SyntaxError exception like this:
285
286 def p_some_production(p):
287 'some_production : prod1 prod2'
288 ...
289 raise yacc.SyntaxError # Signal an error
290
291 A number of things happen after this occurs:
292
293 - The last symbol shifted onto the symbol stack is discarded
294 and parser state backed up to what it was before the
295 the rule reduction.
296
297 - The current lookahead symbol is saved and replaced by
298 the 'error' symbol.
299
300 - The parser enters error recovery mode where it tries
301 to either reduce the 'error' rule or it starts
302 discarding items off of the stack until the parser
303 resets.
304
305 When an error is manually set, the parser does *not* call
306 the p_error() function (if any is defined).
307 *** NEW FEATURE *** Suggested on the mailing list
308
309 11/27/07: beazley
310 Fixed structure bug in examples/ansic. Reported by Dion Blazakis.
311
312 11/27/07: beazley
313 Fixed a bug in the lexer related to start conditions and ignored
314 token rules. If a rule was defined that changed state, but
315 returned no token, the lexer could be left in an inconsistent
316 state. Reported by
317
318 11/27/07: beazley
319 Modified setup.py to support Python Eggs. Patch contributed by
320 Simon Cross.
321
322 11/09/07: beazely
323 Fixed a bug in error handling in yacc. If a syntax error occurred and the
324 parser rolled the entire parse stack back, the parser would be left in in
325 inconsistent state that would cause it to trigger incorrect actions on
326 subsequent input. Reported by Ton Biegstraaten, Justin King, and others.
327
328 11/09/07: beazley
329 Fixed a bug when passing empty input strings to yacc.parse(). This
330 would result in an error message about "No input given". Reported
331 by Andrew Dalke.
332
333 Version 2.3
334 -----------------------------
335 02/20/07: beazley
336 Fixed a bug with character literals if the literal '.' appeared as the
337 last symbol of a grammar rule. Reported by Ales Smrcka.
338
339 02/19/07: beazley
340 Warning messages are now redirected to stderr instead of being printed
341 to standard output.
342
343 02/19/07: beazley
344 Added a warning message to lex.py if it detects a literal backslash
345 character inside the t_ignore declaration. This is to help
346 problems that might occur if someone accidentally defines t_ignore
347 as a Python raw string. For example:
348
349 t_ignore = r' \t'
350
351 The idea for this is from an email I received from David Cimimi who
352 reported bizarre behavior in lexing as a result of defining t_ignore
353 as a raw string by accident.
354
355 02/18/07: beazley
356 Performance improvements. Made some changes to the internal
357 table organization and LR parser to improve parsing performance.
358
359 02/18/07: beazley
360 Automatic tracking of line number and position information must now be
361 enabled by a special flag to parse(). For example:
362
363 yacc.parse(data,tracking=True)
364
365 In many applications, it's just not that important to have the
366 parser automatically track all line numbers. By making this an
367 optional feature, it allows the parser to run significantly faster
368 (more than a 20% speed increase in many cases). Note: positional
369 information is always available for raw tokens---this change only
370 applies to positional information associated with nonterminal
371 grammar symbols.
372 *** POTENTIAL INCOMPATIBILITY ***
373
374 02/18/07: beazley
375 Yacc no longer supports extended slices of grammar productions.
376 However, it does support regular slices. For example:
377
378 def p_foo(p):
379 '''foo: a b c d e'''
380 p[0] = p[1:3]
381
382 This change is a performance improvement to the parser--it streamlines
383 normal access to the grammar values since slices are now handled in
384 a __getslice__() method as opposed to __getitem__().
385
386 02/12/07: beazley
387 Fixed a bug in the handling of token names when combined with
388 start conditions. Bug reported by Todd O'Bryan.
389
390 Version 2.2
391 ------------------------------
392 11/01/06: beazley
393 Added lexpos() and lexspan() methods to grammar symbols. These
394 mirror the same functionality of lineno() and linespan(). For
395 example:
396
397 def p_expr(p):
398 'expr : expr PLUS expr'
399 p.lexpos(1) # Lexing position of left-hand-expression
400 p.lexpos(1) # Lexing position of PLUS
401 start,end = p.lexspan(3) # Lexing range of right hand expression
402
403 11/01/06: beazley
404 Minor change to error handling. The recommended way to skip characters
405 in the input is to use t.lexer.skip() as shown here:
406
407 def t_error(t):
408 print "Illegal character '%s'" % t.value[0]
409 t.lexer.skip(1)
410
411 The old approach of just using t.skip(1) will still work, but won't
412 be documented.
413
414 10/31/06: beazley
415 Discarded tokens can now be specified as simple strings instead of
416 functions. To do this, simply include the text "ignore_" in the
417 token declaration. For example:
418
419 t_ignore_cppcomment = r'//.*'
420
421 Previously, this had to be done with a function. For example:
422
423 def t_ignore_cppcomment(t):
424 r'//.*'
425 pass
426
427 If start conditions/states are being used, state names should appear
428 before the "ignore_" text.
429
430 10/19/06: beazley
431 The Lex module now provides support for flex-style start conditions
432 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
433 Please refer to this document to understand this change note. Refer to
434 the PLY documentation for PLY-specific explanation of how this works.
435
436 To use start conditions, you first need to declare a set of states in
437 your lexer file:
438
439 states = (
440 ('foo','exclusive'),
441 ('bar','inclusive')
442 )
443
444 This serves the same role as the %s and %x specifiers in flex.
445
446 One a state has been declared, tokens for that state can be
447 declared by defining rules of the form t_state_TOK. For example:
448
449 t_PLUS = '\+' # Rule defined in INITIAL state
450 t_foo_NUM = '\d+' # Rule defined in foo state
451 t_bar_NUM = '\d+' # Rule defined in bar state
452
453 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
454 t_ANY_NUM = '\d+' # Rule defined in all states
455
456 In addition to defining tokens for each state, the t_ignore and t_error
457 specifications can be customized for specific states. For example:
458
459 t_foo_ignore = " " # Ignored characters for foo state
460 def t_bar_error(t):
461 # Handle errors in bar state
462
463 With token rules, the following methods can be used to change states
464
465 def t_TOKNAME(t):
466 t.lexer.begin('foo') # Begin state 'foo'
467 t.lexer.push_state('foo') # Begin state 'foo', push old state
468 # onto a stack
469 t.lexer.pop_state() # Restore previous state
470 t.lexer.current_state() # Returns name of current state
471
472 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
473 yy_top_state() functions in flex.
474
475 The use of start states can be used as one way to write sub-lexers.
476 For example, the lexer or parser might instruct the lexer to start
477 generating a different set of tokens depending on the context.
478
479 example/yply/ylex.py shows the use of start states to grab C/C++
480 code fragments out of traditional yacc specification files.
481
482 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
483 discussed various aspects of the design.
484
485 10/19/06: beazley
486 Minor change to the way in which yacc.py was reporting shift/reduce
487 conflicts. Although the underlying LALR(1) algorithm was correct,
488 PLY was under-reporting the number of conflicts compared to yacc/bison
489 when precedence rules were in effect. This change should make PLY
490 report the same number of conflicts as yacc.
491
492 10/19/06: beazley
493 Modified yacc so that grammar rules could also include the '-'
494 character. For example:
495
496 def p_expr_list(p):
497 'expression-list : expression-list expression'
498
499 Suggested by Oldrich Jedlicka.
500
501 10/18/06: beazley
502 Attribute lexer.lexmatch added so that token rules can access the re
503 match object that was generated. For example:
504
505 def t_FOO(t):
506 r'some regex'
507 m = t.lexer.lexmatch
508 # Do something with m
509
510
511 This may be useful if you want to access named groups specified within
512 the regex for a specific token. Suggested by Oldrich Jedlicka.
513
514 10/16/06: beazley
515 Changed the error message that results if an illegal character
516 is encountered and no default error function is defined in lex.
517 The exception is now more informative about the actual cause of
518 the error.
519
520 Version 2.1
521 ------------------------------
522 10/02/06: beazley
523 The last Lexer object built by lex() can be found in lex.lexer.
524 The last Parser object built by yacc() can be found in yacc.parser.
525
526 10/02/06: beazley
527 New example added: examples/yply
528
529 This example uses PLY to convert Unix-yacc specification files to
530 PLY programs with the same grammar. This may be useful if you
531 want to convert a grammar from bison/yacc to use with PLY.
532
533 10/02/06: beazley
534 Added support for a start symbol to be specified in the yacc
535 input file itself. Just do this:
536
537 start = 'name'
538
539 where 'name' matches some grammar rule. For example:
540
541 def p_name(p):
542 'name : A B C'
543 ...
544
545 This mirrors the functionality of the yacc %start specifier.
546
547 09/30/06: beazley
548 Some new examples added.:
549
550 examples/GardenSnake : A simple indentation based language similar
551 to Python. Shows how you might handle
552 whitespace. Contributed by Andrew Dalke.
553
554 examples/BASIC : An implementation of 1964 Dartmouth BASIC.
555 Contributed by Dave against his better
556 judgement.
557
558 09/28/06: beazley
559 Minor patch to allow named groups to be used in lex regular
560 expression rules. For example:
561
562 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
563
564 Patch submitted by Adam Ring.
565
566 09/28/06: beazley
567 LALR(1) is now the default parsing method. To use SLR, use
568 yacc.yacc(method="SLR"). Note: there is no performance impact
569 on parsing when using LALR(1) instead of SLR. However, constructing
570 the parsing tables will take a little longer.
571
572 09/26/06: beazley
573 Change to line number tracking. To modify line numbers, modify
574 the line number of the lexer itself. For example:
575
576 def t_NEWLINE(t):
577 r'\n'
578 t.lexer.lineno += 1
579
580 This modification is both cleanup and a performance optimization.
581 In past versions, lex was monitoring every token for changes in
582 the line number. This extra processing is unnecessary for a vast
583 majority of tokens. Thus, this new approach cleans it up a bit.
584
585 *** POTENTIAL INCOMPATIBILITY ***
586 You will need to change code in your lexer that updates the line
587 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
588
589 09/26/06: beazley
590 Added the lexing position to tokens as an attribute lexpos. This
591 is the raw index into the input text at which a token appears.
592 This information can be used to compute column numbers and other
593 details (e.g., scan backwards from lexpos to the first newline
594 to get a column position).
595
596 09/25/06: beazley
597 Changed the name of the __copy__() method on the Lexer class
598 to clone(). This is used to clone a Lexer object (e.g., if
599 you're running different lexers at the same time).
600
601 09/21/06: beazley
602 Limitations related to the use of the re module have been eliminated.
603 Several users reported problems with regular expressions exceeding
604 more than 100 named groups. To solve this, lex.py is now capable
605 of automatically splitting its master regular regular expression into
606 smaller expressions as needed. This should, in theory, make it
607 possible to specify an arbitrarily large number of tokens.
608
609 09/21/06: beazley
610 Improved error checking in lex.py. Rules that match the empty string
611 are now rejected (otherwise they cause the lexer to enter an infinite
612 loop). An extra check for rules containing '#' has also been added.
613 Since lex compiles regular expressions in verbose mode, '#' is interpreted
614 as a regex comment, it is critical to use '\#' instead.
615
616 09/18/06: beazley
617 Added a @TOKEN decorator function to lex.py that can be used to
618 define token rules where the documentation string might be computed
619 in some way.
620
621 digit = r'([0-9])'
622 nondigit = r'([_A-Za-z])'
623 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
624
625 from ply.lex import TOKEN
626
627 @TOKEN(identifier)
628 def t_ID(t):
629 # Do whatever
630
631 The @TOKEN decorator merely sets the documentation string of the
632 associated token function as needed for lex to work.
633
634 Note: An alternative solution is the following:
635
636 def t_ID(t):
637 # Do whatever
638
639 t_ID.__doc__ = identifier
640
641 Note: Decorators require the use of Python 2.4 or later. If compatibility
642 with old versions is needed, use the latter solution.
643
644 The need for this feature was suggested by Cem Karan.
645
646 09/14/06: beazley
647 Support for single-character literal tokens has been added to yacc.
648 These literals must be enclosed in quotes. For example:
649
650 def p_expr(p):
651 "expr : expr '+' expr"
652 ...
653
654 def p_expr(p):
655 'expr : expr "-" expr'
656 ...
657
658 In addition to this, it is necessary to tell the lexer module about
659 literal characters. This is done by defining the variable 'literals'
660 as a list of characters. This should be defined in the module that
661 invokes the lex.lex() function. For example:
662
663 literals = ['+','-','*','/','(',')','=']
664
665 or simply
666
667 literals = '+=*/()='
668
669 It is important to note that literals can only be a single character.
670 When the lexer fails to match a token using its normal regular expression
671 rules, it will check the current character against the literal list.
672 If found, it will be returned with a token type set to match the literal
673 character. Otherwise, an illegal character will be signalled.
674
675
676 09/14/06: beazley
677 Modified PLY to install itself as a proper Python package called 'ply'.
678 This will make it a little more friendly to other modules. This
679 changes the usage of PLY only slightly. Just do this to import the
680 modules
681
682 import ply.lex as lex
683 import ply.yacc as yacc
684
685 Alternatively, you can do this:
686
687 from ply import *
688
689 Which imports both the lex and yacc modules.
690 Change suggested by Lee June.
691
692 09/13/06: beazley
693 Changed the handling of negative indices when used in production rules.
694 A negative production index now accesses already parsed symbols on the
695 parsing stack. For example,
696
697 def p_foo(p):
698 "foo: A B C D"
699 print p[1] # Value of 'A' symbol
700 print p[2] # Value of 'B' symbol
701 print p[-1] # Value of whatever symbol appears before A
702 # on the parsing stack.
703
704 p[0] = some_val # Sets the value of the 'foo' grammer symbol
705
706 This behavior makes it easier to work with embedded actions within the
707 parsing rules. For example, in C-yacc, it is possible to write code like
708 this:
709
710 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
711
712 In this example, the printf() code executes immediately after A has been
713 parsed. Within the embedded action code, $1 refers to the A symbol on
714 the stack.
715
716 To perform this equivalent action in PLY, you need to write a pair
717 of rules like this:
718
719 def p_bar(p):
720 "bar : A seen_A B"
721 do_stuff
722
723 def p_seen_A(p):
724 "seen_A :"
725 print "seen an A =", p[-1]
726
727 The second rule "seen_A" is merely a empty production which should be
728 reduced as soon as A is parsed in the "bar" rule above. The use
729 of the negative index p[-1] is used to access whatever symbol appeared
730 before the seen_A symbol.
731
732 This feature also makes it possible to support inherited attributes.
733 For example:
734
735 def p_decl(p):
736 "decl : scope name"
737
738 def p_scope(p):
739 """scope : GLOBAL
740 | LOCAL"""
741 p[0] = p[1]
742
743 def p_name(p):
744 "name : ID"
745 if p[-1] == "GLOBAL":
746 # ...
747 else if p[-1] == "LOCAL":
748 #...
749
750 In this case, the name rule is inheriting an attribute from the
751 scope declaration that precedes it.
752
753 *** POTENTIAL INCOMPATIBILITY ***
754 If you are currently using negative indices within existing grammar rules,
755 your code will break. This should be extremely rare if non-existent in
756 most cases. The argument to various grammar rules is not usually not
757 processed in the same way as a list of items.
758
759 Version 2.0
760 ------------------------------
761 09/07/06: beazley
762 Major cleanup and refactoring of the LR table generation code. Both SLR
763 and LALR(1) table generation is now performed by the same code base with
764 only minor extensions for extra LALR(1) processing.
765
766 09/07/06: beazley
767 Completely reimplemented the entire LALR(1) parsing engine to use the
768 DeRemer and Pennello algorithm for calculating lookahead sets. This
769 significantly improves the performance of generating LALR(1) tables
770 and has the added feature of actually working correctly! If you
771 experienced weird behavior with LALR(1) in prior releases, this should
772 hopefully resolve all of those problems. Many thanks to
773 Andrew Waters and Markus Schoepflin for submitting bug reports
774 and helping me test out the revised LALR(1) support.
775
776 Version 1.8
777 ------------------------------
778 08/02/06: beazley
779 Fixed a problem related to the handling of default actions in LALR(1)
780 parsing. If you experienced subtle and/or bizarre behavior when trying
781 to use the LALR(1) engine, this may correct those problems. Patch
782 contributed by Russ Cox. Note: This patch has been superceded by
783 revisions for LALR(1) parsing in Ply-2.0.
784
785 08/02/06: beazley
786 Added support for slicing of productions in yacc.
787 Patch contributed by Patrick Mezard.
788
789 Version 1.7
790 ------------------------------
791 03/02/06: beazley
792 Fixed infinite recursion problem ReduceToTerminals() function that
793 would sometimes come up in LALR(1) table generation. Reported by
794 Markus Schoepflin.
795
796 03/01/06: beazley
797 Added "reflags" argument to lex(). For example:
798
799 lex.lex(reflags=re.UNICODE)
800
801 This can be used to specify optional flags to the re.compile() function
802 used inside the lexer. This may be necessary for special situations such
803 as processing Unicode (e.g., if you want escapes like \w and \b to consult
804 the Unicode character property database). The need for this suggested by
805 Andreas Jung.
806
807 03/01/06: beazley
808 Fixed a bug with an uninitialized variable on repeated instantiations of parser
809 objects when the write_tables=0 argument was used. Reported by Michael Brown.
810
811 03/01/06: beazley
812 Modified lex.py to accept Unicode strings both as the regular expressions for
813 tokens and as input. Hopefully this is the only change needed for Unicode support.
814 Patch contributed by Johan Dahl.
815
816 03/01/06: beazley
817 Modified the class-based interface to work with new-style or old-style classes.
818 Patch contributed by Michael Brown (although I tweaked it slightly so it would work
819 with older versions of Python).
820
821 Version 1.6
822 ------------------------------
823 05/27/05: beazley
824 Incorporated patch contributed by Christopher Stawarz to fix an extremely
825 devious bug in LALR(1) parser generation. This patch should fix problems
826 numerous people reported with LALR parsing.
827
828 05/27/05: beazley
829 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
830 and Thad Austin.
831
832 05/27/05: beazley
833 Added outputdir option to yacc() to control output directory. Contributed
834 by Christopher Stawarz.
835
836 05/27/05: beazley
837 Added rununit.py test script to run tests using the Python unittest module.
838 Contributed by Miki Tebeka.
839
840 Version 1.5
841 ------------------------------
842 05/26/04: beazley
843 Major enhancement. LALR(1) parsing support is now working.
844 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
845 and optimized by David Beazley. To use LALR(1) parsing do
846 the following:
847
848 yacc.yacc(method="LALR")
849
850 Computing LALR(1) parsing tables takes about twice as long as
851 the default SLR method. However, LALR(1) allows you to handle
852 more complex grammars. For example, the ANSI C grammar
853 (in example/ansic) has 13 shift-reduce conflicts with SLR, but
854 only has 1 shift-reduce conflict with LALR(1).
855
856 05/20/04: beazley
857 Added a __len__ method to parser production lists. Can
858 be used in parser rules like this:
859
860 def p_somerule(p):
861 """a : B C D
862 | E F"
863 if (len(p) == 3):
864 # Must have been first rule
865 elif (len(p) == 2):
866 # Must be second rule
867
868 Suggested by Joshua Gerth and others.
869
870 Version 1.4
871 ------------------------------
872 04/23/04: beazley
873 Incorporated a variety of patches contributed by Eric Raymond.
874 These include:
875
876 0. Cleans up some comments so they don't wrap on an 80-column display.
877 1. Directs compiler errors to stderr where they belong.
878 2. Implements and documents automatic line counting when \n is ignored.
879 3. Changes the way progress messages are dumped when debugging is on.
880 The new format is both less verbose and conveys more information than
881 the old, including shift and reduce actions.
882
883 04/23/04: beazley
884 Added a Python setup.py file to simply installation. Contributed
885 by Adam Kerrison.
886
887 04/23/04: beazley
888 Added patches contributed by Adam Kerrison.
889
890 - Some output is now only shown when debugging is enabled. This
891 means that PLY will be completely silent when not in debugging mode.
892
893 - An optional parameter "write_tables" can be passed to yacc() to
894 control whether or not parsing tables are written. By default,
895 it is true, but it can be turned off if you don't want the yacc
896 table file. Note: disabling this will cause yacc() to regenerate
897 the parsing table each time.
898
899 04/23/04: beazley
900 Added patches contributed by David McNab. This patch addes two
901 features:
902
903 - The parser can be supplied as a class instead of a module.
904 For an example of this, see the example/classcalc directory.
905
906 - Debugging output can be directed to a filename of the user's
907 choice. Use
908
909 yacc(debugfile="somefile.out")
910
911
912 Version 1.3
913 ------------------------------
914 12/10/02: jmdyck
915 Various minor adjustments to the code that Dave checked in today.
916 Updated test/yacc_{inf,unused}.exp to reflect today's changes.
917
918 12/10/02: beazley
919 Incorporated a variety of minor bug fixes to empty production
920 handling and infinite recursion checking. Contributed by
921 Michael Dyck.
922
923 12/10/02: beazley
924 Removed bogus recover() method call in yacc.restart()
925
926 Version 1.2
927 ------------------------------
928 11/27/02: beazley
929 Lexer and parser objects are now available as an attribute
930 of tokens and slices respectively. For example:
931
932 def t_NUMBER(t):
933 r'\d+'
934 print t.lexer
935
936 def p_expr_plus(t):
937 'expr: expr PLUS expr'
938 print t.lexer
939 print t.parser
940
941 This can be used for state management (if needed).
942
943 10/31/02: beazley
944 Modified yacc.py to work with Python optimize mode. To make
945 this work, you need to use
946
947 yacc.yacc(optimize=1)
948
949 Furthermore, you need to first run Python in normal mode
950 to generate the necessary parsetab.py files. After that,
951 you can use python -O or python -OO.
952
953 Note: optimized mode turns off a lot of error checking.
954 Only use when you are sure that your grammar is working.
955 Make sure parsetab.py is up to date!
956
957 10/30/02: beazley
958 Added cloning of Lexer objects. For example:
959
960 import copy
961 l = lex.lex()
962 lc = copy.copy(l)
963
964 l.input("Some text")
965 lc.input("Some other text")
966 ...
967
968 This might be useful if the same "lexer" is meant to
969 be used in different contexts---or if multiple lexers
970 are running concurrently.
971
972 10/30/02: beazley
973 Fixed subtle bug with first set computation and empty productions.
974 Patch submitted by Michael Dyck.
975
976 10/30/02: beazley
977 Fixed error messages to use "filename:line: message" instead
978 of "filename:line. message". This makes error reporting more
979 friendly to emacs. Patch submitted by François Pinard.
980
981 10/30/02: beazley
982 Improvements to parser.out file. Terminals and nonterminals
983 are sorted instead of being printed in random order.
984 Patch submitted by François Pinard.
985
986 10/30/02: beazley
987 Improvements to parser.out file output. Rules are now printed
988 in a way that's easier to understand. Contributed by Russ Cox.
989
990 10/30/02: beazley
991 Added 'nonassoc' associativity support. This can be used
992 to disable the chaining of operators like a < b < c.
993 To use, simply specify 'nonassoc' in the precedence table
994
995 precedence = (
996 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators
997 ('left', 'PLUS', 'MINUS'),
998 ('left', 'TIMES', 'DIVIDE'),
999 ('right', 'UMINUS'), # Unary minus operator
1000 )
1001
1002 Patch contributed by Russ Cox.
1003
1004 10/30/02: beazley
1005 Modified the lexer to provide optional support for Python -O and -OO
1006 modes. To make this work, Python *first* needs to be run in
1007 unoptimized mode. This reads the lexing information and creates a
1008 file "lextab.py". Then, run lex like this:
1009
1010 # module foo.py
1011 ...
1012 ...
1013 lex.lex(optimize=1)
1014
1015 Once the lextab file has been created, subsequent calls to
1016 lex.lex() will read data from the lextab file instead of using
1017 introspection. In optimized mode (-O, -OO) everything should
1018 work normally despite the loss of doc strings.
1019
1020 To change the name of the file 'lextab.py' use the following:
1021
1022 lex.lex(lextab="footab")
1023
1024 (this creates a file footab.py)
1025
1026
1027 Version 1.1 October 25, 2001
1028 ------------------------------
1029
1030 10/25/01: beazley
1031 Modified the table generator to produce much more compact data.
1032 This should greatly reduce the size of the parsetab.py[c] file.
1033 Caveat: the tables still need to be constructed so a little more
1034 work is done in parsetab on import.
1035
1036 10/25/01: beazley
1037 There may be a possible bug in the cycle detector that reports errors
1038 about infinite recursion. I'm having a little trouble tracking it
1039 down, but if you get this problem, you can disable the cycle
1040 detector as follows:
1041
1042 yacc.yacc(check_recursion = 0)
1043
1044 10/25/01: beazley
1045 Fixed a bug in lex.py that sometimes caused illegal characters to be
1046 reported incorrectly. Reported by Sverre Jørgensen.
1047
1048 7/8/01 : beazley
1049 Added a reference to the underlying lexer object when tokens are handled by
1050 functions. The lexer is available as the 'lexer' attribute. This
1051 was added to provide better lexing support for languages such as Fortran
1052 where certain types of tokens can't be conveniently expressed as regular
1053 expressions (and where the tokenizing function may want to perform a
1054 little backtracking). Suggested by Pearu Peterson.
1055
1056 6/20/01 : beazley
1057 Modified yacc() function so that an optional starting symbol can be specified.
1058 For example:
1059
1060 yacc.yacc(start="statement")
1061
1062 Normally yacc always treats the first production rule as the starting symbol.
1063 However, if you are debugging your grammar it may be useful to specify
1064 an alternative starting symbol. Idea suggested by Rich Salz.
1065
1066 Version 1.0 June 18, 2001
1067 --------------------------
1068 Initial public offering
1069