2007-05-25 06:54:51 +02:00
|
|
|
|
Version 2.3
|
|
|
|
|
-----------------------------
|
|
|
|
|
02/20/07: beazley
|
|
|
|
|
Fixed a bug with character literals if the literal '.' appeared as the
|
|
|
|
|
last symbol of a grammar rule. Reported by Ales Smrcka.
|
|
|
|
|
|
|
|
|
|
02/19/07: beazley
|
|
|
|
|
Warning messages are now redirected to stderr instead of being printed
|
|
|
|
|
to standard output.
|
|
|
|
|
|
|
|
|
|
02/19/07: beazley
|
|
|
|
|
Added a warning message to lex.py if it detects a literal backslash
|
|
|
|
|
character inside the t_ignore declaration. This is to help
|
|
|
|
|
problems that might occur if someone accidentally defines t_ignore
|
|
|
|
|
as a Python raw string. For example:
|
|
|
|
|
|
|
|
|
|
t_ignore = r' \t'
|
|
|
|
|
|
|
|
|
|
The idea for this is from an email I received from David Cimimi who
|
|
|
|
|
reported bizarre behavior in lexing as a result of defining t_ignore
|
|
|
|
|
as a raw string by accident.
|
|
|
|
|
|
|
|
|
|
02/18/07: beazley
|
|
|
|
|
Performance improvements. Made some changes to the internal
|
|
|
|
|
table organization and LR parser to improve parsing performance.
|
|
|
|
|
|
|
|
|
|
02/18/07: beazley
|
|
|
|
|
Automatic tracking of line number and position information must now be
|
|
|
|
|
enabled by a special flag to parse(). For example:
|
|
|
|
|
|
|
|
|
|
yacc.parse(data,tracking=True)
|
|
|
|
|
|
|
|
|
|
In many applications, it's just not that important to have the
|
|
|
|
|
parser automatically track all line numbers. By making this an
|
|
|
|
|
optional feature, it allows the parser to run significantly faster
|
|
|
|
|
(more than a 20% speed increase in many cases). Note: positional
|
|
|
|
|
information is always available for raw tokens---this change only
|
|
|
|
|
applies to positional information associated with nonterminal
|
|
|
|
|
grammar symbols.
|
|
|
|
|
*** POTENTIAL INCOMPATIBILITY ***
|
|
|
|
|
|
|
|
|
|
02/18/07: beazley
|
|
|
|
|
Yacc no longer supports extended slices of grammar productions.
|
|
|
|
|
However, it does support regular slices. For example:
|
|
|
|
|
|
|
|
|
|
def p_foo(p):
|
|
|
|
|
'''foo: a b c d e'''
|
|
|
|
|
p[0] = p[1:3]
|
|
|
|
|
|
|
|
|
|
This change is a performance improvement to the parser--it streamlines
|
|
|
|
|
normal access to the grammar values since slices are now handled in
|
|
|
|
|
a __getslice__() method as opposed to __getitem__().
|
|
|
|
|
|
|
|
|
|
02/12/07: beazley
|
|
|
|
|
Fixed a bug in the handling of token names when combined with
|
|
|
|
|
start conditions. Bug reported by Todd O'Bryan.
|
|
|
|
|
|
|
|
|
|
Version 2.2
|
|
|
|
|
------------------------------
|
|
|
|
|
11/01/06: beazley
|
|
|
|
|
Added lexpos() and lexspan() methods to grammar symbols. These
|
|
|
|
|
mirror the same functionality of lineno() and linespan(). For
|
|
|
|
|
example:
|
|
|
|
|
|
|
|
|
|
def p_expr(p):
|
|
|
|
|
'expr : expr PLUS expr'
|
|
|
|
|
p.lexpos(1) # Lexing position of left-hand-expression
|
|
|
|
|
p.lexpos(1) # Lexing position of PLUS
|
|
|
|
|
start,end = p.lexspan(3) # Lexing range of right hand expression
|
|
|
|
|
|
|
|
|
|
11/01/06: beazley
|
|
|
|
|
Minor change to error handling. The recommended way to skip characters
|
|
|
|
|
in the input is to use t.lexer.skip() as shown here:
|
|
|
|
|
|
|
|
|
|
def t_error(t):
|
|
|
|
|
print "Illegal character '%s'" % t.value[0]
|
|
|
|
|
t.lexer.skip(1)
|
|
|
|
|
|
|
|
|
|
The old approach of just using t.skip(1) will still work, but won't
|
|
|
|
|
be documented.
|
|
|
|
|
|
|
|
|
|
10/31/06: beazley
|
|
|
|
|
Discarded tokens can now be specified as simple strings instead of
|
|
|
|
|
functions. To do this, simply include the text "ignore_" in the
|
|
|
|
|
token declaration. For example:
|
|
|
|
|
|
|
|
|
|
t_ignore_cppcomment = r'//.*'
|
|
|
|
|
|
|
|
|
|
Previously, this had to be done with a function. For example:
|
|
|
|
|
|
|
|
|
|
def t_ignore_cppcomment(t):
|
|
|
|
|
r'//.*'
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
If start conditions/states are being used, state names should appear
|
|
|
|
|
before the "ignore_" text.
|
|
|
|
|
|
|
|
|
|
10/19/06: beazley
|
|
|
|
|
The Lex module now provides support for flex-style start conditions
|
|
|
|
|
as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
|
|
|
|
|
Please refer to this document to understand this change note. Refer to
|
|
|
|
|
the PLY documentation for PLY-specific explanation of how this works.
|
|
|
|
|
|
|
|
|
|
To use start conditions, you first need to declare a set of states in
|
|
|
|
|
your lexer file:
|
|
|
|
|
|
|
|
|
|
states = (
|
|
|
|
|
('foo','exclusive'),
|
|
|
|
|
('bar','inclusive')
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
This serves the same role as the %s and %x specifiers in flex.
|
|
|
|
|
|
|
|
|
|
One a state has been declared, tokens for that state can be
|
|
|
|
|
declared by defining rules of the form t_state_TOK. For example:
|
|
|
|
|
|
|
|
|
|
t_PLUS = '\+' # Rule defined in INITIAL state
|
|
|
|
|
t_foo_NUM = '\d+' # Rule defined in foo state
|
|
|
|
|
t_bar_NUM = '\d+' # Rule defined in bar state
|
|
|
|
|
|
|
|
|
|
t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
|
|
|
|
|
t_ANY_NUM = '\d+' # Rule defined in all states
|
|
|
|
|
|
|
|
|
|
In addition to defining tokens for each state, the t_ignore and t_error
|
|
|
|
|
specifications can be customized for specific states. For example:
|
|
|
|
|
|
|
|
|
|
t_foo_ignore = " " # Ignored characters for foo state
|
|
|
|
|
def t_bar_error(t):
|
|
|
|
|
# Handle errors in bar state
|
|
|
|
|
|
|
|
|
|
With token rules, the following methods can be used to change states
|
|
|
|
|
|
|
|
|
|
def t_TOKNAME(t):
|
|
|
|
|
t.lexer.begin('foo') # Begin state 'foo'
|
|
|
|
|
t.lexer.push_state('foo') # Begin state 'foo', push old state
|
|
|
|
|
# onto a stack
|
|
|
|
|
t.lexer.pop_state() # Restore previous state
|
|
|
|
|
t.lexer.current_state() # Returns name of current state
|
|
|
|
|
|
|
|
|
|
These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
|
|
|
|
|
yy_top_state() functions in flex.
|
|
|
|
|
|
|
|
|
|
The use of start states can be used as one way to write sub-lexers.
|
|
|
|
|
For example, the lexer or parser might instruct the lexer to start
|
|
|
|
|
generating a different set of tokens depending on the context.
|
|
|
|
|
|
|
|
|
|
example/yply/ylex.py shows the use of start states to grab C/C++
|
|
|
|
|
code fragments out of traditional yacc specification files.
|
|
|
|
|
|
|
|
|
|
*** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
|
|
|
|
|
discussed various aspects of the design.
|
|
|
|
|
|
|
|
|
|
10/19/06: beazley
|
|
|
|
|
Minor change to the way in which yacc.py was reporting shift/reduce
|
|
|
|
|
conflicts. Although the underlying LALR(1) algorithm was correct,
|
|
|
|
|
PLY was under-reporting the number of conflicts compared to yacc/bison
|
|
|
|
|
when precedence rules were in effect. This change should make PLY
|
|
|
|
|
report the same number of conflicts as yacc.
|
|
|
|
|
|
|
|
|
|
10/19/06: beazley
|
|
|
|
|
Modified yacc so that grammar rules could also include the '-'
|
|
|
|
|
character. For example:
|
|
|
|
|
|
|
|
|
|
def p_expr_list(p):
|
|
|
|
|
'expression-list : expression-list expression'
|
|
|
|
|
|
|
|
|
|
Suggested by Oldrich Jedlicka.
|
|
|
|
|
|
|
|
|
|
10/18/06: beazley
|
|
|
|
|
Attribute lexer.lexmatch added so that token rules can access the re
|
|
|
|
|
match object that was generated. For example:
|
|
|
|
|
|
|
|
|
|
def t_FOO(t):
|
|
|
|
|
r'some regex'
|
|
|
|
|
m = t.lexer.lexmatch
|
|
|
|
|
# Do something with m
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This may be useful if you want to access named groups specified within
|
|
|
|
|
the regex for a specific token. Suggested by Oldrich Jedlicka.
|
|
|
|
|
|
|
|
|
|
10/16/06: beazley
|
|
|
|
|
Changed the error message that results if an illegal character
|
|
|
|
|
is encountered and no default error function is defined in lex.
|
|
|
|
|
The exception is now more informative about the actual cause of
|
|
|
|
|
the error.
|
|
|
|
|
|
|
|
|
|
Version 2.1
|
|
|
|
|
------------------------------
|
|
|
|
|
10/02/06: beazley
|
|
|
|
|
The last Lexer object built by lex() can be found in lex.lexer.
|
|
|
|
|
The last Parser object built by yacc() can be found in yacc.parser.
|
|
|
|
|
|
|
|
|
|
10/02/06: beazley
|
|
|
|
|
New example added: examples/yply
|
|
|
|
|
|
|
|
|
|
This example uses PLY to convert Unix-yacc specification files to
|
|
|
|
|
PLY programs with the same grammar. This may be useful if you
|
|
|
|
|
want to convert a grammar from bison/yacc to use with PLY.
|
|
|
|
|
|
|
|
|
|
10/02/06: beazley
|
|
|
|
|
Added support for a start symbol to be specified in the yacc
|
|
|
|
|
input file itself. Just do this:
|
|
|
|
|
|
|
|
|
|
start = 'name'
|
|
|
|
|
|
|
|
|
|
where 'name' matches some grammar rule. For example:
|
|
|
|
|
|
|
|
|
|
def p_name(p):
|
|
|
|
|
'name : A B C'
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This mirrors the functionality of the yacc %start specifier.
|
|
|
|
|
|
|
|
|
|
09/30/06: beazley
|
|
|
|
|
Some new examples added.:
|
|
|
|
|
|
|
|
|
|
examples/GardenSnake : A simple indentation based language similar
|
|
|
|
|
to Python. Shows how you might handle
|
|
|
|
|
whitespace. Contributed by Andrew Dalke.
|
|
|
|
|
|
|
|
|
|
examples/BASIC : An implementation of 1964 Dartmouth BASIC.
|
|
|
|
|
Contributed by Dave against his better
|
|
|
|
|
judgement.
|
|
|
|
|
|
|
|
|
|
09/28/06: beazley
|
|
|
|
|
Minor patch to allow named groups to be used in lex regular
|
|
|
|
|
expression rules. For example:
|
|
|
|
|
|
|
|
|
|
t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
|
|
|
|
|
|
|
|
|
|
Patch submitted by Adam Ring.
|
|
|
|
|
|
|
|
|
|
09/28/06: beazley
|
|
|
|
|
LALR(1) is now the default parsing method. To use SLR, use
|
|
|
|
|
yacc.yacc(method="SLR"). Note: there is no performance impact
|
|
|
|
|
on parsing when using LALR(1) instead of SLR. However, constructing
|
|
|
|
|
the parsing tables will take a little longer.
|
|
|
|
|
|
|
|
|
|
09/26/06: beazley
|
|
|
|
|
Change to line number tracking. To modify line numbers, modify
|
|
|
|
|
the line number of the lexer itself. For example:
|
|
|
|
|
|
|
|
|
|
def t_NEWLINE(t):
|
|
|
|
|
r'\n'
|
|
|
|
|
t.lexer.lineno += 1
|
|
|
|
|
|
|
|
|
|
This modification is both cleanup and a performance optimization.
|
|
|
|
|
In past versions, lex was monitoring every token for changes in
|
|
|
|
|
the line number. This extra processing is unnecessary for a vast
|
|
|
|
|
majority of tokens. Thus, this new approach cleans it up a bit.
|
|
|
|
|
|
|
|
|
|
*** POTENTIAL INCOMPATIBILITY ***
|
|
|
|
|
You will need to change code in your lexer that updates the line
|
|
|
|
|
number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
|
|
|
|
|
|
|
|
|
|
09/26/06: beazley
|
|
|
|
|
Added the lexing position to tokens as an attribute lexpos. This
|
|
|
|
|
is the raw index into the input text at which a token appears.
|
|
|
|
|
This information can be used to compute column numbers and other
|
|
|
|
|
details (e.g., scan backwards from lexpos to the first newline
|
|
|
|
|
to get a column position).
|
|
|
|
|
|
|
|
|
|
09/25/06: beazley
|
|
|
|
|
Changed the name of the __copy__() method on the Lexer class
|
|
|
|
|
to clone(). This is used to clone a Lexer object (e.g., if
|
|
|
|
|
you're running different lexers at the same time).
|
|
|
|
|
|
|
|
|
|
09/21/06: beazley
|
|
|
|
|
Limitations related to the use of the re module have been eliminated.
|
|
|
|
|
Several users reported problems with regular expressions exceeding
|
|
|
|
|
more than 100 named groups. To solve this, lex.py is now capable
|
|
|
|
|
of automatically splitting its master regular regular expression into
|
|
|
|
|
smaller expressions as needed. This should, in theory, make it
|
|
|
|
|
possible to specify an arbitrarily large number of tokens.
|
|
|
|
|
|
|
|
|
|
09/21/06: beazley
|
|
|
|
|
Improved error checking in lex.py. Rules that match the empty string
|
|
|
|
|
are now rejected (otherwise they cause the lexer to enter an infinite
|
|
|
|
|
loop). An extra check for rules containing '#' has also been added.
|
|
|
|
|
Since lex compiles regular expressions in verbose mode, '#' is interpreted
|
|
|
|
|
as a regex comment, it is critical to use '\#' instead.
|
|
|
|
|
|
|
|
|
|
09/18/06: beazley
|
|
|
|
|
Added a @TOKEN decorator function to lex.py that can be used to
|
|
|
|
|
define token rules where the documentation string might be computed
|
|
|
|
|
in some way.
|
|
|
|
|
|
|
|
|
|
digit = r'([0-9])'
|
|
|
|
|
nondigit = r'([_A-Za-z])'
|
|
|
|
|
identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
|
|
|
|
|
|
|
|
|
|
from ply.lex import TOKEN
|
|
|
|
|
|
|
|
|
|
@TOKEN(identifier)
|
|
|
|
|
def t_ID(t):
|
|
|
|
|
# Do whatever
|
|
|
|
|
|
|
|
|
|
The @TOKEN decorator merely sets the documentation string of the
|
|
|
|
|
associated token function as needed for lex to work.
|
|
|
|
|
|
|
|
|
|
Note: An alternative solution is the following:
|
|
|
|
|
|
|
|
|
|
def t_ID(t):
|
|
|
|
|
# Do whatever
|
|
|
|
|
|
|
|
|
|
t_ID.__doc__ = identifier
|
|
|
|
|
|
|
|
|
|
Note: Decorators require the use of Python 2.4 or later. If compatibility
|
|
|
|
|
with old versions is needed, use the latter solution.
|
|
|
|
|
|
|
|
|
|
The need for this feature was suggested by Cem Karan.
|
|
|
|
|
|
|
|
|
|
09/14/06: beazley
|
|
|
|
|
Support for single-character literal tokens has been added to yacc.
|
|
|
|
|
These literals must be enclosed in quotes. For example:
|
|
|
|
|
|
|
|
|
|
def p_expr(p):
|
|
|
|
|
"expr : expr '+' expr"
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
def p_expr(p):
|
|
|
|
|
'expr : expr "-" expr'
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
In addition to this, it is necessary to tell the lexer module about
|
|
|
|
|
literal characters. This is done by defining the variable 'literals'
|
|
|
|
|
as a list of characters. This should be defined in the module that
|
|
|
|
|
invokes the lex.lex() function. For example:
|
|
|
|
|
|
|
|
|
|
literals = ['+','-','*','/','(',')','=']
|
|
|
|
|
|
|
|
|
|
or simply
|
|
|
|
|
|
|
|
|
|
literals = '+=*/()='
|
|
|
|
|
|
|
|
|
|
It is important to note that literals can only be a single character.
|
|
|
|
|
When the lexer fails to match a token using its normal regular expression
|
|
|
|
|
rules, it will check the current character against the literal list.
|
|
|
|
|
If found, it will be returned with a token type set to match the literal
|
|
|
|
|
character. Otherwise, an illegal character will be signalled.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
09/14/06: beazley
|
|
|
|
|
Modified PLY to install itself as a proper Python package called 'ply'.
|
|
|
|
|
This will make it a little more friendly to other modules. This
|
|
|
|
|
changes the usage of PLY only slightly. Just do this to import the
|
|
|
|
|
modules
|
|
|
|
|
|
|
|
|
|
import ply.lex as lex
|
|
|
|
|
import ply.yacc as yacc
|
|
|
|
|
|
|
|
|
|
Alternatively, you can do this:
|
|
|
|
|
|
|
|
|
|
from ply import *
|
|
|
|
|
|
|
|
|
|
Which imports both the lex and yacc modules.
|
|
|
|
|
Change suggested by Lee June.
|
|
|
|
|
|
|
|
|
|
09/13/06: beazley
|
|
|
|
|
Changed the handling of negative indices when used in production rules.
|
|
|
|
|
A negative production index now accesses already parsed symbols on the
|
|
|
|
|
parsing stack. For example,
|
|
|
|
|
|
|
|
|
|
def p_foo(p):
|
|
|
|
|
"foo: A B C D"
|
|
|
|
|
print p[1] # Value of 'A' symbol
|
|
|
|
|
print p[2] # Value of 'B' symbol
|
|
|
|
|
print p[-1] # Value of whatever symbol appears before A
|
|
|
|
|
# on the parsing stack.
|
|
|
|
|
|
|
|
|
|
p[0] = some_val # Sets the value of the 'foo' grammer symbol
|
|
|
|
|
|
|
|
|
|
This behavior makes it easier to work with embedded actions within the
|
|
|
|
|
parsing rules. For example, in C-yacc, it is possible to write code like
|
|
|
|
|
this:
|
|
|
|
|
|
|
|
|
|
bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
|
|
|
|
|
|
|
|
|
|
In this example, the printf() code executes immediately after A has been
|
|
|
|
|
parsed. Within the embedded action code, $1 refers to the A symbol on
|
|
|
|
|
the stack.
|
|
|
|
|
|
|
|
|
|
To perform this equivalent action in PLY, you need to write a pair
|
|
|
|
|
of rules like this:
|
|
|
|
|
|
|
|
|
|
def p_bar(p):
|
|
|
|
|
"bar : A seen_A B"
|
|
|
|
|
do_stuff
|
|
|
|
|
|
|
|
|
|
def p_seen_A(p):
|
|
|
|
|
"seen_A :"
|
|
|
|
|
print "seen an A =", p[-1]
|
|
|
|
|
|
|
|
|
|
The second rule "seen_A" is merely a empty production which should be
|
|
|
|
|
reduced as soon as A is parsed in the "bar" rule above. The use
|
|
|
|
|
of the negative index p[-1] is used to access whatever symbol appeared
|
|
|
|
|
before the seen_A symbol.
|
|
|
|
|
|
|
|
|
|
This feature also makes it possible to support inherited attributes.
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
|
|
def p_decl(p):
|
|
|
|
|
"decl : scope name"
|
|
|
|
|
|
|
|
|
|
def p_scope(p):
|
|
|
|
|
"""scope : GLOBAL
|
|
|
|
|
| LOCAL"""
|
|
|
|
|
p[0] = p[1]
|
|
|
|
|
|
|
|
|
|
def p_name(p):
|
|
|
|
|
"name : ID"
|
|
|
|
|
if p[-1] == "GLOBAL":
|
|
|
|
|
# ...
|
|
|
|
|
else if p[-1] == "LOCAL":
|
|
|
|
|
#...
|
|
|
|
|
|
|
|
|
|
In this case, the name rule is inheriting an attribute from the
|
|
|
|
|
scope declaration that precedes it.
|
|
|
|
|
|
|
|
|
|
*** POTENTIAL INCOMPATIBILITY ***
|
|
|
|
|
If you are currently using negative indices within existing grammar rules,
|
|
|
|
|
your code will break. This should be extremely rare if non-existent in
|
|
|
|
|
most cases. The argument to various grammar rules is not usually not
|
|
|
|
|
processed in the same way as a list of items.
|
|
|
|
|
|
|
|
|
|
Version 2.0
|
|
|
|
|
------------------------------
|
|
|
|
|
09/07/06: beazley
|
|
|
|
|
Major cleanup and refactoring of the LR table generation code. Both SLR
|
|
|
|
|
and LALR(1) table generation is now performed by the same code base with
|
|
|
|
|
only minor extensions for extra LALR(1) processing.
|
|
|
|
|
|
|
|
|
|
09/07/06: beazley
|
|
|
|
|
Completely reimplemented the entire LALR(1) parsing engine to use the
|
|
|
|
|
DeRemer and Pennello algorithm for calculating lookahead sets. This
|
|
|
|
|
significantly improves the performance of generating LALR(1) tables
|
|
|
|
|
and has the added feature of actually working correctly! If you
|
|
|
|
|
experienced weird behavior with LALR(1) in prior releases, this should
|
|
|
|
|
hopefully resolve all of those problems. Many thanks to
|
|
|
|
|
Andrew Waters and Markus Schoepflin for submitting bug reports
|
|
|
|
|
and helping me test out the revised LALR(1) support.
|
|
|
|
|
|
|
|
|
|
Version 1.8
|
|
|
|
|
------------------------------
|
|
|
|
|
08/02/06: beazley
|
|
|
|
|
Fixed a problem related to the handling of default actions in LALR(1)
|
|
|
|
|
parsing. If you experienced subtle and/or bizarre behavior when trying
|
|
|
|
|
to use the LALR(1) engine, this may correct those problems. Patch
|
|
|
|
|
contributed by Russ Cox. Note: This patch has been superceded by
|
|
|
|
|
revisions for LALR(1) parsing in Ply-2.0.
|
|
|
|
|
|
|
|
|
|
08/02/06: beazley
|
|
|
|
|
Added support for slicing of productions in yacc.
|
|
|
|
|
Patch contributed by Patrick Mezard.
|
|
|
|
|
|
|
|
|
|
Version 1.7
|
|
|
|
|
------------------------------
|
|
|
|
|
03/02/06: beazley
|
|
|
|
|
Fixed infinite recursion problem ReduceToTerminals() function that
|
|
|
|
|
would sometimes come up in LALR(1) table generation. Reported by
|
|
|
|
|
Markus Schoepflin.
|
|
|
|
|
|
|
|
|
|
03/01/06: beazley
|
|
|
|
|
Added "reflags" argument to lex(). For example:
|
|
|
|
|
|
|
|
|
|
lex.lex(reflags=re.UNICODE)
|
|
|
|
|
|
|
|
|
|
This can be used to specify optional flags to the re.compile() function
|
|
|
|
|
used inside the lexer. This may be necessary for special situations such
|
|
|
|
|
as processing Unicode (e.g., if you want escapes like \w and \b to consult
|
|
|
|
|
the Unicode character property database). The need for this suggested by
|
|
|
|
|
Andreas Jung.
|
|
|
|
|
|
|
|
|
|
03/01/06: beazley
|
|
|
|
|
Fixed a bug with an uninitialized variable on repeated instantiations of parser
|
|
|
|
|
objects when the write_tables=0 argument was used. Reported by Michael Brown.
|
|
|
|
|
|
|
|
|
|
03/01/06: beazley
|
|
|
|
|
Modified lex.py to accept Unicode strings both as the regular expressions for
|
|
|
|
|
tokens and as input. Hopefully this is the only change needed for Unicode support.
|
|
|
|
|
Patch contributed by Johan Dahl.
|
|
|
|
|
|
|
|
|
|
03/01/06: beazley
|
|
|
|
|
Modified the class-based interface to work with new-style or old-style classes.
|
|
|
|
|
Patch contributed by Michael Brown (although I tweaked it slightly so it would work
|
|
|
|
|
with older versions of Python).
|
|
|
|
|
|
|
|
|
|
Version 1.6
|
|
|
|
|
------------------------------
|
|
|
|
|
05/27/05: beazley
|
|
|
|
|
Incorporated patch contributed by Christopher Stawarz to fix an extremely
|
|
|
|
|
devious bug in LALR(1) parser generation. This patch should fix problems
|
|
|
|
|
numerous people reported with LALR parsing.
|
|
|
|
|
|
|
|
|
|
05/27/05: beazley
|
|
|
|
|
Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
|
|
|
|
|
and Thad Austin.
|
|
|
|
|
|
|
|
|
|
05/27/05: beazley
|
|
|
|
|
Added outputdir option to yacc() to control output directory. Contributed
|
|
|
|
|
by Christopher Stawarz.
|
|
|
|
|
|
|
|
|
|
05/27/05: beazley
|
|
|
|
|
Added rununit.py test script to run tests using the Python unittest module.
|
|
|
|
|
Contributed by Miki Tebeka.
|
|
|
|
|
|
|
|
|
|
Version 1.5
|
|
|
|
|
------------------------------
|
|
|
|
|
05/26/04: beazley
|
|
|
|
|
Major enhancement. LALR(1) parsing support is now working.
|
|
|
|
|
This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
|
|
|
|
|
and optimized by David Beazley. To use LALR(1) parsing do
|
|
|
|
|
the following:
|
|
|
|
|
|
|
|
|
|
yacc.yacc(method="LALR")
|
|
|
|
|
|
|
|
|
|
Computing LALR(1) parsing tables takes about twice as long as
|
|
|
|
|
the default SLR method. However, LALR(1) allows you to handle
|
|
|
|
|
more complex grammars. For example, the ANSI C grammar
|
|
|
|
|
(in example/ansic) has 13 shift-reduce conflicts with SLR, but
|
|
|
|
|
only has 1 shift-reduce conflict with LALR(1).
|
|
|
|
|
|
|
|
|
|
05/20/04: beazley
|
|
|
|
|
Added a __len__ method to parser production lists. Can
|
|
|
|
|
be used in parser rules like this:
|
|
|
|
|
|
|
|
|
|
def p_somerule(p):
|
|
|
|
|
"""a : B C D
|
|
|
|
|
| E F"
|
|
|
|
|
if (len(p) == 3):
|
|
|
|
|
# Must have been first rule
|
|
|
|
|
elif (len(p) == 2):
|
|
|
|
|
# Must be second rule
|
|
|
|
|
|
|
|
|
|
Suggested by Joshua Gerth and others.
|
|
|
|
|
|
|
|
|
|
Version 1.4
|
|
|
|
|
------------------------------
|
|
|
|
|
04/23/04: beazley
|
|
|
|
|
Incorporated a variety of patches contributed by Eric Raymond.
|
|
|
|
|
These include:
|
|
|
|
|
|
|
|
|
|
0. Cleans up some comments so they don't wrap on an 80-column display.
|
|
|
|
|
1. Directs compiler errors to stderr where they belong.
|
|
|
|
|
2. Implements and documents automatic line counting when \n is ignored.
|
|
|
|
|
3. Changes the way progress messages are dumped when debugging is on.
|
|
|
|
|
The new format is both less verbose and conveys more information than
|
|
|
|
|
the old, including shift and reduce actions.
|
|
|
|
|
|
|
|
|
|
04/23/04: beazley
|
|
|
|
|
Added a Python setup.py file to simply installation. Contributed
|
|
|
|
|
by Adam Kerrison.
|
|
|
|
|
|
|
|
|
|
04/23/04: beazley
|
|
|
|
|
Added patches contributed by Adam Kerrison.
|
|
|
|
|
|
|
|
|
|
- Some output is now only shown when debugging is enabled. This
|
|
|
|
|
means that PLY will be completely silent when not in debugging mode.
|
|
|
|
|
|
|
|
|
|
- An optional parameter "write_tables" can be passed to yacc() to
|
|
|
|
|
control whether or not parsing tables are written. By default,
|
|
|
|
|
it is true, but it can be turned off if you don't want the yacc
|
|
|
|
|
table file. Note: disabling this will cause yacc() to regenerate
|
|
|
|
|
the parsing table each time.
|
|
|
|
|
|
|
|
|
|
04/23/04: beazley
|
|
|
|
|
Added patches contributed by David McNab. This patch addes two
|
|
|
|
|
features:
|
|
|
|
|
|
|
|
|
|
- The parser can be supplied as a class instead of a module.
|
|
|
|
|
For an example of this, see the example/classcalc directory.
|
|
|
|
|
|
|
|
|
|
- Debugging output can be directed to a filename of the user's
|
|
|
|
|
choice. Use
|
|
|
|
|
|
|
|
|
|
yacc(debugfile="somefile.out")
|
|
|
|
|
|
|
|
|
|
|
2006-05-22 20:29:33 +02:00
|
|
|
|
Version 1.3
|
|
|
|
|
------------------------------
|
|
|
|
|
12/10/02: jmdyck
|
|
|
|
|
Various minor adjustments to the code that Dave checked in today.
|
|
|
|
|
Updated test/yacc_{inf,unused}.exp to reflect today's changes.
|
|
|
|
|
|
|
|
|
|
12/10/02: beazley
|
|
|
|
|
Incorporated a variety of minor bug fixes to empty production
|
|
|
|
|
handling and infinite recursion checking. Contributed by
|
|
|
|
|
Michael Dyck.
|
|
|
|
|
|
|
|
|
|
12/10/02: beazley
|
|
|
|
|
Removed bogus recover() method call in yacc.restart()
|
|
|
|
|
|
|
|
|
|
Version 1.2
|
|
|
|
|
------------------------------
|
|
|
|
|
11/27/02: beazley
|
|
|
|
|
Lexer and parser objects are now available as an attribute
|
|
|
|
|
of tokens and slices respectively. For example:
|
|
|
|
|
|
|
|
|
|
def t_NUMBER(t):
|
|
|
|
|
r'\d+'
|
|
|
|
|
print t.lexer
|
|
|
|
|
|
|
|
|
|
def p_expr_plus(t):
|
|
|
|
|
'expr: expr PLUS expr'
|
|
|
|
|
print t.lexer
|
|
|
|
|
print t.parser
|
|
|
|
|
|
|
|
|
|
This can be used for state management (if needed).
|
|
|
|
|
|
|
|
|
|
10/31/02: beazley
|
|
|
|
|
Modified yacc.py to work with Python optimize mode. To make
|
|
|
|
|
this work, you need to use
|
|
|
|
|
|
|
|
|
|
yacc.yacc(optimize=1)
|
|
|
|
|
|
|
|
|
|
Furthermore, you need to first run Python in normal mode
|
|
|
|
|
to generate the necessary parsetab.py files. After that,
|
|
|
|
|
you can use python -O or python -OO.
|
|
|
|
|
|
|
|
|
|
Note: optimized mode turns off a lot of error checking.
|
|
|
|
|
Only use when you are sure that your grammar is working.
|
|
|
|
|
Make sure parsetab.py is up to date!
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Added cloning of Lexer objects. For example:
|
|
|
|
|
|
|
|
|
|
import copy
|
|
|
|
|
l = lex.lex()
|
|
|
|
|
lc = copy.copy(l)
|
|
|
|
|
|
|
|
|
|
l.input("Some text")
|
|
|
|
|
lc.input("Some other text")
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This might be useful if the same "lexer" is meant to
|
|
|
|
|
be used in different contexts---or if multiple lexers
|
|
|
|
|
are running concurrently.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Fixed subtle bug with first set computation and empty productions.
|
|
|
|
|
Patch submitted by Michael Dyck.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Fixed error messages to use "filename:line: message" instead
|
|
|
|
|
of "filename:line. message". This makes error reporting more
|
|
|
|
|
friendly to emacs. Patch submitted by Fran<61>ois Pinard.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Improvements to parser.out file. Terminals and nonterminals
|
|
|
|
|
are sorted instead of being printed in random order.
|
|
|
|
|
Patch submitted by Fran<61>ois Pinard.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Improvements to parser.out file output. Rules are now printed
|
|
|
|
|
in a way that's easier to understand. Contributed by Russ Cox.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Added 'nonassoc' associativity support. This can be used
|
|
|
|
|
to disable the chaining of operators like a < b < c.
|
|
|
|
|
To use, simply specify 'nonassoc' in the precedence table
|
|
|
|
|
|
|
|
|
|
precedence = (
|
|
|
|
|
('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators
|
|
|
|
|
('left', 'PLUS', 'MINUS'),
|
|
|
|
|
('left', 'TIMES', 'DIVIDE'),
|
|
|
|
|
('right', 'UMINUS'), # Unary minus operator
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
Patch contributed by Russ Cox.
|
|
|
|
|
|
|
|
|
|
10/30/02: beazley
|
|
|
|
|
Modified the lexer to provide optional support for Python -O and -OO
|
|
|
|
|
modes. To make this work, Python *first* needs to be run in
|
|
|
|
|
unoptimized mode. This reads the lexing information and creates a
|
|
|
|
|
file "lextab.py". Then, run lex like this:
|
|
|
|
|
|
|
|
|
|
# module foo.py
|
|
|
|
|
...
|
|
|
|
|
...
|
|
|
|
|
lex.lex(optimize=1)
|
|
|
|
|
|
|
|
|
|
Once the lextab file has been created, subsequent calls to
|
|
|
|
|
lex.lex() will read data from the lextab file instead of using
|
|
|
|
|
introspection. In optimized mode (-O, -OO) everything should
|
|
|
|
|
work normally despite the loss of doc strings.
|
|
|
|
|
|
|
|
|
|
To change the name of the file 'lextab.py' use the following:
|
|
|
|
|
|
|
|
|
|
lex.lex(lextab="footab")
|
|
|
|
|
|
|
|
|
|
(this creates a file footab.py)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Version 1.1 October 25, 2001
|
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
|
|
10/25/01: beazley
|
|
|
|
|
Modified the table generator to produce much more compact data.
|
|
|
|
|
This should greatly reduce the size of the parsetab.py[c] file.
|
|
|
|
|
Caveat: the tables still need to be constructed so a little more
|
|
|
|
|
work is done in parsetab on import.
|
|
|
|
|
|
|
|
|
|
10/25/01: beazley
|
|
|
|
|
There may be a possible bug in the cycle detector that reports errors
|
|
|
|
|
about infinite recursion. I'm having a little trouble tracking it
|
|
|
|
|
down, but if you get this problem, you can disable the cycle
|
|
|
|
|
detector as follows:
|
|
|
|
|
|
|
|
|
|
yacc.yacc(check_recursion = 0)
|
|
|
|
|
|
|
|
|
|
10/25/01: beazley
|
|
|
|
|
Fixed a bug in lex.py that sometimes caused illegal characters to be
|
|
|
|
|
reported incorrectly. Reported by Sverre J<>rgensen.
|
|
|
|
|
|
|
|
|
|
7/8/01 : beazley
|
|
|
|
|
Added a reference to the underlying lexer object when tokens are handled by
|
|
|
|
|
functions. The lexer is available as the 'lexer' attribute. This
|
|
|
|
|
was added to provide better lexing support for languages such as Fortran
|
|
|
|
|
where certain types of tokens can't be conveniently expressed as regular
|
|
|
|
|
expressions (and where the tokenizing function may want to perform a
|
|
|
|
|
little backtracking). Suggested by Pearu Peterson.
|
|
|
|
|
|
|
|
|
|
6/20/01 : beazley
|
|
|
|
|
Modified yacc() function so that an optional starting symbol can be specified.
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
|
|
yacc.yacc(start="statement")
|
|
|
|
|
|
|
|
|
|
Normally yacc always treats the first production rule as the starting symbol.
|
|
|
|
|
However, if you are debugging your grammar it may be useful to specify
|
|
|
|
|
an alternative starting symbol. Idea suggested by Rich Salz.
|
|
|
|
|
|
|
|
|
|
Version 1.0 June 18, 2001
|
|
|
|
|
--------------------------
|
|
|
|
|
Initial public offering
|
|
|
|
|
|