python3-parsley/parsley(1)

1PARSLEY(1)                          Parsley                         PARSLEY(1)
2
3
4

NAME

6       parsley - Parsley Documentation
7
8       Contents:
9

PARSLEY TUTORIAL PART I: BASICS AND SYNTAX

11   From Regular Expressions To Grammars
12       Parsley is a pattern matching and parsing tool for Python programmers.
13
14       Most  Python programmers are familiar with regular expressions, as pro‐
15       vided by Python’s re module. To use  it,  you  provide  a  string  that
16       describes the pattern you want to match, and your input.
17
18       For example:
19
20          >>> import re
21          >>> x = re.compile("a(b|c)d+e")
22          >>> x.match("abddde")
23          <_sre.SRE_Match object at 0x7f587af54af8>
24
25       You can do exactly the same sort of thing in Parsley:
26
27          >>> import parsley
28          >>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
29          >>> x("abdde").foo()
30          'e'
31
32       From  this  small example, a couple differences between regular expres‐
33       sions and Parsley grammars can be seen:
34
35   Parsley Grammars Have Named Rules
36       A Parsley grammar can have many rules, and each has a name. The example
37       above  has  a single rule named foo. Rules can call each other; calling
38       rules in Parsley works  like  calling  functions  in  Python.  Here  is
39       another way to write the grammar above:
40
41          foo = 'a' baz 'd'+ 'e'
42          baz = 'b' | 'c'
43
44   Parsley Grammars Are Expressions
45       Calling  match  for  a regular expression returns a match object if the
46       match succeeds or None if it fails. Parsley parsers return the value of
47       last expression in the rule. Behind the scenes, Parsley turns each rule
48       in your grammar into Python methods. In pseudo-Python  code,  it  looks
49       something like this:
50
51          def foo(self):
52              match('a')
53              self.baz()
54              match_one_or_more('d')
55              return match('e')
56
57          def baz(self):
58              return match('b') or match('c')
59
60       The  value of the last expression in the rule is what the rule returns.
61       This is why our example returns ‘e’.
62
63       The similarities to regular expressions pretty much end  here,  though.
64       Having  multiple  named  rules composed of expressions makes for a much
65       more powerful tool, and now we’re going to look at some  more  features
66       that go even further.
67
68   Rules Can Embed Python Expressions
69       Since  these  rules just turn into Python code eventually, we can stick
70       some Python code into them ourselves. This is particularly  useful  for
71       changing the return value of a rule. The Parsley expression for this is
72       ->. We can also bind the results of expressions to variable  names  and
73       use them in Python code. So things like this are possible:
74
75          x = parsley.makeGrammar("""
76          foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
77          baz = 'b' | 'c'
78          """, {})
79          print x("abdde").foo()
80
81          ('a', 'b')
82
83       Literal  match  expressions  like  ‘a’ return the character they match.
84       Using a colon and a variable name after an expression is  like  assign‐
85       ment in Python. As a result, we can use those names in a Python expres‐
86       sion - in this case, creating a tuple.
87
88       Another way to use Python code in a rule is to write custom  tests  for
89       matching.  Sometimes  it’s  more  convenient  to write some Python that
90       determines if a rule matches  than  to  stick  to  Parsley  expressions
91       alone.  For  those cases, we can use ?(). Here, we use the builtin rule
92       anything to match a single character, then a Python predicate to decide
93       if it’s the one we want:
94
95          digit = anything:x ?(x in '0123456789') -> x
96
97       This  rule  digit will match any decimal digit. We need the -> x on the
98       end to return the character rather than  the  value  of  the  predicate
99       expression, which is just True.
100
101   Repeated Matches Make Lists
102       Like  regular  expressions, Parsley supports repeating matches. You can
103       match an expression zero or more times with ‘* ‘,  one  or  more  times
104       with  ‘+’,  and a specific number of times with ‘{n, m}’ or just ‘{n}’.
105       Since all expressions in Parsley return a value, these repetition oper‐
106       ators return a list containing each match they made.
107
108          x = parsley.makeGrammar("""
109          digit = anything:x ?(x in '0123456789') -> x
110          number = digit+
111          """, {})
112          print x("314159").number()
113
114          ['3', '1', '4', '1', '5', '9']
115
116       The  number rule repeatedly matches digit and collects the matches into
117       a list. This gets us part way to turning a string like 314159  into  an
118       integer.  All  we  need  now is to turn the list back into a string and
119       call int():
120
121          x = parsley.makeGrammar("""
122          digit = anything:x ?(x in '0123456789') -> x
123          number = digit+:ds -> int(''.join(ds))
124          """, {})
125          print x("8675309").number()
126
127          8675309
128
129   Collecting Chunks Of Input
130       If it seemed kind of strange to break our input string up into  a  list
131       and  then  reassemble  it  into  a string using join, you’re not alone.
132       Parsley has a shortcut for this since it’s a common case: you  can  use
133       <>  around  a  rule  to  make it return the slice of input it consumes,
134       ignoring the actual return value of the rule. For example:
135
136          x = parsley.makeGrammar("""
137          digit = anything:x ?(x in '0123456789')
138          number = <digit+>:ds -> int(ds)
139          """, {})
140          print x("11235").number()
141
142          11235
143
144       Here, <digit+> returns the string “11235”, since that’s the portion  of
145       the input that digit+ matched. (In this case it’s the entire input, but
146       we’ll see some more complex cases soon.)  Since  it  ignores  the  list
147       returned  by  digit+,  leaving the -> x out of digit doesn’t change the
148       result.
149
150   Building A Calculator
151       Now let’s look at using these rules in a more  complicated  parser.  We
152       have support for parsing numbers; let’s do addition, as well.
153
154          x = parsley.makeGrammar("""
155          digit = anything:x ?(x in '0123456789')
156          number = <digit+>:ds -> int(ds)
157          expr = number:left ( '+' number:right -> left + right
158                             | -> left)
159          """, {})
160          print x("17+34").expr()
161          print x("18").expr()
162
163          51
164          18
165
166       Parentheses  group expressions just like in Python. the ‘|’ operator is
167       like or in Python - it short-circuits. It tries each  expression  until
168       it  finds  one that matches. For “17+34”, the number rule matches “17”,
169       then Parsley tries to match + followed by another number. Since “+” and
170       “34”  are  the  next things in the input, those match, and it then runs
171       the Python expression left + right and returns its value. For the input
172       “18”  it does the same, but + does not match, so Parsley tries the next
173       thing after |. Since this is just a Python expression, the  match  suc‐
174       ceeds and the number 18 is returned.
175
176       Now let’s add subtraction:
177
178          digit = anything:x ?(x in '0123456789')
179          number = <digit+>:ds -> int(ds)
180          expr = number:left ( '+' number:right -> left + right
181                             | '-' number:right -> left - right
182                             | -> left)
183
184       This will accept things like ‘5-4’ now.
185
186       Since  parsing  numbers  is  so common and useful, Parsley actually has
187       ‘digit’ as a builtin rule, so we don’t even  need  to  define  it  our‐
188       selves.  We’ll leave it out in further examples and rely on the version
189       Parsley provides.
190
191       Normally we like to allow whitespace in our expressions, so  let’s  add
192       some support for spaces:
193
194          number = <digit+>:ds -> int(ds)
195          ws = ' '*
196          expr = number:left ws ('+' ws number:right -> left + right
197                                |'-' ws number:right -> left - right
198                                | -> left)
199
200       Now we can handle “17 +34”, “2  - 1”, etc.
201
202       We  could  go ahead and add multiplication and division here (and hope‐
203       fully it’s obvious how that would work), but  let’s  complicate  things
204       further  and allow multiple operations in our expressions – things like
205       “1 - 2 + 3”.
206
207       There’s a couple different ways to do this. Possibly the easiest is  to
208       build a list of numbers and operations, then do the math.:
209
210          x = parsley.makeGrammar("""
211          number = <digit+>:ds -> int(ds)
212          ws = ' '*
213          add = '+' ws number:n -> ('+', n)
214          sub = '-' ws number:n -> ('-', n)
215          addsub = ws (add | sub)
216          expr = number:left (addsub+:right -> right
217                             | -> left)
218          """, {})
219          print x("1 + 2 - 3").expr()
220
221          [('+', 2), ('-, 3)]
222
223       Oops,  this  is  only half the job done. We’re collecting the operators
224       and values, but now we need to do the actual calculation.  The  easiest
225       way  to  do  it is probably to write a Python function and call it from
226       inside the grammar.
227
228       So far we have been passing an empty dict as  the  second  argument  to
229       makeGrammar.  This  is  a dict of variable bindings that can be used in
230       Python expressions in the grammar. So we can pass Python objects,  such
231       as functions, this way:
232
233          def calculate(start, pairs):
234              result = start
235              for op, value in pairs:
236                  if op == '+':
237                      result += value
238                  elif op == '-':
239                      result -= value
240              return result
241          x = parsley.makeGrammar("""
242          number = <digit+>:ds -> int(ds)
243          ws = ' '*
244          add = '+' ws number:n -> ('+', n)
245          sub = '-' ws number:n -> ('-', n)
246          addsub = ws (add | sub)
247          expr = number:left (addsub+:right -> calculate(left, right)
248                             | -> left)
249          """, {"calculate": calculate})
250          print x("4 + 5 - 6").expr()
251
252          3
253
254       Introducing  this  function  lets  us simplify even further: instead of
255       using addsub+, we can  use  addsub*,  since  calculate(left,  [])  will
256       return left – so now expr becomes:
257
258          expr = number:left addsub*:right -> calculate(left, right)
259
260       So  now  let’s look at adding multiplication and division. Here, we run
261       into precedence rules: should “4 * 5 + 6” give us 26, or 44? The tradi‐
262       tional  choice  is  for  multiplication and division to take precedence
263       over addition and subtraction,  so  the  answer  should  be  26.  We’ll
264       resolve  this  by making sure multiplication and division happen before
265       addition and subtraction are considered:
266
267          def calculate(start, pairs):
268              result = start
269              for op, value in pairs:
270                  if op == '+':
271                      result += value
272                  elif op == '-':
273                      result -= value
274                  elif op == '*':
275                      result *= value
276                  elif op == '/':
277                      result /= value
278              return result
279          x = parsley.makeGrammar("""
280          number = <digit+>:ds -> int(ds)
281          ws = ' '*
282          add = '+' ws expr2:n -> ('+', n)
283          sub = '-' ws expr2:n -> ('-', n)
284          mul = '*' ws number:n -> ('*', n)
285          div = '/' ws number:n -> ('/', n)
286
287          addsub = ws (add | sub)
288          muldiv = ws (mul | div)
289
290          expr = expr2:left addsub*:right -> calculate(left, right)
291          expr2 = number:left muldiv*:right -> calculate(left, right)
292          """, {"calculate": calculate})
293          print x("4 * 5 + 6").expr()
294
295          26
296
297       Notice particularly that add, sub, and expr all call the expr2 rule now
298       where they called number before. This means that all the places where a
299       number was expected previously, a multiplication or division expression
300       can appear instead.
301
302       Finally  let’s  add parentheses, so you can override the precedence and
303       write “4 * (5 + 6)” when you do want 44. We’ll  do  this  by  adding  a
304       value  rule  that accepts either a number or an expression in parenthe‐
305       ses, and replace existing calls to number with calls to value.
306
307          def calculate(start, pairs):
308              result = start
309              for op, value in pairs:
310                  if op == '+':
311                      result += value
312                  elif op == '-':
313                      result -= value
314                  elif op == '*':
315                      result *= value
316                  elif op == '/':
317                      result /= value
318              return result
319          x = parsley.makeGrammar("""
320          number = <digit+>:ds -> int(ds)
321          parens = '(' ws expr:e ws ')' -> e
322          value = number | parens
323          ws = ' '*
324          add = '+' ws expr2:n -> ('+', n)
325          sub = '-' ws expr2:n -> ('-', n)
326          mul = '*' ws value:n -> ('*', n)
327          div = '/' ws value:n -> ('/', n)
328
329          addsub = ws (add | sub)
330          muldiv = ws (mul | div)
331
332          expr = expr2:left addsub*:right -> calculate(left, right)
333          expr2 = value:left muldiv*:right -> calculate(left, right)
334          """, {"calculate": calculate})
335
336          print x("4 * (5 + 6) + 1").expr()
337
338          45
339
340       And there you have it: a four-function calculator with  precedence  and
341       parentheses.
342

PARSLEY TUTORIAL PART II: PARSING STRUCTURED DATA

344       Now that you are familiar with the basics of Parsley syntax, let’s look
345       at a more realistic example: a JSON parser.
346
347       The JSON spec on http://json.org/ describes  the  format,  and  we  can
348       adapt its description to a parser. We’ll write the Parsley rules in the
349       same order as the grammar rules in the right sidebar on the JSON  site,
350       starting with the top-level rule, ‘object’.
351
352          object = ws '{' members:m ws '}' -> dict(m)
353
354       Parsley  defines  a builtin rule ws which consumes any spaces, tabs, or
355       newlines it can.
356
357       Since JSON objects are represented in Python as dicts, and dict takes a
358       list  of  pairs,  we  need a rule to collect name/value pairs inside an
359       object expression.
360
361          members = (pair:first (ws ',' pair)*:rest -> [first] + rest)
362                    | -> []
363
364       This handles the three cases for object  contents:  one,  multiple,  or
365       zero  pairs.  A  name/value  pair  is  separated by a colon. We use the
366       builtin rule spaces to consume any whitespace after the colon:
367
368          pair = ws string:k ws ':' value:v -> (k, v)
369
370       Arrays, similarly, are sequences of array elements, and are represented
371       as Python lists.
372
373          array = '[' elements:xs ws ']' -> xs
374          elements = (value:first (ws ',' value)*:rest -> [first] + rest) | -> []
375
376       Values can be any JSON expression.
377
378          value = ws (string | number | object | array
379                     | 'true'  -> True
380                     | 'false' -> False
381                     | 'null'  -> None)
382
383       Strings are sequences of zero or more characters between double quotes.
384       Of course, we need to deal with escaped characters as well.  This  rule
385       introduces  the  operator  ~,  which  does  negative  lookahead; if the
386       expression following it succeeds, its parse will fail. If  the  expres‐
387       sion  fails, the rest of the parse continues. Either way, no input will
388       be consumed.
389
390          string = '"' (escapedChar | ~'"' anything)*:c '"' -> ''.join(c)
391
392       This is a common pattern, so let’s examine it step by step.  This  will
393       match  leading  whitespace  and  then a double quote character. It then
394       matches zero or more characters. If it’s not an escapedChar (which will
395       start  with  a  backslash),  we check to see if it’s a double quote, in
396       which case we want to end the loop. If it’s  not  a  double  quote,  we
397       match  it  using the rule anything, which accepts a single character of
398       any kind, and continue. Finally, we match the ending double  quote  and
399       return  the  characters  in  the string. We cannot use the <> syntax in
400       this case because we don’t want a literal slice of the input – we  want
401       escape sequences to be replaced with the character they represent.
402
403       It’s  very  common to use ~ for “match until” situations where you want
404       to keep parsing only until an end marker is  found.  Similarly,  ~~  is
405       positive  lookahead: it succeed if its expression succeeds but not con‐
406       sume any input.
407
408       The escapedChar rule should not be too surprising: we match a backslash
409       then whatever escape code is given.
410
411          escapedChar = '\\' (('"' -> '"')    |('\\' -> '\\')
412                             |('/' -> '/')    |('b' -> '\b')
413                             |('f' -> '\f')   |('n' -> '\n')
414                             |('r' -> '\r')   |('t' -> '\t')
415                             |('\'' -> '\'')  | escapedUnicode)
416
417       Unicode  escapes (of the form \u2603) require matching four hex digits,
418       so we use the repetition operator {}, which works like +  or  *  except
419       taking  either  a  {min,  max} pair or simply a {number} indicating the
420       exact number of repetitions.
421
422          hexdigit = :x ?(x in '0123456789abcdefABCDEF') -> x
423          escapedUnicode = 'u' <hexdigit{4}>:hs -> unichr(int(hs, 16))
424
425       With strings out of the way, we advance to numbers,  both  integer  and
426       floating-point.
427
428          number = spaces ('-' | -> ''):sign (intPart:ds (floatPart(sign ds)
429                                                         | -> int(sign + ds)))
430
431       Here  we vary from the json.org description a little and move sign han‐
432       dling up into the number rule. We match either an intPart followed by a
433       floatPart or just an intPart by itself.
434
435          digit = :x ?(x in '0123456789') -> x
436          digits = <digit*>
437          digit1_9 = :x ?(x in '123456789') -> x
438
439          intPart = (digit1_9:first digits:rest -> first + rest) | digit
440          floatPart :sign :ds = <('.' digits exponent?) | exponent>:tail
441                               -> float(sign + ds + tail)
442          exponent = ('e' | 'E') ('+' | '-')? digits
443
444       In  JSON,  multi-digit  numbers  cannot  start  with  0  (since that is
445       Javascript’s syntax for octal numbers), so  intPart  uses  digit1_9  to
446       exclude it in the first position.
447
448       The  floatPart  rule takes two parameters, sign and ds. Our number rule
449       passes values for these when it invokes  floatPart,  letting  us  avoid
450       duplication  of  work  within  the  rule. Note that pattern matching on
451       arguments to rules works the same as on the string input to the parser.
452       In  this  case,  we provide no pattern, just a name: :ds is the same as
453       anything:ds.
454
455       (Also note that our float rule cheats a  little:  it  does  not  really
456       parse floating-point numbers, it merely recognizes them and passes them
457       to Python’s float builtin to actually produce the value.)
458
459       The full version of this parser and its test cases can be found in  the
460       examples directory in the Parsley distribution.
461

PARSLEY TUTORIAL PART III: PARSING NETWORK DATA

463       This tutorial assumes basic knowledge of writing Twisted TCP clients or
464       servers.
465
466   Basic parsing
467       Parsing data that comes in over the network can  be  difficult  due  to
468       that  there  is  no guarantee of receiving whole messages. Buffering is
469       often complicated by protocols switching between using fixed-width mes‐
470       sages  and  delimiters for framing. Fortunately, Parsley can remove all
471       of this tedium.
472
473       With   parsley.makeProtocol(),   Parsley   can   generate   a   Twisted
474       IProtocol-implementing  class  which  will  match incoming network data
475       using Parsley grammar rules. Before  getting  started  with  makeProto‐
476       col(), let’s build a grammar for netstrings. The netstrings protocol is
477       very simple:
478
479          4:spam,4:eggs,
480
481       This stream contains two netstrings: spam, and eggs. The data  is  pre‐
482       fixed  with one or more ASCII digits followed by a :, and suffixed with
483       a ,. So, a Parsley grammar to match a netstring would look like:
484
485          nonzeroDigit = digit:x ?(x != '0')
486          digits = <'0' | nonzeroDigit digit*>:i -> int(i)
487
488          netstring = digits:length ':' <anything{length}>:string ',' -> string
489
490
491       makeProtocol() takes, in  addition  to  a  grammar,  a  factory  for  a
492       “sender”  and a factory for a “receiver”. In the system of objects man‐
493       aged by the ParserProtocol, the sender is in charge of writing data  to
494       the  wire,  and  the  receiver  has methods called on it by the Parsley
495       rules. To demonstrate it, here is the final piece needed in the Parsley
496       grammar for netstrings:
497
498          receiveNetstring = netstring:string -> receiver.netstringReceived(string)
499
500
501       The  receiver  is  always  available  in  Parsley  rules  with the name
502       receiver, allowing Parsley rules to call methods on it.
503
504       When data is received over the wire, the ParserProtocol tries to  match
505       the  received  data  against  the  current  rule.  If  the current rule
506       requires more data to finish matching,  the  ParserProtocol  stops  and
507       waits  until  more data comes in, then tries to continue matching. This
508       repeats until the current rule is  completely  matched,  and  then  the
509       ParserProtocol  starts  matching  any leftover data against the current
510       rule again.
511
512       One specifies the current rule by setting a  currentRule  attribute  on
513       the  receiver, which the ParserProtocol looks at before doing any pars‐
514       ing. Changing the current rule is addressed in the Switching rules sec‐
515       tion.
516
517       Since  the  ParserProtocol  will never modify the currentRule attribute
518       itself, the default behavior is to keep using the  same  rule.  Parsing
519       netstrings  doesn’t require any rule changing, so, the default behavior
520       of continuing to use the same rule is fine.
521
522       Both the sender factory and receiver factory are constructed  when  the
523       ParserProtocol’s  connection  is  established.  The sender factory is a
524       one-argument  callable  which  will  be  passed  the   ParserProtocol’s
525       Transport.  This allows the sender to send data over the transport. For
526       example:
527
528          class NetstringSender(object):
529              def __init__(self, transport):
530                  self.transport = transport
531
532              def sendNetstring(self, string):
533                  self.transport.write('%d:%s,' % (len(string), string))
534
535
536       The receiver factory is another one-argument callable which  is  passed
537       the  constructed  sender.  The  returned object must at least have pre‐
538       pareParsing() and finishParsing() methods.  prepareParsing() is  called
539       with the ParserProtocol instance when a connection is established (i.e.
540       in the connectionMade of the  ParserProtocol)  and  finishParsing()  is
541       called  when  a connection is closed (i.e. in the connectionLost of the
542       ParserProtocol).
543
544       NOTE:
545          Both the receiver factory and its returned object’s prepareParsing()
546          are  called  at  in the ParserProtocol’s connectionMade method; this
547          separation is for ease of testing receivers.
548
549       To demonstrate a receiver, here is a simple receiver that receives net‐
550       strings and echos the same netstrings back:
551
552          class NetstringReceiver(object):
553              currentRule = 'receiveNetstring'
554
555              def __init__(self, sender):
556                  self.sender = sender
557
558              def prepareParsing(self, parser):
559                  pass
560
561              def finishParsing(self, reason):
562                  pass
563
564              def netstringReceived(self, string):
565                  self.sender.sendNetstring(string)
566
567
568       Putting it all together, the Protocol is constructed using the grammar,
569       sender factory, and receiver factory:
570
571
572
573          NetstringProtocol = makeProtocol(
574              grammar, NetstringSender, NetstringReceiver)
575
576
577
578
579       The complete script is also available for download.
580
581   Intermezzo: error reporting
582       If an exception is raised from within Parsley during  parsing,  whether
583       it’s  due  to input not matching the current rule or an exception being
584       raised from code the grammar calls, the connection will be  immediately
585       closed.  The  traceback will be captured as a Failure and passed to the
586       finishParsing() method of the receiver.
587
588       At present, there is no way to recover from failure.
589
590   Composing senders and receivers
591       The design of senders and receivers is intentional to make  composition
592       easy:  no subclassing is required. While the composition is easy enough
593       to do on your own, Parsley provides a function:  stack().  It  takes  a
594       base factory followed by zero or more wrappers.
595
596       Its  use  is  extremely  simple:  stack(x, y, z) will return a callable
597       suitable either as a sender or receiver factory which will, when called
598       with an argument, return x(y(z(argument))).
599
600       An example of wrapping a sender factory:
601
602          class NetstringReversalWrapper(object):
603              def __init__(self, wrapped):
604                  self.wrapped = wrapped
605
606              def sendNetstring(self, string):
607                  self.wrapped.sendNetstring(string[::-1])
608
609
610       And then, constructing the Protocol:
611
612          NetstringProtocol = makeProtocol(
613              grammar,
614              stack(NetstringReversalWrapper, NetstringSender),
615              NetstringReceiver)
616
617       A wrapper doesn’t need to call the same methods on the thing it’s wrap‐
618       ping.  Also note that in most cases, it’s important to forward  unknown
619       methods on to the wrapped object. An example of wrapping a receiver:
620
621          class NetstringSplittingWrapper(object):
622              def __init__(self, wrapped):
623                  self.wrapped = wrapped
624
625              def netstringReceived(self, string):
626                  splitpoint = len(string) // 2
627                  self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
628                  self.wrapped.netstringSecondHalfReceived(string[splitpoint:])
629
630              def __getattr__(self, attr):
631                  return getattr(self.wrapped, attr)
632
633
634       The corresponding receiver and again, constructing the Protocol:
635
636          class SplitNetstringReceiver(object):
637              currentRule = 'receiveNetstring'
638
639              def __init__(self, sender):
640                  self.sender = sender
641
642              def prepareParsing(self, parser):
643                  pass
644
645              def finishParsing(self, reason):
646                  pass
647
648              def netstringFirstHalfReceived(self, string):
649                  self.sender.sendNetstring(string)
650
651              def netstringSecondHalfReceived(self, string):
652                  pass
653
654
655          NetstringProtocol = makeProtocol(
656              grammar,
657              stack(NetstringReversalWrapper, NetstringSender),
658
659
660       The complete script is also available for download.
661
662   Switching rules
663       As  mentioned before, it’s possible to change the current rule. Imagine
664       a “netstrings2” protocol that looks like this:
665
666          3:foo,3;bar,4:spam,4;eggs,
667
668       That is, the protocol alternates between using : and using ; delimiting
669       data length and the data. The amended grammar would look something like
670       this:
671
672          nonzeroDigit = digit:x ?(x != '0')
673          digits = <'0' | nonzeroDigit digit*>:i -> int(i)
674          netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string
675
676          colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
677          semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)
678
679
680       Changing the current rule is as  simple  as  changing  the  currentRule
681       attribute  on the receiver. So, the netstringReceived method could look
682       like this:
683
684              def netstringReceived(self, delimiter, string):
685                  self.sender.sendNetstring(string)
686                  if delimiter == ':':
687                      self.currentRule = 'semicolon'
688                  else:
689                      self.currentRule = 'colon'
690
691
692       While changing the currentRule attribute can be done at any  time,  the
693       ParserProtocol  only examines the currentRule at the beginning of pars‐
694       ing and after a rule has finished matching. As a result, if the curren‐
695       tRule  changes,  the ParserProtocol will wait until the current rule is
696       completely matched before switching rules.
697
698       The complete script is also available for download.
699

EXTENDING GRAMMARS AND INHERITANCE

701       warning
702              Unfinished
703
704       Another feature taken from OMeta is grammar inheritance. We can write a
705       grammar with rules that override ones in a parent. If we load the gram‐
706       mar from our calculator tutorial as Calc, we can extend  it  with  some
707       constants:
708
709          from parsley import makeGrammar
710          import math
711          import calc
712          calcGrammarEx = """
713          value = super | constant
714          constant = 'pi' -> math.pi
715                   | 'e' -> math.e
716          """
717          CalcEx = makeGrammar(calcGrammar, {"math": math}, extends=calc.Calc)
718
719       Invoking  the  rule  super calls the rule value in Calc. If it fails to
720       match, our new value rule attempts to match a constant name.
721

TERML

723       TermL (“term-ell”) is the Term Language, a small expression-based  lan‐
724       guage for representing arbitrary data in a simple structured format. It
725       is ideal for expressing abstract syntax trees (ASTs) and other kinds of
726       primitive data trees.
727
728   Creating Terms
729          >>> from terml.nodes import termMaker as t
730          >>> t.Term()
731          term('Term')
732
733       That’s it! We’ve created an empty term, Term, with nothing inside.
734
735          >>> t.Num(1)
736          term('Num(1)')
737          >>> t.Outer(t.Inner())
738          term('Outer(Inner)')
739
740       We  can  see  that  terms are not just namedtuple lookalikes. They have
741       their own internals and store data in a  slightly  different  and  more
742       structured way than a normal tuple.
743
744   Parsing Terms
745       Parsley  can  parse  terms  from streams. Terms can contain any kind of
746       parseable data, including other terms. Returning to the ubiquitous cal‐
747       culator example:
748
749          add = Add(:x, :y) -> x + y
750
751       Here this rule matches a term called Add which has two components, bind
752       those components to a couple of names (x and y), and return their  sum.
753       If this rule were applied to a term like Add(3, 5), it would return 8.
754
755       Terms  can  be  nested, too. Here’s an example that performs a slightly
756       contrived match on a negated term inside an addition:
757
758          add_negate = Add(:x, Negate(:y)) -> x - y
759

PARSLEY REFERENCE

761   Basic syntax
762       foo = ....:
763              Define a rule named foo.
764
765       expr1 expr2:
766              Match expr1, and then match expr2 if it succeeds, returning  the
767              value of expr2. Like Python's and.
768
769       expr1 | expr2:
770              Try  to  match  expr1 --- if it fails, match expr2 instead. Like
771              Python's or.
772
773       expr*: Match expr zero or more times, returning a list of matches.
774
775       expr+: Match expr one or more times, returning a list of matches.
776
777       expr?: Try to match expr. Returns None if it fails to match.
778
779       expr{n, m}:
780              Match expr at least n times, and no more than m times.
781
782       expr{n}:
783              Match expr n times exactly.
784
785       ~expr: Negative lookahead. Fails if the next item in the input  matches
786              expr. Consumes no input.
787
788       ~~expr:
789              Positive lookahead. Fails if the next item in the input does not
790              match expr. Consumes no input.
791
792       ruleName or ruleName(arg1 arg2 etc):
793              Call the rule ruleName, possibly with args.
794
795       'x':   Match the literal character 'x'.
796
797       <expr>:
798              Returns the string consumed by matching expr. Good for  tokeniz‐
799              ing rules.
800
801       expr:name:
802              Bind the result of expr to the local variable name.
803
804       -> pythonExpression:
805              Evaluate  the given Python expression and return its result. Can
806              be used inside parentheses too!
807
808       !(pythonExpression):
809              Invoke a Python expression as an action.
810
811       ?(pythonExpression):
812              Fail if the Python expression is false, Returns True otherwise.
813
814       expr ^(CustomLabel):
815              If the expr fails, the exception raised will  contain  CustomLa‐
816              bel.   Good  for  providing  more context when a rule is broken.
817              CustomLabel can contain any character other than "(" and ")".
818
819       Comments like Python comments are supported as well,  starting  with  #
820       and extending to the end of the line.
821
822   Python API
823   Protocol parsing API
824       class ometa.protocol.ParserProtocol
825              The  Twisted Protocol subclass used for parsing stream protocols
826              using Parsley. It has two public attributes:
827
828              sender After the connection is established, this attribute  will
829                     refer  to the sender created by the sender factory of the
830                     ParserProtocol.
831
832              receiver
833                     After the connection is established, this attribute  will
834                     refer  to the receiver created by the receiver factory of
835                     the ParserProtocol.
836
837              It's  common  to  also  add   a   factory   attribute   to   the
838              ParserProtocol from its factory's buildProtocol method, but this
839              isn't strictly required or guaranteed to be present.
840
841              Subclassing or instantiating ParserProtocol  is  not  necessary;
842              makeProtocol() is sufficient and requires less boilerplate.
843
844       class ometa.protocol.Receiver
845              Receiver  is not a real class but is used here for demonstration
846              purposes to indicate the required API.
847
848              currentRule
849                     ParserProtocol examines the currentRule attribute at  the
850                     beginning  of  parsing as well as after every time a rule
851                     has completely matched. At these times, the rule with the
852                     same name as the value of currentRule will be selected to
853                     start parsing the incoming stream of data.
854
855              prepareParsing(parserProtocol)
856                     prepareParsing() is called after the  ParserProtocol  has
857                     established    a    connection,   and   is   passed   the
858                     ParserProtocol instance itself.
859
860                     Parameters
861                            parserProtocol -- An instance of ProtocolParser.
862
863              finishParsing(reason)
864                     finishParsing() is called if an exception was raised dur‐
865                     ing parsing, or when the ParserProtocol has lost its con‐
866                     nection, whichever comes first. It will  only  be  called
867                     once.
868
869                     An exception raised during parsing can be due to incoming
870                     data that doesn't match the current rule or an  exception
871                     raised calling python code during matching.
872
873                     Parameters
874                            reason -- A Failure encapsulating the reason pars‐
875                            ing has ended.
876
877       Senders do not have any required API as ParserProtocol will never  call
878       methods on a sender.
879
880   Built-in Parsley Rules
881       anything:
882              Matches a single character from the input.
883
884       letter:
885              Matches a single ASCII letter.
886
887       digit: Matches a decimal digit.
888
889       letterOrDigit:
890              Combines the above.
891
892       end:   Matches the end of input.
893
894       ws:    Matches zero or more spaces, tabs, or newlines.
895
896       exactly(char):
897              Matches the character char.
898

AUTHOR

900       Allen Short
901

COPYRIGHT

903       2013, Allen Short
904
905
906
907
9081.3                              Mar 12, 2019                       PARSLEY(1)