1PARSLEY(1)                          Parsley                         PARSLEY(1)
2
3
4

NAME

6       parsley - Parsley Documentation
7
8       Contents:
9

PARSLEY TUTORIAL PART I: BASICS AND SYNTAX

11   From Regular Expressions To Grammars
12       Parsley is a pattern matching and parsing tool for Python programmers.
13
14       Most  Python programmers are familiar with regular expressions, as pro‐
15       vided by Python’s re module. To use it, you provide a string  that  de‐
16       scribes the pattern you want to match, and your input.
17
18       For example:
19
20          >>> import re
21          >>> x = re.compile("a(b|c)d+e")
22          >>> x.match("abddde")
23          <_sre.SRE_Match object at 0x7f587af54af8>
24
25       You can do exactly the same sort of thing in Parsley:
26
27          >>> import parsley
28          >>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
29          >>> x("abdde").foo()
30          'e'
31
32       From  this  small example, a couple differences between regular expres‐
33       sions and Parsley grammars can be seen:
34
35   Parsley Grammars Have Named Rules
36       A Parsley grammar can have many rules, and each has a name. The example
37       above  has  a single rule named foo. Rules can call each other; calling
38       rules in Parsley works like calling functions in Python.  Here  is  an‐
39       other way to write the grammar above:
40
41          foo = 'a' baz 'd'+ 'e'
42          baz = 'b' | 'c'
43
44   Parsley Grammars Are Expressions
45       Calling  match  for  a regular expression returns a match object if the
46       match succeeds or None if it fails. Parsley parsers return the value of
47       last expression in the rule. Behind the scenes, Parsley turns each rule
48       in your grammar into Python methods. In pseudo-Python  code,  it  looks
49       something like this:
50
51          def foo(self):
52              match('a')
53              self.baz()
54              match_one_or_more('d')
55              return match('e')
56
57          def baz(self):
58              return match('b') or match('c')
59
60       The  value of the last expression in the rule is what the rule returns.
61       This is why our example returns ‘e’.
62
63       The similarities to regular expressions pretty much end  here,  though.
64       Having  multiple  named  rules composed of expressions makes for a much
65       more powerful tool, and now we’re going to look at some  more  features
66       that go even further.
67
68   Rules Can Embed Python Expressions
69       Since  these  rules just turn into Python code eventually, we can stick
70       some Python code into them ourselves. This is particularly  useful  for
71       changing the return value of a rule. The Parsley expression for this is
72       ->. We can also bind the results of expressions to variable  names  and
73       use them in Python code. So things like this are possible:
74
75          x = parsley.makeGrammar("""
76          foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
77          baz = 'b' | 'c'
78          """, {})
79          print x("abdde").foo()
80
81          ('a', 'b')
82
83       Literal match expressions like ‘a’ return the character they match. Us‐
84       ing a colon and a variable name after an expression is like  assignment
85       in Python. As a result, we can use those names in a Python expression -
86       in this case, creating a tuple.
87
88       Another way to use Python code in a rule is to write custom  tests  for
89       matching.  Sometimes it’s more convenient to write some Python that de‐
90       termines if a rule matches than to stick to Parsley expressions  alone.
91       For those cases, we can use ?(). Here, we use the builtin rule anything
92       to match a single character, then a Python predicate to decide if  it’s
93       the one we want:
94
95          digit = anything:x ?(x in '0123456789') -> x
96
97       This  rule  digit will match any decimal digit. We need the -> x on the
98       end to return the character rather than the value of the predicate  ex‐
99       pression, which is just True.
100
101   Repeated Matches Make Lists
102       Like  regular  expressions, Parsley supports repeating matches. You can
103       match an expression zero or more times with ‘* ‘,  one  or  more  times
104       with  ‘+’,  and a specific number of times with ‘{n, m}’ or just ‘{n}’.
105       Since all expressions in Parsley return a value, these repetition oper‐
106       ators return a list containing each match they made.
107
108          x = parsley.makeGrammar("""
109          digit = anything:x ?(x in '0123456789') -> x
110          number = digit+
111          """, {})
112          print x("314159").number()
113
114          ['3', '1', '4', '1', '5', '9']
115
116       The  number rule repeatedly matches digit and collects the matches into
117       a list. This gets us part way to turning a string like 314159  into  an
118       integer.  All  we  need  now is to turn the list back into a string and
119       call int():
120
121          x = parsley.makeGrammar("""
122          digit = anything:x ?(x in '0123456789') -> x
123          number = digit+:ds -> int(''.join(ds))
124          """, {})
125          print x("8675309").number()
126
127          8675309
128
129   Collecting Chunks Of Input
130       If it seemed kind of strange to break our input string up into  a  list
131       and  then  reassemble  it  into  a string using join, you’re not alone.
132       Parsley has a shortcut for this since it’s a common case: you  can  use
133       <>  around a rule to make it return the slice of input it consumes, ig‐
134       noring the actual return value of the rule. For example:
135
136          x = parsley.makeGrammar("""
137          digit = anything:x ?(x in '0123456789')
138          number = <digit+>:ds -> int(ds)
139          """, {})
140          print x("11235").number()
141
142          11235
143
144       Here, <digit+> returns the string “11235”, since that’s the portion  of
145       the input that digit+ matched. (In this case it’s the entire input, but
146       we’ll see some more complex cases soon.) Since it ignores the list  re‐
147       turned  by digit+, leaving the -> x out of digit doesn’t change the re‐
148       sult.
149
150   Building A Calculator
151       Now let’s look at using these rules in a more  complicated  parser.  We
152       have support for parsing numbers; let’s do addition, as well.
153
154          x = parsley.makeGrammar("""
155          digit = anything:x ?(x in '0123456789')
156          number = <digit+>:ds -> int(ds)
157          expr = number:left ( '+' number:right -> left + right
158                             | -> left)
159          """, {})
160          print x("17+34").expr()
161          print x("18").expr()
162
163          51
164          18
165
166       Parentheses  group expressions just like in Python. the ‘|’ operator is
167       like or in Python - it short-circuits. It tries each  expression  until
168       it  finds  one that matches. For “17+34”, the number rule matches “17”,
169       then Parsley tries to match + followed by another number. Since “+” and
170       “34”  are  the  next things in the input, those match, and it then runs
171       the Python expression left + right and returns its value. For the input
172       “18”  it does the same, but + does not match, so Parsley tries the next
173       thing after |. Since this is just a Python expression, the  match  suc‐
174       ceeds and the number 18 is returned.
175
176       Now let’s add subtraction:
177
178          digit = anything:x ?(x in '0123456789')
179          number = <digit+>:ds -> int(ds)
180          expr = number:left ( '+' number:right -> left + right
181                             | '-' number:right -> left - right
182                             | -> left)
183
184       This will accept things like ‘5-4’ now.
185
186       Since  parsing  numbers  is  so common and useful, Parsley actually has
187       ‘digit’ as a builtin rule, so we don’t even  need  to  define  it  our‐
188       selves.  We’ll leave it out in further examples and rely on the version
189       Parsley provides.
190
191       Normally we like to allow whitespace in our expressions, so  let’s  add
192       some support for spaces:
193
194          number = <digit+>:ds -> int(ds)
195          ws = ' '*
196          expr = number:left ws ('+' ws number:right -> left + right
197                                |'-' ws number:right -> left - right
198                                | -> left)
199
200       Now we can handle “17 +34”, “2  - 1”, etc.
201
202       We  could  go ahead and add multiplication and division here (and hope‐
203       fully it’s obvious how that would work), but  let’s  complicate  things
204       further  and allow multiple operations in our expressions – things like
205       “1 - 2 + 3”.
206
207       There’s a couple different ways to do this. Possibly the easiest is  to
208       build a list of numbers and operations, then do the math.:
209
210          x = parsley.makeGrammar("""
211          number = <digit+>:ds -> int(ds)
212          ws = ' '*
213          add = '+' ws number:n -> ('+', n)
214          sub = '-' ws number:n -> ('-', n)
215          addsub = ws (add | sub)
216          expr = number:left (addsub+:right -> right
217                             | -> left)
218          """, {})
219          print x("1 + 2 - 3").expr()
220
221          [('+', 2), ('-, 3)]
222
223       Oops,  this  is  only half the job done. We’re collecting the operators
224       and values, but now we need to do the actual calculation.  The  easiest
225       way  to  do  it is probably to write a Python function and call it from
226       inside the grammar.
227
228       So far we have been passing an empty dict as  the  second  argument  to
229       makeGrammar.  This  is  a dict of variable bindings that can be used in
230       Python expressions in the grammar. So we can pass Python objects,  such
231       as functions, this way:
232
233          def calculate(start, pairs):
234              result = start
235              for op, value in pairs:
236                  if op == '+':
237                      result += value
238                  elif op == '-':
239                      result -= value
240              return result
241          x = parsley.makeGrammar("""
242          number = <digit+>:ds -> int(ds)
243          ws = ' '*
244          add = '+' ws number:n -> ('+', n)
245          sub = '-' ws number:n -> ('-', n)
246          addsub = ws (add | sub)
247          expr = number:left (addsub+:right -> calculate(left, right)
248                             | -> left)
249          """, {"calculate": calculate})
250          print x("4 + 5 - 6").expr()
251
252          3
253
254       Introducing this function lets us simplify even further: instead of us‐
255       ing addsub+, we can use addsub*, since calculate(left, []) will  return
256       left – so now expr becomes:
257
258          expr = number:left addsub*:right -> calculate(left, right)
259
260       So  now  let’s look at adding multiplication and division. Here, we run
261       into precedence rules: should “4 * 5 + 6” give us 26, or 44? The tradi‐
262       tional  choice  is  for  multiplication and division to take precedence
263       over addition and subtraction, so the answer should be  26.  We’ll  re‐
264       solve this by making sure multiplication and division happen before ad‐
265       dition and subtraction are considered:
266
267          def calculate(start, pairs):
268              result = start
269              for op, value in pairs:
270                  if op == '+':
271                      result += value
272                  elif op == '-':
273                      result -= value
274                  elif op == '*':
275                      result *= value
276                  elif op == '/':
277                      result /= value
278              return result
279          x = parsley.makeGrammar("""
280          number = <digit+>:ds -> int(ds)
281          ws = ' '*
282          add = '+' ws expr2:n -> ('+', n)
283          sub = '-' ws expr2:n -> ('-', n)
284          mul = '*' ws number:n -> ('*', n)
285          div = '/' ws number:n -> ('/', n)
286
287          addsub = ws (add | sub)
288          muldiv = ws (mul | div)
289
290          expr = expr2:left addsub*:right -> calculate(left, right)
291          expr2 = number:left muldiv*:right -> calculate(left, right)
292          """, {"calculate": calculate})
293          print x("4 * 5 + 6").expr()
294
295          26
296
297       Notice particularly that add, sub, and expr all call the expr2 rule now
298       where they called number before. This means that all the places where a
299       number was expected previously, a multiplication or division expression
300       can appear instead.
301
302       Finally  let’s  add parentheses, so you can override the precedence and
303       write “4 * (5 + 6)” when you do want 44. We’ll  do  this  by  adding  a
304       value  rule  that accepts either a number or an expression in parenthe‐
305       ses, and replace existing calls to number with calls to value.
306
307          def calculate(start, pairs):
308              result = start
309              for op, value in pairs:
310                  if op == '+':
311                      result += value
312                  elif op == '-':
313                      result -= value
314                  elif op == '*':
315                      result *= value
316                  elif op == '/':
317                      result /= value
318              return result
319          x = parsley.makeGrammar("""
320          number = <digit+>:ds -> int(ds)
321          parens = '(' ws expr:e ws ')' -> e
322          value = number | parens
323          ws = ' '*
324          add = '+' ws expr2:n -> ('+', n)
325          sub = '-' ws expr2:n -> ('-', n)
326          mul = '*' ws value:n -> ('*', n)
327          div = '/' ws value:n -> ('/', n)
328
329          addsub = ws (add | sub)
330          muldiv = ws (mul | div)
331
332          expr = expr2:left addsub*:right -> calculate(left, right)
333          expr2 = value:left muldiv*:right -> calculate(left, right)
334          """, {"calculate": calculate})
335
336          print x("4 * (5 + 6) + 1").expr()
337
338          45
339
340       And there you have it: a four-function calculator with  precedence  and
341       parentheses.
342

PARSLEY TUTORIAL PART II: PARSING STRUCTURED DATA

344       Now that you are familiar with the basics of Parsley syntax, let’s look
345       at a more realistic example: a JSON parser.
346
347       The JSON spec on http://json.org/ describes  the  format,  and  we  can
348       adapt its description to a parser. We’ll write the Parsley rules in the
349       same order as the grammar rules in the right sidebar on the JSON  site,
350       starting with the top-level rule, ‘object’.
351
352          object = ws '{' members:m ws '}' -> dict(m)
353
354       Parsley  defines  a builtin rule ws which consumes any spaces, tabs, or
355       newlines it can.
356
357       Since JSON objects are represented in Python as dicts, and dict takes a
358       list of pairs, we need a rule to collect name/value pairs inside an ob‐
359       ject expression.
360
361          members = (pair:first (ws ',' pair)*:rest -> [first] + rest)
362                    | -> []
363
364       This handles the three cases for object  contents:  one,  multiple,  or
365       zero  pairs.  A  name/value  pair  is  separated by a colon. We use the
366       builtin rule spaces to consume any whitespace after the colon:
367
368          pair = ws string:k ws ':' value:v -> (k, v)
369
370       Arrays, similarly, are sequences of array elements, and are represented
371       as Python lists.
372
373          array = '[' elements:xs ws ']' -> xs
374          elements = (value:first (ws ',' value)*:rest -> [first] + rest) | -> []
375
376       Values can be any JSON expression.
377
378          value = ws (string | number | object | array
379                     | 'true'  -> True
380                     | 'false' -> False
381                     | 'null'  -> None)
382
383       Strings are sequences of zero or more characters between double quotes.
384       Of course, we need to deal with escaped characters as well.  This  rule
385       introduces  the  operator  ~, which does negative lookahead; if the ex‐
386       pression following it succeeds, its parse will fail. If the  expression
387       fails,  the  rest  of the parse continues. Either way, no input will be
388       consumed.
389
390          string = '"' (escapedChar | ~'"' anything)*:c '"' -> ''.join(c)
391
392       This is a common pattern, so let’s examine it step by step.  This  will
393       match  leading  whitespace  and  then a double quote character. It then
394       matches zero or more characters. If it’s not an escapedChar (which will
395       start  with  a  backslash),  we check to see if it’s a double quote, in
396       which case we want to end the loop. If it’s  not  a  double  quote,  we
397       match  it  using the rule anything, which accepts a single character of
398       any kind, and continue. Finally, we match the ending double  quote  and
399       return  the  characters  in  the string. We cannot use the <> syntax in
400       this case because we don’t want a literal slice of the input – we  want
401       escape sequences to be replaced with the character they represent.
402
403       It’s  very  common to use ~ for “match until” situations where you want
404       to keep parsing only until an end marker is  found.  Similarly,  ~~  is
405       positive  lookahead: it succeed if its expression succeeds but not con‐
406       sume any input.
407
408       The escapedChar rule should not be too surprising: we match a backslash
409       then whatever escape code is given.
410
411          escapedChar = '\\' (('"' -> '"')    |('\\' -> '\\')
412                             |('/' -> '/')    |('b' -> '\b')
413                             |('f' -> '\f')   |('n' -> '\n')
414                             |('r' -> '\r')   |('t' -> '\t')
415                             |('\'' -> '\'')  | escapedUnicode)
416
417       Unicode  escapes (of the form \u2603) require matching four hex digits,
418       so we use the repetition operator {}, which works like +  or  *  except
419       taking either a {min, max} pair or simply a {number} indicating the ex‐
420       act number of repetitions.
421
422          hexdigit = :x ?(x in '0123456789abcdefABCDEF') -> x
423          escapedUnicode = 'u' <hexdigit{4}>:hs -> unichr(int(hs, 16))
424
425       With strings out of the way, we advance to numbers,  both  integer  and
426       floating-point.
427
428          number = spaces ('-' | -> ''):sign (intPart:ds (floatPart(sign ds)
429                                                         | -> int(sign + ds)))
430
431       Here  we vary from the json.org description a little and move sign han‐
432       dling up into the number rule. We match either an intPart followed by a
433       floatPart or just an intPart by itself.
434
435          digit = :x ?(x in '0123456789') -> x
436          digits = <digit*>
437          digit1_9 = :x ?(x in '123456789') -> x
438
439          intPart = (digit1_9:first digits:rest -> first + rest) | digit
440          floatPart :sign :ds = <('.' digits exponent?) | exponent>:tail
441                               -> float(sign + ds + tail)
442          exponent = ('e' | 'E') ('+' | '-')? digits
443
444       In  JSON,  multi-digit numbers cannot start with 0 (since that is Java‐
445       script’s syntax for octal numbers), so intPart uses digit1_9 to exclude
446       it in the first position.
447
448       The  floatPart  rule takes two parameters, sign and ds. Our number rule
449       passes values for these when it invokes floatPart, letting us avoid du‐
450       plication  of work within the rule. Note that pattern matching on argu‐
451       ments to rules works the same as on the string input to the parser.  In
452       this  case, we provide no pattern, just a name: :ds is the same as any‐
453       thing:ds.
454
455       (Also note that our float rule cheats a  little:  it  does  not  really
456       parse floating-point numbers, it merely recognizes them and passes them
457       to Python’s float builtin to actually produce the value.)
458
459       The full version of this parser and its test cases can be found in  the
460       examples directory in the Parsley distribution.
461

PARSLEY TUTORIAL PART III: PARSING NETWORK DATA

463       This tutorial assumes basic knowledge of writing Twisted TCP clients or
464       servers.
465
466   Basic parsing
467       Parsing data that comes in over the network can  be  difficult  due  to
468       that  there  is  no guarantee of receiving whole messages. Buffering is
469       often complicated by protocols switching between using fixed-width mes‐
470       sages  and  delimiters for framing. Fortunately, Parsley can remove all
471       of this tedium.
472
473       With   parsley.makeProtocol(),   Parsley   can   generate   a   Twisted
474       IProtocol-implementing class which will match incoming network data us‐
475       ing Parsley grammar rules. Before getting started with  makeProtocol(),
476       let’s  build  a grammar for netstrings. The netstrings protocol is very
477       simple:
478
479          4:spam,4:eggs,
480
481       This stream contains two netstrings: spam, and eggs. The data  is  pre‐
482       fixed  with one or more ASCII digits followed by a :, and suffixed with
483       a ,. So, a Parsley grammar to match a netstring would look like:
484
485          nonzeroDigit = digit:x ?(x != '0')
486          digits = <'0' | nonzeroDigit digit*>:i -> int(i)
487
488          netstring = digits:length ':' <anything{length}>:string ',' -> string
489
490
491       makeProtocol() takes, in  addition  to  a  grammar,  a  factory  for  a
492       “sender”  and a factory for a “receiver”. In the system of objects man‐
493       aged by the ParserProtocol, the sender is in charge of writing data  to
494       the  wire,  and  the  receiver  has methods called on it by the Parsley
495       rules. To demonstrate it, here is the final piece needed in the Parsley
496       grammar for netstrings:
497
498          receiveNetstring = netstring:string -> receiver.netstringReceived(string)
499
500
501       The  receiver  is  always  available in Parsley rules with the name re‐
502       ceiver, allowing Parsley rules to call methods on it.
503
504       When data is received over the wire, the ParserProtocol tries to  match
505       the  received  data  against  the current rule. If the current rule re‐
506       quires more data to finish matching, the ParserProtocol stops and waits
507       until more data comes in, then tries to continue matching. This repeats
508       until  the  current  rule  is  completely   matched,   and   then   the
509       ParserProtocol  starts  matching  any leftover data against the current
510       rule again.
511
512       One specifies the current rule by setting a  currentRule  attribute  on
513       the  receiver, which the ParserProtocol looks at before doing any pars‐
514       ing. Changing the current rule is addressed in the Switching rules sec‐
515       tion.
516
517       Since  the  ParserProtocol  will never modify the currentRule attribute
518       itself, the default behavior is to keep using the  same  rule.  Parsing
519       netstrings  doesn’t require any rule changing, so, the default behavior
520       of continuing to use the same rule is fine.
521
522       Both the sender factory and receiver factory are constructed  when  the
523       ParserProtocol’s  connection  is  established.  The sender factory is a
524       one-argument  callable  which  will  be  passed  the   ParserProtocol’s
525       Transport.  This allows the sender to send data over the transport. For
526       example:
527
528          class NetstringSender(object):
529              def __init__(self, transport):
530                  self.transport = transport
531
532              def sendNetstring(self, string):
533                  self.transport.write('%d:%s,' % (len(string), string))
534
535
536       The receiver factory is another one-argument callable which  is  passed
537       the  constructed  sender.  The  returned  object  must  at  least  have
538       prepareParsing()  and  finishParsing()  methods.   prepareParsing()  is
539       called  with  the  ParserProtocol  instance when a connection is estab‐
540       lished  (i.e.  in  the  connectionMade  of  the   ParserProtocol)   and
541       finishParsing() is called when a connection is closed (i.e. in the con‐
542       nectionLost of the ParserProtocol).
543
544       NOTE:
545          Both the receiver factory and its returned object’s prepareParsing()
546          are  called  at  in the ParserProtocol’s connectionMade method; this
547          separation is for ease of testing receivers.
548
549       To demonstrate a receiver, here is a simple receiver that receives net‐
550       strings and echos the same netstrings back:
551
552          class NetstringReceiver(object):
553              currentRule = 'receiveNetstring'
554
555              def __init__(self, sender):
556                  self.sender = sender
557
558              def prepareParsing(self, parser):
559                  pass
560
561              def finishParsing(self, reason):
562                  pass
563
564              def netstringReceived(self, string):
565                  self.sender.sendNetstring(string)
566
567
568       Putting it all together, the Protocol is constructed using the grammar,
569       sender factory, and receiver factory:
570
571
572
573          NetstringProtocol = makeProtocol(
574              grammar, NetstringSender, NetstringReceiver)
575
576
577
578
579       The complete script is also available for download.
580
581   Intermezzo: error reporting
582       If an exception is raised from within Parsley during  parsing,  whether
583       it’s  due  to input not matching the current rule or an exception being
584       raised from code the grammar calls, the connection will be  immediately
585       closed.  The  traceback will be captured as a Failure and passed to the
586       finishParsing() method of the receiver.
587
588       At present, there is no way to recover from failure.
589
590   Composing senders and receivers
591       The design of senders and receivers is intentional to make  composition
592       easy:  no subclassing is required. While the composition is easy enough
593       to do on your own, Parsley provides a function:  stack().  It  takes  a
594       base factory followed by zero or more wrappers.
595
596       Its  use  is  extremely  simple:  stack(x, y, z) will return a callable
597       suitable either as a sender or receiver factory which will, when called
598       with an argument, return x(y(z(argument))).
599
600       An example of wrapping a sender factory:
601
602          class NetstringReversalWrapper(object):
603              def __init__(self, wrapped):
604                  self.wrapped = wrapped
605
606              def sendNetstring(self, string):
607                  self.wrapped.sendNetstring(string[::-1])
608
609
610       And then, constructing the Protocol:
611
612          NetstringProtocol = makeProtocol(
613              grammar,
614              stack(NetstringReversalWrapper, NetstringSender),
615              NetstringReceiver)
616
617       A wrapper doesn’t need to call the same methods on the thing it’s wrap‐
618       ping.  Also note that in most cases, it’s important to forward  unknown
619       methods on to the wrapped object. An example of wrapping a receiver:
620
621          class NetstringSplittingWrapper(object):
622              def __init__(self, wrapped):
623                  self.wrapped = wrapped
624
625              def netstringReceived(self, string):
626                  splitpoint = len(string) // 2
627                  self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
628                  self.wrapped.netstringSecondHalfReceived(string[splitpoint:])
629
630              def __getattr__(self, attr):
631                  return getattr(self.wrapped, attr)
632
633
634       The corresponding receiver and again, constructing the Protocol:
635
636          class SplitNetstringReceiver(object):
637              currentRule = 'receiveNetstring'
638
639              def __init__(self, sender):
640                  self.sender = sender
641
642              def prepareParsing(self, parser):
643                  pass
644
645              def finishParsing(self, reason):
646                  pass
647
648              def netstringFirstHalfReceived(self, string):
649                  self.sender.sendNetstring(string)
650
651              def netstringSecondHalfReceived(self, string):
652                  pass
653
654
655          NetstringProtocol = makeProtocol(
656              grammar,
657              stack(NetstringReversalWrapper, NetstringSender),
658
659
660       The complete script is also available for download.
661
662   Switching rules
663       As  mentioned before, it’s possible to change the current rule. Imagine
664       a “netstrings2” protocol that looks like this:
665
666          3:foo,3;bar,4:spam,4;eggs,
667
668       That is, the protocol alternates between using : and using ; delimiting
669       data length and the data. The amended grammar would look something like
670       this:
671
672          nonzeroDigit = digit:x ?(x != '0')
673          digits = <'0' | nonzeroDigit digit*>:i -> int(i)
674          netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string
675
676          colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
677          semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)
678
679
680       Changing the current rule is as simple as changing the currentRule  at‐
681       tribute  on  the  receiver. So, the netstringReceived method could look
682       like this:
683
684       While changing the currentRule attribute can be done at any  time,  the
685       ParserProtocol  only examines the currentRule at the beginning of pars‐
686       ing and after a rule  has  finished  matching.  As  a  result,  if  the
687       currentRule  changes,  the  ParserProtocol  will wait until the current
688       rule is completely matched before switching rules.
689
690       The complete script is also available for download.
691

EXTENDING GRAMMARS AND INHERITANCE

693       warning
694              Unfinished
695
696       Another feature taken from OMeta is grammar inheritance. We can write a
697       grammar with rules that override ones in a parent. If we load the gram‐
698       mar from our calculator tutorial as Calc, we can extend  it  with  some
699       constants:
700
701          from parsley import makeGrammar
702          import math
703          import calc
704          calcGrammarEx = """
705          value = super | constant
706          constant = 'pi' -> math.pi
707                   | 'e' -> math.e
708          """
709          CalcEx = makeGrammar(calcGrammar, {"math": math}, extends=calc.Calc)
710
711       Invoking  the  rule  super calls the rule value in Calc. If it fails to
712       match, our new value rule attempts to match a constant name.
713

TERML

715       TermL (“term-ell”) is the Term Language, a small expression-based  lan‐
716       guage for representing arbitrary data in a simple structured format. It
717       is ideal for expressing abstract syntax trees (ASTs) and other kinds of
718       primitive data trees.
719
720   Creating Terms
721          >>> from terml.nodes import termMaker as t
722          >>> t.Term()
723          term('Term')
724
725       That’s it! We’ve created an empty term, Term, with nothing inside.
726
727          >>> t.Num(1)
728          term('Num(1)')
729          >>> t.Outer(t.Inner())
730          term('Outer(Inner)')
731
732       We  can  see  that  terms are not just namedtuple lookalikes. They have
733       their own internals and store data in a  slightly  different  and  more
734       structured way than a normal tuple.
735
736   Parsing Terms
737       Parsley  can  parse  terms  from streams. Terms can contain any kind of
738       parseable data, including other terms. Returning to the ubiquitous cal‐
739       culator example:
740
741          add = Add(:x, :y) -> x + y
742
743       Here this rule matches a term called Add which has two components, bind
744       those components to a couple of names (x and y), and return their  sum.
745       If this rule were applied to a term like Add(3, 5), it would return 8.
746
747       Terms  can  be  nested, too. Here’s an example that performs a slightly
748       contrived match on a negated term inside an addition:
749
750          add_negate = Add(:x, Negate(:y)) -> x - y
751

PARSLEY REFERENCE

753   Basic syntax
754       foo = ....:
755              Define a rule named foo.
756
757       expr1 expr2:
758              Match expr1, and then match expr2 if it succeeds, returning  the
759              value of expr2. Like Python's and.
760
761       expr1 | expr2:
762              Try  to  match  expr1 --- if it fails, match expr2 instead. Like
763              Python's or.
764
765       expr*: Match expr zero or more times, returning a list of matches.
766
767       expr+: Match expr one or more times, returning a list of matches.
768
769       expr?: Try to match expr. Returns None if it fails to match.
770
771       expr{n, m}:
772              Match expr at least n times, and no more than m times.
773
774       expr{n}:
775              Match expr n times exactly.
776
777       ~expr: Negative lookahead. Fails if the next item in the input  matches
778              expr. Consumes no input.
779
780       ~~expr:
781              Positive lookahead. Fails if the next item in the input does not
782              match expr. Consumes no input.
783
784       ruleName or ruleName(arg1 arg2 etc):
785              Call the rule ruleName, possibly with args.
786
787       'x':   Match the literal character 'x'.
788
789       <expr>:
790              Returns the string consumed by matching expr. Good for  tokeniz‐
791              ing rules.
792
793       expr:name:
794              Bind the result of expr to the local variable name.
795
796       -> pythonExpression:
797              Evaluate  the given Python expression and return its result. Can
798              be used inside parentheses too!
799
800       !(pythonExpression):
801              Invoke a Python expression as an action.
802
803       ?(pythonExpression):
804              Fail if the Python expression is false, Returns True otherwise.
805
806       expr ^(CustomLabel):
807              If the expr fails, the exception raised will  contain  CustomLa‐
808              bel.   Good  for  providing  more context when a rule is broken.
809              CustomLabel can contain any character other than "(" and ")".
810
811       Comments like Python comments are supported as well,  starting  with  #
812       and extending to the end of the line.
813
814   Python API
815   Protocol parsing API
816       class ometa.protocol.ParserProtocol
817              The  Twisted Protocol subclass used for parsing stream protocols
818              using Parsley. It has two public attributes:
819
820              sender After the connection is established, this attribute  will
821                     refer  to the sender created by the sender factory of the
822                     ParserProtocol.
823
824              receiver
825                     After the connection is established, this attribute  will
826                     refer  to the receiver created by the receiver factory of
827                     the ParserProtocol.
828
829              It's  common  to  also  add   a   factory   attribute   to   the
830              ParserProtocol from its factory's buildProtocol method, but this
831              isn't strictly required or guaranteed to be present.
832
833              Subclassing or instantiating ParserProtocol  is  not  necessary;
834              makeProtocol() is sufficient and requires less boilerplate.
835
836       class ometa.protocol.Receiver
837              Receiver  is not a real class but is used here for demonstration
838              purposes to indicate the required API.
839
840              currentRule
841                     ParserProtocol examines the currentRule attribute at  the
842                     beginning  of  parsing as well as after every time a rule
843                     has completely matched. At these times, the rule with the
844                     same name as the value of currentRule will be selected to
845                     start parsing the incoming stream of data.
846
847              prepareParsing(parserProtocol)
848                     prepareParsing() is called after the  ParserProtocol  has
849                     established    a    connection,   and   is   passed   the
850                     ParserProtocol instance itself.
851
852                     Parameters
853                            parserProtocol -- An instance of ProtocolParser.
854
855              finishParsing(reason)
856                     finishParsing() is called if an exception was raised dur‐
857                     ing parsing, or when the ParserProtocol has lost its con‐
858                     nection, whichever comes first. It will  only  be  called
859                     once.
860
861                     An exception raised during parsing can be due to incoming
862                     data that doesn't match the current rule or an  exception
863                     raised calling python code during matching.
864
865                     Parameters
866                            reason -- A Failure encapsulating the reason pars‐
867                            ing has ended.
868
869       Senders do not have any required API as ParserProtocol will never  call
870       methods on a sender.
871
872   Built-in Parsley Rules
873       anything:
874              Matches a single character from the input.
875
876       letter:
877              Matches a single ASCII letter.
878
879       digit: Matches a decimal digit.
880
881       letterOrDigit:
882              Combines the above.
883
884       end:   Matches the end of input.
885
886       ws:    Matches zero or more spaces, tabs, or newlines.
887
888       exactly(char):
889              Matches the character char.
890

AUTHOR

892       Allen Short
893
895       2023, Allen Short
896
897
898
899
9001.3                              Aug 15, 2023                       PARSLEY(1)
Impressum