1PARSLEY(1) Parsley PARSLEY(1)
2
3
4
6 parsley - Parsley Documentation
7
8 Contents:
9
11 From Regular Expressions To Grammars
12 Parsley is a pattern matching and parsing tool for Python programmers.
13
14 Most Python programmers are familiar with regular expressions, as pro‐
15 vided by Python’s re module. To use it, you provide a string that de‐
16 scribes the pattern you want to match, and your input.
17
18 For example:
19
20 >>> import re
21 >>> x = re.compile("a(b|c)d+e")
22 >>> x.match("abddde")
23 <_sre.SRE_Match object at 0x7f587af54af8>
24
25 You can do exactly the same sort of thing in Parsley:
26
27 >>> import parsley
28 >>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
29 >>> x("abdde").foo()
30 'e'
31
32 From this small example, a couple differences between regular expres‐
33 sions and Parsley grammars can be seen:
34
35 Parsley Grammars Have Named Rules
36 A Parsley grammar can have many rules, and each has a name. The example
37 above has a single rule named foo. Rules can call each other; calling
38 rules in Parsley works like calling functions in Python. Here is an‐
39 other way to write the grammar above:
40
41 foo = 'a' baz 'd'+ 'e'
42 baz = 'b' | 'c'
43
44 Parsley Grammars Are Expressions
45 Calling match for a regular expression returns a match object if the
46 match succeeds or None if it fails. Parsley parsers return the value of
47 last expression in the rule. Behind the scenes, Parsley turns each rule
48 in your grammar into Python methods. In pseudo-Python code, it looks
49 something like this:
50
51 def foo(self):
52 match('a')
53 self.baz()
54 match_one_or_more('d')
55 return match('e')
56
57 def baz(self):
58 return match('b') or match('c')
59
60 The value of the last expression in the rule is what the rule returns.
61 This is why our example returns ‘e’.
62
63 The similarities to regular expressions pretty much end here, though.
64 Having multiple named rules composed of expressions makes for a much
65 more powerful tool, and now we’re going to look at some more features
66 that go even further.
67
68 Rules Can Embed Python Expressions
69 Since these rules just turn into Python code eventually, we can stick
70 some Python code into them ourselves. This is particularly useful for
71 changing the return value of a rule. The Parsley expression for this is
72 ->. We can also bind the results of expressions to variable names and
73 use them in Python code. So things like this are possible:
74
75 x = parsley.makeGrammar("""
76 foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
77 baz = 'b' | 'c'
78 """, {})
79 print x("abdde").foo()
80
81 ('a', 'b')
82
83 Literal match expressions like ‘a’ return the character they match. Us‐
84 ing a colon and a variable name after an expression is like assignment
85 in Python. As a result, we can use those names in a Python expression -
86 in this case, creating a tuple.
87
88 Another way to use Python code in a rule is to write custom tests for
89 matching. Sometimes it’s more convenient to write some Python that de‐
90 termines if a rule matches than to stick to Parsley expressions alone.
91 For those cases, we can use ?(). Here, we use the builtin rule anything
92 to match a single character, then a Python predicate to decide if it’s
93 the one we want:
94
95 digit = anything:x ?(x in '0123456789') -> x
96
97 This rule digit will match any decimal digit. We need the -> x on the
98 end to return the character rather than the value of the predicate ex‐
99 pression, which is just True.
100
101 Repeated Matches Make Lists
102 Like regular expressions, Parsley supports repeating matches. You can
103 match an expression zero or more times with ‘* ‘, one or more times
104 with ‘+’, and a specific number of times with ‘{n, m}’ or just ‘{n}’.
105 Since all expressions in Parsley return a value, these repetition oper‐
106 ators return a list containing each match they made.
107
108 x = parsley.makeGrammar("""
109 digit = anything:x ?(x in '0123456789') -> x
110 number = digit+
111 """, {})
112 print x("314159").number()
113
114 ['3', '1', '4', '1', '5', '9']
115
116 The number rule repeatedly matches digit and collects the matches into
117 a list. This gets us part way to turning a string like 314159 into an
118 integer. All we need now is to turn the list back into a string and
119 call int():
120
121 x = parsley.makeGrammar("""
122 digit = anything:x ?(x in '0123456789') -> x
123 number = digit+:ds -> int(''.join(ds))
124 """, {})
125 print x("8675309").number()
126
127 8675309
128
129 Collecting Chunks Of Input
130 If it seemed kind of strange to break our input string up into a list
131 and then reassemble it into a string using join, you’re not alone.
132 Parsley has a shortcut for this since it’s a common case: you can use
133 <> around a rule to make it return the slice of input it consumes, ig‐
134 noring the actual return value of the rule. For example:
135
136 x = parsley.makeGrammar("""
137 digit = anything:x ?(x in '0123456789')
138 number = <digit+>:ds -> int(ds)
139 """, {})
140 print x("11235").number()
141
142 11235
143
144 Here, <digit+> returns the string “11235”, since that’s the portion of
145 the input that digit+ matched. (In this case it’s the entire input, but
146 we’ll see some more complex cases soon.) Since it ignores the list re‐
147 turned by digit+, leaving the -> x out of digit doesn’t change the re‐
148 sult.
149
150 Building A Calculator
151 Now let’s look at using these rules in a more complicated parser. We
152 have support for parsing numbers; let’s do addition, as well.
153
154 x = parsley.makeGrammar("""
155 digit = anything:x ?(x in '0123456789')
156 number = <digit+>:ds -> int(ds)
157 expr = number:left ( '+' number:right -> left + right
158 | -> left)
159 """, {})
160 print x("17+34").expr()
161 print x("18").expr()
162
163 51
164 18
165
166 Parentheses group expressions just like in Python. the ‘|’ operator is
167 like or in Python - it short-circuits. It tries each expression until
168 it finds one that matches. For “17+34”, the number rule matches “17”,
169 then Parsley tries to match + followed by another number. Since “+” and
170 “34” are the next things in the input, those match, and it then runs
171 the Python expression left + right and returns its value. For the input
172 “18” it does the same, but + does not match, so Parsley tries the next
173 thing after |. Since this is just a Python expression, the match suc‐
174 ceeds and the number 18 is returned.
175
176 Now let’s add subtraction:
177
178 digit = anything:x ?(x in '0123456789')
179 number = <digit+>:ds -> int(ds)
180 expr = number:left ( '+' number:right -> left + right
181 | '-' number:right -> left - right
182 | -> left)
183
184 This will accept things like ‘5-4’ now.
185
186 Since parsing numbers is so common and useful, Parsley actually has
187 ‘digit’ as a builtin rule, so we don’t even need to define it our‐
188 selves. We’ll leave it out in further examples and rely on the version
189 Parsley provides.
190
191 Normally we like to allow whitespace in our expressions, so let’s add
192 some support for spaces:
193
194 number = <digit+>:ds -> int(ds)
195 ws = ' '*
196 expr = number:left ws ('+' ws number:right -> left + right
197 |'-' ws number:right -> left - right
198 | -> left)
199
200 Now we can handle “17 +34”, “2 - 1”, etc.
201
202 We could go ahead and add multiplication and division here (and hope‐
203 fully it’s obvious how that would work), but let’s complicate things
204 further and allow multiple operations in our expressions – things like
205 “1 - 2 + 3”.
206
207 There’s a couple different ways to do this. Possibly the easiest is to
208 build a list of numbers and operations, then do the math.:
209
210 x = parsley.makeGrammar("""
211 number = <digit+>:ds -> int(ds)
212 ws = ' '*
213 add = '+' ws number:n -> ('+', n)
214 sub = '-' ws number:n -> ('-', n)
215 addsub = ws (add | sub)
216 expr = number:left (addsub+:right -> right
217 | -> left)
218 """, {})
219 print x("1 + 2 - 3").expr()
220
221 [('+', 2), ('-, 3)]
222
223 Oops, this is only half the job done. We’re collecting the operators
224 and values, but now we need to do the actual calculation. The easiest
225 way to do it is probably to write a Python function and call it from
226 inside the grammar.
227
228 So far we have been passing an empty dict as the second argument to
229 makeGrammar. This is a dict of variable bindings that can be used in
230 Python expressions in the grammar. So we can pass Python objects, such
231 as functions, this way:
232
233 def calculate(start, pairs):
234 result = start
235 for op, value in pairs:
236 if op == '+':
237 result += value
238 elif op == '-':
239 result -= value
240 return result
241 x = parsley.makeGrammar("""
242 number = <digit+>:ds -> int(ds)
243 ws = ' '*
244 add = '+' ws number:n -> ('+', n)
245 sub = '-' ws number:n -> ('-', n)
246 addsub = ws (add | sub)
247 expr = number:left (addsub+:right -> calculate(left, right)
248 | -> left)
249 """, {"calculate": calculate})
250 print x("4 + 5 - 6").expr()
251
252 3
253
254 Introducing this function lets us simplify even further: instead of us‐
255 ing addsub+, we can use addsub*, since calculate(left, []) will return
256 left – so now expr becomes:
257
258 expr = number:left addsub*:right -> calculate(left, right)
259
260 So now let’s look at adding multiplication and division. Here, we run
261 into precedence rules: should “4 * 5 + 6” give us 26, or 44? The tradi‐
262 tional choice is for multiplication and division to take precedence
263 over addition and subtraction, so the answer should be 26. We’ll re‐
264 solve this by making sure multiplication and division happen before ad‐
265 dition and subtraction are considered:
266
267 def calculate(start, pairs):
268 result = start
269 for op, value in pairs:
270 if op == '+':
271 result += value
272 elif op == '-':
273 result -= value
274 elif op == '*':
275 result *= value
276 elif op == '/':
277 result /= value
278 return result
279 x = parsley.makeGrammar("""
280 number = <digit+>:ds -> int(ds)
281 ws = ' '*
282 add = '+' ws expr2:n -> ('+', n)
283 sub = '-' ws expr2:n -> ('-', n)
284 mul = '*' ws number:n -> ('*', n)
285 div = '/' ws number:n -> ('/', n)
286
287 addsub = ws (add | sub)
288 muldiv = ws (mul | div)
289
290 expr = expr2:left addsub*:right -> calculate(left, right)
291 expr2 = number:left muldiv*:right -> calculate(left, right)
292 """, {"calculate": calculate})
293 print x("4 * 5 + 6").expr()
294
295 26
296
297 Notice particularly that add, sub, and expr all call the expr2 rule now
298 where they called number before. This means that all the places where a
299 number was expected previously, a multiplication or division expression
300 can appear instead.
301
302 Finally let’s add parentheses, so you can override the precedence and
303 write “4 * (5 + 6)” when you do want 44. We’ll do this by adding a
304 value rule that accepts either a number or an expression in parenthe‐
305 ses, and replace existing calls to number with calls to value.
306
307 def calculate(start, pairs):
308 result = start
309 for op, value in pairs:
310 if op == '+':
311 result += value
312 elif op == '-':
313 result -= value
314 elif op == '*':
315 result *= value
316 elif op == '/':
317 result /= value
318 return result
319 x = parsley.makeGrammar("""
320 number = <digit+>:ds -> int(ds)
321 parens = '(' ws expr:e ws ')' -> e
322 value = number | parens
323 ws = ' '*
324 add = '+' ws expr2:n -> ('+', n)
325 sub = '-' ws expr2:n -> ('-', n)
326 mul = '*' ws value:n -> ('*', n)
327 div = '/' ws value:n -> ('/', n)
328
329 addsub = ws (add | sub)
330 muldiv = ws (mul | div)
331
332 expr = expr2:left addsub*:right -> calculate(left, right)
333 expr2 = value:left muldiv*:right -> calculate(left, right)
334 """, {"calculate": calculate})
335
336 print x("4 * (5 + 6) + 1").expr()
337
338 45
339
340 And there you have it: a four-function calculator with precedence and
341 parentheses.
342
344 Now that you are familiar with the basics of Parsley syntax, let’s look
345 at a more realistic example: a JSON parser.
346
347 The JSON spec on http://json.org/ describes the format, and we can
348 adapt its description to a parser. We’ll write the Parsley rules in the
349 same order as the grammar rules in the right sidebar on the JSON site,
350 starting with the top-level rule, ‘object’.
351
352 object = ws '{' members:m ws '}' -> dict(m)
353
354 Parsley defines a builtin rule ws which consumes any spaces, tabs, or
355 newlines it can.
356
357 Since JSON objects are represented in Python as dicts, and dict takes a
358 list of pairs, we need a rule to collect name/value pairs inside an ob‐
359 ject expression.
360
361 members = (pair:first (ws ',' pair)*:rest -> [first] + rest)
362 | -> []
363
364 This handles the three cases for object contents: one, multiple, or
365 zero pairs. A name/value pair is separated by a colon. We use the
366 builtin rule spaces to consume any whitespace after the colon:
367
368 pair = ws string:k ws ':' value:v -> (k, v)
369
370 Arrays, similarly, are sequences of array elements, and are represented
371 as Python lists.
372
373 array = '[' elements:xs ws ']' -> xs
374 elements = (value:first (ws ',' value)*:rest -> [first] + rest) | -> []
375
376 Values can be any JSON expression.
377
378 value = ws (string | number | object | array
379 | 'true' -> True
380 | 'false' -> False
381 | 'null' -> None)
382
383 Strings are sequences of zero or more characters between double quotes.
384 Of course, we need to deal with escaped characters as well. This rule
385 introduces the operator ~, which does negative lookahead; if the ex‐
386 pression following it succeeds, its parse will fail. If the expression
387 fails, the rest of the parse continues. Either way, no input will be
388 consumed.
389
390 string = '"' (escapedChar | ~'"' anything)*:c '"' -> ''.join(c)
391
392 This is a common pattern, so let’s examine it step by step. This will
393 match leading whitespace and then a double quote character. It then
394 matches zero or more characters. If it’s not an escapedChar (which will
395 start with a backslash), we check to see if it’s a double quote, in
396 which case we want to end the loop. If it’s not a double quote, we
397 match it using the rule anything, which accepts a single character of
398 any kind, and continue. Finally, we match the ending double quote and
399 return the characters in the string. We cannot use the <> syntax in
400 this case because we don’t want a literal slice of the input – we want
401 escape sequences to be replaced with the character they represent.
402
403 It’s very common to use ~ for “match until” situations where you want
404 to keep parsing only until an end marker is found. Similarly, ~~ is
405 positive lookahead: it succeed if its expression succeeds but not con‐
406 sume any input.
407
408 The escapedChar rule should not be too surprising: we match a backslash
409 then whatever escape code is given.
410
411 escapedChar = '\\' (('"' -> '"') |('\\' -> '\\')
412 |('/' -> '/') |('b' -> '\b')
413 |('f' -> '\f') |('n' -> '\n')
414 |('r' -> '\r') |('t' -> '\t')
415 |('\'' -> '\'') | escapedUnicode)
416
417 Unicode escapes (of the form \u2603) require matching four hex digits,
418 so we use the repetition operator {}, which works like + or * except
419 taking either a {min, max} pair or simply a {number} indicating the ex‐
420 act number of repetitions.
421
422 hexdigit = :x ?(x in '0123456789abcdefABCDEF') -> x
423 escapedUnicode = 'u' <hexdigit{4}>:hs -> unichr(int(hs, 16))
424
425 With strings out of the way, we advance to numbers, both integer and
426 floating-point.
427
428 number = spaces ('-' | -> ''):sign (intPart:ds (floatPart(sign ds)
429 | -> int(sign + ds)))
430
431 Here we vary from the json.org description a little and move sign han‐
432 dling up into the number rule. We match either an intPart followed by a
433 floatPart or just an intPart by itself.
434
435 digit = :x ?(x in '0123456789') -> x
436 digits = <digit*>
437 digit1_9 = :x ?(x in '123456789') -> x
438
439 intPart = (digit1_9:first digits:rest -> first + rest) | digit
440 floatPart :sign :ds = <('.' digits exponent?) | exponent>:tail
441 -> float(sign + ds + tail)
442 exponent = ('e' | 'E') ('+' | '-')? digits
443
444 In JSON, multi-digit numbers cannot start with 0 (since that is Java‐
445 script’s syntax for octal numbers), so intPart uses digit1_9 to exclude
446 it in the first position.
447
448 The floatPart rule takes two parameters, sign and ds. Our number rule
449 passes values for these when it invokes floatPart, letting us avoid du‐
450 plication of work within the rule. Note that pattern matching on argu‐
451 ments to rules works the same as on the string input to the parser. In
452 this case, we provide no pattern, just a name: :ds is the same as any‐
453 thing:ds.
454
455 (Also note that our float rule cheats a little: it does not really
456 parse floating-point numbers, it merely recognizes them and passes them
457 to Python’s float builtin to actually produce the value.)
458
459 The full version of this parser and its test cases can be found in the
460 examples directory in the Parsley distribution.
461
463 This tutorial assumes basic knowledge of writing Twisted TCP clients or
464 servers.
465
466 Basic parsing
467 Parsing data that comes in over the network can be difficult due to
468 that there is no guarantee of receiving whole messages. Buffering is
469 often complicated by protocols switching between using fixed-width mes‐
470 sages and delimiters for framing. Fortunately, Parsley can remove all
471 of this tedium.
472
473 With parsley.makeProtocol(), Parsley can generate a Twisted
474 IProtocol-implementing class which will match incoming network data us‐
475 ing Parsley grammar rules. Before getting started with makeProtocol(),
476 let’s build a grammar for netstrings. The netstrings protocol is very
477 simple:
478
479 4:spam,4:eggs,
480
481 This stream contains two netstrings: spam, and eggs. The data is pre‐
482 fixed with one or more ASCII digits followed by a :, and suffixed with
483 a ,. So, a Parsley grammar to match a netstring would look like:
484
485 nonzeroDigit = digit:x ?(x != '0')
486 digits = <'0' | nonzeroDigit digit*>:i -> int(i)
487
488 netstring = digits:length ':' <anything{length}>:string ',' -> string
489
490
491 makeProtocol() takes, in addition to a grammar, a factory for a
492 “sender” and a factory for a “receiver”. In the system of objects man‐
493 aged by the ParserProtocol, the sender is in charge of writing data to
494 the wire, and the receiver has methods called on it by the Parsley
495 rules. To demonstrate it, here is the final piece needed in the Parsley
496 grammar for netstrings:
497
498 receiveNetstring = netstring:string -> receiver.netstringReceived(string)
499
500
501 The receiver is always available in Parsley rules with the name re‐
502 ceiver, allowing Parsley rules to call methods on it.
503
504 When data is received over the wire, the ParserProtocol tries to match
505 the received data against the current rule. If the current rule re‐
506 quires more data to finish matching, the ParserProtocol stops and waits
507 until more data comes in, then tries to continue matching. This repeats
508 until the current rule is completely matched, and then the
509 ParserProtocol starts matching any leftover data against the current
510 rule again.
511
512 One specifies the current rule by setting a currentRule attribute on
513 the receiver, which the ParserProtocol looks at before doing any pars‐
514 ing. Changing the current rule is addressed in the Switching rules sec‐
515 tion.
516
517 Since the ParserProtocol will never modify the currentRule attribute
518 itself, the default behavior is to keep using the same rule. Parsing
519 netstrings doesn’t require any rule changing, so, the default behavior
520 of continuing to use the same rule is fine.
521
522 Both the sender factory and receiver factory are constructed when the
523 ParserProtocol’s connection is established. The sender factory is a
524 one-argument callable which will be passed the ParserProtocol’s
525 Transport. This allows the sender to send data over the transport. For
526 example:
527
528 class NetstringSender(object):
529 def __init__(self, transport):
530 self.transport = transport
531
532 def sendNetstring(self, string):
533 self.transport.write('%d:%s,' % (len(string), string))
534
535
536 The receiver factory is another one-argument callable which is passed
537 the constructed sender. The returned object must at least have
538 prepareParsing() and finishParsing() methods. prepareParsing() is
539 called with the ParserProtocol instance when a connection is estab‐
540 lished (i.e. in the connectionMade of the ParserProtocol) and
541 finishParsing() is called when a connection is closed (i.e. in the con‐
542 nectionLost of the ParserProtocol).
543
544 NOTE:
545 Both the receiver factory and its returned object’s prepareParsing()
546 are called at in the ParserProtocol’s connectionMade method; this
547 separation is for ease of testing receivers.
548
549 To demonstrate a receiver, here is a simple receiver that receives net‐
550 strings and echos the same netstrings back:
551
552 class NetstringReceiver(object):
553 currentRule = 'receiveNetstring'
554
555 def __init__(self, sender):
556 self.sender = sender
557
558 def prepareParsing(self, parser):
559 pass
560
561 def finishParsing(self, reason):
562 pass
563
564 def netstringReceived(self, string):
565 self.sender.sendNetstring(string)
566
567
568 Putting it all together, the Protocol is constructed using the grammar,
569 sender factory, and receiver factory:
570
571
572
573 NetstringProtocol = makeProtocol(
574 grammar, NetstringSender, NetstringReceiver)
575
576
577
578
579 The complete script is also available for download.
580
581 Intermezzo: error reporting
582 If an exception is raised from within Parsley during parsing, whether
583 it’s due to input not matching the current rule or an exception being
584 raised from code the grammar calls, the connection will be immediately
585 closed. The traceback will be captured as a Failure and passed to the
586 finishParsing() method of the receiver.
587
588 At present, there is no way to recover from failure.
589
590 Composing senders and receivers
591 The design of senders and receivers is intentional to make composition
592 easy: no subclassing is required. While the composition is easy enough
593 to do on your own, Parsley provides a function: stack(). It takes a
594 base factory followed by zero or more wrappers.
595
596 Its use is extremely simple: stack(x, y, z) will return a callable
597 suitable either as a sender or receiver factory which will, when called
598 with an argument, return x(y(z(argument))).
599
600 An example of wrapping a sender factory:
601
602 class NetstringReversalWrapper(object):
603 def __init__(self, wrapped):
604 self.wrapped = wrapped
605
606 def sendNetstring(self, string):
607 self.wrapped.sendNetstring(string[::-1])
608
609
610 And then, constructing the Protocol:
611
612 NetstringProtocol = makeProtocol(
613 grammar,
614 stack(NetstringReversalWrapper, NetstringSender),
615 NetstringReceiver)
616
617 A wrapper doesn’t need to call the same methods on the thing it’s wrap‐
618 ping. Also note that in most cases, it’s important to forward unknown
619 methods on to the wrapped object. An example of wrapping a receiver:
620
621 class NetstringSplittingWrapper(object):
622 def __init__(self, wrapped):
623 self.wrapped = wrapped
624
625 def netstringReceived(self, string):
626 splitpoint = len(string) // 2
627 self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
628 self.wrapped.netstringSecondHalfReceived(string[splitpoint:])
629
630 def __getattr__(self, attr):
631 return getattr(self.wrapped, attr)
632
633
634 The corresponding receiver and again, constructing the Protocol:
635
636 class SplitNetstringReceiver(object):
637 currentRule = 'receiveNetstring'
638
639 def __init__(self, sender):
640 self.sender = sender
641
642 def prepareParsing(self, parser):
643 pass
644
645 def finishParsing(self, reason):
646 pass
647
648 def netstringFirstHalfReceived(self, string):
649 self.sender.sendNetstring(string)
650
651 def netstringSecondHalfReceived(self, string):
652 pass
653
654
655 NetstringProtocol = makeProtocol(
656 grammar,
657 stack(NetstringReversalWrapper, NetstringSender),
658
659
660 The complete script is also available for download.
661
662 Switching rules
663 As mentioned before, it’s possible to change the current rule. Imagine
664 a “netstrings2” protocol that looks like this:
665
666 3:foo,3;bar,4:spam,4;eggs,
667
668 That is, the protocol alternates between using : and using ; delimiting
669 data length and the data. The amended grammar would look something like
670 this:
671
672 nonzeroDigit = digit:x ?(x != '0')
673 digits = <'0' | nonzeroDigit digit*>:i -> int(i)
674 netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string
675
676 colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
677 semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)
678
679
680 Changing the current rule is as simple as changing the currentRule at‐
681 tribute on the receiver. So, the netstringReceived method could look
682 like this:
683
684 While changing the currentRule attribute can be done at any time, the
685 ParserProtocol only examines the currentRule at the beginning of pars‐
686 ing and after a rule has finished matching. As a result, if the
687 currentRule changes, the ParserProtocol will wait until the current
688 rule is completely matched before switching rules.
689
690 The complete script is also available for download.
691
693 warning
694 Unfinished
695
696 Another feature taken from OMeta is grammar inheritance. We can write a
697 grammar with rules that override ones in a parent. If we load the gram‐
698 mar from our calculator tutorial as Calc, we can extend it with some
699 constants:
700
701 from parsley import makeGrammar
702 import math
703 import calc
704 calcGrammarEx = """
705 value = super | constant
706 constant = 'pi' -> math.pi
707 | 'e' -> math.e
708 """
709 CalcEx = makeGrammar(calcGrammar, {"math": math}, extends=calc.Calc)
710
711 Invoking the rule super calls the rule value in Calc. If it fails to
712 match, our new value rule attempts to match a constant name.
713
715 TermL (“term-ell”) is the Term Language, a small expression-based lan‐
716 guage for representing arbitrary data in a simple structured format. It
717 is ideal for expressing abstract syntax trees (ASTs) and other kinds of
718 primitive data trees.
719
720 Creating Terms
721 >>> from terml.nodes import termMaker as t
722 >>> t.Term()
723 term('Term')
724
725 That’s it! We’ve created an empty term, Term, with nothing inside.
726
727 >>> t.Num(1)
728 term('Num(1)')
729 >>> t.Outer(t.Inner())
730 term('Outer(Inner)')
731
732 We can see that terms are not just namedtuple lookalikes. They have
733 their own internals and store data in a slightly different and more
734 structured way than a normal tuple.
735
736 Parsing Terms
737 Parsley can parse terms from streams. Terms can contain any kind of
738 parseable data, including other terms. Returning to the ubiquitous cal‐
739 culator example:
740
741 add = Add(:x, :y) -> x + y
742
743 Here this rule matches a term called Add which has two components, bind
744 those components to a couple of names (x and y), and return their sum.
745 If this rule were applied to a term like Add(3, 5), it would return 8.
746
747 Terms can be nested, too. Here’s an example that performs a slightly
748 contrived match on a negated term inside an addition:
749
750 add_negate = Add(:x, Negate(:y)) -> x - y
751
753 Basic syntax
754 foo = ....:
755 Define a rule named foo.
756
757 expr1 expr2:
758 Match expr1, and then match expr2 if it succeeds, returning the
759 value of expr2. Like Python's and.
760
761 expr1 | expr2:
762 Try to match expr1 --- if it fails, match expr2 instead. Like
763 Python's or.
764
765 expr*: Match expr zero or more times, returning a list of matches.
766
767 expr+: Match expr one or more times, returning a list of matches.
768
769 expr?: Try to match expr. Returns None if it fails to match.
770
771 expr{n, m}:
772 Match expr at least n times, and no more than m times.
773
774 expr{n}:
775 Match expr n times exactly.
776
777 ~expr: Negative lookahead. Fails if the next item in the input matches
778 expr. Consumes no input.
779
780 ~~expr:
781 Positive lookahead. Fails if the next item in the input does not
782 match expr. Consumes no input.
783
784 ruleName or ruleName(arg1 arg2 etc):
785 Call the rule ruleName, possibly with args.
786
787 'x': Match the literal character 'x'.
788
789 <expr>:
790 Returns the string consumed by matching expr. Good for tokeniz‐
791 ing rules.
792
793 expr:name:
794 Bind the result of expr to the local variable name.
795
796 -> pythonExpression:
797 Evaluate the given Python expression and return its result. Can
798 be used inside parentheses too!
799
800 !(pythonExpression):
801 Invoke a Python expression as an action.
802
803 ?(pythonExpression):
804 Fail if the Python expression is false, Returns True otherwise.
805
806 expr ^(CustomLabel):
807 If the expr fails, the exception raised will contain CustomLa‐
808 bel. Good for providing more context when a rule is broken.
809 CustomLabel can contain any character other than "(" and ")".
810
811 Comments like Python comments are supported as well, starting with #
812 and extending to the end of the line.
813
814 Python API
815 Protocol parsing API
816 class ometa.protocol.ParserProtocol
817 The Twisted Protocol subclass used for parsing stream protocols
818 using Parsley. It has two public attributes:
819
820 sender After the connection is established, this attribute will
821 refer to the sender created by the sender factory of the
822 ParserProtocol.
823
824 receiver
825 After the connection is established, this attribute will
826 refer to the receiver created by the receiver factory of
827 the ParserProtocol.
828
829 It's common to also add a factory attribute to the
830 ParserProtocol from its factory's buildProtocol method, but this
831 isn't strictly required or guaranteed to be present.
832
833 Subclassing or instantiating ParserProtocol is not necessary;
834 makeProtocol() is sufficient and requires less boilerplate.
835
836 class ometa.protocol.Receiver
837 Receiver is not a real class but is used here for demonstration
838 purposes to indicate the required API.
839
840 currentRule
841 ParserProtocol examines the currentRule attribute at the
842 beginning of parsing as well as after every time a rule
843 has completely matched. At these times, the rule with the
844 same name as the value of currentRule will be selected to
845 start parsing the incoming stream of data.
846
847 prepareParsing(parserProtocol)
848 prepareParsing() is called after the ParserProtocol has
849 established a connection, and is passed the
850 ParserProtocol instance itself.
851
852 Parameters
853 parserProtocol -- An instance of ProtocolParser.
854
855 finishParsing(reason)
856 finishParsing() is called if an exception was raised dur‐
857 ing parsing, or when the ParserProtocol has lost its con‐
858 nection, whichever comes first. It will only be called
859 once.
860
861 An exception raised during parsing can be due to incoming
862 data that doesn't match the current rule or an exception
863 raised calling python code during matching.
864
865 Parameters
866 reason -- A Failure encapsulating the reason pars‐
867 ing has ended.
868
869 Senders do not have any required API as ParserProtocol will never call
870 methods on a sender.
871
872 Built-in Parsley Rules
873 anything:
874 Matches a single character from the input.
875
876 letter:
877 Matches a single ASCII letter.
878
879 digit: Matches a decimal digit.
880
881 letterOrDigit:
882 Combines the above.
883
884 end: Matches the end of input.
885
886 ws: Matches zero or more spaces, tabs, or newlines.
887
888 exactly(char):
889 Matches the character char.
890
892 Allen Short
893
895 2023, Allen Short
896
897
898
899
9001.3 Jan 20, 2023 PARSLEY(1)