1PARSLEY(1) Parsley PARSLEY(1)
2
3
4
6 parsley - Parsley Documentation
7
8 Contents:
9
11 From Regular Expressions To Grammars
12 Parsley is a pattern matching and parsing tool for Python programmers.
13
14 Most Python programmers are familiar with regular expressions, as pro‐
15 vided by Python’s re module. To use it, you provide a string that
16 describes the pattern you want to match, and your input.
17
18 For example:
19
20 >>> import re
21 >>> x = re.compile("a(b|c)d+e")
22 >>> x.match("abddde")
23 <_sre.SRE_Match object at 0x7f587af54af8>
24
25 You can do exactly the same sort of thing in Parsley:
26
27 >>> import parsley
28 >>> x = parsley.makeGrammar("foo = 'a' ('b' | 'c') 'd'+ 'e'", {})
29 >>> x("abdde").foo()
30 'e'
31
32 From this small example, a couple differences between regular expres‐
33 sions and Parsley grammars can be seen:
34
35 Parsley Grammars Have Named Rules
36 A Parsley grammar can have many rules, and each has a name. The example
37 above has a single rule named foo. Rules can call each other; calling
38 rules in Parsley works like calling functions in Python. Here is
39 another way to write the grammar above:
40
41 foo = 'a' baz 'd'+ 'e'
42 baz = 'b' | 'c'
43
44 Parsley Grammars Are Expressions
45 Calling match for a regular expression returns a match object if the
46 match succeeds or None if it fails. Parsley parsers return the value of
47 last expression in the rule. Behind the scenes, Parsley turns each rule
48 in your grammar into Python methods. In pseudo-Python code, it looks
49 something like this:
50
51 def foo(self):
52 match('a')
53 self.baz()
54 match_one_or_more('d')
55 return match('e')
56
57 def baz(self):
58 return match('b') or match('c')
59
60 The value of the last expression in the rule is what the rule returns.
61 This is why our example returns ‘e’.
62
63 The similarities to regular expressions pretty much end here, though.
64 Having multiple named rules composed of expressions makes for a much
65 more powerful tool, and now we’re going to look at some more features
66 that go even further.
67
68 Rules Can Embed Python Expressions
69 Since these rules just turn into Python code eventually, we can stick
70 some Python code into them ourselves. This is particularly useful for
71 changing the return value of a rule. The Parsley expression for this is
72 ->. We can also bind the results of expressions to variable names and
73 use them in Python code. So things like this are possible:
74
75 x = parsley.makeGrammar("""
76 foo = 'a':one baz:two 'd'+ 'e' -> (one, two)
77 baz = 'b' | 'c'
78 """, {})
79 print x("abdde").foo()
80
81 ('a', 'b')
82
83 Literal match expressions like ‘a’ return the character they match.
84 Using a colon and a variable name after an expression is like assign‐
85 ment in Python. As a result, we can use those names in a Python expres‐
86 sion - in this case, creating a tuple.
87
88 Another way to use Python code in a rule is to write custom tests for
89 matching. Sometimes it’s more convenient to write some Python that
90 determines if a rule matches than to stick to Parsley expressions
91 alone. For those cases, we can use ?(). Here, we use the builtin rule
92 anything to match a single character, then a Python predicate to decide
93 if it’s the one we want:
94
95 digit = anything:x ?(x in '0123456789') -> x
96
97 This rule digit will match any decimal digit. We need the -> x on the
98 end to return the character rather than the value of the predicate
99 expression, which is just True.
100
101 Repeated Matches Make Lists
102 Like regular expressions, Parsley supports repeating matches. You can
103 match an expression zero or more times with ‘* ‘, one or more times
104 with ‘+’, and a specific number of times with ‘{n, m}’ or just ‘{n}’.
105 Since all expressions in Parsley return a value, these repetition oper‐
106 ators return a list containing each match they made.
107
108 x = parsley.makeGrammar("""
109 digit = anything:x ?(x in '0123456789') -> x
110 number = digit+
111 """, {})
112 print x("314159").number()
113
114 ['3', '1', '4', '1', '5', '9']
115
116 The number rule repeatedly matches digit and collects the matches into
117 a list. This gets us part way to turning a string like 314159 into an
118 integer. All we need now is to turn the list back into a string and
119 call int():
120
121 x = parsley.makeGrammar("""
122 digit = anything:x ?(x in '0123456789') -> x
123 number = digit+:ds -> int(''.join(ds))
124 """, {})
125 print x("8675309").number()
126
127 8675309
128
129 Collecting Chunks Of Input
130 If it seemed kind of strange to break our input string up into a list
131 and then reassemble it into a string using join, you’re not alone.
132 Parsley has a shortcut for this since it’s a common case: you can use
133 <> around a rule to make it return the slice of input it consumes,
134 ignoring the actual return value of the rule. For example:
135
136 x = parsley.makeGrammar("""
137 digit = anything:x ?(x in '0123456789')
138 number = <digit+>:ds -> int(ds)
139 """, {})
140 print x("11235").number()
141
142 11235
143
144 Here, <digit+> returns the string “11235”, since that’s the portion of
145 the input that digit+ matched. (In this case it’s the entire input, but
146 we’ll see some more complex cases soon.) Since it ignores the list
147 returned by digit+, leaving the -> x out of digit doesn’t change the
148 result.
149
150 Building A Calculator
151 Now let’s look at using these rules in a more complicated parser. We
152 have support for parsing numbers; let’s do addition, as well.
153
154 x = parsley.makeGrammar("""
155 digit = anything:x ?(x in '0123456789')
156 number = <digit+>:ds -> int(ds)
157 expr = number:left ( '+' number:right -> left + right
158 | -> left)
159 """, {})
160 print x("17+34").expr()
161 print x("18").expr()
162
163 51
164 18
165
166 Parentheses group expressions just like in Python. the ‘|’ operator is
167 like or in Python - it short-circuits. It tries each expression until
168 it finds one that matches. For “17+34”, the number rule matches “17”,
169 then Parsley tries to match + followed by another number. Since “+” and
170 “34” are the next things in the input, those match, and it then runs
171 the Python expression left + right and returns its value. For the input
172 “18” it does the same, but + does not match, so Parsley tries the next
173 thing after |. Since this is just a Python expression, the match suc‐
174 ceeds and the number 18 is returned.
175
176 Now let’s add subtraction:
177
178 digit = anything:x ?(x in '0123456789')
179 number = <digit+>:ds -> int(ds)
180 expr = number:left ( '+' number:right -> left + right
181 | '-' number:right -> left - right
182 | -> left)
183
184 This will accept things like ‘5-4’ now.
185
186 Since parsing numbers is so common and useful, Parsley actually has
187 ‘digit’ as a builtin rule, so we don’t even need to define it our‐
188 selves. We’ll leave it out in further examples and rely on the version
189 Parsley provides.
190
191 Normally we like to allow whitespace in our expressions, so let’s add
192 some support for spaces:
193
194 number = <digit+>:ds -> int(ds)
195 ws = ' '*
196 expr = number:left ws ('+' ws number:right -> left + right
197 |'-' ws number:right -> left - right
198 | -> left)
199
200 Now we can handle “17 +34”, “2 - 1”, etc.
201
202 We could go ahead and add multiplication and division here (and hope‐
203 fully it’s obvious how that would work), but let’s complicate things
204 further and allow multiple operations in our expressions – things like
205 “1 - 2 + 3”.
206
207 There’s a couple different ways to do this. Possibly the easiest is to
208 build a list of numbers and operations, then do the math.:
209
210 x = parsley.makeGrammar("""
211 number = <digit+>:ds -> int(ds)
212 ws = ' '*
213 add = '+' ws number:n -> ('+', n)
214 sub = '-' ws number:n -> ('-', n)
215 addsub = ws (add | sub)
216 expr = number:left (addsub+:right -> right
217 | -> left)
218 """, {})
219 print x("1 + 2 - 3").expr()
220
221 [('+', 2), ('-, 3)]
222
223 Oops, this is only half the job done. We’re collecting the operators
224 and values, but now we need to do the actual calculation. The easiest
225 way to do it is probably to write a Python function and call it from
226 inside the grammar.
227
228 So far we have been passing an empty dict as the second argument to
229 makeGrammar. This is a dict of variable bindings that can be used in
230 Python expressions in the grammar. So we can pass Python objects, such
231 as functions, this way:
232
233 def calculate(start, pairs):
234 result = start
235 for op, value in pairs:
236 if op == '+':
237 result += value
238 elif op == '-':
239 result -= value
240 return result
241 x = parsley.makeGrammar("""
242 number = <digit+>:ds -> int(ds)
243 ws = ' '*
244 add = '+' ws number:n -> ('+', n)
245 sub = '-' ws number:n -> ('-', n)
246 addsub = ws (add | sub)
247 expr = number:left (addsub+:right -> calculate(left, right)
248 | -> left)
249 """, {"calculate": calculate})
250 print x("4 + 5 - 6").expr()
251
252 3
253
254 Introducing this function lets us simplify even further: instead of
255 using addsub+, we can use addsub*, since calculate(left, []) will
256 return left – so now expr becomes:
257
258 expr = number:left addsub*:right -> calculate(left, right)
259
260 So now let’s look at adding multiplication and division. Here, we run
261 into precedence rules: should “4 * 5 + 6” give us 26, or 44? The tradi‐
262 tional choice is for multiplication and division to take precedence
263 over addition and subtraction, so the answer should be 26. We’ll
264 resolve this by making sure multiplication and division happen before
265 addition and subtraction are considered:
266
267 def calculate(start, pairs):
268 result = start
269 for op, value in pairs:
270 if op == '+':
271 result += value
272 elif op == '-':
273 result -= value
274 elif op == '*':
275 result *= value
276 elif op == '/':
277 result /= value
278 return result
279 x = parsley.makeGrammar("""
280 number = <digit+>:ds -> int(ds)
281 ws = ' '*
282 add = '+' ws expr2:n -> ('+', n)
283 sub = '-' ws expr2:n -> ('-', n)
284 mul = '*' ws number:n -> ('*', n)
285 div = '/' ws number:n -> ('/', n)
286
287 addsub = ws (add | sub)
288 muldiv = ws (mul | div)
289
290 expr = expr2:left addsub*:right -> calculate(left, right)
291 expr2 = number:left muldiv*:right -> calculate(left, right)
292 """, {"calculate": calculate})
293 print x("4 * 5 + 6").expr()
294
295 26
296
297 Notice particularly that add, sub, and expr all call the expr2 rule now
298 where they called number before. This means that all the places where a
299 number was expected previously, a multiplication or division expression
300 can appear instead.
301
302 Finally let’s add parentheses, so you can override the precedence and
303 write “4 * (5 + 6)” when you do want 44. We’ll do this by adding a
304 value rule that accepts either a number or an expression in parenthe‐
305 ses, and replace existing calls to number with calls to value.
306
307 def calculate(start, pairs):
308 result = start
309 for op, value in pairs:
310 if op == '+':
311 result += value
312 elif op == '-':
313 result -= value
314 elif op == '*':
315 result *= value
316 elif op == '/':
317 result /= value
318 return result
319 x = parsley.makeGrammar("""
320 number = <digit+>:ds -> int(ds)
321 parens = '(' ws expr:e ws ')' -> e
322 value = number | parens
323 ws = ' '*
324 add = '+' ws expr2:n -> ('+', n)
325 sub = '-' ws expr2:n -> ('-', n)
326 mul = '*' ws value:n -> ('*', n)
327 div = '/' ws value:n -> ('/', n)
328
329 addsub = ws (add | sub)
330 muldiv = ws (mul | div)
331
332 expr = expr2:left addsub*:right -> calculate(left, right)
333 expr2 = value:left muldiv*:right -> calculate(left, right)
334 """, {"calculate": calculate})
335
336 print x("4 * (5 + 6) + 1").expr()
337
338 45
339
340 And there you have it: a four-function calculator with precedence and
341 parentheses.
342
344 Now that you are familiar with the basics of Parsley syntax, let’s look
345 at a more realistic example: a JSON parser.
346
347 The JSON spec on http://json.org/ describes the format, and we can
348 adapt its description to a parser. We’ll write the Parsley rules in the
349 same order as the grammar rules in the right sidebar on the JSON site,
350 starting with the top-level rule, ‘object’.
351
352 object = ws '{' members:m ws '}' -> dict(m)
353
354 Parsley defines a builtin rule ws which consumes any spaces, tabs, or
355 newlines it can.
356
357 Since JSON objects are represented in Python as dicts, and dict takes a
358 list of pairs, we need a rule to collect name/value pairs inside an
359 object expression.
360
361 members = (pair:first (ws ',' pair)*:rest -> [first] + rest)
362 | -> []
363
364 This handles the three cases for object contents: one, multiple, or
365 zero pairs. A name/value pair is separated by a colon. We use the
366 builtin rule spaces to consume any whitespace after the colon:
367
368 pair = ws string:k ws ':' value:v -> (k, v)
369
370 Arrays, similarly, are sequences of array elements, and are represented
371 as Python lists.
372
373 array = '[' elements:xs ws ']' -> xs
374 elements = (value:first (ws ',' value)*:rest -> [first] + rest) | -> []
375
376 Values can be any JSON expression.
377
378 value = ws (string | number | object | array
379 | 'true' -> True
380 | 'false' -> False
381 | 'null' -> None)
382
383 Strings are sequences of zero or more characters between double quotes.
384 Of course, we need to deal with escaped characters as well. This rule
385 introduces the operator ~, which does negative lookahead; if the
386 expression following it succeeds, its parse will fail. If the expres‐
387 sion fails, the rest of the parse continues. Either way, no input will
388 be consumed.
389
390 string = '"' (escapedChar | ~'"' anything)*:c '"' -> ''.join(c)
391
392 This is a common pattern, so let’s examine it step by step. This will
393 match leading whitespace and then a double quote character. It then
394 matches zero or more characters. If it’s not an escapedChar (which will
395 start with a backslash), we check to see if it’s a double quote, in
396 which case we want to end the loop. If it’s not a double quote, we
397 match it using the rule anything, which accepts a single character of
398 any kind, and continue. Finally, we match the ending double quote and
399 return the characters in the string. We cannot use the <> syntax in
400 this case because we don’t want a literal slice of the input – we want
401 escape sequences to be replaced with the character they represent.
402
403 It’s very common to use ~ for “match until” situations where you want
404 to keep parsing only until an end marker is found. Similarly, ~~ is
405 positive lookahead: it succeed if its expression succeeds but not con‐
406 sume any input.
407
408 The escapedChar rule should not be too surprising: we match a backslash
409 then whatever escape code is given.
410
411 escapedChar = '\\' (('"' -> '"') |('\\' -> '\\')
412 |('/' -> '/') |('b' -> '\b')
413 |('f' -> '\f') |('n' -> '\n')
414 |('r' -> '\r') |('t' -> '\t')
415 |('\'' -> '\'') | escapedUnicode)
416
417 Unicode escapes (of the form \u2603) require matching four hex digits,
418 so we use the repetition operator {}, which works like + or * except
419 taking either a {min, max} pair or simply a {number} indicating the
420 exact number of repetitions.
421
422 hexdigit = :x ?(x in '0123456789abcdefABCDEF') -> x
423 escapedUnicode = 'u' <hexdigit{4}>:hs -> unichr(int(hs, 16))
424
425 With strings out of the way, we advance to numbers, both integer and
426 floating-point.
427
428 number = spaces ('-' | -> ''):sign (intPart:ds (floatPart(sign ds)
429 | -> int(sign + ds)))
430
431 Here we vary from the json.org description a little and move sign han‐
432 dling up into the number rule. We match either an intPart followed by a
433 floatPart or just an intPart by itself.
434
435 digit = :x ?(x in '0123456789') -> x
436 digits = <digit*>
437 digit1_9 = :x ?(x in '123456789') -> x
438
439 intPart = (digit1_9:first digits:rest -> first + rest) | digit
440 floatPart :sign :ds = <('.' digits exponent?) | exponent>:tail
441 -> float(sign + ds + tail)
442 exponent = ('e' | 'E') ('+' | '-')? digits
443
444 In JSON, multi-digit numbers cannot start with 0 (since that is
445 Javascript’s syntax for octal numbers), so intPart uses digit1_9 to
446 exclude it in the first position.
447
448 The floatPart rule takes two parameters, sign and ds. Our number rule
449 passes values for these when it invokes floatPart, letting us avoid
450 duplication of work within the rule. Note that pattern matching on
451 arguments to rules works the same as on the string input to the parser.
452 In this case, we provide no pattern, just a name: :ds is the same as
453 anything:ds.
454
455 (Also note that our float rule cheats a little: it does not really
456 parse floating-point numbers, it merely recognizes them and passes them
457 to Python’s float builtin to actually produce the value.)
458
459 The full version of this parser and its test cases can be found in the
460 examples directory in the Parsley distribution.
461
463 This tutorial assumes basic knowledge of writing Twisted TCP clients or
464 servers.
465
466 Basic parsing
467 Parsing data that comes in over the network can be difficult due to
468 that there is no guarantee of receiving whole messages. Buffering is
469 often complicated by protocols switching between using fixed-width mes‐
470 sages and delimiters for framing. Fortunately, Parsley can remove all
471 of this tedium.
472
473 With parsley.makeProtocol(), Parsley can generate a Twisted
474 IProtocol-implementing class which will match incoming network data
475 using Parsley grammar rules. Before getting started with makeProto‐
476 col(), let’s build a grammar for netstrings. The netstrings protocol is
477 very simple:
478
479 4:spam,4:eggs,
480
481 This stream contains two netstrings: spam, and eggs. The data is pre‐
482 fixed with one or more ASCII digits followed by a :, and suffixed with
483 a ,. So, a Parsley grammar to match a netstring would look like:
484
485 nonzeroDigit = digit:x ?(x != '0')
486 digits = <'0' | nonzeroDigit digit*>:i -> int(i)
487
488 netstring = digits:length ':' <anything{length}>:string ',' -> string
489
490
491 makeProtocol() takes, in addition to a grammar, a factory for a
492 “sender” and a factory for a “receiver”. In the system of objects man‐
493 aged by the ParserProtocol, the sender is in charge of writing data to
494 the wire, and the receiver has methods called on it by the Parsley
495 rules. To demonstrate it, here is the final piece needed in the Parsley
496 grammar for netstrings:
497
498 receiveNetstring = netstring:string -> receiver.netstringReceived(string)
499
500
501 The receiver is always available in Parsley rules with the name
502 receiver, allowing Parsley rules to call methods on it.
503
504 When data is received over the wire, the ParserProtocol tries to match
505 the received data against the current rule. If the current rule
506 requires more data to finish matching, the ParserProtocol stops and
507 waits until more data comes in, then tries to continue matching. This
508 repeats until the current rule is completely matched, and then the
509 ParserProtocol starts matching any leftover data against the current
510 rule again.
511
512 One specifies the current rule by setting a currentRule attribute on
513 the receiver, which the ParserProtocol looks at before doing any pars‐
514 ing. Changing the current rule is addressed in the Switching rules sec‐
515 tion.
516
517 Since the ParserProtocol will never modify the currentRule attribute
518 itself, the default behavior is to keep using the same rule. Parsing
519 netstrings doesn’t require any rule changing, so, the default behavior
520 of continuing to use the same rule is fine.
521
522 Both the sender factory and receiver factory are constructed when the
523 ParserProtocol’s connection is established. The sender factory is a
524 one-argument callable which will be passed the ParserProtocol’s
525 Transport. This allows the sender to send data over the transport. For
526 example:
527
528 class NetstringSender(object):
529 def __init__(self, transport):
530 self.transport = transport
531
532 def sendNetstring(self, string):
533 self.transport.write('%d:%s,' % (len(string), string))
534
535
536 The receiver factory is another one-argument callable which is passed
537 the constructed sender. The returned object must at least have pre‐
538 pareParsing() and finishParsing() methods. prepareParsing() is called
539 with the ParserProtocol instance when a connection is established (i.e.
540 in the connectionMade of the ParserProtocol) and finishParsing() is
541 called when a connection is closed (i.e. in the connectionLost of the
542 ParserProtocol).
543
544 NOTE:
545 Both the receiver factory and its returned object’s prepareParsing()
546 are called at in the ParserProtocol’s connectionMade method; this
547 separation is for ease of testing receivers.
548
549 To demonstrate a receiver, here is a simple receiver that receives net‐
550 strings and echos the same netstrings back:
551
552 class NetstringReceiver(object):
553 currentRule = 'receiveNetstring'
554
555 def __init__(self, sender):
556 self.sender = sender
557
558 def prepareParsing(self, parser):
559 pass
560
561 def finishParsing(self, reason):
562 pass
563
564 def netstringReceived(self, string):
565 self.sender.sendNetstring(string)
566
567
568 Putting it all together, the Protocol is constructed using the grammar,
569 sender factory, and receiver factory:
570
571
572
573 NetstringProtocol = makeProtocol(
574 grammar, NetstringSender, NetstringReceiver)
575
576
577
578
579 The complete script is also available for download.
580
581 Intermezzo: error reporting
582 If an exception is raised from within Parsley during parsing, whether
583 it’s due to input not matching the current rule or an exception being
584 raised from code the grammar calls, the connection will be immediately
585 closed. The traceback will be captured as a Failure and passed to the
586 finishParsing() method of the receiver.
587
588 At present, there is no way to recover from failure.
589
590 Composing senders and receivers
591 The design of senders and receivers is intentional to make composition
592 easy: no subclassing is required. While the composition is easy enough
593 to do on your own, Parsley provides a function: stack(). It takes a
594 base factory followed by zero or more wrappers.
595
596 Its use is extremely simple: stack(x, y, z) will return a callable
597 suitable either as a sender or receiver factory which will, when called
598 with an argument, return x(y(z(argument))).
599
600 An example of wrapping a sender factory:
601
602 class NetstringReversalWrapper(object):
603 def __init__(self, wrapped):
604 self.wrapped = wrapped
605
606 def sendNetstring(self, string):
607 self.wrapped.sendNetstring(string[::-1])
608
609
610 And then, constructing the Protocol:
611
612 NetstringProtocol = makeProtocol(
613 grammar,
614 stack(NetstringReversalWrapper, NetstringSender),
615 NetstringReceiver)
616
617 A wrapper doesn’t need to call the same methods on the thing it’s wrap‐
618 ping. Also note that in most cases, it’s important to forward unknown
619 methods on to the wrapped object. An example of wrapping a receiver:
620
621 class NetstringSplittingWrapper(object):
622 def __init__(self, wrapped):
623 self.wrapped = wrapped
624
625 def netstringReceived(self, string):
626 splitpoint = len(string) // 2
627 self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
628 self.wrapped.netstringSecondHalfReceived(string[splitpoint:])
629
630 def __getattr__(self, attr):
631 return getattr(self.wrapped, attr)
632
633
634 The corresponding receiver and again, constructing the Protocol:
635
636 class SplitNetstringReceiver(object):
637 currentRule = 'receiveNetstring'
638
639 def __init__(self, sender):
640 self.sender = sender
641
642 def prepareParsing(self, parser):
643 pass
644
645 def finishParsing(self, reason):
646 pass
647
648 def netstringFirstHalfReceived(self, string):
649 self.sender.sendNetstring(string)
650
651 def netstringSecondHalfReceived(self, string):
652 pass
653
654
655 NetstringProtocol = makeProtocol(
656 grammar,
657 stack(NetstringReversalWrapper, NetstringSender),
658
659
660 The complete script is also available for download.
661
662 Switching rules
663 As mentioned before, it’s possible to change the current rule. Imagine
664 a “netstrings2” protocol that looks like this:
665
666 3:foo,3;bar,4:spam,4;eggs,
667
668 That is, the protocol alternates between using : and using ; delimiting
669 data length and the data. The amended grammar would look something like
670 this:
671
672 nonzeroDigit = digit:x ?(x != '0')
673 digits = <'0' | nonzeroDigit digit*>:i -> int(i)
674 netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string
675
676 colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
677 semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)
678
679
680 Changing the current rule is as simple as changing the currentRule
681 attribute on the receiver. So, the netstringReceived method could look
682 like this:
683
684 def netstringReceived(self, delimiter, string):
685 self.sender.sendNetstring(string)
686 if delimiter == ':':
687 self.currentRule = 'semicolon'
688 else:
689 self.currentRule = 'colon'
690
691
692 While changing the currentRule attribute can be done at any time, the
693 ParserProtocol only examines the currentRule at the beginning of pars‐
694 ing and after a rule has finished matching. As a result, if the curren‐
695 tRule changes, the ParserProtocol will wait until the current rule is
696 completely matched before switching rules.
697
698 The complete script is also available for download.
699
701 warning
702 Unfinished
703
704 Another feature taken from OMeta is grammar inheritance. We can write a
705 grammar with rules that override ones in a parent. If we load the gram‐
706 mar from our calculator tutorial as Calc, we can extend it with some
707 constants:
708
709 from parsley import makeGrammar
710 import math
711 import calc
712 calcGrammarEx = """
713 value = super | constant
714 constant = 'pi' -> math.pi
715 | 'e' -> math.e
716 """
717 CalcEx = makeGrammar(calcGrammar, {"math": math}, extends=calc.Calc)
718
719 Invoking the rule super calls the rule value in Calc. If it fails to
720 match, our new value rule attempts to match a constant name.
721
723 TermL (“term-ell”) is the Term Language, a small expression-based lan‐
724 guage for representing arbitrary data in a simple structured format. It
725 is ideal for expressing abstract syntax trees (ASTs) and other kinds of
726 primitive data trees.
727
728 Creating Terms
729 >>> from terml.nodes import termMaker as t
730 >>> t.Term()
731 term('Term')
732
733 That’s it! We’ve created an empty term, Term, with nothing inside.
734
735 >>> t.Num(1)
736 term('Num(1)')
737 >>> t.Outer(t.Inner())
738 term('Outer(Inner)')
739
740 We can see that terms are not just namedtuple lookalikes. They have
741 their own internals and store data in a slightly different and more
742 structured way than a normal tuple.
743
744 Parsing Terms
745 Parsley can parse terms from streams. Terms can contain any kind of
746 parseable data, including other terms. Returning to the ubiquitous cal‐
747 culator example:
748
749 add = Add(:x, :y) -> x + y
750
751 Here this rule matches a term called Add which has two components, bind
752 those components to a couple of names (x and y), and return their sum.
753 If this rule were applied to a term like Add(3, 5), it would return 8.
754
755 Terms can be nested, too. Here’s an example that performs a slightly
756 contrived match on a negated term inside an addition:
757
758 add_negate = Add(:x, Negate(:y)) -> x - y
759
761 Basic syntax
762 foo = ....:
763 Define a rule named foo.
764
765 expr1 expr2:
766 Match expr1, and then match expr2 if it succeeds, returning the
767 value of expr2. Like Python's and.
768
769 expr1 | expr2:
770 Try to match expr1 --- if it fails, match expr2 instead. Like
771 Python's or.
772
773 expr*: Match expr zero or more times, returning a list of matches.
774
775 expr+: Match expr one or more times, returning a list of matches.
776
777 expr?: Try to match expr. Returns None if it fails to match.
778
779 expr{n, m}:
780 Match expr at least n times, and no more than m times.
781
782 expr{n}:
783 Match expr n times exactly.
784
785 ~expr: Negative lookahead. Fails if the next item in the input matches
786 expr. Consumes no input.
787
788 ~~expr:
789 Positive lookahead. Fails if the next item in the input does not
790 match expr. Consumes no input.
791
792 ruleName or ruleName(arg1 arg2 etc):
793 Call the rule ruleName, possibly with args.
794
795 'x': Match the literal character 'x'.
796
797 <expr>:
798 Returns the string consumed by matching expr. Good for tokeniz‐
799 ing rules.
800
801 expr:name:
802 Bind the result of expr to the local variable name.
803
804 -> pythonExpression:
805 Evaluate the given Python expression and return its result. Can
806 be used inside parentheses too!
807
808 !(pythonExpression):
809 Invoke a Python expression as an action.
810
811 ?(pythonExpression):
812 Fail if the Python expression is false, Returns True otherwise.
813
814 expr ^(CustomLabel):
815 If the expr fails, the exception raised will contain CustomLa‐
816 bel. Good for providing more context when a rule is broken.
817 CustomLabel can contain any character other than "(" and ")".
818
819 Comments like Python comments are supported as well, starting with #
820 and extending to the end of the line.
821
822 Python API
823 Protocol parsing API
824 class ometa.protocol.ParserProtocol
825 The Twisted Protocol subclass used for parsing stream protocols
826 using Parsley. It has two public attributes:
827
828 sender After the connection is established, this attribute will
829 refer to the sender created by the sender factory of the
830 ParserProtocol.
831
832 receiver
833 After the connection is established, this attribute will
834 refer to the receiver created by the receiver factory of
835 the ParserProtocol.
836
837 It's common to also add a factory attribute to the
838 ParserProtocol from its factory's buildProtocol method, but this
839 isn't strictly required or guaranteed to be present.
840
841 Subclassing or instantiating ParserProtocol is not necessary;
842 makeProtocol() is sufficient and requires less boilerplate.
843
844 class ometa.protocol.Receiver
845 Receiver is not a real class but is used here for demonstration
846 purposes to indicate the required API.
847
848 currentRule
849 ParserProtocol examines the currentRule attribute at the
850 beginning of parsing as well as after every time a rule
851 has completely matched. At these times, the rule with the
852 same name as the value of currentRule will be selected to
853 start parsing the incoming stream of data.
854
855 prepareParsing(parserProtocol)
856 prepareParsing() is called after the ParserProtocol has
857 established a connection, and is passed the
858 ParserProtocol instance itself.
859
860 Parameters
861 parserProtocol -- An instance of ProtocolParser.
862
863 finishParsing(reason)
864 finishParsing() is called if an exception was raised dur‐
865 ing parsing, or when the ParserProtocol has lost its con‐
866 nection, whichever comes first. It will only be called
867 once.
868
869 An exception raised during parsing can be due to incoming
870 data that doesn't match the current rule or an exception
871 raised calling python code during matching.
872
873 Parameters
874 reason -- A Failure encapsulating the reason pars‐
875 ing has ended.
876
877 Senders do not have any required API as ParserProtocol will never call
878 methods on a sender.
879
880 Built-in Parsley Rules
881 anything:
882 Matches a single character from the input.
883
884 letter:
885 Matches a single ASCII letter.
886
887 digit: Matches a decimal digit.
888
889 letterOrDigit:
890 Combines the above.
891
892 end: Matches the end of input.
893
894 ws: Matches zero or more spaces, tabs, or newlines.
895
896 exactly(char):
897 Matches the character char.
898
900 Allen Short
901
903 2013, Allen Short
904
905
906
907
9081.3 Mar 12, 2019 PARSLEY(1)