1ms_transform(3) Erlang Module Definition ms_transform(3)
2
3
4
6 ms_transform - A parse transformation that translates fun syntax into
7 match
8 specifications.
9
11 This module provides the parse transformation that makes calls to ets
12 and dbg:fun2ms/1 translate into literal match specifications. It also
13 provides the back end for the same functions when called from the
14 Erlang shell.
15
16 The translation from funs to match specifications is accessed through
17 the two "pseudo functions" ets:fun2ms/1 and dbg:fun2ms/1.
18
19 As everyone trying to use ets:select/2 or dbg seems to end up reading
20 this manual page, this description is an introduction to the concept of
21 match specifications.
22
23 Read the whole manual page if it is the first time you are using the
24 transformations.
25
26 Match specifications are used more or less as filters. They resemble
27 usual Erlang matching in a list comprehension or in a fun used with
28 lists:foldl/3, and so on. However, the syntax of pure match specifica‐
29 tions is awkward, as they are made up purely by Erlang terms, and the
30 language has no syntax to make the match specifications more readable.
31
32 As the execution and structure of the match specifications are like
33 that of a fun, it is more straightforward to write it using the famil‐
34 iar fun syntax and to have that translated into a match specification
35 automatically. A real fun is clearly more powerful than the match spec‐
36 ifications allow, but bearing the match specifications in mind, and
37 what they can do, it is still more convenient to write it all as a fun.
38 This module contains the code that translates the fun syntax into match
39 specification terms.
40
42 Using ets:select/2 and a match specification, one can filter out rows
43 of a table and construct a list of tuples containing relevant parts of
44 the data in these rows. One can use ets:foldl/3 instead, but the
45 ets:select/2 call is far more efficient. Without the translation pro‐
46 vided by ms_transform, one must struggle with writing match specifica‐
47 tions terms to accommodate this.
48
49 Consider a simple table of employees:
50
51 -record(emp, {empno, %Employee number as a string, the key
52 surname, %Surname of the employee
53 givenname, %Given name of employee
54 dept, %Department, one of {dev,sales,prod,adm}
55 empyear}). %Year the employee was employed
56
57 We create the table using:
58
59 ets:new(emp_tab, [{keypos,#emp.empno},named_table,ordered_set]).
60
61 We fill the table with randomly chosen data:
62
63 [{emp,"011103","Black","Alfred",sales,2000},
64 {emp,"041231","Doe","John",prod,2001},
65 {emp,"052341","Smith","John",dev,1997},
66 {emp,"076324","Smith","Ella",sales,1995},
67 {emp,"122334","Weston","Anna",prod,2002},
68 {emp,"535216","Chalker","Samuel",adm,1998},
69 {emp,"789789","Harrysson","Joe",adm,1996},
70 {emp,"963721","Scott","Juliana",dev,2003},
71 {emp,"989891","Brown","Gabriel",prod,1999}]
72
73 Assuming that we want the employee numbers of everyone in the sales
74 department, there are several ways.
75
76 ets:match/2 can be used:
77
78 1> ets:match(emp_tab, {'_', '$1', '_', '_', sales, '_'}).
79 [["011103"],["076324"]]
80
81 ets:match/2 uses a simpler type of match specification, but it is still
82 unreadable, and one has little control over the returned result. It is
83 always a list of lists.
84
85 ets:foldl/3 or ets:foldr/3 can be used to avoid the nested lists:
86
87 ets:foldr(fun(#emp{empno = E, dept = sales},Acc) -> [E | Acc];
88 (_,Acc) -> Acc
89 end,
90 [],
91 emp_tab).
92
93 The result is ["011103","076324"]. The fun is straightforward, so the
94 only problem is that all the data from the table must be transferred
95 from the table to the calling process for filtering. That is ineffi‐
96 cient compared to the ets:match/2 call where the filtering can be done
97 "inside" the emulator and only the result is transferred to the
98 process.
99
100 Consider a "pure" ets:select/2 call that does what ets:foldr does:
101
102 ets:select(emp_tab, [{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]).
103
104 Although the record syntax is used, it is still hard to read and even
105 harder to write. The first element of the tuple, #emp{empno = '$1',
106 dept = sales, _='_'}, tells what to match. Elements not matching this
107 are not returned, as in the ets:match/2 example. The second element,
108 the empty list, is a list of guard expressions, which we do not need.
109 The third element is the list of expressions constructing the return
110 value (in ETS this is almost always a list containing one single term).
111 In our case '$1' is bound to the employee number in the head (first
112 element of the tuple), and hence the employee number is returned. The
113 result is ["011103","076324"], as in the ets:foldr/3 example, but the
114 result is retrieved much more efficiently in terms of execution speed
115 and memory consumption.
116
117 Using ets:fun2ms/1, we can combine the ease of use of the ets:foldr/3
118 and the efficiency of the pure ets:select/2 example:
119
120 -include_lib("stdlib/include/ms_transform.hrl").
121
122 ets:select(emp_tab, ets:fun2ms(
123 fun(#emp{empno = E, dept = sales}) ->
124 E
125 end)).
126
127 This example requires no special knowledge of match specifications to
128 understand. The head of the fun matches what you want to filter out and
129 the body returns what you want returned. As long as the fun can be kept
130 within the limits of the match specifications, there is no need to
131 transfer all table data to the process for filtering as in the
132 ets:foldr/3 example. It is easier to read than the ets:foldr/3 example,
133 as the select call in itself discards anything that does not match,
134 while the fun of the ets:foldr/3 call needs to handle both the elements
135 matching and the ones not matching.
136
137 In the ets:fun2ms/1 example above, it is needed to include ms_trans‐
138 form.hrl in the source code, as this is what triggers the parse trans‐
139 formation of the ets:fun2ms/1 call to a valid match specification. This
140 also implies that the transformation is done at compile time (except
141 when called from the shell) and therefore takes no resources in run‐
142 time. That is, although you use the more intuitive fun syntax, it gets
143 as efficient in runtime as writing match specifications by hand.
144
146 Assume that we want to get all the employee numbers of employees hired
147 before year 2000. Using ets:match/2 is not an alternative here, as
148 relational operators cannot be expressed there. Once again, ets:foldr/3
149 can do it (slowly, but correct):
150
151 ets:foldr(fun(#emp{empno = E, empyear = Y},Acc) when Y < 2000 -> [E | Acc];
152 (_,Acc) -> Acc
153 end,
154 [],
155 emp_tab).
156
157 The result is ["052341","076324","535216","789789","989891"], as
158 expected. The equivalent expression using a handwritten match specifi‐
159 cation would look like this:
160
161 ets:select(emp_tab, [{#emp{empno = '$1', empyear = '$2', _='_'},
162 [{'<', '$2', 2000}],
163 ['$1']}]).
164
165 This gives the same result. [{'<', '$2', 2000}] is in the guard part
166 and therefore discards anything that does not have an empyear (bound to
167 '$2' in the head) less than 2000, as the guard in the foldr/3 example.
168
169 We write it using ets:fun2ms/1:
170
171 -include_lib("stdlib/include/ms_transform.hrl").
172
173 ets:select(emp_tab, ets:fun2ms(
174 fun(#emp{empno = E, empyear = Y}) when Y < 2000 ->
175 E
176 end)).
177
179 Assume that we want the whole object matching instead of only one ele‐
180 ment. One alternative is to assign a variable to every part of the
181 record and build it up once again in the body of the fun, but the fol‐
182 lowing is easier:
183
184 ets:select(emp_tab, ets:fun2ms(
185 fun(Obj = #emp{empno = E, empyear = Y})
186 when Y < 2000 ->
187 Obj
188 end)).
189
190 As in ordinary Erlang matching, you can bind a variable to the whole
191 matched object using a "match inside the match", that is, a =. Unfortu‐
192 nately in funs translated to match specifications, it is allowed only
193 at the "top-level", that is, matching the whole object arriving to be
194 matched into a separate variable. If you are used to writing match
195 specifications by hand, we mention that variable A is simply translated
196 into '$_'. Alternatively, pseudo function object/0 also returns the
197 whole matched object, see section Warnings and Restrictions.
198
200 This example concerns the body of the fun. Assume that all employee
201 numbers beginning with zero (0) must be changed to begin with one (1)
202 instead, and that we want to create the list [{<Old empno>,<New
203 empno>}]:
204
205 ets:select(emp_tab, ets:fun2ms(
206 fun(#emp{empno = [$0 | Rest] }) ->
207 {[$0|Rest],[$1|Rest]}
208 end)).
209
210 This query hits the feature of partially bound keys in table type
211 ordered_set, so that not the whole table needs to be searched, only the
212 part containing keys beginning with 0 is looked into.
213
215 The fun can have many clauses. Assume that we want to do the following:
216
217 * If an employee started before 1997, return the tuple {inventory,
218 <employee number>}.
219
220 * If an employee started 1997 or later, but before 2001, return
221 {rookie, <employee number>}.
222
223 * For all other employees, return {newbie, <employee number>}, except
224 for those named Smith as they would be affronted by anything other
225 than the tag guru and that is also what is returned for their num‐
226 bers: {guru, <employee number>}.
227
228 This is accomplished as follows:
229
230 ets:select(emp_tab, ets:fun2ms(
231 fun(#emp{empno = E, surname = "Smith" }) ->
232 {guru,E};
233 (#emp{empno = E, empyear = Y}) when Y < 1997 ->
234 {inventory, E};
235 (#emp{empno = E, empyear = Y}) when Y > 2001 ->
236 {newbie, E};
237 (#emp{empno = E, empyear = Y}) -> % 1997 -- 2001
238 {rookie, E}
239 end)).
240
241 The result is as follows:
242
243 [{rookie,"011103"},
244 {rookie,"041231"},
245 {guru,"052341"},
246 {guru,"076324"},
247 {newbie,"122334"},
248 {rookie,"535216"},
249 {inventory,"789789"},
250 {newbie,"963721"},
251 {rookie,"989891"}]
252
254 What more can you do? A simple answer is: see the documentation of
255 match specifications in ERTS User's Guide. However, the following is a
256 brief overview of the most useful "built-in functions" that you can use
257 when the fun is to be translated into a match specification by
258 ets:fun2ms/1. It is not possible to call other functions than those
259 allowed in match specifications. No "usual" Erlang code can be executed
260 by the fun that is translated by ets:fun2ms/1. The fun is limited
261 exactly to the power of the match specifications, which is unfortunate,
262 but the price one must pay for the execution speed of ets:select/2 com‐
263 pared to ets:foldl/foldr.
264
265 The head of the fun is a head matching (or mismatching) one parameter,
266 one object of the table we select from. The object is always a single
267 variable (can be _) or a tuple, as ETS, Dets, and Mnesia tables include
268 that. The match specification returned by ets:fun2ms/1 can be used with
269 dets:select/2 and mnesia:select/2, and with ets:select/2. The use of =
270 in the head is allowed (and encouraged) at the top-level.
271
272 The guard section can contain any guard expression of Erlang. The fol‐
273 lowing is a list of BIFs and expressions:
274
275 * Type tests: is_atom, is_float, is_integer, is_list, is_number,
276 is_pid, is_port, is_reference, is_tuple, is_binary, is_function,
277 is_record
278
279 * Boolean operators: not, and, or, andalso, orelse
280
281 * Relational operators: >, >=, <, =<, =:=, ==, =/=, /=
282
283 * Arithmetics: +, -, *, div, rem
284
285 * Bitwise operators: band, bor, bxor, bnot, bsl, bsr
286
287 * The guard BIFs: abs, element, hd, length, node, round, size, tl,
288 trunc, self
289
290 Contrary to the fact with "handwritten" match specifications, the
291 is_record guard works as in ordinary Erlang code.
292
293 Semicolons (;) in guards are allowed, the result is (as expected) one
294 "match specification clause" for each semicolon-separated part of the
295 guard. The semantics is identical to the Erlang semantics.
296
297 The body of the fun is used to construct the resulting value. When
298 selecting from tables, one usually construct a suiting term here, using
299 ordinary Erlang term construction, like tuple parentheses, list brack‐
300 ets, and variables matched out in the head, possibly with the occa‐
301 sional constant. Whatever expressions are allowed in guards are also
302 allowed here, but no special functions exist except object and bindings
303 (see further down), which returns the whole matched object and all
304 known variable bindings, respectively.
305
306 The dbg variants of match specifications have an imperative approach to
307 the match specification body, the ETS dialect has not. The fun body for
308 ets:fun2ms/1 returns the result without side effects. As matching (=)
309 in the body of the match specifications is not allowed (for performance
310 reasons) the only thing left, more or less, is term construction.
311
313 This section describes the slightly different match specifications
314 translated by dbg:fun2ms/1.
315
316 The same reasons for using the parse transformation apply to dbg, maybe
317 even more, as filtering using Erlang code is not a good idea when trac‐
318 ing (except afterwards, if you trace to file). The concept is similar
319 to that of ets:fun2ms/1 except that you usually use it directly from
320 the shell (which can also be done with ets:fun2ms/1).
321
322 The following is an example module to trace on:
323
324 -module(toy).
325
326 -export([start/1, store/2, retrieve/1]).
327
328 start(Args) ->
329 toy_table = ets:new(toy_table, Args).
330
331 store(Key, Value) ->
332 ets:insert(toy_table, {Key,Value}).
333
334 retrieve(Key) ->
335 [{Key, Value}] = ets:lookup(toy_table, Key),
336 Value.
337
338 During model testing, the first test results in {badmatch,16} in
339 {toy,start,1}, why?
340
341 We suspect the ets:new/2 call, as we match hard on the return value,
342 but want only the particular new/2 call with toy_table as first parame‐
343 ter. So we start a default tracer on the node:
344
345 1> dbg:tracer().
346 {ok,<0.88.0>}
347
348 We turn on call tracing for all processes, we want to make a pretty
349 restrictive trace pattern, so there is no need to call trace only a few
350 processes (usually it is not):
351
352 2> dbg:p(all,call).
353 {ok,[{matched,nonode@nohost,25}]}
354
355 We specify the filter, we want to view calls that resemble
356 ets:new(toy_table, <something>):
357
358 3> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> true end)).
359 {ok,[{matched,nonode@nohost,1},{saved,1}]}
360
361 As can be seen, the fun used with dbg:fun2ms/1 takes a single list as
362 parameter instead of a single tuple. The list matches a list of the
363 parameters to the traced function. A single variable can also be used.
364 The body of the fun expresses, in a more imperative way, actions to be
365 taken if the fun head (and the guards) matches. true is returned here,
366 only because the body of a fun cannot be empty. The return value is
367 discarded.
368
369 The following trace output is received during test:
370
371 (<0.86.0>) call ets:new(toy_table, [ordered_set])
372
373 Assume that we have not found the problem yet, and want to see what
374 ets:new/2 returns. We use a slightly different trace pattern:
375
376 4> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> return_trace() end)).
377
378 The following trace output is received during test:
379
380 (<0.86.0>) call ets:new(toy_table,[ordered_set])
381 (<0.86.0>) returned from ets:new/2 -> 24
382
383 The call to return_trace results in a trace message when the function
384 returns. It applies only to the specific function call triggering the
385 match specification (and matching the head/guards of the match specifi‐
386 cation). This is by far the most common call in the body of a dbg match
387 specification.
388
389 The test now fails with {badmatch,24} because the atom toy_table does
390 not match the number returned for an unnamed table. So, the problem is
391 found, the table is to be named, and the arguments supplied by the test
392 program do not include named_table. We rewrite the start function:
393
394 start(Args) ->
395 toy_table = ets:new(toy_table, [named_table|Args]).
396
397 With the same tracing turned on, the following trace output is
398 received:
399
400 (<0.86.0>) call ets:new(toy_table,[named_table,ordered_set])
401 (<0.86.0>) returned from ets:new/2 -> toy_table
402
403 Assume that the module now passes all testing and goes into the system.
404 After a while, it is found that table toy_table grows while the system
405 is running and that there are many elements with atoms as keys. We
406 expected only integer keys and so does the rest of the system, but
407 clearly not the entire system. We turn on call tracing and try to see
408 calls to the module with an atom as the key:
409
410 1> dbg:tracer().
411 {ok,<0.88.0>}
412 2> dbg:p(all,call).
413 {ok,[{matched,nonode@nohost,25}]}
414 3> dbg:tpl(toy,store,dbg:fun2ms(fun([A,_]) when is_atom(A) -> true end)).
415 {ok,[{matched,nonode@nohost,1},{saved,1}]}
416
417 We use dbg:tpl/3 to ensure to catch local calls (assume that the module
418 has grown since the smaller version and we are unsure if this inserting
419 of atoms is not done locally). When in doubt, always use local call
420 tracing.
421
422 Assume that nothing happens when tracing in this way. The function is
423 never called with these parameters. We conclude that someone else (some
424 other module) is doing it and realize that we must trace on
425 ets:insert/2 and want to see the calling function. The calling function
426 can be retrieved using the match specification function caller. To get
427 it into the trace message, the match specification function message
428 must be used. The filter call looks like this (looking for calls to
429 ets:insert/2):
430
431 4> dbg:tpl(ets,insert,dbg:fun2ms(fun([toy_table,{A,_}]) when is_atom(A) ->
432 message(caller())
433 end)).
434 {ok,[{matched,nonode@nohost,1},{saved,2}]}
435
436 The caller is now displayed in the "additional message" part of the
437 trace output, and the following is displayed after a while:
438
439 (<0.86.0>) call ets:insert(toy_table,{garbage,can}) ({evil_mod,evil_fun,2})
440
441 You have realized that function evil_fun of the evil_mod module, with
442 arity 2, is causing all this trouble.
443
444 This example illustrates the most used calls in match specifications
445 for dbg. The other, more esoteric, calls are listed and explained in
446 Match specifications in Erlang in ERTS User's Guide, as they are beyond
447 the scope of this description.
448
450 The following warnings and restrictions apply to the funs used in with
451 ets:fun2ms/1 and dbg:fun2ms/1.
452
453 Warning:
454 To use the pseudo functions triggering the translation, ensure to
455 include the header file ms_transform.hrl in the source code. Failure to
456 do so possibly results in runtime errors rather than compile time, as
457 the expression can be valid as a plain Erlang program without transla‐
458 tion.
459
460
461 Warning:
462 The fun must be literally constructed inside the parameter list to the
463 pseudo functions. The fun cannot be bound to a variable first and then
464 passed to ets:fun2ms/1 or dbg:fun2ms/1. For example, ets:fun2ms(fun(A)
465 -> A end) works, but not F = fun(A) -> A end, ets:fun2ms(F). The latter
466 results in a compile-time error if the header is included, otherwise a
467 runtime error.
468
469
470 Many restrictions apply to the fun that is translated into a match
471 specification. To put it simple: you cannot use anything in the fun
472 that you cannot use in a match specification. This means that, among
473 others, the following restrictions apply to the fun itself:
474
475 * Functions written in Erlang cannot be called, neither can local
476 functions, global functions, or real funs.
477
478 * Everything that is written as a function call is translated into a
479 match specification call to a built-in function, so that the call
480 is_list(X) is translated to {'is_list', '$1'} ('$1' is only an
481 example, the numbering can vary). If one tries to call a function
482 that is not a match specification built-in, it causes an error.
483
484 * Variables occurring in the head of the fun are replaced by match
485 specification variables in the order of occurrence, so that frag‐
486 ment fun({A,B,C}) is replaced by {'$1', '$2', '$3'}, and so on.
487 Every occurrence of such a variable in the match specification is
488 replaced by a match specification variable in the same way, so that
489 the fun fun({A,B}) when is_atom(A) -> B end is translated into
490 [{{'$1','$2'},[{is_atom,'$1'}],['$2']}].
491
492 * Variables that are not included in the head are imported from the
493 environment and made into match specification const expressions.
494 Example from the shell:
495
496 1> X = 25.
497 25
498 2> ets:fun2ms(fun({A,B}) when A > X -> B end).
499 [{{'$1','$2'},[{'>','$1',{const,25}}],['$2']}]
500
501 * Matching with = cannot be used in the body. It can only be used on
502 the top-level in the head of the fun. Example from the shell again:
503
504 1> ets:fun2ms(fun({A,[B|C]} = D) when A > B -> D end).
505 [{{'$1',['$2'|'$3']},[{'>','$1','$2'}],['$_']}]
506 2> ets:fun2ms(fun({A,[B|C]=D}) when A > B -> D end).
507 Error: fun with head matching ('=' in head) cannot be translated into
508 match_spec
509 {error,transform_error}
510 3> ets:fun2ms(fun({A,[B|C]}) when A > B -> D = [B|C], D end).
511 Error: fun with body matching ('=' in body) is illegal as match_spec
512 {error,transform_error}
513
514 All variables are bound in the head of a match specification, so
515 the translator cannot allow multiple bindings. The special case
516 when matching is done on the top-level makes the variable bind to
517 '$_' in the resulting match specification. It is to allow a more
518 natural access to the whole matched object. Pseudo function
519 object() can be used instead, see below.
520
521 The following expressions are translated equally:
522
523 ets:fun2ms(fun({a,_} = A) -> A end).
524 ets:fun2ms(fun({a,_}) -> object() end).
525
526 * The special match specification variables '$_' and '$*' can be
527 accessed through the pseudo functions object() (for '$_') and bind‐
528 ings() (for '$*'). As an example, one can translate the following
529 ets:match_object/2 call to a ets:select/2 call:
530
531 ets:match_object(Table, {'$1',test,'$2'}).
532
533 This is the same as:
534
535 ets:select(Table, ets:fun2ms(fun({A,test,B}) -> object() end)).
536
537 In this simple case, the former expression is probably preferable
538 in terms of readability.
539
540 The ets:select/2 call conceptually looks like this in the resulting
541 code:
542
543 ets:select(Table, [{{'$1',test,'$2'},[],['$_']}]).
544
545 Matching on the top-level of the fun head can be a more natural way
546 to access '$_', see above.
547
548 * Term constructions/literals are translated as much as is needed to
549 get them into valid match specification. This way tuples are made
550 into match specification tuple constructions (a one element tuple
551 containing the tuple) and constant expressions are used when
552 importing variables from the environment. Records are also trans‐
553 lated into plain tuple constructions, calls to element, and so on.
554 The guard test is_record/2 is translated into match specification
555 code using the three parameter version that is built into match
556 specification, so that is_record(A,t) is translated into
557 {is_record,'$1',t,5} if the record size of record type t is 5.
558
559 * Language constructions such as case, if, and catch that are not
560 present in match specifications are not allowed.
561
562 * If header file ms_transform.hrl is not included, the fun is not
563 translated, which can result in a runtime error (depending on
564 whether the fun is valid in a pure Erlang context).
565
566 Ensure that the header is included when using ets and dbg:fun2ms/1
567 in compiled code.
568
569 * If pseudo function triggering the translation is ets:fun2ms/1, the
570 head of the fun must contain a single variable or a single tuple.
571 If the pseudo function is dbg:fun2ms/1, the head of the fun must
572 contain a single variable or a single list.
573
574 The translation from funs to match specifications is done at compile
575 time, so runtime performance is not affected by using these pseudo
576 functions.
577
578 For more information about match specifications, see the Match specifi‐
579 cations in Erlang in ERTS User's Guide.
580
582 format_error(Error) -> Chars
583
584 Types:
585
586 Error = {error, module(), term()}
587 Chars = io_lib:chars()
588
589 Takes an error code returned by one of the other functions in
590 the module and creates a textual description of the error.
591
592 parse_transform(Forms, Options) -> Forms2
593
594 Types:
595
596 Forms = Forms2 = [erl_parse:abstract_form() |
597 erl_parse:form_info()]
598 Options = term()
599 Option list, required but not used.
600
601 Implements the transformation at compile time. This function is
602 called by the compiler to do the source code transformation if
603 and when header file ms_transform.hrl is included in the source
604 code.
605
606 For information about how to use this parse transformation, see
607 ets and dbg:fun2ms/1.
608
609 For a description of match specifications, see section Match
610 Specification in Erlang in ERTS User's Guide.
611
612 transform_from_shell(Dialect, Clauses, BoundEnvironment) -> term()
613
614 Types:
615
616 Dialect = ets | dbg
617 Clauses = [erl_parse:abstract_clause()]
618 BoundEnvironment = erl_eval:binding_struct()
619 List of variable bindings in the shell environment.
620
621 Implements the transformation when the fun2ms/1 functions are
622 called from the shell. In this case, the abstract form is for
623 one single fun (parsed by the Erlang shell). All imported vari‐
624 ables are to be in the key-value list passed as BoundEnviron‐
625 ment. The result is a term, normalized, that is, not in abstract
626 format.
627
628
629
630Ericsson AB stdlib 3.4.5.1 ms_transform(3)