antlr3-interop(3)

1interop(3)                          ANTLR3C                         interop(3)
2
3
4

NAME

6       interop - Interoperation Within Rule Actions
7
8

Introduction

10       The main way to interact with the generated code is via action code
11       placed within { and } characters in your rules. In general, you are
12       advised to keep the code you embed within these actions, and the
13       grammar itself to an absolute minimum. Rather than embed code directly
14       in your grammar, you should construct an API, that is called from the
15       actions within your grammar. This way you will keep the grammar clean
16       and maintainable and separate the code generators or other code from
17       the definition of the grammar itself.
18
19       However, when you wish to call your API functions, or insert small
20       pieces of code that do not warrant external functions, you will need to
21       access elements of tokens, return elements from parser rules and
22       perhaps the internals of the recognizer itself. The C runtime provides
23       a number of MACROs that you can use within your action code. It also
24       provides a number of performant structures that you may find useful for
25       building symbol tables, lists, tries, stacks, arrays and so on (all of
26       which are managed so that your memory allocation problems are
27       minimized.)
28

Parameters and Returns from Parser Rules

30       The C target does not differ from the Java target in any major ways
31       here, and you should consult the standard documentation for the use of
32       parameters on rules and the returns clause. You should be aware though,
33       that the rules generate C function calls and therefore the input and
34       returns clauses are subject to the constraints of C scoping.
35
36       You should note that if your parser rule returns more than a single
37       entity, then the return type of the generated rule function is a
38       struct, which is returned by value. This is also the case if your rule
39       is part of a tree building grammar (uses the output=AST; option.
40
41       Other than the notes above, you can use any pre-declared type as an
42       input or output parameter for your rule.
43

Memory Management

45       You are responsible for allocating and freeing any memory used by your
46       own constructs, ANTLR will track and release any memory allocated
47       internally for tokens, trees, stacks, scopes and so on. This memory is
48       returned to the malloc pool when you call the free method of any ANTLR3
49       produced structure.
50
51       For performance reasons, and to avoid thrashing the malloc allocation
52       system, memory for amy elements of your generated parser is allocated
53       in chunks and parcelled out by factories. For instance memory for
54       tokens is created as an array of tokens, and a token factory hands out
55       the next available slot to the lexer. When you free the lexer, the
56       allocated memory is returned to the pool. The same applies to 'strings'
57       that contain the token text and various other text elements accessed
58       within the lexer.
59
60       The only side effect of this is that after your parse and analysis is
61       complete, if you wish to retain anything generated automatically, you
62       must copy it before freeing the recognizer structures. In practice it
63       is usually practical to retain the recognizer context objects until
64       your processing is complete or to use your own allocation scheme for
65       generating output etc.
66
67       The advantage of using object factories is of course that memory leaks
68       and accessing de-allocated memory are bugs that rarely occur within the
69       ANTLR3 C runtime. Further, allocating memory for tokens, trees and so
70       on is very fast.
71

The CTX Macro

73       The CTX macro is a fundamental parameter that is passed as the first
74       parameter to any generated function concerned with your lexer, parser,
75       or tree parser. The is is the context pointer for your generated
76       recognizer and is how you invoke the generated functions, and access
77       the data embedded within your generated recognizer. While you can use
78       it to directly access stacks, scopes and so on, this is not really
79       recommended as you should use the $xxx references that are available
80       generically within ANTLR grammars.
81
82       The context pointer is used because this removes the need for any
83       global/static variables at all, either within the generated code, or
84       the C runtime. This is of course fundamental to creating free threading
85       recognizers. Wherever a function call or rule call required the ctx
86       parameter, you either reference it via the CTX macro, or the ctx
87       parameter is in fact the return type from calling the 'constructor'
88       function for your parser/lexer/tree parser (see code example in 'How to
89       build Generated Code' .)
90

Macro Changes

92       While the author is not fond of using C MACROs to hide code or
93       structure access, in the case of generated code, they serve two useful
94       purposes. The first is to simplify the references to internal
95       constructs, the second is to facilitate the change of any internal
96       interface without requiring you to port grammars from earlier versions
97       (just regenerate and recompile). As of release 3.1, these macros are
98       stable and will only change their usage interface in the event of bugs
99       being discovered. You are encouraged to use these macros in your code,
100       rather than access the raw interface.
101
102       \bNB: Macros that act like statements must be terminated with a ';'.
103       The macro body does not supply this, nor should it. Macros that call
104       functions are declared with () even if they have no parameters, macros
105       that reference fields do not have a () declaration.
106

Lexer Macros

108       There are a number of macros that are useful exclusively within lexer
109       rules. There are additional macros, common to all recognizer, and these
110       are documented in the section Common Macros.
111
112   LEXER
113       The LEXER macro returns a pointer to the base lexer object, which is of
114       type pANTLR3_LEXER. This is not the pointer to your generated lexer,
115       which is supplied by the CTX macro, but to the common implementation of
116       a lexer interface, which is supplied to all generated lexers.
117
118   LEXSTATE
119       Provides a pointer to the lexer shared state structure, which is where
120       the tokens for a rule are constructed and the status elements of the
121       lexer are kept. This pointer is of type
122       #pANTLR3_RECOGNIZER_SHARED_STATE.In general you should only access
123       elements of this structure if there is not already another MACRO or
124       standard $xxxx antlr reference that refers to it.
125
126   LA(n)
127       The LA macro returns the character at index n from the current input
128       stream index. The return type is ANTLR3_UINT32. Hence LA(1) returns the
129       character at the current input position (the character that will be
130       consumed next), LA(-1) returns the character that has just been
131       consumed and so on. The LA(n) macro is useful for constructing semantic
132       predicates in lexer rules. The reference LA(0) is undefined and will
133       cause an error in your lexer.
134
135   GETCHARINDEX()
136       The GETCHARINDEX macro returns the index of the current character
137       position as a 0 based offset from the start of the input stream. It
138       returns a value type of ANTLR3_UINT32.
139
140   GETLINE()
141       The GETLINE macro returns the line number of current character (LA(1)
142       in the input stream. It returns a value type of ANTLR3_UINT32. Note
143       that the line number is incremented automatically by an input stream
144       when it sees the input character '
145
146   GETTEXT()
147       The GETTEXT macro returns the text currently matched by the lexer rule.
148       In general you should use the generic $text reference in ANTLR to
149       retrieve this. The return type is a reference type of pANTLR3_STRING
150       which allows you to manipulate the text you have retrieved (NB this
151       does not change the input stream only the text you copy from the input
152       stream when you use this MACRO or $text).
153
154       The reference $text->chars or GETTEXT()->chars will reference a pointer
155       to the '\0' terminated character string that the ANTLR3 pANTLR3_STRING
156       represents. String space is allocated automatically as well as the
157       structure that holds the string. The pANTLR3_STRING_FACTORY associated
158       with the lexer handles this and when you close the lexer, it will
159       automatically free any space allocated for strings and their
160       structures.
161
162   GETCHARPOSITIONINLINE()
163       The GETCHARPOSITIONINLINE returns the zero based offset of character
164       LA(1) from the start of the current input line. See the macro GETLINE
165       for details on what the line number means.
166
167   EMIT()
168       The macro EMIT causes the text range currently matched to the lexer
169       rule to be emitted immediately as the token for the rule. Subsequent
170       text is matched but ignored. The type used for the the token is the
171       name of the lexer rule or, if you have change this by using $type =
172       XXX;, the type XXX is used.
173
174   EMITNEW(t)
175       The macro EMITNEW causes the supplied token reference t to be used as
176       the token emitted by the rule. The parameter t  must be of type
177       pANTLR3_COMMON_TOKEN.
178
179   INDEX()
180       The INDEX macro returns the current input position according to the
181       input stream. It is not guaranteed to be the character offset in the
182       input stream but is instead used as a value for marking and rewinding
183       to specific points in the input stream. Use the macro GETCHARINDEX() to
184       find out the position of the LA(1) in the input stream.
185
186   PUSHSTREAM(str)
187       The PUSHSTREAM macro, in conjunction with the POPSTREAM macro (called
188       internally in the runtime usually) can be used to stack many input
189       streams to the lexer, and implement constructs such as the C pre-
190       processor #include directive.
191
192       An input stream that is pushed on to the stack becomes the current
193       input stream for the lexer and the state of the previous stream is
194       automatically saved. The input stream will be automatically popped from
195       the stack when it is exhausted by the lexer. You may use the macro
196       POPSTREAM to return to the previous input stream prior to exhausting
197       the currently stacked input stream.
198
199       Here is an example of using the macro in a lexer to implement the C
200       #include pre-processor directive:
201
202       fragment
203       STRING_GUTS :   (~('\'|'"') )* ;
204
205       LINE_COMMAND
206       : '#' (' ' | '')*
207           (                                                                 '? '0
208               'include' (' ' | '')+ '"' file = STRING_GUTS '"' (' ' | '')* '
209               {
210                   pANTLR3_STRING      fName;
211                   pANTLR3_INPUT_STREAM    in;
212
213                   // Create an initial string, then take a substring
214                   // We can do this by messing with the start and end
215                   // pointers of tokens and so on. This shows a reasonable way to
216                   // manipulate strings.
217                   //
218                   fName = $file.text;
219                   printf("Including file 's'0, fName->chars);
220
221                   // Create a new input stream and take advantage of built in stream stacking
222                   // in C target runtime.
223                   //
224                   in = antlr38BitFileStreamNew(fName->chars);
225                   PUSHSTREAM(in);
226
227                   // Note that the input stream is not closed when it EOFs, I don't bother
228                   // to do it here, but it is up to you to track streams created like this
229                   // and destroy them when the whole parse session is complete. Remember that you
230                   // don't want to do this until all tokens have been manipulated all the way through
231                   // your tree parsers etc as the token does not store the text it just refers
232                   // back to the input stream and trying to get the text for it will abort if you
233                   // close the input stream too early.
234                   //
235                                                          '? '0
236               }                                     ')* '
237                   | (('0'..'9')=>('0'..'9'))+ ~('0|'
238               )
239            {$channel=HIDDEN;}
240           ;
241
242   POPSTREAM()
243       Assuming that you have stacked an input stream using the PUSHSTREAM
244       macro, you can remove it from the stream stack and revert to the
245       previous input stream. You should be careful to pop the stream at an
246       appropriate point in your lexer action, so you do not match characters
247       from one stream with those from another in the same rule (unless this
248       is what you want to do)
249
250   SETTEXT(str)
251       A token manufactured by the lexer does not actually physically store
252       the text from the input stream to which it matches. The token string is
253       instead created only if you ask for the text. However if you wish to
254       change the text that the token represents you can use this macro to set
255       it explicitly. Note that this does not change the input stream text but
256       associates the supplied pANTLR3_STRING with the token. This string is
257       then returned when parser and tree parser reference the tokens via the
258       $xxx.text reference.
259
260   USER1 USER2 USER3 and CUSTOM
261       While you can create your own custom token class and have the lexer
262       deal with this, this is a lot of work compared to the trivial
263       inheritance that can be achieved in the Java target. In many cases
264       though, all that is needed is the addition of a few data items such as
265       an integer or a pointer. Rather than require C programmers to create
266       complicated structures just to add a few data items, the C target
267       provides a few custom fields in the standard token, which will fulfil
268       the needs of most lexers and parsers.
269
270       The token fields user1, user2, and user3 are all value types of
271       #ANTLR_UINT32. In the parser you can reference these fields directly
272       from the token: x=TOKNAME { $x->user1 ... but when you are building the
273       token in the lexer, you must assign to the fields using the macros
274       USER1, USER2, or USER3. As in:
275
276       LEXTOK: 'AAAAA' { USER1 = 99; } ;
277

Parser and Tree Parser Macros

279   PARSER
280       The PARSER macro returns a pointer to the base parser or tree parser
281       object, which is of type pANTLR3_PARSER or pANTLR3_TREE_PARSER . This
282       is not the pointer to your generated parser, which is supplied by the
283       CTX macro, but to the common implementation of a parser or tree parser
284       interface, which is supplied to all generated parsers.
285
286   INDEX()
287       When used in the parser, the INDEX macro returns the position of the
288       current token ( LT(1) ) in the input token stream. It can be used for
289       MARK and REWIND operations.
290
291   LT(n) and LA(n)
292       In the parser, the macro LT(n) returns the pANTLR3_COMMON_TOKEN at
293       offset n from the current token stream input position. The macro LA(n)
294       returns the token type of the token at position n. The value n cannot
295       be zero, and such a reference will return NULL and possibly cause an
296       error. LA(1) is the token that is about to be recognized and LA(-1) is
297       the token that has just been recognized. Values of n that exceed the
298       limits of the token stream boundaries will return NULL.
299
300   PSRSTATE
301       Returns the shared state pointer of type
302       pANTLR3_RECOGNIZER_SHARED_STATE. This is not generally useful to the
303       grammar programmer as the useful elements have generic $xxx references
304       built in to ANTLR.
305
306   ADAPTOR
307       When building an AST via a parser, the work of constructing and
308       manipulating trees is done by a supplied adaptor class. The default
309       class is usually fine for most tree operations but if you wish to build
310       your own specialized linked/tree structure, then you may need to
311       reference the adaptor you supply directly. The ADAPTOR macro returns
312       the reference to the tree adaptor which is always of type
313       pANTLR3_BASE_TREE_ADAPTOR, even if it is your custom adapter.
314

Macros Common to All Recognizers

316   RECOGNIZER
317       Returns a reference type of #pANTRL3_BASE_RECOGNIZER, which is the base
318       functionality supplied to all recognizers, whether lexers, parsers or
319       tree parsers. You can override methods in this interface by installing
320       your own function pointers (once you know what you are doing).
321
322   INPUT
323       Returns a reference to the input stream of the appropriate type for the
324       recognizer. In a lexer this macro returns a reference type of
325       pANTLR3_INPUT_STREAM, in a parser this is type pANTLR3_TOKEN_STREAM and
326       in a tree parser this is type pANTLR3_COMMON_TREE_NODE_STREAM. You can
327       of course provide your own implementations of any of these interfaces.
328
329   MARK()
330       This macro will cause the input stream for the current recognizer to be
331       marked with a checkpoint. It will return a value type of ANTLR3_MARKER
332       which you can use as the parameter to a REWIND macro to return to the
333       marked point in the input.
334
335       If you know you will only ever rewind to the last MARK, then you can
336       ignore the return value of this macro and just use the REWINDLAST macro
337       to return to the last MARK that was set in the input stream.
338
339   REWIND(m)
340       Rewinds the appropriate input stream back to the marked checkpoint
341       returned from a prior MARK macro call and supplied as the parameter m
342       to the REWIND(m) macro.
343
344   REWINDLAST()
345       Rewinds the current input stream (character, tokens, tree nodes) back
346       to the last checkpoint marker created by a MARK macro call. Fails
347       silently if there was no prior MARK call.
348
349   SEEK(n)
350       Causes the input stream to position itself directly at offset n in the
351       stream. Works for all input stream types, both lexer, parser and tree
352       parser.
353
354
355
356Version 3.3.1                   Wed Jul 20 2022                     interop(3)