1interop(3) ANTLR3C interop(3)
2
3
4
6 interop - Interoperation Within Rule Actions
7
9 The main way to interact with the generated code is via action code
10 placed within { and } characters in your rules. In general, you are
11 advised to keep the code you embed within these actions, and the
12 grammar itself to an absolute minimum. Rather than embed code directly
13 in your grammar, you should construct an API, that is called from the
14 actions within your grammar. This way you will keep the grammar clean
15 and maintainable and separate the code generators or other code from
16 the definition of the grammar itself.
17
18 However, when you wish to call your API functions, or insert small
19 pieces of code that do not warrant external functions, you will need to
20 access elements of tokens, return elements from parser rules and
21 perhaps the internals of the recognizer itself. The C runtime provides
22 a number of MACROs that you can use within your action code. It also
23 provides a number of performant structures that you may find useful for
24 building symbol tables, lists, tries, stacks, arrays and so on (all of
25 which are managed so that your memory allocation problems are
26 minimized.)
27
29 The C target does not differ from the Java target in any major ways
30 here, and you should consult the standard documentation for the use of
31 parameters on rules and the returns clause. You should be aware though,
32 that the rules generate C function calls and therefore the input and
33 returns clauses are subject to the constraints of C scoping.
34
35 You should note that if your parser rule returns more than a single
36 entity, then the return type of the generated rule function is a
37 struct, which is returned by value. This is also the case if your rule
38 is part of a tree building grammar (uses the output=AST; option.
39
40 Other than the notes above, you can use any pre-declared type as an
41 input or output parameter for your rule.
42
44 You are responsible for allocating and freeing any memory used by your
45 own constructs, ANTLR will track and release any memory allocated
46 internally for tokens, trees, stacks, scopes and so on. This memory is
47 returned to the malloc pool when you call the free method of any ANTLR3
48 produced structure.
49
50 For performance reasons, and to avoid thrashing the malloc allocation
51 system, memory for amy elements of your generated parser is allocated
52 in chunks and parcelled out by factories. For instance memory for
53 tokens is created as an array of tokens, and a token factory hands out
54 the next available slot to the lexer. When you free the lexer, the
55 allocated memory is returned to the pool. The same applies to 'strings'
56 that contain the token text and various other text elements accessed
57 within the lexer.
58
59 The only side effect of this is that after your parse and analysis is
60 complete, if you wish to retain anything generated automatically, you
61 must copy it before freeing the recognizer structures. In practice it
62 is usually practical to retain the recognizer context objects until
63 your processing is complete or to use your own allocation scheme for
64 generating output etc.
65
66 The advantage of using object factories is of course that memory leaks
67 and accessing de-allocated memory are bugs that rarely occur within the
68 ANTLR3 C runtime. Further, allocating memory for tokens, trees and so
69 on is very fast.
70
72 The CTX macro is a fundamental parameter that is passed as the first
73 parameter to any generated function concerned with your lexer, parser,
74 or tree parser. The is is the context pointer for your generated
75 recognizer and is how you invoke the generated functions, and access
76 the data embedded within your generated recognizer. While you can use
77 it to directly access stacks, scopes and so on, this is not really
78 recommended as you should use the $xxx references that are available
79 generically within ANTLR grammars.
80
81 The context pointer is used because this removes the need for any
82 global/static variables at all, either within the generated code, or
83 the C runtime. This is of course fundamental to creating free threading
84 recognizers. Wherever a function call or rule call required the ctx
85 parameter, you either reference it via the CTX macro, or the ctx
86 parameter is in fact the return type from calling the 'constructor'
87 function for your parser/lexer/tree parser (see code example in 'How to
88 build Generated Code' .)
89
91 While the author is not fond of using C MACROs to hide code or
92 structure access, in the case of generated code, they serve two useful
93 purposes. The first is to simplify the references to internal
94 constructs, the second is to facilitate the change of any internal
95 interface without requiring you to port grammars from earlier versions
96 (just regenerate and recompile). As of release 3.1, these macros are
97 stable and will only change their usage interface in the event of bugs
98 being discovered. You are encouraged to use these macros in your code,
99 rather than access the raw interface.
100
101 \bNB: Macros that act like statements must be terminated with a ';'.
102 The macro body does not supply this, nor should it. Macros that call
103 functions are declared with () even if they have no parameters, macros
104 that reference fields do not have a () declaration.
105
107 There are a number of macros that are useful exclusively within lexer
108 rules. There are additional macros, common to all recognizer, and these
109 are documented in the section Common Macros.
110
111 LEXER
112 The LEXER macro returns a pointer to the base lexer object, which is of
113 type pANTLR3_LEXER. This is not the pointer to your generated lexer,
114 which is supplied by the CTX macro, but to the common implementation of
115 a lexer interface, which is supplied to all generated lexers.
116
117 LEXSTATE
118 Provides a pointer to the lexer shared state structure, which is where
119 the tokens for a rule are constructed and the status elements of the
120 lexer are kept. This pointer is of type
121 #pANTLR3_RECOGNIZER_SHARED_STATE.In general you should only access
122 elements of this structure if there is not already another MACRO or
123 standard $xxxx antlr reference that refers to it.
124
125 LA(n)
126 The LA macro returns the character at index n from the current input
127 stream index. The return type is ANTLR3_UINT32. Hence LA(1) returns the
128 character at the current input position (the character that will be
129 consumed next), LA(-1) returns the character that has just been
130 consumed and so on. The LA(n) macro is useful for constructing semantic
131 predicates in lexer rules. The reference LA(0) is undefined and will
132 cause an error in your lexer.
133
134 GETCHARINDEX()
135 The GETCHARINDEX macro returns the index of the current character
136 position as a 0 based offset from the start of the input stream. It
137 returns a value type of ANTLR3_UINT32.
138
139 GETLINE()
140 The GETLINE macro returns the line number of current character (LA(1)
141 in the input stream. It returns a value type of ANTLR3_UINT32. Note
142 that the line number is incremented automatically by an input stream
143 when it sees the input character '
144
145 GETTEXT()
146 The GETTEXT macro returns the text currently matched by the lexer rule.
147 In general you should use the generic $text reference in ANTLR to
148 retrieve this. The return type is a reference type of pANTLR3_STRING
149 which allows you to manipulate the text you have retrieved (NB this
150 does not change the input stream only the text you copy from the input
151 stream when you use this MACRO or $text).
152
153 The reference $text->chars or GETTEXT()->chars will reference a pointer
154 to the '\0' terminated character string that the ANTLR3 pANTLR3_STRING
155 represents. String space is allocated automatically as well as the
156 structure that holds the string. The pANTLR3_STRING_FACTORY associated
157 with the lexer handles this and when you close the lexer, it will
158 automatically free any space allocated for strings and their
159 structures.
160
161 GETCHARPOSITIONINLINE()
162 The GETCHARPOSITIONINLINE returns the zero based offset of character
163 LA(1) from the start of the current input line. See the macro GETLINE
164 for details on what the line number means.
165
166 EMIT()
167 The macro EMIT causes the text range currently matched to the lexer
168 rule to be emitted immediately as the token for the rule. Subsequent
169 text is matched but ignored. The type used for the the token is the
170 name of the lexer rule or, if you have change this by using $type =
171 XXX;, the type XXX is used.
172
173 EMITNEW(t)
174 The macro EMITNEW causes the supplied token reference t to be used as
175 the token emitted by the rule. The parameter t must be of type
176 pANTLR3_COMMON_TOKEN.
177
178 INDEX()
179 The INDEX macro returns the current input position according to the
180 input stream. It is not guaranteed to be the character offset in the
181 input stream but is instead used as a value for marking and rewinding
182 to specific points in the input stream. Use the macro GETCHARINDEX() to
183 find out the position of the LA(1) in the input stream.
184
185 PUSHSTREAM(str)
186 The PUSHSTREAM macro, in conjunction with the POPSTREAM macro (called
187 internally in the runtime usually) can be used to stack many input
188 streams to the lexer, and implement constructs such as the C pre-
189 processor #include directive.
190
191 An input stream that is pushed on to the stack becomes the current
192 input stream for the lexer and the state of the previous stream is
193 automatically saved. The input stream will be automatically popped from
194 the stack when it is exhausted by the lexer. You may use the macro
195 POPSTREAM to return to the previous input stream prior to exhausting
196 the currently stacked input stream.
197
198 Here is an example of using the macro in a lexer to implement the C
199 #include pre-processor directive:
200
201 fragment
202 STRING_GUTS : (~('\\'|'"') )* ;
203
204 LINE_COMMAND
205 : '#' (' ' | '\t')*
206 (
207 'include' (' ' | '\t')+ '"' file = STRING_GUTS '"' (' ' | '\t')* '\r'? '\n'
208 {
209 pANTLR3_STRING fName;
210 pANTLR3_INPUT_STREAM in;
211
212 // Create an initial string, then take a substring
213 // We can do this by messing with the start and end
214 // pointers of tokens and so on. This shows a reasonable way to
215 // manipulate strings.
216 //
217 fName = $file.text;
218 printf("Including file '\%s'\n", fName->chars);
219
220 // Create a new input stream and take advantage of built in stream stacking
221 // in C target runtime.
222 //
223 in = antlr38BitFileStreamNew(fName->chars);
224 PUSHSTREAM(in);
225
226 // Note that the input stream is not closed when it EOFs, I don't bother
227 // to do it here, but it is up to you to track streams created like this
228 // and destroy them when the whole parse session is complete. Remember that you
229 // don't want to do this until all tokens have been manipulated all the way through
230 // your tree parsers etc as the token does not store the text it just refers
231 // back to the input stream and trying to get the text for it will abort if you
232 // close the input stream too early.
233 //
234
235 }
236 | (('0'..'9')=>('0'..'9'))+ ~('\n'|'\r')* '\r'? '\n'
237 )
238 {$channel=HIDDEN;}
239 ;
240
241 POPSTREAM()
242 Assuming that you have stacked an input stream using the PUSHSTREAM
243 macro, you can remove it from the stream stack and revert to the
244 previous input stream. You should be careful to pop the stream at an
245 appropriate point in your lexer action, so you do not match characters
246 from one stream with those from another in the same rule (unless this
247 is what you want to do)
248
249 SETTEXT(str)
250 A token manufactured by the lexer does not actually physically store
251 the text from the input stream to which it matches. The token string is
252 instead created only if you ask for the text. However if you wish to
253 change the text that the token represents you can use this macro to set
254 it explicitly. Note that this does not change the input stream text but
255 associates the supplied pANTLR3_STRING with the token. This string is
256 then returned when parser and tree parser reference the tokens via the
257 $xxx.text reference.
258
259 USER1 USER2 USER3 and CUSTOM
260 While you can create your own custom token class and have the lexer
261 deal with this, this is a lot of work compared to the trivial
262 inheritance that can be achieved in the Java target. In many cases
263 though, all that is needed is the addition of a few data items such as
264 an integer or a pointer. Rather than require C programmers to create
265 complicated structures just to add a few data items, the C target
266 provides a few custom fields in the standard token, which will fulfil
267 the needs of most lexers and parsers.
268
269 The token fields user1, user2, and user3 are all value types of
270 #ANTLR_UINT32. In the parser you can reference these fields directly
271 from the token: x=TOKNAME { $x->user1 ... but when you are building the
272 token in the lexer, you must assign to the fields using the macros
273 USER1, USER2, or USER3. As in:
274
275 LEXTOK: 'AAAAA' { USER1 = 99; } ;
276
278 PARSER
279 The PARSER macro returns a pointer to the base parser or tree parser
280 object, which is of type pANTLR3_PARSER or pANTLR3_TREE_PARSER . This
281 is not the pointer to your generated parser, which is supplied by the
282 CTX macro, but to the common implementation of a parser or tree parser
283 interface, which is supplied to all generated parsers.
284
285 INDEX()
286 When used in the parser, the INDEX macro returns the position of the
287 current token ( LT(1) ) in the input token stream. It can be used for
288 MARK and REWIND operations.
289
290 LT(n) and LA(n)
291 In the parser, the macro LT(n) returns the pANTLR3_COMMON_TOKEN at
292 offset n from the current token stream input position. The macro LA(n)
293 returns the token type of the token at position n. The value n cannot
294 be zero, and such a reference will return NULL and possibly cause an
295 error. LA(1) is the token that is about to be recognized and LA(-1) is
296 the token that has just been recognized. Values of n that exceed the
297 limits of the token stream boundaries will return NULL.
298
299 PSRSTATE
300 Returns the shared state pointer of type
301 pANTLR3_RECOGNIZER_SHARED_STATE. This is not generally useful to the
302 grammar programmer as the useful elements have generic $xxx references
303 built in to ANTLR.
304
305 ADAPTOR
306 When building an AST via a parser, the work of constructing and
307 manipulating trees is done by a supplied adaptor class. The default
308 class is usually fine for most tree operations but if you wish to build
309 your own specialized linked/tree structure, then you may need to
310 reference the adaptor you supply directly. The ADAPTOR macro returns
311 the reference to the tree adaptor which is always of type
312 pANTLR3_BASE_TREE_ADAPTOR, even if it is your custom adapter.
313
315 RECOGNIZER
316 Returns a reference type of #pANTRL3_BASE_RECOGNIZER, which is the base
317 functionality supplied to all recognizers, whether lexers, parsers or
318 tree parsers. You can override methods in this interface by installing
319 your own function pointers (once you know what you are doing).
320
321 INPUT
322 Returns a reference to the input stream of the appropriate type for the
323 recognizer. In a lexer this macro returns a reference type of
324 pANTLR3_INPUT_STREAM, in a parser this is type pANTLR3_TOKEN_STREAM and
325 in a tree parser this is type pANTLR3_COMMON_TREE_NODE_STREAM. You can
326 of course provide your own implementations of any of these interfaces.
327
328 MARK()
329 This macro will cause the input stream for the current recognizer to be
330 marked with a checkpoint. It will return a value type of ANTLR3_MARKER
331 which you can use as the parameter to a REWIND macro to return to the
332 marked point in the input.
333
334 If you know you will only ever rewind to the last MARK, then you can
335 ignore the return value of this macro and just use the REWINDLAST macro
336 to return to the last MARK that was set in the input stream.
337
338 REWIND(m)
339 Rewinds the appropriate input stream back to the marked checkpoint
340 returned from a prior MARK macro call and supplied as the parameter m
341 to the REWIND(m) macro.
342
343 REWINDLAST()
344 Rewinds the current input stream (character, tokens, tree nodes) back
345 to the last checkpoint marker created by a MARK macro call. Fails
346 silently if there was no prior MARK call.
347
348 SEEK(n)
349 Causes the input stream to position itself directly at offset n in the
350 stream. Works for all input stream types, both lexer, parser and tree
351 parser.
352
353
354
355Version 3.3.1 Wed Jan 18 2023 interop(3)