1interop(3) ANTLR3C interop(3)
2
3
4
6 interop - .TH "interop" 3 "Wed Oct 13 2010" "Version 3.1.2" "ANTLR3C"
7
9 interop - .SH "Introduction"
10
11 The main way to interact with the generated code is via action code
12 placed within { and } characters in your rules. In general, you are
13 advised to keep the code you embed within these actions, and the
14 grammar itself to an absolute minimum. Rather than embed code directly
15 in your grammar, you should construct an API, that is called from the
16 actions within your grammar. This way you will keep the grammar clean
17 and maintainable and separate the code generators or other code from
18 the definition of the grammar itself.
19
20 However, when you wish to call your API functions, or insert small
21 pieces of code that do not warrant external functions, you will need to
22 access elements of tokens, return elements from parser rules and
23 perhaps the internals of the recognizer itself. The C runtime provides
24 a number of MACROs that you can use within your action code. It also
25 provides a number of performant structures that you may find useful for
26 building symbol tables, lists, tries, stacks, arrays and so on (all of
27 which are managed so that your memory allocation problems are
28 minimized.)
29
31 The C target does not differ from the Java target in any major ways
32 here, and you should consult the standard documentation for the use of
33 parameters on rules and the returns clause. You should be aware though,
34 that the rules generate C function calls and therefore the input and
35 returns clauses are subject to the constraints of C scoping.
36
37 You should note that if your parser rule returns more than a single
38 entity, then the return type of the generated rule function is a
39 struct, which is returned by value. This is also the case if your rule
40 is part of a tree building grammar (uses the output=AST; option.
41
42 Other than the notes above, you can use any pre-declared type as an
43 input or output parameter for your rule.
44
46 You are responsible for allocating and freeing any memory used by your
47 own constructs, ANTLR will track and release any memory allocated
48 internally for tokens, trees, stacks, scopes and so on. This memory is
49 returned to the malloc pool when you call the free method of any ANTLR3
50 produced structure.
51
52 For performance reasons, and to avoid thrashing the malloc allocation
53 system, memory for amy elements of your generated parser is allocated
54 in chunks and parcelled out by factories. For instance memory for
55 tokens is created as an array of tokens, and a token factory hands out
56 the next available slot to the lexer. When you free the lexer, the
57 allocated memory is returned to the pool. The same applies to 'strings'
58 that contain the token text and various other text elements accessed
59 within the lexer.
60
61 The only side effect of this is that after your parse and analysis is
62 complete, if you wish to retain anything generated automatically, you
63 must copy it before freeing the recognizer structures. In practice it
64 is usually practical to retain the recognizer context objects until
65 your processing is complete or to use your own allocation scheme for
66 generating output etc.
67
68 The advantage of using object factories is of course that memory leaks
69 and accessing de-allocated memory are bugs that rarely occur within the
70 ANTLR3 C runtime. Further, allocating memory for tokens, trees and so
71 on is very fast.
72
74 The CTX macro is a fundamental parameter that is passed as the first
75 parameter to any generated function concerned with your lexer, parser,
76 or tree parser. The is is the context pointer for your generated
77 recognizer and is how you invoke the generated functions, and access
78 the data embedded within your generated recognizer. While you can use
79 it to directly access stacks, scopes and so on, this is not really
80 recommended as you should use the $xxx references that are available
81 generically within ANTLR grammars.
82
83 The context pointer is used because this removes the need for any
84 global/static variables at all, either within the generated code, or
85 the C runtime. This is of course fundamental to creating free threading
86 recognizers. Wherever a function call or rule call required the ctx
87 parameter, you either reference it via the CTX macro, or the ctx
88 parameter is in fact the return type from calling the 'constructor'
89 function for your parser/lexer/tree parser (see code example in 'How to
90 build Generated Code' .)
91
93 While the author is not fond of using C MACROs to hide code or
94 structure access, in the case of generated code, they serve two useful
95 purposes. The first is to simplify the references to internal
96 constructs, the second is to facilitate the change of any internal
97 interface without requiring you to port grammars from earlier versions
98 (just regenerate and recompile). As of release 3.1, these macros are
99 stable and will only change their usage interface in the event of bugs
100 being discovered. You are encouraged to use these macros in your code,
101 rather than access the raw interface.
102
103 : Macros that act like statements must be terminated with a ';'. The
104 macro body does not supply this, nor should it. Macros that call
105 functions are declared with () even if they have no parameters, macros
106 that reference fields do not have a () declaration.
107
109 There are a number of macros that are useful exclusively within lexer
110 rules. There are additional macros, common to all recognizer, and these
111 are documented in the section Common Macros.
112
113 LEXER
114 The LEXER macro returns a pointer to the base lexer object, which is of
115 type pANTLR3_LEXER. This is not the pointer to your generated lexer,
116 which is supplied by the CTX macro, but to the common implementation of
117 a lexer interface, which is supplied to all generated lexers.
118
119 LEXSTATE
120 Provides a pointer to the lexer shared state structure, which is where
121 the tokens for a rule are constructed and the status elements of the
122 lexer are kept. This pointer is of type
123 pANTLR3_RECOGNIZER_SHARED_STATE.In general you should only access
124 elements of this structure if there is not already another MACRO or
125 standard $xxxx antlr reference that refers to it.
126
127 LA(n)
128 The LA macro returns the character at index n from the current input
129 stream index. The return type is ANTLR3_UINT32. Hence LA(1) returns the
130 character at the current input position (the character that will be
131 consumed next), LA(-1) returns the character that has just been
132 consumed and so on. The LA(n) macro is useful for constructing semantic
133 predicates in lexer rules. The reference LA(0) is undefined and will
134 cause an error in your lexer.
135
136 GETCHARINDEX()
137 The GETCHARINDEX macro returns the index of the current character
138 position as a 0 based offset from the start of the input stream. It
139 returns a value type of ANTLR3_UINT32.
140
141 GETLINE()
142 The GETLINE macro returns the line number of current character (LA(1)
143 in the input stream. It returns a value type of ANTLR3_UINT32. Note
144 that the line number is incremented automatically by an input stream
145 when it sees the input character '
146
147 GETTEXT()
148 The GETTEXT macro returns the text currently matched by the lexer rule.
149 In general you should use the generic $text reference in ANTLR to
150 retrieve this. The return type is a reference type of pANTLR3_STRING
151 which allows you to manipulate the text you have retrieved (NB this
152 does not change the input stream only the text you copy from the input
153 stream when you use this MACRO or $text).
154
155 The reference $text->chars or GETTEXT()->chars will reference a pointer
156 to the '\0' terminated character string that the ANTLR3 pANTLR3_STRING
157 represents. String space is allocated automatically as well as the
158 structure that holds the string. The pANTLR3_STRING_FACTORY associated
159 with the lexer handles this and when you close the lexer, it will
160 automatically free any space allocated for strings and their
161 structures.
162
163 GETCHARPOSITIONINLINE()
164 The GETCHARPOSITIONINLINE returns the zero based offset of character
165 LA(1) from the start of the current input line. See the macro GETLINE
166 for details on what the line number means.
167
168 EMIT()
169 The macro EMIT causes the text range currently matched to the lexer
170 rule to be emitted immediately as the token for the rule. Subsequent
171 text is matched but ignored. The type used for the the token is the
172 name of the lexer rule or, if you have change this by using $type =
173 XXX;, the type XXX is used.
174
175 EMITNEW(t)
176 The macro EMITNEW causes the supplied token reference t to be used as
177 the token emitted by the rule. The parameter t must be of type
178 pANTLR3_COMMON_TOKEN.
179
180 INDEX()
181 The INDEX macro returns the current input position according to the
182 input stream. It is not guaranteed to be the character offset in the
183 input stream but is instead used as a value for marking and rewinding
184 to specific points in the input stream. Use the macro GETCHARINDEX() to
185 find out the position of the LA(1) in the input stream.
186
187 PUSHSTREAM(str)
188 The PUSHSTREAM macro, in conjunction with the POPSTREAM macro (called
189 internally in the runtime usually) can be used to stack many input
190 streams to the lexer, and implement constructs such as the C pre-
191 processor #include directive.
192
193 An input stream that is pushed on to the stack becomes the current
194 input stream for the lexer and the state of the previous stream is
195 automatically saved. The input stream will be automatically popped from
196 the stack when it is exhausted by the lexer. You may use the macro
197 POPSTREAM to return to the previous input stream prior to exhausting
198 the currently stacked input stream.
199
200 Here is an example of using the macro in a lexer to implement the C
201 #include pre-processor directive:
202
203 fragment
204 STRING_GUTS : (~('\'|''') )* ;
205
206 LINE_COMMAND
207 : '#' (' ' | '')*
208 ( '? '0
209 'include' (' ' | '')+ ''' file = STRING_GUTS ''' (' ' | '')* '
210 {
211 pANTLR3_STRING fName;
212 pANTLR3_INPUT_STREAM in;
213
214 // Create an initial string, then take a substring
215 // We can do this by messing with the start and end
216 // pointers of tokens and so on. This shows a reasonable way to
217 // manipulate strings.
218 //
219 fName = $file.text;
220 printf('Including file 's'0, fName->chars);
221
222 // Create a new input stream and take advantage of built in stream stacking
223 // in C target runtime.
224 //
225 in = antlr3AsciiFileStreamNew(fName->chars);
226 PUSHSTREAM(in);
227
228 // Note that the input stream is not closed when it EOFs, I don't bother
229 // to do it here, but it is up to you to track streams created like this
230 // and destroy them when the whole parse session is complete. Remember that you
231 // don't want to do this until all tokens have been manipulated all the way through
232 // your tree parsers etc as the token does not store the text it just refers
233 // back to the input stream and trying to get the text for it will abort if you
234 // close the input stream too early.
235 //
236 '? '0
237 } ')* '
238 | (('0'..'9')=>('0'..'9'))+ ~('0|'
239 )
240 {$channel=HIDDEN;}
241 ;
242
243 POPSTREAM()
244 Assuming that you have stacked an input stream using the PUSHSTREAM
245 macro, you can remove it from the stream stack and revert to the
246 previous input stream. You should be careful to pop the stream at an
247 appropriate point in your lexer action, so you do not match characters
248 from one stream with those from another in the same rule (unless this
249 is what you want to do)
250
251 SETTEXT(str)
252 A token manufactured by the lexer does not actually physically store
253 the text from the input stream to which it matches. The token string is
254 instead created only if you ask for the text. However if you wish to
255 change the text that the token represents you can use this macro to set
256 it explicitly. Note that this does not change the input stream text but
257 associates the supplied pANTLR3_STRING with the token. This string is
258 then returned when parser and tree parser reference the tokens via the
259 $xxx.text reference.
260
261 USER1 USER2 USER3 and CUSTOM
262 While you can create your own custom token class and have the lexer
263 deal with this, this is a lot of work compared to the trivial
264 inheritance that can be achieved in the Java target. In many cases
265 though, all that is needed is the addition of a few data items such as
266 an integer or a pointer. Rather than require C programmers to create
267 complicated structures just to add a few data items, the C target
268 provides a few custom fields in the standard token, which will fulfil
269 the needs of most lexers and parsers.
270
271 The token fields user1, user2, and user3 are all value types of
272 ANTLR_UINT32. In the parser you can reference these fields directly
273 from the token: x=TOKNAME { $x->user1 ... but when you are building the
274 token in the lexer, you must assign to the fields using the macros
275 USER1, USER2, or USER3. As in:
276
277 LEXTOK: 'AAAAA' { USER1 = 99; } ;
278
280 PARSER
281 The PARSER macro returns a pointer to the base parser or tree parser
282 object, which is of type pANTLR3_PARSER or pANTLR3_TREE_PARSER . This
283 is not the pointer to your generated parser, which is supplied by the
284 CTX macro, but to the common implementation of a parser or tree parser
285 interface, which is supplied to all generated parsers.
286
287 INDEX()
288 When used in the parser, the INDEX macro returns the position of the
289 current token ( LT(1) ) in the input token stream. It can be used for
290 MARK and REWIND operations.
291
292 LT(n) and LA(n)
293 In the parser, the macro LT(n) returns the pANTLR3_COMMON_TOKEN at
294 offset n from the current token stream input position. The macro LA(n)
295 returns the token type of the token at position n. The value n cannot
296 be zero, and such a reference will return NULL and possibly cause an
297 error. LA(1) is the token that is about to be recognized and LA(-1) is
298 the token that has just been recognized. Values of n that exceed the
299 limits of the token stream boundaries will return NULL.
300
301 PSRSTATE
302 Returns the shared state pointer of type
303 pANTLR3_RECOGNIZER_SHARED_STATE. This is not generally useful to the
304 grammar programmer as the useful elements have generic $xxx references
305 built in to ANTLR.
306
307 ADAPTOR
308 When building an AST via a parser, the work of constructing and
309 manipulating trees is done by a supplied adaptor class. The default
310 class is usually fine for most tree operations but if you wish to build
311 your own specialized linked/tree structure, then you may need to
312 reference the adaptor you supply directly. The ADAPTOR macro returns
313 the reference to the tree adaptor which is always of type
314 pANTLR3_BASE_TREE_ADAPTOR, even if it is your custom adapter.
315
317 RECOGNIZER
318 Returns a reference type of pANTRL3_BASE_RECOGNIZER, which is the base
319 functionality supplied to all recognizers, whether lexers, parsers or
320 tree parsers. You can override methods in this interface by installing
321 your own function pointers (once you know what you are doing).
322
323 INPUT
324 Returns a reference to the input stream of the appropriate type for the
325 recognizer. In a lexer this macro returns a reference type of
326 pANTLR3_INPUT_STREAM, in a parser this is type pANTLR3_TOKEN_STREAM and
327 in a tree parser this is type pANTLR3_COMMON_TREE_NODE_STREAM. You can
328 of course provide your own implementations of any of these interfaces.
329
330 MARK()
331 This macro will cause the input stream for the current recognizer to be
332 marked with a checkpoint. It will return a value type of ANTLR3_MARKER
333 which you can use as the parameter to a REWIND macro to return to the
334 marked point in the input.
335
336 If you know you will only ever rewind to the last MARK, then you can
337 ignore the return value of this macro and just use the REWINDLAST macro
338 to return to the last MARK that was set in the input stream.
339
340 REWIND(m)
341 Rewinds the appropriate input stream back to the marked checkpoint
342 returned from a prior MARK macro call and supplied as the parameter m
343 to the REWIND(m) macro.
344
345 REWINDLAST()
346 Rewinds the current input stream (character, tokens, tree nodes) back
347 to the last checkpoint marker created by a MARK macro call. Fails
348 silently if there was no prior MARK call.
349
350 SEEK(n)
351 Causes the input stream to position itself directly at offset n in the
352 stream. Works for all input stream types, both lexer, parser and tree
353 parser.
354
355
356
357Version 3.1.2 Wed Oct 13 2010 interop(3)