antlr3-interop(3)

1interop(3)                          ANTLR3C                         interop(3)
2
3
4

NAME

6       interop - .TH "interop" 3 "Wed Oct 13 2010" "Version 3.1.2" "ANTLR3C"
7

NAME

9       interop - .SH "Introduction"
10
11       The main way to interact with the generated code is via action code
12       placed within { and } characters in your rules. In general, you are
13       advised to keep the code you embed within these actions, and the
14       grammar itself to an absolute minimum. Rather than embed code directly
15       in your grammar, you should construct an API, that is called from the
16       actions within your grammar. This way you will keep the grammar clean
17       and maintainable and separate the code generators or other code from
18       the definition of the grammar itself.
19
20       However, when you wish to call your API functions, or insert small
21       pieces of code that do not warrant external functions, you will need to
22       access elements of tokens, return elements from parser rules and
23       perhaps the internals of the recognizer itself. The C runtime provides
24       a number of MACROs that you can use within your action code. It also
25       provides a number of performant structures that you may find useful for
26       building symbol tables, lists, tries, stacks, arrays and so on (all of
27       which are managed so that your memory allocation problems are
28       minimized.)
29

Parameters and Returns from Parser Rules

31       The C target does not differ from the Java target in any major ways
32       here, and you should consult the standard documentation for the use of
33       parameters on rules and the returns clause. You should be aware though,
34       that the rules generate C function calls and therefore the input and
35       returns clauses are subject to the constraints of C scoping.
36
37       You should note that if your parser rule returns more than a single
38       entity, then the return type of the generated rule function is a
39       struct, which is returned by value. This is also the case if your rule
40       is part of a tree building grammar (uses the output=AST; option.
41
42       Other than the notes above, you can use any pre-declared type as an
43       input or output parameter for your rule.
44

Memory Management

46       You are responsible for allocating and freeing any memory used by your
47       own constructs, ANTLR will track and release any memory allocated
48       internally for tokens, trees, stacks, scopes and so on. This memory is
49       returned to the malloc pool when you call the free method of any ANTLR3
50       produced structure.
51
52       For performance reasons, and to avoid thrashing the malloc allocation
53       system, memory for amy elements of your generated parser is allocated
54       in chunks and parcelled out by factories. For instance memory for
55       tokens is created as an array of tokens, and a token factory hands out
56       the next available slot to the lexer. When you free the lexer, the
57       allocated memory is returned to the pool. The same applies to 'strings'
58       that contain the token text and various other text elements accessed
59       within the lexer.
60
61       The only side effect of this is that after your parse and analysis is
62       complete, if you wish to retain anything generated automatically, you
63       must copy it before freeing the recognizer structures. In practice it
64       is usually practical to retain the recognizer context objects until
65       your processing is complete or to use your own allocation scheme for
66       generating output etc.
67
68       The advantage of using object factories is of course that memory leaks
69       and accessing de-allocated memory are bugs that rarely occur within the
70       ANTLR3 C runtime. Further, allocating memory for tokens, trees and so
71       on is very fast.
72

The CTX Macro

74       The CTX macro is a fundamental parameter that is passed as the first
75       parameter to any generated function concerned with your lexer, parser,
76       or tree parser. The is is the context pointer for your generated
77       recognizer and is how you invoke the generated functions, and access
78       the data embedded within your generated recognizer. While you can use
79       it to directly access stacks, scopes and so on, this is not really
80       recommended as you should use the $xxx references that are available
81       generically within ANTLR grammars.
82
83       The context pointer is used because this removes the need for any
84       global/static variables at all, either within the generated code, or
85       the C runtime. This is of course fundamental to creating free threading
86       recognizers. Wherever a function call or rule call required the ctx
87       parameter, you either reference it via the CTX macro, or the ctx
88       parameter is in fact the return type from calling the 'constructor'
89       function for your parser/lexer/tree parser (see code example in 'How to
90       build Generated Code' .)
91

Macro Changes

93       While the author is not fond of using C MACROs to hide code or
94       structure access, in the case of generated code, they serve two useful
95       purposes. The first is to simplify the references to internal
96       constructs, the second is to facilitate the change of any internal
97       interface without requiring you to port grammars from earlier versions
98       (just regenerate and recompile). As of release 3.1, these macros are
99       stable and will only change their usage interface in the event of bugs
100       being discovered. You are encouraged to use these macros in your code,
101       rather than access the raw interface.
102
103       : Macros that act like statements must be terminated with a ';'. The
104       macro body does not supply this, nor should it. Macros that call
105       functions are declared with () even if they have no parameters, macros
106       that reference fields do not have a () declaration.
107

Lexer Macros

109       There are a number of macros that are useful exclusively within lexer
110       rules. There are additional macros, common to all recognizer, and these
111       are documented in the section Common Macros.
112
113   LEXER
114       The LEXER macro returns a pointer to the base lexer object, which is of
115       type pANTLR3_LEXER. This is not the pointer to your generated lexer,
116       which is supplied by the CTX macro, but to the common implementation of
117       a lexer interface, which is supplied to all generated lexers.
118
119   LEXSTATE
120       Provides a pointer to the lexer shared state structure, which is where
121       the tokens for a rule are constructed and the status elements of the
122       lexer are kept. This pointer is of type
123       pANTLR3_RECOGNIZER_SHARED_STATE.In general you should only access
124       elements of this structure if there is not already another MACRO or
125       standard $xxxx antlr reference that refers to it.
126
127   LA(n)
128       The LA macro returns the character at index n from the current input
129       stream index. The return type is ANTLR3_UINT32. Hence LA(1) returns the
130       character at the current input position (the character that will be
131       consumed next), LA(-1) returns the character that has just been
132       consumed and so on. The LA(n) macro is useful for constructing semantic
133       predicates in lexer rules. The reference LA(0) is undefined and will
134       cause an error in your lexer.
135
136   GETCHARINDEX()
137       The GETCHARINDEX macro returns the index of the current character
138       position as a 0 based offset from the start of the input stream. It
139       returns a value type of ANTLR3_UINT32.
140
141   GETLINE()
142       The GETLINE macro returns the line number of current character (LA(1)
143       in the input stream. It returns a value type of ANTLR3_UINT32. Note
144       that the line number is incremented automatically by an input stream
145       when it sees the input character '
146
147   GETTEXT()
148       The GETTEXT macro returns the text currently matched by the lexer rule.
149       In general you should use the generic $text reference in ANTLR to
150       retrieve this. The return type is a reference type of pANTLR3_STRING
151       which allows you to manipulate the text you have retrieved (NB this
152       does not change the input stream only the text you copy from the input
153       stream when you use this MACRO or $text).
154
155       The reference $text->chars or GETTEXT()->chars will reference a pointer
156       to the '\0' terminated character string that the ANTLR3 pANTLR3_STRING
157       represents. String space is allocated automatically as well as the
158       structure that holds the string. The pANTLR3_STRING_FACTORY associated
159       with the lexer handles this and when you close the lexer, it will
160       automatically free any space allocated for strings and their
161       structures.
162
163   GETCHARPOSITIONINLINE()
164       The GETCHARPOSITIONINLINE returns the zero based offset of character
165       LA(1) from the start of the current input line. See the macro GETLINE
166       for details on what the line number means.
167
168   EMIT()
169       The macro EMIT causes the text range currently matched to the lexer
170       rule to be emitted immediately as the token for the rule. Subsequent
171       text is matched but ignored. The type used for the the token is the
172       name of the lexer rule or, if you have change this by using $type =
173       XXX;, the type XXX is used.
174
175   EMITNEW(t)
176       The macro EMITNEW causes the supplied token reference t to be used as
177       the token emitted by the rule. The parameter t  must be of type
178       pANTLR3_COMMON_TOKEN.
179
180   INDEX()
181       The INDEX macro returns the current input position according to the
182       input stream. It is not guaranteed to be the character offset in the
183       input stream but is instead used as a value for marking and rewinding
184       to specific points in the input stream. Use the macro GETCHARINDEX() to
185       find out the position of the LA(1) in the input stream.
186
187   PUSHSTREAM(str)
188       The PUSHSTREAM macro, in conjunction with the POPSTREAM macro (called
189       internally in the runtime usually) can be used to stack many input
190       streams to the lexer, and implement constructs such as the C pre-
191       processor #include directive.
192
193       An input stream that is pushed on to the stack becomes the current
194       input stream for the lexer and the state of the previous stream is
195       automatically saved. The input stream will be automatically popped from
196       the stack when it is exhausted by the lexer. You may use the macro
197       POPSTREAM to return to the previous input stream prior to exhausting
198       the currently stacked input stream.
199
200       Here is an example of using the macro in a lexer to implement the C
201       #include pre-processor directive:
202
203        fragment
204        STRING_GUTS :  (~('\'|''') )* ;
205
206        LINE_COMMAND
207        : '#' (' ' | '')*
208           (                                                                 '? '0
209               'include' (' ' | '')+ ''' file = STRING_GUTS ''' (' ' | '')* '
210               {
211                   pANTLR3_STRING      fName;
212                   pANTLR3_INPUT_STREAM    in;
213
214                   // Create an initial string, then take a substring
215                   // We can do this by messing with the start and end
216                   // pointers of tokens and so on. This shows a reasonable way to
217                   // manipulate strings.
218                   //
219                   fName = $file.text;
220                   printf('Including file 's'0, fName->chars);
221
222                   // Create a new input stream and take advantage of built in stream stacking
223                   // in C target runtime.
224                   //
225                   in = antlr3AsciiFileStreamNew(fName->chars);
226                   PUSHSTREAM(in);
227
228                   // Note that the input stream is not closed when it EOFs, I don't bother
229                   // to do it here, but it is up to you to track streams created like this
230                   // and destroy them when the whole parse session is complete. Remember that you
231                   // don't want to do this until all tokens have been manipulated all the way through
232                   // your tree parsers etc as the token does not store the text it just refers
233                   // back to the input stream and trying to get the text for it will abort if you
234                   // close the input stream too early.
235                   //
236                                                           '? '0
237               }                                      ')* '
238                    | (('0'..'9')=>('0'..'9'))+ ~('0|'
239               )
240            {$channel=HIDDEN;}
241            ;
242
243   POPSTREAM()
244       Assuming that you have stacked an input stream using the PUSHSTREAM
245       macro, you can remove it from the stream stack and revert to the
246       previous input stream. You should be careful to pop the stream at an
247       appropriate point in your lexer action, so you do not match characters
248       from one stream with those from another in the same rule (unless this
249       is what you want to do)
250
251   SETTEXT(str)
252       A token manufactured by the lexer does not actually physically store
253       the text from the input stream to which it matches. The token string is
254       instead created only if you ask for the text. However if you wish to
255       change the text that the token represents you can use this macro to set
256       it explicitly. Note that this does not change the input stream text but
257       associates the supplied pANTLR3_STRING with the token. This string is
258       then returned when parser and tree parser reference the tokens via the
259       $xxx.text reference.
260
261   USER1 USER2 USER3 and CUSTOM
262       While you can create your own custom token class and have the lexer
263       deal with this, this is a lot of work compared to the trivial
264       inheritance that can be achieved in the Java target. In many cases
265       though, all that is needed is the addition of a few data items such as
266       an integer or a pointer. Rather than require C programmers to create
267       complicated structures just to add a few data items, the C target
268       provides a few custom fields in the standard token, which will fulfil
269       the needs of most lexers and parsers.
270
271       The token fields user1, user2, and user3 are all value types of
272       ANTLR_UINT32. In the parser you can reference these fields directly
273       from the token: x=TOKNAME { $x->user1 ... but when you are building the
274       token in the lexer, you must assign to the fields using the macros
275       USER1, USER2, or USER3. As in:
276
277        LEXTOK: 'AAAAA' { USER1 = 99; } ;
278

Parser and Tree Parser Macros

280   PARSER
281       The PARSER macro returns a pointer to the base parser or tree parser
282       object, which is of type pANTLR3_PARSER or pANTLR3_TREE_PARSER . This
283       is not the pointer to your generated parser, which is supplied by the
284       CTX macro, but to the common implementation of a parser or tree parser
285       interface, which is supplied to all generated parsers.
286
287   INDEX()
288       When used in the parser, the INDEX macro returns the position of the
289       current token ( LT(1) ) in the input token stream. It can be used for
290       MARK and REWIND operations.
291
292   LT(n) and LA(n)
293       In the parser, the macro LT(n) returns the pANTLR3_COMMON_TOKEN at
294       offset n from the current token stream input position. The macro LA(n)
295       returns the token type of the token at position n. The value n cannot
296       be zero, and such a reference will return NULL and possibly cause an
297       error. LA(1) is the token that is about to be recognized and LA(-1) is
298       the token that has just been recognized. Values of n that exceed the
299       limits of the token stream boundaries will return NULL.
300
301   PSRSTATE
302       Returns the shared state pointer of type
303       pANTLR3_RECOGNIZER_SHARED_STATE. This is not generally useful to the
304       grammar programmer as the useful elements have generic $xxx references
305       built in to ANTLR.
306
307   ADAPTOR
308       When building an AST via a parser, the work of constructing and
309       manipulating trees is done by a supplied adaptor class. The default
310       class is usually fine for most tree operations but if you wish to build
311       your own specialized linked/tree structure, then you may need to
312       reference the adaptor you supply directly. The ADAPTOR macro returns
313       the reference to the tree adaptor which is always of type
314       pANTLR3_BASE_TREE_ADAPTOR, even if it is your custom adapter.
315

Macros Common to All Recognizers

317   RECOGNIZER
318       Returns a reference type of pANTRL3_BASE_RECOGNIZER, which is the base
319       functionality supplied to all recognizers, whether lexers, parsers or
320       tree parsers. You can override methods in this interface by installing
321       your own function pointers (once you know what you are doing).
322
323   INPUT
324       Returns a reference to the input stream of the appropriate type for the
325       recognizer. In a lexer this macro returns a reference type of
326       pANTLR3_INPUT_STREAM, in a parser this is type pANTLR3_TOKEN_STREAM and
327       in a tree parser this is type pANTLR3_COMMON_TREE_NODE_STREAM. You can
328       of course provide your own implementations of any of these interfaces.
329
330   MARK()
331       This macro will cause the input stream for the current recognizer to be
332       marked with a checkpoint. It will return a value type of ANTLR3_MARKER
333       which you can use as the parameter to a REWIND macro to return to the
334       marked point in the input.
335
336       If you know you will only ever rewind to the last MARK, then you can
337       ignore the return value of this macro and just use the REWINDLAST macro
338       to return to the last MARK that was set in the input stream.
339
340   REWIND(m)
341       Rewinds the appropriate input stream back to the marked checkpoint
342       returned from a prior MARK macro call and supplied as the parameter m
343       to the REWIND(m) macro.
344
345   REWINDLAST()
346       Rewinds the current input stream (character, tokens, tree nodes) back
347       to the last checkpoint marker created by a MARK macro call. Fails
348       silently if there was no prior MARK call.
349
350   SEEK(n)
351       Causes the input stream to position itself directly at offset n in the
352       stream. Works for all input stream types, both lexer, parser and tree
353       parser.
354
355
356
357Version 3.1.2                   Wed Oct 13 2010                     interop(3)