1CTAGS-OPTLIB(7) Universal Ctags CTAGS-OPTLIB(7)
2
3
4
6 ctags-optlib - Universal Ctags parser definition language
7
9 ctags [options] [file(s)]
10 etags [options] [file(s)]
11
12
14 Exuberant Ctags, the ancestor of Universal Ctags, has provided the way
15 to define a new parser from command line. Universal Ctags extends and
16 refines this feature. optlib parser is the name for such parser in Uni‐
17 versal Ctags. "opt" intends a parser is defined with combination of
18 command line options. "lib" intends an optlib parser can be more than
19 ad-hoc personal configuration.
20
21 This man page is for people who want to define an optlib parser. The
22 readers should read ctags(1) of Universal Ctags first.
23
24 Following options are for defining (or customizing) a parser:
25
26 • --langdef=<name>
27
28 • --map-<LANG>=[+|-]<extension>|<pattern>
29
30 • --kinddef-<LANG>=<letter>,<name>,<description>
31
32 • --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34 • --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35 tern>/<kind-spec>/[<flags>]
36
37 Following options are for controlling loading parser definition:
38
39 • --options=<pathname>
40
41 • --options-maybe=<pathname>
42
43 • --optlib-dir=[+]<directory>
44
45 The design of options and notations for defining a parser in Exuberant
46 Ctags may focus on reducing the number of typing by user. Reducing the
47 number of typing is important for users who want to define (or custom‐
48 ize) a parser quickly.
49
50 On the other hand, the design in Universal Ctags focuses on maintain‐
51 ability. The notation of Universal Ctags is redundant than that of Exu‐
52 berant Ctags; the newly introduced kind should be declared explicitly,
53 (long) names are approved than one-letter flags specifying kinds, and
54 naming rules are stricter.
55
56 This man page explains only stable options and flags. Universal Ctags
57 also introduces experimental options and flags which have names start‐
58 ing with _. For documentation on these options and flags, visit Univer‐
59 sal Ctags web site at https://ctags.io/.
60
61 Storing a parser definition to a file
62 Though it is possible to define a parser from command line, you don't
63 want to type the same command line each time when you need the parser.
64 You can store options for defining a parser into a file.
65
66 ctags loads files (preload files) listed in "FILES" section of ctags(1)
67 at program starting up. You can put your parser definition needed usu‐
68 ally to the files.
69
70 --options=<pathname>, --options-maybe=<pathname>, and
71 --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72 sionally. See "Option File Options" section of ctags(1) for these op‐
73 tions.
74
75 As explained in "FILES" section of ctags(1), options for defining a
76 parser listed line by line in an optlib file. Prefixed white spaces are
77 ignored. A line starting with '#' is treated as a comment. Escaping
78 shell meta character is not needed.
79
80 Use .ctags as file extension for optlib file. You can define multiple
81 parsers in an optlib file but it is better to make a file for each
82 parser definition.
83
84 --_echo=<msg> and --_force-quit=<num> options are for debugging optlib
85 parser.
86
87 Overview for defining a parser
88 1. Design the parser
89
90 You need know both the target language and the ctags' concepts (def‐
91 inition, reference, kind, role, field, extra). About the concepts,
92 ctags(1) of Universal Ctags may help you.
93
94 2. Give a name to the parser
95
96 Use --langdef=<name> option. <name> is referred as <LANG> in the
97 later steps.
98
99 3. Give a file pattern or file extension for activating the parser
100
101 Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103 4. Define kinds
104
105 Use --kinddef-<LANG>=<letter>,<name>,<description> option. Univer‐
106 sal Ctags introduces this option. Exuberant Ctags doesn't have. In
107 Exuberant Ctags, a kind is defined as a side effect of specifying
108 --regex-<LANG>= option. So user doesn't have a chance to recognize
109 how important the definition of kind.
110
111 5. Define patterns
112
113 Use --regex-<LANG>=/<line_pattern>/<name_pat‐
114 tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115 sion. You can also use --mline-regex-<LANG>=/<line_pat‐
116 tern>/<name_pattern>/<kind-spec>/[<flags>] option for a multi-line
117 regular expression.
118
119 As <kind-spec>, you can use the one-letter flag defined with --kind‐
120 def-<LANG>=<letter>,<name>,<description> option.
121
123 --langdef=<name>
124 Defines a new user-defined language, <name>, to be parsed with
125 regular expressions. Once defined, <name> may be used in other
126 options taking language names.
127
128 <name> must consist of alphanumeric characters, '#', or '+'
129 ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and '+'
130 are disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131 lowed because they can make the command line parser of ctags
132 confused. The rest of them are just reserved for future extend‐
133 ing ctags.
134
135 all is an exception. all as <name> is not acceptable. It is a
136 reserved word. See the description of
137 --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about
138 how the reserved word is used.
139
140 The names of built-in parsers are capitalized. When ctags evalu‐
141 ates an option in a command line, and chooses a parser, ctags
142 uses the names of parsers in a case-insensitive way. Therefore,
143 giving a name started from a lowercase character doesn't help
144 you to avoid the parser name confliction. However, in a tags
145 file, ctags prints parser names in a case-sensitive way; it
146 prints a parser name as specified in --langdef=<name> option.
147 Therefore, we recommend you to give a name started from a lower‐
148 case character to your private optlib parser. With this conven‐
149 tion, people can know where a tag entry in a tag file comes from
150 a built-in parser or a private optlib parser.
151
152 --kinddef-<LANG>=<letter>,<name>,<description>
153 Define a kind for <LANG>. Be not confused this with
154 --kinds-<LANG>.
155
156 <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
157 than "F". "F" has been reserved for representing a file since
158 Exuberant Ctags.
159
160 <name> must start with an alphabetic character, and the rest
161 must be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use
162 "file" as <name>. It has been reserved for representing a file
163 since Exuberant Ctags.
164
165 Note that using a number character in a <name> violates the ver‐
166 sion 2 of tags file format though ctags accepts it. For more de‐
167 tail, see tags(5).
168
169 <description> comes from any printable ASCII characters. The ex‐
170 ception is { and \. { is reserved for adding flags this option
171 in the future. So put \ before { to include { to a description.
172 To include \ itself to a description, put \ before \.
173
174 Both <letter>, <name> and their combination must be unique in a
175 <LANG>.
176
177 This option is newly introduced in Universal Ctags. This option
178 reduces the typing defining a regex pattern with
179 --regex-<LANG>=, and keeps the consistency of kind definitions
180 in a language.
181
182 The <letter> can be used as an argument for --kinds-<LANG> op‐
183 tion to enable or disable the kind. Unless K field is enabled,
184 the <letter> is used as value in the "kind" extension field in
185 tags output.
186
187 The <name> surrounded by braces can be used as an argument for
188 --kind-<LANG> option. If K field is enabled, the <name> is used
189 as value in the "kind" extension field in tags output.
190
191 The <description> and <letter> are listed in --list-kinds out‐
192 put. All three elements of the kind-spec are listed in
193 --list-kinds-full output. Don't use braces in the <description>.
194 They will be used meta characters in the future.
195
196 --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
197 Define a single-line regular expression.
198
199 The /<line_pattern>/<name_pattern>/ pair defines a regular ex‐
200 pression replacement pattern, similar in style to sed substitu‐
201 tion commands, s/regexp/replacement/, with which to generate
202 tags from source files mapped to the named language, <LANG>,
203 (case-insensitive; either a built-in or user-defined language).
204
205 The regular expression, <line_pattern>, defines an extended reg‐
206 ular expression (roughly that used by egrep(1)), which is used
207 to locate a single source line containing a tag and may specify
208 tab characters using \t.
209
210 When a matching line is found, a tag will be generated for the
211 name defined by <name_pattern>, which generally will contain the
212 special back-references \1 through \9 to refer to matching
213 sub-expression groups within <line_pattern>.
214
215 The '/' separator characters shown in the parameter to the op‐
216 tion can actually be replaced by any character. Note that which‐
217 ever separator character is used will have to be escaped with a
218 backslash ('\') character wherever it is used in the parameter
219 as something other than a separator. The regular expression de‐
220 fined by this option is added to the current list of regular ex‐
221 pressions for the specified language unless the parameter is
222 omitted, in which case the current list is cleared.
223
224 Unless modified by <flags>, <line_pattern> is interpreted as a
225 POSIX extended regular expression. The <name_pattern> should ex‐
226 pand for all matching lines to a non-empty string of characters,
227 or a warning message will be reported unless {placeholder} regex
228 flag is specified.
229
230 A kind specifier (<kind-spec>) for tags matching regexp may fol‐
231 low <name_pattern>, which will determine what kind of tag is re‐
232 ported in the kind extension field (see tags(5)).
233
234 <kind-spec> has two forms: one-letter form and full form.
235
236 The one-letter form in the form of <letter>. It just refers
237 a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
238 ommended in Universal Ctags.
239
240 The full form of <kind-spec> is in the form of <let‐
241 ter>,<name>,<description>. Either the kind <name> and/or
242 the <description> can be omitted. See the description of --kind‐
243 def-<LANG>=<letter>,<name>,<description> option about the ele‐
244 ments.
245
246 The full form is supported only for keeping the compatibility
247 with Exuberant Ctags which does not have --kinddef-<LANG> op‐
248 tion. Supporting the form will be removed from Universal Ctags
249 in the future.
250
251 About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
252
253 For more information on the regular expressions used by ctags,
254 see either the regex(5,7) man page, or the GNU info documenta‐
255 tion for regex (e.g. "info regex").
256
257 --list-regex-flags
258 Lists the flags that can be used in --regex-<LANG> option.
259
260 --list-mline-regex-flags
261 Lists the flags that can be used in --mline-regex-<LANG> option.
262
263 --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
264 tern>/<kind-spec>/[<flags>]
265 Define a multi-line regular expression.
266
267 This option is similar to --regex-<LANG> option except the pat‐
268 tern is applied to the whole file’s contents, not line by line.
269
270 --_echo=<message>
271 Print <message> to the standard error stream. This is helpful
272 to understand (and debug) optlib loading feature of Universal
273 Ctags.
274
275 --_force-quit[=<num>]
276 Exits immediately when this option is processed. If <num> is
277 used as exit status. The default is 0. This is helpful to debug
278 optlib loading feature of Universal Ctags.
279
280 FLAGS FOR --regex-<LANG> OPTION
281 You can specify more than one flag, <letter>|{<name>}, at the end of
282 --regex-<LANG> to control how Universal Ctags uses the pattern.
283
284 Exuberant Ctags uses a <letter> to represent a flag. In Universal
285 Ctags, a <name> surrounded by braces (name form) can be used in addi‐
286 tion to <letter>. The name form makes a user reading an optlib file
287 easier.
288
289 The most of all flags newly added in Universal Ctags don't have the
290 one-letter representation. All of them have only the name representa‐
291 tion. --list-regex-flags lists all the flags.
292
293 basic (one-letter form b)
294 The pattern is interpreted as a POSIX basic regular expression.
295
296 exclusive (one-letter form x)
297 Skip testing the other patterns if a line is matched to this
298 pattern. This is useful to avoid using CPU to parse line com‐
299 ments.
300
301 extend (one-letter form e)
302 The pattern is interpreted as a POSIX extended regular expres‐
303 sion (default).
304
305 icase (one-letter form i)
306 The regular expression is to be applied in a case-insensitive
307 manner.
308
309 placeholder
310 Don't emit a tag captured with a regex pattern. The replacement
311 can be an empty string. See the following description of
312 scope=... flag about how this is useful.
313
314 scope=(ref|push|pop|clear|set)
315 Specify what to do with the internal scope stack.
316
317 A parser programmed with --regex-<LANG> has a stack (scope stack)
318 internally. You can use it for tracking scope information. The
319 scope=... flag is for manipulating and utilizing the scope stack.
320
321 If {scope=push} is specified, a tag captured with --regex-<LANG> is
322 pushed to the stack. {scope=push} implies {scope=ref}.
323
324 You can fill the scope field of captured tag with {scope=ref}. If
325 {scope=ref} flag is given, ctags attaches the tag at the top to the
326 tag captured with --regex-<LANG> as the value for the scope: field.
327
328 ctags pops the tag at the top of the stack when --regex-<LANG> with
329 {scope=pop} is matched to the input line.
330
331 Specifying {scope=clear} removes all the tags in the scope. Speci‐
332 fying {scope=set} removes all the tags in the scope, and then pushes
333 the captured tag as {scope=push} does.
334
335 In some cases, you may want to use --regex-<LANG> only for its side
336 effects: using it only to manipulate the stack but not for capturing
337 a tag. In such a case, make <name_pattern> component of
338 --regex-<LANG> option empty while specifying {placeholder} as a
339 regex flag. For example, a non-named tag can be put on the stack by
340 giving a regex flag "{scope=push}{placeholder}".
341
342 You may wonder what happens if a regex pattern with {scope=ref} flag
343 matches an input line but the stack is empty, or a non-named tag is
344 at the top. If the regex pattern contains a {scope=ref} flag and the
345 stack is empty, the {scope=ref} flag is ignored and nothing is at‐
346 tached to the scope: field.
347
348 If the top of the stack contains an unnamed tag, ctags searches
349 deeper into the stack to find the top-most named tag. If it reaches
350 the bottom of the stack without finding a named tag, the {scope=ref}
351 flag is ignored and nothing is attached to the scope: field.
352
353 When a named tag on the stack is popped or cleared as the side ef‐
354 fect of a pattern matching, ctags attaches the line number of the
355 match to the end: field of the named tag.
356
357 ctags clears all of the tags on the stack when it reaches the end of
358 the input source file. The line number of the end is attached to the
359 end: field of the cleared tags.
360
361 warning=<message>
362 print the given <message> at WARNING level
363
364 fatal=<message>
365 print the given <message> and exit
366
368 Perl Pod
369 This is the definition (pod.ctags) used in ctags for parsing Pod (‐
370 https://perldoc.perl.org/perlpod.html) file.
371
372 --langdef=pod
373 --map-pod=+.pod
374
375 --kinddef-pod=c,chapter,chapters
376 --kinddef-pod=s,section,sections
377 --kinddef-pod=S,subsection,subsections
378 --kinddef-pod=t,subsubsection,subsubsections
379
380 --regex-pod=/^=head1[ \t]+(.+)/\1/c/
381 --regex-pod=/^=head2[ \t]+(.+)/\1/s/
382 --regex-pod=/^=head3[ \t]+(.+)/\1/S/
383 --regex-pod=/^=head4[ \t]+(.+)/\1/t/
384
385 Using scope regex flags
386 Let's think about writing a parser for a very small subset of the Ruby
387 language.
388
389 input source file (input.srb):
390
391 class Example
392 def methodA
393 puts "in class_method"
394 end
395 def methodB
396 puts "in class_method"
397 end
398 end
399
400 The parser for the input should capture Example with class kind,
401 methodA, and methodB with method kind. methodA and methodB should have
402 Example as their scope. end: fields of each tag should have proper val‐
403 ues.
404
405 optlib file (sub-ruby.ctags):
406
407 --langdef=subRuby
408 --map-subRuby=.srb
409 --kinddef-subRuby=c,class,classes
410 --kinddef-subRuby=m,method,methods
411 --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
412 --regex-subRuby=/^end///{scope=pop}{placeholder}
413 --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
414 --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
415
416 command line and output:
417
418 $ ctags --quiet --fields=+eK \
419 --options=./sub-ruby.ctags -o - input.srb
420 Example input.srb /^class Example$/;" class end:8
421 methodA input.srb /^ def methodA$/;" method class:Example end:4
422 methodB input.srb /^ def methodB$/;" method class:Example end:7
423
425 The official Universal Ctags web site at:
426
427 https://ctags.io/
428
429 ctags(1), tags(5), regex(3), regex(7), egrep(1)
430
432 Universal Ctags project https://ctags.io/ (This man page partially de‐
433 rived from ctags(1) of Executable-ctags)
434
435 Darren Hiebert <dhiebert@users.sourceforge.net>
436 http://DarrenHiebert.com/
437
438
439
440
4415.9.0 CTAGS-OPTLIB(7)