1CTAGS-OPTLIB(7) Universal Ctags CTAGS-OPTLIB(7)
2
3
4
6 ctags-optlib - Universal Ctags parser definition language
7
9 ctags [options] [file(s)]
10 etags [options] [file(s)]
11
12
14 Exuberant Ctags, the ancestor of Universal Ctags, has provided the way
15 to define a new parser from command line. Universal Ctags extends and
16 refines this feature. optlib parser is the name for such parser in Uni‐
17 versal Ctags. "opt" intends a parser is defined with combination of
18 command line options. "lib" intends an optlib parser can be more than
19 ad-hoc personal configuration.
20
21 This man page is for people who want to define an optlib parser. The
22 readers should read ctags(1) of Universal Ctags first.
23
24 Following options are for defining (or customizing) a parser:
25
26 • --langdef=<name>
27
28 • --map-<LANG>=[+|-]<extension>|<pattern>
29
30 • --kinddef-<LANG>=<letter>,<name>,<description>
31
32 • --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34 • --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35 tern>/<kind-spec>/{mgroup=<N>}[<flags>]
36
37 Following options are for controlling loading parser definition:
38
39 • --options=<pathname>
40
41 • --options-maybe=<pathname>
42
43 • --optlib-dir=[+]<directory>
44
45 The design of options and notations for defining a parser in Exuberant
46 Ctags may focus on reducing the number of typing by user. Reducing the
47 number of typing is important for users who want to define (or custom‐
48 ize) a parser quickly.
49
50 On the other hand, the design in Universal Ctags focuses on maintain‐
51 ability. The notation of Universal Ctags is redundant than that of Exu‐
52 berant Ctags; the newly introduced kind should be declared explicitly,
53 (long) names are approved than one-letter flags specifying kinds, and
54 naming rules are stricter.
55
56 This man page explains only stable options and flags. Universal Ctags
57 also introduces experimental options and flags which have names start‐
58 ing with _. For documentation on these options and flags, visit Univer‐
59 sal Ctags web site at https://ctags.io/.
60
61 Storing a parser definition to a file
62 Though it is possible to define a parser from command line, you don't
63 want to type the same command line each time when you need the parser.
64 You can store options for defining a parser into a file.
65
66 ctags loads files (preload files) listed in "FILES" section of ctags(1)
67 at program starting up. You can put your parser definition needed usu‐
68 ally to the files.
69
70 --options=<pathname>, --options-maybe=<pathname>, and
71 --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72 sionally. See "Option File Options" section of ctags(1) for these op‐
73 tions.
74
75 As explained in "FILES" section of ctags(1), options for defining a
76 parser listed line by line in an optlib file. Prefixed white spaces are
77 ignored. A line starting with '#' is treated as a comment. Escaping
78 shell meta character is not needed.
79
80 Use .ctags as file extension for optlib file. You can define multiple
81 parsers in an optlib file but it is better to make a file for each
82 parser definition.
83
84 --_echo=<msg> and --_force-quit=<num> options are for debugging optlib
85 parser.
86
87 Overview for defining a parser
88 1. Design the parser
89
90 You need know both the target language and the ctags' concepts (def‐
91 inition, reference, kind, role, field, extra). About the concepts,
92 ctags(1) of Universal Ctags may help you.
93
94 2. Give a name to the parser
95
96 Use --langdef=<name> option. <name> is referred as <LANG> in the
97 later steps.
98
99 3. Give a file pattern or file extension for activating the parser
100
101 Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103 4. Define kinds
104
105 Use --kinddef-<LANG>=<letter>,<name>,<description> option. Univer‐
106 sal Ctags introduces this option. Exuberant Ctags doesn't have. In
107 Exuberant Ctags, a kind is defined as a side effect of specifying
108 --regex-<LANG>= option. So user doesn't have a chance to recognize
109 how important the definition of kind.
110
111 5. Define patterns
112
113 Use --regex-<LANG>=/<line_pattern>/<name_pat‐
114 tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115 sion. You can also use --mline-regex-<LANG>=/<line_pat‐
116 tern>/<name_pattern>/<kind-spec>/{mgroup=<N>}[<flags>] option for a
117 multi-line regular expression.
118
119 As <kind-spec>, you can use the one-letter flag defined with --kind‐
120 def-<LANG>=<letter>,<name>,<description> option.
121
123 --langdef=<name>
124 Defines a new user-defined language, <name>, to be parsed with
125 regular expressions. Once defined, <name> may be used in other
126 options taking language names.
127
128 <name> must consist of alphanumeric characters, '#', or '+'
129 ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and '+'
130 are disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131 lowed because they can make the command line parser of ctags
132 confused. The rest of them are just reserved for future extend‐
133 ing ctags.
134
135 all is an exception. all as <name> is not acceptable. It is a
136 reserved word. See the description of
137 --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about
138 how the reserved word is used.
139
140 NONE is another exception. NONE as <name> is not acceptable.
141
142 The names of built-in parsers are capitalized. When ctags evalu‐
143 ates an option in a command line, and chooses a parser, ctags
144 uses the names of parsers in a case-insensitive way. Therefore,
145 giving a name started from a lowercase character doesn't help
146 you to avoid the parser name confliction. However, in a tags
147 file, ctags prints parser names in a case-sensitive way; it
148 prints a parser name as specified in --langdef=<name> option.
149 Therefore, we recommend you to give a name started from a lower‐
150 case character to your private optlib parser. With this conven‐
151 tion, people can know where a tag entry in a tag file comes from
152 a built-in parser or a private optlib parser.
153
154 --kinddef-<LANG>=<letter>,<name>,<description>
155 Define a kind for <LANG>. Be not confused this with
156 --kinds-<LANG>.
157
158 <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
159 than "F". "F" has been reserved for representing a file since
160 Exuberant Ctags.
161
162 <name> must start with an alphabetic character, and the rest
163 must be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use
164 "file" as <name>. It has been reserved for representing a file
165 since Exuberant Ctags.
166
167 Note that using a number character in a <name> violates the ver‐
168 sion 2 of tags file format though ctags accepts it. For more de‐
169 tail, see tags(5).
170
171 <description> comes from any printable ASCII characters. The ex‐
172 ception is { and \. { is reserved for adding flags this option
173 in the future. So put \ before { to include { to a description.
174 To include \ itself to a description, put \ before \.
175
176 Both <letter>, <name> and their combination must be unique in a
177 <LANG>.
178
179 This option is newly introduced in Universal Ctags. This option
180 reduces the typing defining a regex pattern with
181 --regex-<LANG>=, and keeps the consistency of kind definitions
182 in a language.
183
184 The <letter> can be used as an argument for --kinds-<LANG> op‐
185 tion to enable or disable the kind. Unless K field is enabled,
186 the <letter> is used as value in the "kind" extension field in
187 tags output.
188
189 The <name> surrounded by braces can be used as an argument for
190 --kind-<LANG> option. If K field is enabled, the <name> is used
191 as value in the "kind" extension field in tags output.
192
193 The <description> and <letter> are listed in --list-kinds out‐
194 put. All three elements of the kind-spec are listed in
195 --list-kinds-full output. Don't use braces in the <description>.
196 They will be used meta characters in the future.
197
198 --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
199 Define a single-line regular expression.
200
201 The /<line_pattern>/<name_pattern>/ pair defines a regular ex‐
202 pression replacement pattern, similar in style to sed substitu‐
203 tion commands, s/regexp/replacement/, with which to generate
204 tags from source files mapped to the named language, <LANG>,
205 (case-insensitive; either a built-in or user-defined language).
206
207 The regular expression, <line_pattern>, defines an extended reg‐
208 ular expression (roughly that used by egrep(1)), which is used
209 to locate a single source line containing a tag and may specify
210 tab characters using \t.
211
212 When a matching line is found, a tag will be generated for the
213 name defined by <name_pattern>, which generally will contain the
214 special back-references \1 through \9 to refer to matching
215 sub-expression groups within <line_pattern>.
216
217 The '/' separator characters shown in the parameter to the op‐
218 tion can actually be replaced by any character. Note that which‐
219 ever separator character is used will have to be escaped with a
220 backslash ('\') character wherever it is used in the parameter
221 as something other than a separator. The regular expression de‐
222 fined by this option is added to the current list of regular ex‐
223 pressions for the specified language unless the parameter is
224 omitted, in which case the current list is cleared.
225
226 Unless modified by <flags>, <line_pattern> is interpreted as a
227 POSIX extended regular expression. The <name_pattern> should ex‐
228 pand for all matching lines to a non-empty string of characters,
229 or a warning message will be reported unless {placeholder} regex
230 flag is specified.
231
232 A kind specifier (<kind-spec>) for tags matching regexp may fol‐
233 low <name_pattern>, which will determine what kind of tag is re‐
234 ported in the kind extension field (see tags(5)).
235
236 <kind-spec> has two forms: one-letter form and full form.
237
238 The one-letter form in the form of <letter>. It just refers
239 a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
240 ommended in Universal Ctags.
241
242 The full form of <kind-spec> is in the form of <let‐
243 ter>,<name>,<description>. Either the kind <name> and/or
244 the <description> can be omitted. See the description of --kind‐
245 def-<LANG>=<letter>,<name>,<description> option about the ele‐
246 ments.
247
248 The full form is supported only for keeping the compatibility
249 with Exuberant Ctags which does not have --kinddef-<LANG> op‐
250 tion. Supporting the form will be removed from Universal Ctags
251 in the future.
252
253 About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
254
255 For more information on the regular expressions used by ctags,
256 see either the regex(5,7) man page, or the GNU info documenta‐
257 tion for regex (e.g. "info regex").
258
259 --list-regex-flags
260 Lists the flags that can be used in --regex-<LANG> option.
261
262 --list-mline-regex-flags
263 Lists the flags that can be used in --mline-regex-<LANG> option.
264
265 --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
266 tern>/<kind-spec>/{mgroup=<N>}[<flags>]
267 Define a multi-line regular expression.
268
269 This option is similar to --regex-<LANG> option except the pat‐
270 tern is applied to the whole file’s contents, not line by line.
271
272 See "FLAGS FOR `--mline-regex-<LANG>`` OPTION`_" about
273 {mgroup=<N>}. {mgroup=<N>} flag is a must.
274
275 --_echo=<message>
276 Print <message> to the standard error stream. This is helpful
277 to understand (and debug) optlib loading feature of Universal
278 Ctags.
279
280 --_force-quit[=<num>]
281 Exits immediately when this option is processed. If <num> is
282 used as exit status. The default is 0. This is helpful to debug
283 optlib loading feature of Universal Ctags.
284
285 FLAGS FOR --regex-<LANG> OPTION
286 You can specify more than one flag, <letter>|{<name>}, at the end of
287 --regex-<LANG> to control how Universal Ctags uses the pattern.
288
289 Exuberant Ctags uses a <letter> to represent a flag. In Universal
290 Ctags, a <name> surrounded by braces (name form) can be used in addi‐
291 tion to <letter>. The name form makes a user reading an optlib file
292 easier.
293
294 The most of all flags newly added in Universal Ctags don't have the
295 one-letter representation. All of them have only the name representa‐
296 tion. --list-regex-flags lists all the flags.
297
298 basic (one-letter form b)
299 The pattern is interpreted as a POSIX basic regular expression.
300
301 exclusive (one-letter form x)
302 Skip testing the other patterns if a line is matched to this
303 pattern. This is useful to avoid using CPU to parse line com‐
304 ments.
305
306 extend (one-letter form e)
307 The pattern is interpreted as a POSIX extended regular expres‐
308 sion (default).
309
310 pcre2 (one-letter form p, experimental)
311 The pattern is interpreted as a PCRE2 regular expression ex‐
312 plained in pcre2syntax(3). This flag is available only if the
313 ctags is built with pcre2 library. See the output of --list-fea‐
314 tures option to know whether your ctags is built-with pcre2 or
315 not.
316
317 icase (one-letter form i)
318 The regular expression is to be applied in a case-insensitive
319 manner.
320
321 placeholder
322 Don't emit a tag captured with a regex pattern. The replacement
323 can be an empty string. See the following description of
324 scope=... flag about how this is useful.
325
326 scope=(ref|push|pop|clear|set|replace)
327 Specify what to do with the internal scope stack.
328
329 A parser programmed with --regex-<LANG> has a stack (scope stack)
330 internally. You can use it for tracking scope information. The
331 scope=... flag is for manipulating and utilizing the scope stack.
332
333 If {scope=push} is specified, a tag captured with --regex-<LANG> is
334 pushed to the stack. {scope=push} implies {scope=ref}.
335
336 You can fill the scope field (scope:) of captured tag with
337 {scope=ref}. If {scope=ref} flag is given, ctags attaches the tag at
338 the top to the tag captured with --regex-<LANG> as the value for the
339 scope: field.
340
341 ctags pops the tag at the top of the stack when --regex-<LANG> with
342 {scope=pop} is matched to the input line.
343
344 Specifying {scope=clear} removes all the tags in the scope. Speci‐
345 fying {scope=set} removes all the tags in the scope, and then pushes
346 the captured tag as {scope=push} does.
347
348 {scope=replace} does the three things sequentially. First it does
349 the same as {scope=pop}, then fills the scope: field of the tag cap‐
350 tured with --regex-<LANG>, and pushes the tag to the scope stack as
351 if {scope=push} was given finally. You cannot specify another scope
352 action together with {scope=replace}.
353
354 You don't want to specify {scope=pop}{scope=push} as an alternative
355 to {scope=replace}; {scope=pop}{scope=push} fills the scope: field
356 of the tag captured with --regex-<LANG> first, then pops the tag at
357 the top of the stack, and pushes the captured tag to the scope stack
358 finally. The timing when filling the end field is different between
359 {scope=replace} and {scope=pop}{scope=push}.
360
361 In some cases, you may want to use --regex-<LANG> only for its side
362 effects: using it only to manipulate the stack but not for capturing
363 a tag. In such a case, make <name_pattern> component of
364 --regex-<LANG> option empty while specifying {placeholder} as a
365 regex flag. For example, a non-named tag can be put on the stack by
366 giving a regex flag "{scope=push}{placeholder}".
367
368 You may wonder what happens if a regex pattern with {scope=ref} flag
369 matches an input line but the stack is empty, or a non-named tag is
370 at the top. If the regex pattern contains a {scope=ref} flag and the
371 stack is empty, the {scope=ref} flag is ignored and nothing is at‐
372 tached to the scope: field.
373
374 If the top of the stack contains an unnamed tag, ctags searches
375 deeper into the stack to find the top-most named tag. If it reaches
376 the bottom of the stack without finding a named tag, the {scope=ref}
377 flag is ignored and nothing is attached to the scope: field.
378
379 When a named tag on the stack is popped or cleared as the side ef‐
380 fect of a pattern matching, ctags attaches the line number of the
381 match to the end: field of the named tag.
382
383 ctags clears all of the tags on the stack when it reaches the end of
384 the input source file. The line number of the end is attached to the
385 end: field of the cleared tags.
386
387 warning=<message>
388 print the given <message> at WARNING level
389
390 fatal=<message>
391 print the given <message> and exit
392
393 FLAGS FOR --mline-regex-<LANG> OPTION
394 mgroup=<N>
395 decide the location of the tag extracted with --mline-regex-<LANG>
396 option.
397
398 <N> is the number of a capture group in the pattern, which is used
399 to record the line number location of the tag. mgroup=<N> flag is
400 not an optional. You must add an mgroup=<N> flag, even if the <N> is
401 0 (meaning the start position of the whole regex pattern).
402
404 Perl Pod
405 This is the definition (pod.ctags) used in ctags for parsing Pod (‐
406 https://perldoc.perl.org/perlpod.html) file.
407
408 --langdef=pod
409 --map-pod=+.pod
410
411 --kinddef-pod=c,chapter,chapters
412 --kinddef-pod=s,section,sections
413 --kinddef-pod=S,subsection,subsections
414 --kinddef-pod=t,subsubsection,subsubsections
415
416 --regex-pod=/^=head1[ \t]+(.+)/\1/c/
417 --regex-pod=/^=head2[ \t]+(.+)/\1/s/
418 --regex-pod=/^=head3[ \t]+(.+)/\1/S/
419 --regex-pod=/^=head4[ \t]+(.+)/\1/t/
420
421 Using scope regex flags
422 Let's think about writing a parser for a very small subset of the Ruby
423 language.
424
425 input source file (input.srb):
426
427 class Example
428 def methodA
429 puts "in class_method"
430 end
431 def methodB
432 puts "in class_method"
433 end
434 end
435
436 The parser for the input should capture Example with class kind,
437 methodA, and methodB with method kind. methodA and methodB should have
438 Example as their scope. end: fields of each tag should have proper val‐
439 ues.
440
441 optlib file (sub-ruby.ctags):
442
443 --langdef=subRuby
444 --map-subRuby=.srb
445 --kinddef-subRuby=c,class,classes
446 --kinddef-subRuby=m,method,methods
447 --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
448 --regex-subRuby=/^end///{scope=pop}{placeholder}
449 --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
450 --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
451
452 command line and output:
453
454 $ ctags --quiet --fields=+eK \
455 --options=./sub-ruby.ctags -o - input.srb
456 Example input.srb /^class Example$/;" class end:8
457 methodA input.srb /^ def methodA$/;" method class:Example end:4
458 methodB input.srb /^ def methodB$/;" method class:Example end:7
459
461 The official Universal Ctags web site at:
462
463 https://ctags.io/
464
465 ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
466
468 Universal Ctags project https://ctags.io/ (This man page partially de‐
469 rived from ctags(1) of Executable-ctags)
470
471 Darren Hiebert <dhiebert@users.sourceforge.net>
472 http://DarrenHiebert.com/
473
474
475
476
4776.0.0 CTAGS-OPTLIB(7)