1CTAGS-OPTLIB(7) Universal Ctags CTAGS-OPTLIB(7)
2
3
4
6 ctags-optlib - Universal Ctags parser definition language
7
9 ctags [options] [file(s)]
10 etags [options] [file(s)]
11
12
14 Exuberant Ctags, the ancestor of Universal Ctags, has provided the way
15 to define a new parser from command line. Universal Ctags extends and
16 refines this feature. optlib parser is the name for such parser in Uni‐
17 versal Ctags. "opt" intends a parser is defined with combination of
18 command line options. "lib" intends an optlib parser can be more than
19 ad-hoc personal configuration.
20
21 This man page is for people who want to define an optlib parser. The
22 readers should read ctags(1) of Universal Ctags first.
23
24 Following options are for defining (or customizing) a parser:
25
26 • --langdef=<name>
27
28 • --map-<LANG>=[+|-]<extension>|<pattern>
29
30 • --kinddef-<LANG>=<letter>,<name>,<description>
31
32 • --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34 • --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35 tern>/<kind-spec>/[<flags>]
36
37 Following options are for controlling loading parser definition:
38
39 • --options=<pathname>
40
41 • --options-maybe=<pathname>
42
43 • --optlib-dir=[+]<directory>
44
45 The design of options and notations for defining a parser in Exuberant
46 Ctags may focus on reducing the number of typing by user. Reducing the
47 number of typing is important for users who want to define (or custom‐
48 ize) a parser quickly.
49
50 On the other hand, the design in Universal Ctags focuses on maintain‐
51 ability. The notation of Universal Ctags is redundant than that of Exu‐
52 berant Ctags; the newly introduced kind should be declared explicitly,
53 (long) names are approved than one-letter flags specifying kinds, and
54 naming rules are stricter.
55
56 This man page explains only stable options and flags. Universal Ctags
57 also introduces experimental options and flags which have names start‐
58 ing with _. For documentation on these options and flags, visit Univer‐
59 sal Ctags web site at https://ctags.io/.
60
61 Storing a parser definition to a file
62 Though it is possible to define a parser from command line, you don't
63 want to type the same command line each time when you need the parser.
64 You can store options for defining a parser into a file.
65
66 ctags loads files (preload files) listed in "FILES" section of ctags(1)
67 at program starting up. You can put your parser definition needed usu‐
68 ally to the files.
69
70 --options=<pathname>, --options-maybe=<pathname>, and
71 --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72 sionally. See "Option File Options" section of ctags(1) for these op‐
73 tions.
74
75 As explained in "FILES" section of ctags(1), options for defining a
76 parser listed line by line in an optlib file. Prefixed white spaces are
77 ignored. A line starting with '#' is treated as a comment. Escaping
78 shell meta character is not needed.
79
80 Use .ctags as file extension for optlib file. You can define multiple
81 parsers in an optlib file but it is better to make a file for each
82 parser definition.
83
84 --_echo=<msg> and --_force-quit=<num> options are for debugging optlib
85 parser.
86
87 Overview for defining a parser
88 1. Design the parser
89
90 You need know both the target language and the ctags' concepts (def‐
91 inition, reference, kind, role, field, extra). About the concepts,
92 ctags(1) of Universal Ctags may help you.
93
94 2. Give a name to the parser
95
96 Use --langdef=<name> option. <name> is referred as <LANG> in the
97 later steps.
98
99 3. Give a file pattern or file extension for activating the parser
100
101 Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103 4. Define kinds
104
105 Use --kinddef-<LANG>=<letter>,<name>,<description> option. Univer‐
106 sal Ctags introduces this option. Exuberant Ctags doesn't have. In
107 Exuberant Ctags, a kind is defined as a side effect of specifying
108 --regex-<LANG>= option. So user doesn't have a chance to recognize
109 how important the definition of kind.
110
111 5. Define patterns
112
113 Use --regex-<LANG>=/<line_pattern>/<name_pat‐
114 tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115 sion. You can also use --mline-regex-<LANG>=/<line_pat‐
116 tern>/<name_pattern>/<kind-spec>/[<flags>] option for a multi-line
117 regular expression.
118
119 As <kind-spec>, you can use the one-letter flag defined with --kind‐
120 def-<LANG>=<letter>,<name>,<description> option.
121
123 --langdef=<name>
124 Defines a new user-defined language, <name>, to be parsed with
125 regular expressions. Once defined, <name> may be used in other
126 options taking language names.
127
128 <name> must consist of alphanumeric characters, '#', or '+'
129 ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and '+'
130 are disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131 lowed because they can make the command line parser of ctags
132 confused. The rest of them are just reserved for future extend‐
133 ing ctags.
134
135 all is an exception. all as <name> is not acceptable. It is a
136 reserved word. See the description of
137 --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about
138 how the reserved word is used.
139
140 The names of built-in parsers are capitalized. When ctags evalu‐
141 ates an option in a command line, and chooses a parser, ctags
142 uses the names of parsers in a case-insensitive way. Therefore,
143 giving a name started from a lowercase character doesn't help
144 you to avoid the parser name confliction. However, in a tags
145 file, ctags prints parser names in a case-sensitive way; it
146 prints a parser name as specified in --langdef=<name> option.
147 Therefore, we recommend you to give a name started from a lower‐
148 case character to your private optlib parser. With this conven‐
149 tion, people can know where a tag entry in a tag file comes from
150 a built-in parser or a private optlib parser.
151
152 --kinddef-<LANG>=<letter>,<name>,<description>
153 Define a kind for <LANG>. Be not confused this with
154 --kinds-<LANG>.
155
156 <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
157 than "F". "F" has been reserved for representing a file since
158 Exuberant Ctags.
159
160 <name> must start with an alphabetic character, and the rest
161 must be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use
162 "file" as <name>. It has been reserved for representing a file
163 since Exuberant Ctags.
164
165 Note that using a number character in a <name> violates the ver‐
166 sion 2 of tags file format though ctags accepts it. For more de‐
167 tail, see tags(5).
168
169 <description> comes from any printable ASCII characters. The ex‐
170 ception is { and \. { is reserved for adding flags this option
171 in the future. So put \ before { to include { to a description.
172 To include \ itself to a description, put \ before \.
173
174 Both <letter>, <name> and their combination must be unique in a
175 <LANG>.
176
177 This option is newly introduced in Universal Ctags. This option
178 reduces the typing defining a regex pattern with
179 --regex-<LANG>=, and keeps the consistency of kind definitions
180 in a language.
181
182 The <letter> can be used as an argument for --kinds-<LANG> op‐
183 tion to enable or disable the kind. Unless K field is enabled,
184 the <letter> is used as value in the "kind" extension field in
185 tags output.
186
187 The <name> surrounded by braces can be used as an argument for
188 --kind-<LANG> option. If K field is enabled, the <name> is used
189 as value in the "kind" extension field in tags output.
190
191 The <description> and <letter> are listed in --list-kinds out‐
192 put. All three elements of the kind-spec are listed in
193 --list-kinds-full output. Don't use braces in the <description>.
194 They will be used meta characters in the future.
195
196 --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
197 Define a single-line regular expression.
198
199 The /<line_pattern>/<name_pattern>/ pair defines a regular ex‐
200 pression replacement pattern, similar in style to sed substitu‐
201 tion commands, s/regexp/replacement/, with which to generate
202 tags from source files mapped to the named language, <LANG>,
203 (case-insensitive; either a built-in or user-defined language).
204
205 The regular expression, <line_pattern>, defines an extended reg‐
206 ular expression (roughly that used by egrep(1)), which is used
207 to locate a single source line containing a tag and may specify
208 tab characters using \t.
209
210 When a matching line is found, a tag will be generated for the
211 name defined by <name_pattern>, which generally will contain the
212 special back-references \1 through \9 to refer to matching
213 sub-expression groups within <line_pattern>.
214
215 The '/' separator characters shown in the parameter to the op‐
216 tion can actually be replaced by any character. Note that which‐
217 ever separator character is used will have to be escaped with a
218 backslash ('\') character wherever it is used in the parameter
219 as something other than a separator. The regular expression de‐
220 fined by this option is added to the current list of regular ex‐
221 pressions for the specified language unless the parameter is
222 omitted, in which case the current list is cleared.
223
224 Unless modified by <flags>, <line_pattern> is interpreted as a
225 POSIX extended regular expression. The <name_pattern> should ex‐
226 pand for all matching lines to a non-empty string of characters,
227 or a warning message will be reported unless {placeholder} regex
228 flag is specified.
229
230 A kind specifier (<kind-spec>) for tags matching regexp may fol‐
231 low <name_pattern>, which will determine what kind of tag is re‐
232 ported in the kind extension field (see tags(5)).
233
234 <kind-spec> has two forms: one-letter form and full form.
235
236 The one-letter form in the form of <letter>. It just refers
237 a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
238 ommended in Universal Ctags.
239
240 The full form of <kind-spec> is in the form of <let‐
241 ter>,<name>,<description>. Either the kind <name> and/or
242 the <description> can be omitted. See the description of --kind‐
243 def-<LANG>=<letter>,<name>,<description> option about the ele‐
244 ments.
245
246 The full form is supported only for keeping the compatibility
247 with Exuberant Ctags which does not have --kinddef-<LANG> op‐
248 tion. Supporting the form will be removed from Universal Ctags
249 in the future.
250
251 About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
252
253 For more information on the regular expressions used by ctags,
254 see either the regex(5,7) man page, or the GNU info documenta‐
255 tion for regex (e.g. "info regex").
256
257 --list-regex-flags
258 Lists the flags that can be used in --regex-<LANG> option.
259
260 --list-mline-regex-flags
261 Lists the flags that can be used in --mline-regex-<LANG> option.
262
263 --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
264 tern>/<kind-spec>/[<flags>]
265 Define a multi-line regular expression.
266
267 This option is similar to --regex-<LANG> option except the pat‐
268 tern is applied to the whole file’s contents, not line by line.
269
270 --_echo=<message>
271 Print <message> to the standard error stream. This is helpful
272 to understand (and debug) optlib loading feature of Universal
273 Ctags.
274
275 --_force-quit[=<num>]
276 Exits immediately when this option is processed. If <num> is
277 used as exit status. The default is 0. This is helpful to debug
278 optlib loading feature of Universal Ctags.
279
280 FLAGS FOR --regex-<LANG> OPTION
281 You can specify more than one flag, <letter>|{<name>}, at the end of
282 --regex-<LANG> to control how Universal Ctags uses the pattern.
283
284 Exuberant Ctags uses a <letter> to represent a flag. In Universal
285 Ctags, a <name> surrounded by braces (name form) can be used in addi‐
286 tion to <letter>. The name form makes a user reading an optlib file
287 easier.
288
289 The most of all flags newly added in Universal Ctags don't have the
290 one-letter representation. All of them have only the name representa‐
291 tion. --list-regex-flags lists all the flags.
292
293 basic (one-letter form b)
294 The pattern is interpreted as a POSIX basic regular expression.
295
296 exclusive (one-letter form x)
297 Skip testing the other patterns if a line is matched to this
298 pattern. This is useful to avoid using CPU to parse line com‐
299 ments.
300
301 extend (one-letter form e)
302 The pattern is interpreted as a POSIX extended regular expres‐
303 sion (default).
304
305 pcre2 (one-letter form p, experimental)
306 The pattern is interpreted as a PCRE2 regular expression ex‐
307 plained in pcre2syntax(3). This flag is available only if the
308 ctags is built with pcre2 library. See the output of --list-fea‐
309 tures option to know whether your ctags is built-with pcre2 or
310 not.
311
312 icase (one-letter form i)
313 The regular expression is to be applied in a case-insensitive
314 manner.
315
316 placeholder
317 Don't emit a tag captured with a regex pattern. The replacement
318 can be an empty string. See the following description of
319 scope=... flag about how this is useful.
320
321 scope=(ref|push|pop|clear|set|replace)
322 Specify what to do with the internal scope stack.
323
324 A parser programmed with --regex-<LANG> has a stack (scope stack)
325 internally. You can use it for tracking scope information. The
326 scope=... flag is for manipulating and utilizing the scope stack.
327
328 If {scope=push} is specified, a tag captured with --regex-<LANG> is
329 pushed to the stack. {scope=push} implies {scope=ref}.
330
331 You can fill the scope field (scope:) of captured tag with
332 {scope=ref}. If {scope=ref} flag is given, ctags attaches the tag at
333 the top to the tag captured with --regex-<LANG> as the value for the
334 scope: field.
335
336 ctags pops the tag at the top of the stack when --regex-<LANG> with
337 {scope=pop} is matched to the input line.
338
339 Specifying {scope=clear} removes all the tags in the scope. Speci‐
340 fying {scope=set} removes all the tags in the scope, and then pushes
341 the captured tag as {scope=push} does.
342
343 {scope=replace} does the three things sequentially. First it does
344 the same as {scope=pop}, then fills the scope: field of the tag cap‐
345 tured with --regex-<LANG>, and pushes the tag to the scope stack as
346 if {scope=push} was given finally. You cannot specify another scope
347 action together with {scope=replace}.
348
349 You don't want to specify {scope=pop}{scope=push} as an alternative
350 to {scope=replace}; {scope=pop}{scope=push} fills the scope: field
351 of the tag captured with --regex-<LANG> first, then pops the tag at
352 the top of the stack, and pushes the captured tag to the scope stack
353 finally. The timing when filling the end field is different between
354 {scope=replace} and {scope=pop}{scope=push}.
355
356 In some cases, you may want to use --regex-<LANG> only for its side
357 effects: using it only to manipulate the stack but not for capturing
358 a tag. In such a case, make <name_pattern> component of
359 --regex-<LANG> option empty while specifying {placeholder} as a
360 regex flag. For example, a non-named tag can be put on the stack by
361 giving a regex flag "{scope=push}{placeholder}".
362
363 You may wonder what happens if a regex pattern with {scope=ref} flag
364 matches an input line but the stack is empty, or a non-named tag is
365 at the top. If the regex pattern contains a {scope=ref} flag and the
366 stack is empty, the {scope=ref} flag is ignored and nothing is at‐
367 tached to the scope: field.
368
369 If the top of the stack contains an unnamed tag, ctags searches
370 deeper into the stack to find the top-most named tag. If it reaches
371 the bottom of the stack without finding a named tag, the {scope=ref}
372 flag is ignored and nothing is attached to the scope: field.
373
374 When a named tag on the stack is popped or cleared as the side ef‐
375 fect of a pattern matching, ctags attaches the line number of the
376 match to the end: field of the named tag.
377
378 ctags clears all of the tags on the stack when it reaches the end of
379 the input source file. The line number of the end is attached to the
380 end: field of the cleared tags.
381
382 warning=<message>
383 print the given <message> at WARNING level
384
385 fatal=<message>
386 print the given <message> and exit
387
389 Perl Pod
390 This is the definition (pod.ctags) used in ctags for parsing Pod (‐
391 https://perldoc.perl.org/perlpod.html) file.
392
393 --langdef=pod
394 --map-pod=+.pod
395
396 --kinddef-pod=c,chapter,chapters
397 --kinddef-pod=s,section,sections
398 --kinddef-pod=S,subsection,subsections
399 --kinddef-pod=t,subsubsection,subsubsections
400
401 --regex-pod=/^=head1[ \t]+(.+)/\1/c/
402 --regex-pod=/^=head2[ \t]+(.+)/\1/s/
403 --regex-pod=/^=head3[ \t]+(.+)/\1/S/
404 --regex-pod=/^=head4[ \t]+(.+)/\1/t/
405
406 Using scope regex flags
407 Let's think about writing a parser for a very small subset of the Ruby
408 language.
409
410 input source file (input.srb):
411
412 class Example
413 def methodA
414 puts "in class_method"
415 end
416 def methodB
417 puts "in class_method"
418 end
419 end
420
421 The parser for the input should capture Example with class kind,
422 methodA, and methodB with method kind. methodA and methodB should have
423 Example as their scope. end: fields of each tag should have proper val‐
424 ues.
425
426 optlib file (sub-ruby.ctags):
427
428 --langdef=subRuby
429 --map-subRuby=.srb
430 --kinddef-subRuby=c,class,classes
431 --kinddef-subRuby=m,method,methods
432 --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
433 --regex-subRuby=/^end///{scope=pop}{placeholder}
434 --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
435 --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
436
437 command line and output:
438
439 $ ctags --quiet --fields=+eK \
440 --options=./sub-ruby.ctags -o - input.srb
441 Example input.srb /^class Example$/;" class end:8
442 methodA input.srb /^ def methodA$/;" method class:Example end:4
443 methodB input.srb /^ def methodB$/;" method class:Example end:7
444
446 The official Universal Ctags web site at:
447
448 https://ctags.io/
449
450 ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
451
453 Universal Ctags project https://ctags.io/ (This man page partially de‐
454 rived from ctags(1) of Executable-ctags)
455
456 Darren Hiebert <dhiebert@users.sourceforge.net>
457 http://DarrenHiebert.com/
458
459
460
461
4625.9.0 CTAGS-OPTLIB(7)