1CTAGS-OPTLIB(7)                 Universal Ctags                CTAGS-OPTLIB(7)
2
3
4

NAME

6       ctags-optlib - Universal Ctags parser definition language
7

SYNOPSIS

9       ctags [options] [file(s)]
10       etags [options] [file(s)]
11
12

DESCRIPTION

14       Exuberant  Ctags, the ancestor of Universal Ctags, has provided the way
15       to define a new parser from command line.  Universal Ctags extends  and
16       refines this feature. optlib parser is the name for such parser in Uni‐
17       versal Ctags. "opt" intends a parser is  defined  with  combination  of
18       command  line  options. "lib" intends an optlib parser can be more than
19       ad-hoc personal configuration.
20
21       This man page is for people who want to define an  optlib  parser.  The
22       readers should read ctags(1) of Universal Ctags first.
23
24       Following options are for defining (or customizing) a parser:
25
26--langdef=<name>
27
28--map-<LANG>=[+|-]<extension>|<pattern>
29
30--kinddef-<LANG>=<letter>,<name>,<description>
31
32--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34--mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35         tern>/<kind-spec>/{mgroup=<N>}[<flags>]
36
37       Following options are for controlling loading parser definition:
38
39--options=<pathname>
40
41--options-maybe=<pathname>
42
43--optlib-dir=[+]<directory>
44
45       The design of options and notations for defining a parser in  Exuberant
46       Ctags may focus on reducing the number of typing by user.  Reducing the
47       number of typing is important for users who want to define (or  custom‐
48       ize) a parser quickly.
49
50       On  the  other hand, the design in Universal Ctags focuses on maintain‐
51       ability. The notation of Universal Ctags is redundant than that of Exu‐
52       berant  Ctags; the newly introduced kind should be declared explicitly,
53       (long) names are approved than one-letter flags specifying  kinds,  and
54       naming rules are stricter.
55
56       This  man page explains only stable options and flags.  Universal Ctags
57       also introduces experimental options and flags which have names  start‐
58       ing with _. For documentation on these options and flags, visit Univer‐
59       sal Ctags web site at https://ctags.io/.
60
61   Storing a parser definition to a file
62       Though it is possible to define a parser from command line,  you  don't
63       want  to type the same command line each time when you need the parser.
64       You can store options for defining a parser into a file.
65
66       ctags loads files (preload files) listed in "FILES" section of ctags(1)
67       at  program starting up. You can put your parser definition needed usu‐
68       ally to the files.
69
70       --options=<pathname>,          --options-maybe=<pathname>,          and
71       --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72       sionally. See "Option File Options" section of ctags(1) for  these  op‐
73       tions.
74
75       As  explained  in  "FILES"  section of ctags(1), options for defining a
76       parser listed line by line in an optlib file. Prefixed white spaces are
77       ignored.  A  line  starting with '#' is treated as a comment.  Escaping
78       shell meta character is not needed.
79
80       Use .ctags as file extension for optlib file. You can  define  multiple
81       parsers  in  an  optlib  file  but it is better to make a file for each
82       parser definition.
83
84       --_echo=<msg> and --_force-quit=<num> options are for debugging  optlib
85       parser.
86
87   Overview for defining a parser
88       1. Design the parser
89
90          You need know both the target language and the ctags' concepts (def‐
91          inition, reference, kind, role, field, extra). About  the  concepts,
92          ctags(1) of Universal Ctags may help you.
93
94       2. Give a name to the parser
95
96          Use  --langdef=<name>  option.  <name>  is referred as <LANG> in the
97          later steps.
98
99       3. Give a file pattern or file extension for activating the parser
100
101          Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103       4. Define kinds
104
105          Use --kinddef-<LANG>=<letter>,<name>,<description> option.   Univer‐
106          sal  Ctags introduces this option.  Exuberant Ctags doesn't have. In
107          Exuberant Ctags, a kind is defined as a side  effect  of  specifying
108          --regex-<LANG>=  option.  So user doesn't have a chance to recognize
109          how important the definition of kind.
110
111       5. Define patterns
112
113          Use                        --regex-<LANG>=/<line_pattern>/<name_pat‐
114          tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115          sion.   You   can    also    use    --mline-regex-<LANG>=/<line_pat‐
116          tern>/<name_pattern>/<kind-spec>/{mgroup=<N>}[<flags>]  option for a
117          multi-line regular expression.
118
119          As <kind-spec>, you can use the one-letter flag defined with --kind‐
120          def-<LANG>=<letter>,<name>,<description> option.
121

OPTIONS

123       --langdef=<name>
124              Defines  a  new user-defined language, <name>, to be parsed with
125              regular expressions. Once defined, <name> may be used  in  other
126              options taking language names.
127
128              <name>  must  consist  of  alphanumeric  characters, '#', or '+'
129              ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and  '+'
130              are  disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131              lowed because they can make the command  line  parser  of  ctags
132              confused.  The rest of them are just reserved for future extend‐
133              ing ctags.
134
135              all is an exception.  all as <name> is not acceptable. It  is  a
136              reserved       word.       See      the      description      of
137              --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in  ctags(1)  about
138              how the reserved word is used.
139
140              NONE is another exception. NONE as <name> is not acceptable.
141
142              The names of built-in parsers are capitalized. When ctags evalu‐
143              ates an option in a command line, and chooses  a  parser,  ctags
144              uses  the names of parsers in a case-insensitive way. Therefore,
145              giving a name started from a lowercase  character  doesn't  help
146              you  to  avoid  the  parser name confliction. However, in a tags
147              file, ctags prints parser names  in  a  case-sensitive  way;  it
148              prints  a  parser  name as specified in --langdef=<name> option.
149              Therefore, we recommend you to give a name started from a lower‐
150              case  character to your private optlib parser. With this conven‐
151              tion, people can know where a tag entry in a tag file comes from
152              a built-in parser or a private optlib parser.
153
154       --kinddef-<LANG>=<letter>,<name>,<description>
155              Define   a   kind   for  <LANG>.   Be  not  confused  this  with
156              --kinds-<LANG>.
157
158              <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
159              than  "F".  "F"  has been reserved for representing a file since
160              Exuberant Ctags.
161
162              <name> must start with an alphabetic  character,  and  the  rest
163              must   be  alphanumeric  ('[a-zA-Z][a-zA-Z0-9]*').  Do  not  use
164              "file" as <name>. It has been reserved for representing  a  file
165              since Exuberant Ctags.
166
167              Note that using a number character in a <name> violates the ver‐
168              sion 2 of tags file format though ctags accepts it. For more de‐
169              tail, see tags(5).
170
171              <description> comes from any printable ASCII characters. The ex‐
172              ception is { and \. { is reserved for adding flags  this  option
173              in  the future. So put \ before { to include { to a description.
174              To include \ itself to a description, put \ before \.
175
176              Both <letter>, <name> and their combination must be unique in  a
177              <LANG>.
178
179              This option is newly introduced in Universal Ctags.  This option
180              reduces   the   typing   defining   a   regex    pattern    with
181              --regex-<LANG>=,  and  keeps the consistency of kind definitions
182              in a language.
183
184              The <letter> can be used as an argument for  --kinds-<LANG>  op‐
185              tion  to  enable or disable the kind. Unless K field is enabled,
186              the <letter> is used as value in the "kind" extension  field  in
187              tags output.
188
189              The  <name>  surrounded by braces can be used as an argument for
190              --kind-<LANG> option. If K field is enabled, the <name> is  used
191              as value in the "kind" extension field in tags output.
192
193              The  <description>  and <letter> are listed in --list-kinds out‐
194              put.  All  three  elements  of  the  kind-spec  are  listed   in
195              --list-kinds-full output. Don't use braces in the <description>.
196              They will be used meta characters in the future.
197
198       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
199              Define a single-line regular expression.
200
201              The /<line_pattern>/<name_pattern>/ pair defines a  regular  ex‐
202              pression  replacement pattern, similar in style to sed substitu‐
203              tion commands, s/regexp/replacement/,  with  which  to  generate
204              tags  from  source  files  mapped to the named language, <LANG>,
205              (case-insensitive; either a built-in or user-defined language).
206
207              The regular expression, <line_pattern>, defines an extended reg‐
208              ular  expression  (roughly that used by egrep(1)), which is used
209              to locate a single source line containing a tag and may  specify
210              tab characters using \t.
211
212              When  a  matching line is found, a tag will be generated for the
213              name defined by <name_pattern>, which generally will contain the
214              special  back-references  \1  through  \9  to  refer to matching
215              sub-expression groups within <line_pattern>.
216
217              The '/' separator characters shown in the parameter to  the  op‐
218              tion can actually be replaced by any character. Note that which‐
219              ever separator character is used will have to be escaped with  a
220              backslash  ('\')  character wherever it is used in the parameter
221              as something other than a separator. The regular expression  de‐
222              fined by this option is added to the current list of regular ex‐
223              pressions for the specified language  unless  the  parameter  is
224              omitted, in which case the current list is cleared.
225
226              Unless  modified  by <flags>, <line_pattern> is interpreted as a
227              POSIX extended regular expression. The <name_pattern> should ex‐
228              pand for all matching lines to a non-empty string of characters,
229              or a warning message will be reported unless {placeholder} regex
230              flag is specified.
231
232              A kind specifier (<kind-spec>) for tags matching regexp may fol‐
233              low <name_pattern>, which will determine what kind of tag is re‐
234              ported in the kind extension field (see tags(5)).
235
236              <kind-spec> has two forms: one-letter form and full form.
237
238              The      one-letter form in the form of <letter>. It just refers
239              a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
240              ommended in Universal Ctags.
241
242              The   full   form  of  <kind-spec>  is  in  the  form  of  <let‐
243              ter>,<name>,<description>.      Either the  kind  <name>  and/or
244              the <description> can be omitted. See the description of --kind‐
245              def-<LANG>=<letter>,<name>,<description> option about  the  ele‐
246              ments.
247
248              The  full  form  is supported only for keeping the compatibility
249              with Exuberant Ctags which does not  have  --kinddef-<LANG>  op‐
250              tion.  Supporting  the form will be removed from Universal Ctags
251              in the future.
252
253              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
254
255              For more information on the regular expressions used  by  ctags,
256              see  either  the regex(5,7) man page, or the GNU info documenta‐
257              tion for regex (e.g. "info regex").
258
259       --list-regex-flags
260              Lists the flags that can be used in --regex-<LANG> option.
261
262       --list-mline-regex-flags
263              Lists the flags that can be used in --mline-regex-<LANG> option.
264
265       --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
266       tern>/<kind-spec>/{mgroup=<N>}[<flags>]
267              Define a multi-line regular expression.
268
269              This  option is similar to --regex-<LANG> option except the pat‐
270              tern is applied to the whole file’s contents, not line by line.
271
272              See   "FLAGS   FOR   `--mline-regex-<LANG>``   OPTION`_"   about
273              {mgroup=<N>}.  {mgroup=<N>} flag is a must.
274
275       --_echo=<message>
276              Print  <message>  to the standard error stream.  This is helpful
277              to understand (and debug) optlib loading  feature  of  Universal
278              Ctags.
279
280       --_force-quit[=<num>]
281              Exits  immediately  when  this option is processed.  If <num> is
282              used as exit status. The default is 0.  This is helpful to debug
283              optlib loading feature of Universal Ctags.
284
285   FLAGS FOR --regex-<LANG> OPTION
286       You  can  specify  more than one flag, <letter>|{<name>}, at the end of
287       --regex-<LANG> to control how Universal Ctags uses the pattern.
288
289       Exuberant Ctags uses a <letter>  to  represent  a  flag.  In  Universal
290       Ctags,  a  <name> surrounded by braces (name form) can be used in addi‐
291       tion to <letter>. The name form makes a user  reading  an  optlib  file
292       easier.
293
294       The  most  of  all  flags newly added in Universal Ctags don't have the
295       one-letter representation. All of them have only the  name  representa‐
296       tion. --list-regex-flags lists all the flags.
297
298       basic (one-letter form b)
299              The pattern is interpreted as a POSIX basic regular expression.
300
301       exclusive (one-letter form x)
302              Skip  testing  the  other  patterns if a line is matched to this
303              pattern. This is useful to avoid using CPU to  parse  line  com‐
304              ments.
305
306       extend (one-letter form e)
307              The  pattern  is interpreted as a POSIX extended regular expres‐
308              sion (default).
309
310       pcre2 (one-letter form p, experimental)
311              The pattern is interpreted as a  PCRE2  regular  expression  ex‐
312              plained  in  pcre2syntax(3).  This flag is available only if the
313              ctags is built with pcre2 library. See the output of --list-fea‐
314              tures  option  to know whether your ctags is built-with pcre2 or
315              not.
316
317       icase (one-letter form i)
318              The regular expression is to be applied  in  a  case-insensitive
319              manner.
320
321       placeholder
322              Don't emit a tag captured with a regex pattern.  The replacement
323              can be an  empty  string.   See  the  following  description  of
324              scope=... flag about how this is useful.
325
326       scope=(ref|push|pop|clear|set|replace)
327          Specify what to do with the internal scope stack.
328
329          A  parser  programmed  with --regex-<LANG> has a stack (scope stack)
330          internally. You can use  it  for  tracking  scope  information.  The
331          scope=... flag is for manipulating and utilizing the scope stack.
332
333          If  {scope=push} is specified, a tag captured with --regex-<LANG> is
334          pushed to the stack. {scope=push} implies {scope=ref}.
335
336          You  can  fill  the  scope  field  (scope:)  of  captured  tag  with
337          {scope=ref}. If {scope=ref} flag is given, ctags attaches the tag at
338          the top to the tag captured with --regex-<LANG> as the value for the
339          scope: field.
340
341          ctags  pops the tag at the top of the stack when --regex-<LANG> with
342          {scope=pop} is matched to the input line.
343
344          Specifying {scope=clear} removes all the tags in the scope.   Speci‐
345          fying {scope=set} removes all the tags in the scope, and then pushes
346          the captured tag as {scope=push} does.
347
348          {scope=replace} does the three things sequentially.  First  it  does
349          the same as {scope=pop}, then fills the scope: field of the tag cap‐
350          tured with --regex-<LANG>, and pushes the tag to the scope stack  as
351          if {scope=push} was given finally.  You cannot specify another scope
352          action together with {scope=replace}.
353
354          You don't want to specify {scope=pop}{scope=push} as an  alternative
355          to  {scope=replace};  {scope=pop}{scope=push} fills the scope: field
356          of the tag captured with --regex-<LANG> first, then pops the tag  at
357          the top of the stack, and pushes the captured tag to the scope stack
358          finally. The timing when filling the end field is different  between
359          {scope=replace} and {scope=pop}{scope=push}.
360
361          In  some cases, you may want to use --regex-<LANG> only for its side
362          effects: using it only to manipulate the stack but not for capturing
363          a   tag.   In   such   a  case,  make  <name_pattern>  component  of
364          --regex-<LANG> option empty  while  specifying  {placeholder}  as  a
365          regex  flag. For example, a non-named tag can be put on the stack by
366          giving a regex flag "{scope=push}{placeholder}".
367
368          You may wonder what happens if a regex pattern with {scope=ref} flag
369          matches  an input line but the stack is empty, or a non-named tag is
370          at the top. If the regex pattern contains a {scope=ref} flag and the
371          stack  is  empty, the {scope=ref} flag is ignored and nothing is at‐
372          tached to the scope: field.
373
374          If the top of the stack contains  an  unnamed  tag,  ctags  searches
375          deeper  into the stack to find the top-most named tag. If it reaches
376          the bottom of the stack without finding a named tag, the {scope=ref}
377          flag is ignored and nothing is attached to the scope: field.
378
379          When  a  named tag on the stack is popped or cleared as the side ef‐
380          fect of a pattern matching, ctags attaches the line  number  of  the
381          match to the end: field of the named tag.
382
383          ctags clears all of the tags on the stack when it reaches the end of
384          the input source file. The line number of the end is attached to the
385          end: field of the cleared tags.
386
387       warning=<message>
388              print the given <message> at WARNING level
389
390       fatal=<message>
391              print the given <message> and exit
392
393   FLAGS FOR --mline-regex-<LANG> OPTION
394       mgroup=<N>
395          decide  the  location of the tag extracted with --mline-regex-<LANG>
396          option.
397
398          <N> is the number of a capture group in the pattern, which  is  used
399          to  record  the  line number location of the tag. mgroup=<N> flag is
400          not an optional. You must add an mgroup=<N> flag, even if the <N> is
401          0 (meaning the start position of the whole regex pattern).
402

EXAMPLES

404   Perl Pod
405       This  is  the  definition  (pod.ctags) used in ctags for parsing Pod (‐
406       https://perldoc.perl.org/perlpod.html) file.
407
408          --langdef=pod
409          --map-pod=+.pod
410
411          --kinddef-pod=c,chapter,chapters
412          --kinddef-pod=s,section,sections
413          --kinddef-pod=S,subsection,subsections
414          --kinddef-pod=t,subsubsection,subsubsections
415
416          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
417          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
418          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
419          --regex-pod=/^=head4[ \t]+(.+)/\1/t/
420
421   Using scope regex flags
422       Let's think about writing a parser for a very small subset of the  Ruby
423       language.
424
425       input source file (input.srb):
426
427          class Example
428            def methodA
429                  puts "in class_method"
430            end
431            def methodB
432                  puts "in class_method"
433            end
434          end
435
436       The  parser  for  the  input  should  capture  Example with class kind,
437       methodA, and methodB with method kind. methodA and methodB should  have
438       Example as their scope. end: fields of each tag should have proper val‐
439       ues.
440
441       optlib file (sub-ruby.ctags):
442
443          --langdef=subRuby
444          --map-subRuby=.srb
445          --kinddef-subRuby=c,class,classes
446          --kinddef-subRuby=m,method,methods
447          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
448          --regex-subRuby=/^end///{scope=pop}{placeholder}
449          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
450          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
451
452       command line and output:
453
454          $ ctags --quiet --fields=+eK \
455          --options=./sub-ruby.ctags -o - input.srb
456          Example input.srb       /^class Example$/;"     class   end:8
457          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
458          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7
459

SEE ALSO

461       The official Universal Ctags web site at:
462
463       https://ctags.io/
464
465       ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
466

AUTHOR

468       Universal Ctags project https://ctags.io/ (This man page partially  de‐
469       rived from ctags(1) of Executable-ctags)
470
471       Darren             Hiebert             <dhiebert@users.sourceforge.net>
472       http://DarrenHiebert.com/
473
474
475
476
4776.0.0                                                          CTAGS-OPTLIB(7)
Impressum