1CTAGS-OPTLIB(7)                 Universal Ctags                CTAGS-OPTLIB(7)
2
3
4

NAME

6       ctags-optlib - Universal Ctags parser definition language
7

SYNOPSIS

9       ctags [options] [file(s)]
10       etags [options] [file(s)]
11
12

DESCRIPTION

14       Exuberant  Ctags, the ancestor of Universal Ctags, has provided the way
15       to define a new parser from command line.  Universal Ctags extends  and
16       refines this feature. optlib parser is the name for such parser in Uni‐
17       versal Ctags. "opt" intends a parser is  defined  with  combination  of
18       command  line  options. "lib" intends an optlib parser can be more than
19       ad-hoc personal configuration.
20
21       This man page is for people who want to define an  optlib  parser.  The
22       readers should read ctags(1) of Universal Ctags first.
23
24       Following options are for defining (or customizing) a parser:
25
26--langdef=<name>
27
28--map-<LANG>=[+|-]<extension>|<pattern>
29
30--kinddef-<LANG>=<letter>,<name>,<description>
31
32--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34--mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35         tern>/<kind-spec>/[<flags>]
36
37       Following options are for controlling loading parser definition:
38
39--options=<pathname>
40
41--options-maybe=<pathname>
42
43--optlib-dir=[+]<directory>
44
45       The design of options and notations for defining a parser in  Exuberant
46       Ctags may focus on reducing the number of typing by user.  Reducing the
47       number of typing is important for users who want to define (or  custom‐
48       ize) a parser quickly.
49
50       On  the  other hand, the design in Universal Ctags focuses on maintain‐
51       ability. The notation of Universal Ctags is redundant than that of Exu‐
52       berant  Ctags; the newly introduced kind should be declared explicitly,
53       (long) names are approved than one-letter flags specifying  kinds,  and
54       naming rules are stricter.
55
56       This  man page explains only stable options and flags.  Universal Ctags
57       also introduces experimental options and flags which have names  start‐
58       ing with _. For documentation on these options and flags, visit Univer‐
59       sal Ctags web site at https://ctags.io/.
60
61   Storing a parser definition to a file
62       Though it is possible to define a parser from command line,  you  don't
63       want  to type the same command line each time when you need the parser.
64       You can store options for defining a parser into a file.
65
66       ctags loads files (preload files) listed in "FILES" section of ctags(1)
67       at  program starting up. You can put your parser definition needed usu‐
68       ally to the files.
69
70       --options=<pathname>,          --options-maybe=<pathname>,          and
71       --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72       sionally. See "Option File Options" section of ctags(1) for  these  op‐
73       tions.
74
75       As  explained  in  "FILES"  section of ctags(1), options for defining a
76       parser listed line by line in an optlib file. Prefixed white spaces are
77       ignored.  A  line  starting with '#' is treated as a comment.  Escaping
78       shell meta character is not needed.
79
80       Use .ctags as file extension for optlib file. You can  define  multiple
81       parsers  in  an  optlib  file  but it is better to make a file for each
82       parser definition.
83
84       --_echo=<msg> and --_force-quit=<num> options are for debugging  optlib
85       parser.
86
87   Overview for defining a parser
88       1. Design the parser
89
90          You need know both the target language and the ctags' concepts (def‐
91          inition, reference, kind, role, field, extra). About  the  concepts,
92          ctags(1) of Universal Ctags may help you.
93
94       2. Give a name to the parser
95
96          Use  --langdef=<name>  option.  <name>  is referred as <LANG> in the
97          later steps.
98
99       3. Give a file pattern or file extension for activating the parser
100
101          Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103       4. Define kinds
104
105          Use --kinddef-<LANG>=<letter>,<name>,<description> option.   Univer‐
106          sal  Ctags introduces this option.  Exuberant Ctags doesn't have. In
107          Exuberant Ctags, a kind is defined as a side  effect  of  specifying
108          --regex-<LANG>=  option.  So user doesn't have a chance to recognize
109          how important the definition of kind.
110
111       5. Define patterns
112
113          Use                        --regex-<LANG>=/<line_pattern>/<name_pat‐
114          tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115          sion.   You   can    also    use    --mline-regex-<LANG>=/<line_pat‐
116          tern>/<name_pattern>/<kind-spec>/[<flags>]  option  for a multi-line
117          regular expression.
118
119          As <kind-spec>, you can use the one-letter flag defined with --kind‐
120          def-<LANG>=<letter>,<name>,<description> option.
121

OPTIONS

123       --langdef=<name>
124              Defines  a  new user-defined language, <name>, to be parsed with
125              regular expressions. Once defined, <name> may be used  in  other
126              options taking language names.
127
128              <name>  must  consist  of  alphanumeric  characters, '#', or '+'
129              ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and  '+'
130              are  disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131              lowed because they can make the command  line  parser  of  ctags
132              confused.  The rest of them are just reserved for future extend‐
133              ing ctags.
134
135              all is an exception.  all as <name> is not acceptable. It  is  a
136              reserved       word.       See      the      description      of
137              --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in  ctags(1)  about
138              how the reserved word is used.
139
140              The names of built-in parsers are capitalized. When ctags evalu‐
141              ates an option in a command line, and chooses  a  parser,  ctags
142              uses  the names of parsers in a case-insensitive way. Therefore,
143              giving a name started from a lowercase  character  doesn't  help
144              you  to  avoid  the  parser name confliction. However, in a tags
145              file, ctags prints parser names  in  a  case-sensitive  way;  it
146              prints  a  parser  name as specified in --langdef=<name> option.
147              Therefore, we recommend you to give a name started from a lower‐
148              case  character to your private optlib parser. With this conven‐
149              tion, people can know where a tag entry in a tag file comes from
150              a built-in parser or a private optlib parser.
151
152       --kinddef-<LANG>=<letter>,<name>,<description>
153              Define   a   kind   for  <LANG>.   Be  not  confused  this  with
154              --kinds-<LANG>.
155
156              <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
157              than  "F".  "F"  has been reserved for representing a file since
158              Exuberant Ctags.
159
160              <name> must start with an alphabetic  character,  and  the  rest
161              must   be  alphanumeric  ('[a-zA-Z][a-zA-Z0-9]*').  Do  not  use
162              "file" as <name>. It has been reserved for representing  a  file
163              since Exuberant Ctags.
164
165              Note that using a number character in a <name> violates the ver‐
166              sion 2 of tags file format though ctags accepts it. For more de‐
167              tail, see tags(5).
168
169              <description> comes from any printable ASCII characters. The ex‐
170              ception is { and \. { is reserved for adding flags  this  option
171              in  the future. So put \ before { to include { to a description.
172              To include \ itself to a description, put \ before \.
173
174              Both <letter>, <name> and their combination must be unique in  a
175              <LANG>.
176
177              This option is newly introduced in Universal Ctags.  This option
178              reduces   the   typing   defining   a   regex    pattern    with
179              --regex-<LANG>=,  and  keeps the consistency of kind definitions
180              in a language.
181
182              The <letter> can be used as an argument for  --kinds-<LANG>  op‐
183              tion  to  enable or disable the kind. Unless K field is enabled,
184              the <letter> is used as value in the "kind" extension  field  in
185              tags output.
186
187              The  <name>  surrounded by braces can be used as an argument for
188              --kind-<LANG> option. If K field is enabled, the <name> is  used
189              as value in the "kind" extension field in tags output.
190
191              The  <description>  and <letter> are listed in --list-kinds out‐
192              put.  All  three  elements  of  the  kind-spec  are  listed   in
193              --list-kinds-full output. Don't use braces in the <description>.
194              They will be used meta characters in the future.
195
196       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
197              Define a single-line regular expression.
198
199              The /<line_pattern>/<name_pattern>/ pair defines a  regular  ex‐
200              pression  replacement pattern, similar in style to sed substitu‐
201              tion commands, s/regexp/replacement/,  with  which  to  generate
202              tags  from  source  files  mapped to the named language, <LANG>,
203              (case-insensitive; either a built-in or user-defined language).
204
205              The regular expression, <line_pattern>, defines an extended reg‐
206              ular  expression  (roughly that used by egrep(1)), which is used
207              to locate a single source line containing a tag and may  specify
208              tab characters using \t.
209
210              When  a  matching line is found, a tag will be generated for the
211              name defined by <name_pattern>, which generally will contain the
212              special  back-references  \1  through  \9  to  refer to matching
213              sub-expression groups within <line_pattern>.
214
215              The '/' separator characters shown in the parameter to  the  op‐
216              tion can actually be replaced by any character. Note that which‐
217              ever separator character is used will have to be escaped with  a
218              backslash  ('\')  character wherever it is used in the parameter
219              as something other than a separator. The regular expression  de‐
220              fined by this option is added to the current list of regular ex‐
221              pressions for the specified language  unless  the  parameter  is
222              omitted, in which case the current list is cleared.
223
224              Unless  modified  by <flags>, <line_pattern> is interpreted as a
225              POSIX extended regular expression. The <name_pattern> should ex‐
226              pand for all matching lines to a non-empty string of characters,
227              or a warning message will be reported unless {placeholder} regex
228              flag is specified.
229
230              A kind specifier (<kind-spec>) for tags matching regexp may fol‐
231              low <name_pattern>, which will determine what kind of tag is re‐
232              ported in the kind extension field (see tags(5)).
233
234              <kind-spec> has two forms: one-letter form and full form.
235
236              The      one-letter form in the form of <letter>. It just refers
237              a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
238              ommended in Universal Ctags.
239
240              The   full   form  of  <kind-spec>  is  in  the  form  of  <let‐
241              ter>,<name>,<description>.      Either the  kind  <name>  and/or
242              the <description> can be omitted. See the description of --kind‐
243              def-<LANG>=<letter>,<name>,<description> option about  the  ele‐
244              ments.
245
246              The  full  form  is supported only for keeping the compatibility
247              with Exuberant Ctags which does not  have  --kinddef-<LANG>  op‐
248              tion.  Supporting  the form will be removed from Universal Ctags
249              in the future.
250
251              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
252
253              For more information on the regular expressions used  by  ctags,
254              see  either  the regex(5,7) man page, or the GNU info documenta‐
255              tion for regex (e.g. "info regex").
256
257       --list-regex-flags
258              Lists the flags that can be used in --regex-<LANG> option.
259
260       --list-mline-regex-flags
261              Lists the flags that can be used in --mline-regex-<LANG> option.
262
263       --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
264       tern>/<kind-spec>/[<flags>]
265              Define a multi-line regular expression.
266
267              This  option is similar to --regex-<LANG> option except the pat‐
268              tern is applied to the whole file’s contents, not line by line.
269
270       --_echo=<message>
271              Print <message> to the standard error stream.  This  is  helpful
272              to  understand  (and  debug) optlib loading feature of Universal
273              Ctags.
274
275       --_force-quit[=<num>]
276              Exits immediately when this option is processed.   If  <num>  is
277              used as exit status. The default is 0.  This is helpful to debug
278              optlib loading feature of Universal Ctags.
279
280   FLAGS FOR --regex-<LANG> OPTION
281       You can specify more than one flag, <letter>|{<name>}, at  the  end  of
282       --regex-<LANG> to control how Universal Ctags uses the pattern.
283
284       Exuberant  Ctags  uses  a  <letter>  to  represent a flag. In Universal
285       Ctags, a <name> surrounded by braces (name form) can be used  in  addi‐
286       tion  to  <letter>.  The  name form makes a user reading an optlib file
287       easier.
288
289       The most of all flags newly added in Universal  Ctags  don't  have  the
290       one-letter  representation.  All of them have only the name representa‐
291       tion. --list-regex-flags lists all the flags.
292
293       basic (one-letter form b)
294              The pattern is interpreted as a POSIX basic regular expression.
295
296       exclusive (one-letter form x)
297              Skip testing the other patterns if a line  is  matched  to  this
298              pattern.  This  is  useful to avoid using CPU to parse line com‐
299              ments.
300
301       extend (one-letter form e)
302              The pattern is interpreted as a POSIX extended  regular  expres‐
303              sion (default).
304
305       pcre2 (one-letter form p, experimental)
306              The  pattern  is  interpreted  as a PCRE2 regular expression ex‐
307              plained in pcre2syntax(3).  This flag is available only  if  the
308              ctags is built with pcre2 library. See the output of --list-fea‐
309              tures option to know whether your ctags is built-with  pcre2  or
310              not.
311
312       icase (one-letter form i)
313              The  regular  expression  is to be applied in a case-insensitive
314              manner.
315
316       placeholder
317              Don't emit a tag captured with a regex pattern.  The replacement
318              can  be  an  empty  string.   See  the  following description of
319              scope=... flag about how this is useful.
320
321       scope=(ref|push|pop|clear|set|replace)
322          Specify what to do with the internal scope stack.
323
324          A parser programmed with --regex-<LANG> has a  stack  (scope  stack)
325          internally.  You  can  use  it  for  tracking scope information. The
326          scope=... flag is for manipulating and utilizing the scope stack.
327
328          If {scope=push} is specified, a tag captured with --regex-<LANG>  is
329          pushed to the stack. {scope=push} implies {scope=ref}.
330
331          You  can  fill  the  scope  field  (scope:)  of  captured  tag  with
332          {scope=ref}. If {scope=ref} flag is given, ctags attaches the tag at
333          the top to the tag captured with --regex-<LANG> as the value for the
334          scope: field.
335
336          ctags pops the tag at the top of the stack when --regex-<LANG>  with
337          {scope=pop} is matched to the input line.
338
339          Specifying  {scope=clear} removes all the tags in the scope.  Speci‐
340          fying {scope=set} removes all the tags in the scope, and then pushes
341          the captured tag as {scope=push} does.
342
343          {scope=replace}  does  the  three things sequentially. First it does
344          the same as {scope=pop}, then fills the scope: field of the tag cap‐
345          tured  with --regex-<LANG>, and pushes the tag to the scope stack as
346          if {scope=push} was given finally.  You cannot specify another scope
347          action together with {scope=replace}.
348
349          You  don't want to specify {scope=pop}{scope=push} as an alternative
350          to {scope=replace}; {scope=pop}{scope=push} fills the  scope:  field
351          of  the tag captured with --regex-<LANG> first, then pops the tag at
352          the top of the stack, and pushes the captured tag to the scope stack
353          finally.  The timing when filling the end field is different between
354          {scope=replace} and {scope=pop}{scope=push}.
355
356          In some cases, you may want to use --regex-<LANG> only for its  side
357          effects: using it only to manipulate the stack but not for capturing
358          a  tag.  In  such  a  case,   make   <name_pattern>   component   of
359          --regex-<LANG>  option  empty  while  specifying  {placeholder} as a
360          regex flag. For example, a non-named tag can be put on the stack  by
361          giving a regex flag "{scope=push}{placeholder}".
362
363          You may wonder what happens if a regex pattern with {scope=ref} flag
364          matches an input line but the stack is empty, or a non-named tag  is
365          at the top. If the regex pattern contains a {scope=ref} flag and the
366          stack is empty, the {scope=ref} flag is ignored and nothing  is  at‐
367          tached to the scope: field.
368
369          If  the  top  of  the  stack contains an unnamed tag, ctags searches
370          deeper into the stack to find the top-most named tag. If it  reaches
371          the bottom of the stack without finding a named tag, the {scope=ref}
372          flag is ignored and nothing is attached to the scope: field.
373
374          When a named tag on the stack is popped or cleared as the  side  ef‐
375          fect  of  a  pattern matching, ctags attaches the line number of the
376          match to the end: field of the named tag.
377
378          ctags clears all of the tags on the stack when it reaches the end of
379          the input source file. The line number of the end is attached to the
380          end: field of the cleared tags.
381
382       warning=<message>
383              print the given <message> at WARNING level
384
385       fatal=<message>
386              print the given <message> and exit
387

EXAMPLES

389   Perl Pod
390       This is the definition (pod.ctags) used in ctags  for  parsing  Pod  (‐
391       https://perldoc.perl.org/perlpod.html) file.
392
393          --langdef=pod
394          --map-pod=+.pod
395
396          --kinddef-pod=c,chapter,chapters
397          --kinddef-pod=s,section,sections
398          --kinddef-pod=S,subsection,subsections
399          --kinddef-pod=t,subsubsection,subsubsections
400
401          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
402          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
403          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
404          --regex-pod=/^=head4[ \t]+(.+)/\1/t/
405
406   Using scope regex flags
407       Let's  think about writing a parser for a very small subset of the Ruby
408       language.
409
410       input source file (input.srb):
411
412          class Example
413            def methodA
414                  puts "in class_method"
415            end
416            def methodB
417                  puts "in class_method"
418            end
419          end
420
421       The parser for the  input  should  capture  Example  with  class  kind,
422       methodA,  and methodB with method kind. methodA and methodB should have
423       Example as their scope. end: fields of each tag should have proper val‐
424       ues.
425
426       optlib file (sub-ruby.ctags):
427
428          --langdef=subRuby
429          --map-subRuby=.srb
430          --kinddef-subRuby=c,class,classes
431          --kinddef-subRuby=m,method,methods
432          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
433          --regex-subRuby=/^end///{scope=pop}{placeholder}
434          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
435          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
436
437       command line and output:
438
439          $ ctags --quiet --fields=+eK \
440          --options=./sub-ruby.ctags -o - input.srb
441          Example input.srb       /^class Example$/;"     class   end:8
442          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
443          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7
444

SEE ALSO

446       The official Universal Ctags web site at:
447
448       https://ctags.io/
449
450       ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
451

AUTHOR

453       Universal  Ctags project https://ctags.io/ (This man page partially de‐
454       rived from ctags(1) of Executable-ctags)
455
456       Darren             Hiebert             <dhiebert@users.sourceforge.net>
457       http://DarrenHiebert.com/
458
459
460
461
4625.9.0                                                          CTAGS-OPTLIB(7)
Impressum