1CTAGS-OPTLIB(7)                 Universal Ctags                CTAGS-OPTLIB(7)
2
3
4

NAME

6       ctags-optlib - Universal Ctags parser definition language
7

SYNOPSIS

9       ctags [options] [file(s)]
10       etags [options] [file(s)]
11
12

DESCRIPTION

14       Exuberant  Ctags, the ancestor of Universal Ctags, has provided the way
15       to define a new parser from command line.  Universal Ctags extends  and
16       refines this feature. optlib parser is the name for such parser in Uni‐
17       versal Ctags. "opt" intends a parser is  defined  with  combination  of
18       command  line  options. "lib" intends an optlib parser can be more than
19       ad-hoc personal configuration.
20
21       This man page is for people who want to define an  optlib  parser.  The
22       readers should read ctags(1) of Universal Ctags first.
23
24       Following options are for defining (or customizing) a parser:
25
26--langdef=<name>
27
28--map-<LANG>=[+|-]<extension>|<pattern>
29
30--kinddef-<LANG>=<letter>,<name>,<description>
31
32--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
33
34--mline-regex-<LANG>=/<line_pattern>/<name_pat‐
35         tern>/<kind-spec>/[<flags>]
36
37       Following options are for controlling loading parser definition:
38
39--options=<pathname>
40
41--options-maybe=<pathname>
42
43--optlib-dir=[+]<directory>
44
45       The design of options and notations for defining a parser in  Exuberant
46       Ctags may focus on reducing the number of typing by user.  Reducing the
47       number of typing is important for users who want to define (or  custom‐
48       ize) a parser quickly.
49
50       On  the  other hand, the design in Universal Ctags focuses on maintain‐
51       ability. The notation of Universal Ctags is redundant than that of Exu‐
52       berant  Ctags; the newly introduced kind should be declared explicitly,
53       (long) names are approved than one-letter flags specifying  kinds,  and
54       naming rules are stricter.
55
56       This  man page explains only stable options and flags.  Universal Ctags
57       also introduces experimental options and flags which have names  start‐
58       ing with _. For documentation on these options and flags, visit Univer‐
59       sal Ctags web site at https://ctags.io/.
60
61   Storing a parser definition to a file
62       Though it is possible to define a parser from command line,  you  don't
63       want  to type the same command line each time when you need the parser.
64       You can store options for defining a parser into a file.
65
66       ctags loads files (preload files) listed in "FILES" section of ctags(1)
67       at  program starting up. You can put your parser definition needed usu‐
68       ally to the files.
69
70       --options=<pathname>,          --options-maybe=<pathname>,          and
71       --optlib-dir=[+]<directory> are for loading optlib files you need occa‐
72       sionally. See "Option File Options" section of ctags(1) for  these  op‐
73       tions.
74
75       As  explained  in  "FILES"  section of ctags(1), options for defining a
76       parser listed line by line in an optlib file. Prefixed white spaces are
77       ignored.  A  line  starting with '#' is treated as a comment.  Escaping
78       shell meta character is not needed.
79
80       Use .ctags as file extension for optlib file. You can  define  multiple
81       parsers  in  an  optlib  file  but it is better to make a file for each
82       parser definition.
83
84       --_echo=<msg> and --_force-quit=<num> options are for debugging  optlib
85       parser.
86
87   Overview for defining a parser
88       1. Design the parser
89
90          You need know both the target language and the ctags' concepts (def‐
91          inition, reference, kind, role, field, extra). About  the  concepts,
92          ctags(1) of Universal Ctags may help you.
93
94       2. Give a name to the parser
95
96          Use  --langdef=<name>  option.  <name>  is referred as <LANG> in the
97          later steps.
98
99       3. Give a file pattern or file extension for activating the parser
100
101          Use --map-<LANG>=[+|-]<extension>|<pattern>.
102
103       4. Define kinds
104
105          Use --kinddef-<LANG>=<letter>,<name>,<description> option.   Univer‐
106          sal  Ctags introduces this option.  Exuberant Ctags doesn't have. In
107          Exuberant Ctags, a kind is defined as a side  effect  of  specifying
108          --regex-<LANG>=  option.  So user doesn't have a chance to recognize
109          how important the definition of kind.
110
111       5. Define patterns
112
113          Use                        --regex-<LANG>=/<line_pattern>/<name_pat‐
114          tern>/<kind-spec>/[<flags>] option for a single-line regular expres‐
115          sion.   You   can    also    use    --mline-regex-<LANG>=/<line_pat‐
116          tern>/<name_pattern>/<kind-spec>/[<flags>]  option  for a multi-line
117          regular expression.
118
119          As <kind-spec>, you can use the one-letter flag defined with --kind‐
120          def-<LANG>=<letter>,<name>,<description> option.
121

OPTIONS

123       --langdef=<name>
124              Defines  a  new user-defined language, <name>, to be parsed with
125              regular expressions. Once defined, <name> may be used  in  other
126              options taking language names.
127
128              <name>  must  consist  of  alphanumeric  characters, '#', or '+'
129              ('[a-zA-Z0-9#+]+'). The graph characters other than '#' and  '+'
130              are  disallowed (or reserved). Some of them ([-=:{.]) are disal‐
131              lowed because they can make the command  line  parser  of  ctags
132              confused.  The rest of them are just reserved for future extend‐
133              ing ctags.
134
135              all is an exception.  all as <name> is not acceptable. It  is  a
136              reserved       word.       See      the      description      of
137              --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in  ctags(1)  about
138              how the reserved word is used.
139
140              The names of built-in parsers are capitalized. When ctags evalu‐
141              ates an option in a command line, and chooses  a  parser,  ctags
142              uses  the names of parsers in a case-insensitive way. Therefore,
143              giving a name started from a lowercase  character  doesn't  help
144              you  to  avoid  the  parser name confliction. However, in a tags
145              file, ctags prints parser names  in  a  case-sensitive  way;  it
146              prints  a  parser  name as specified in --langdef=<name> option.
147              Therefore, we recommend you to give a name started from a lower‐
148              case  character to your private optlib parser. With this conven‐
149              tion, people can know where a tag entry in a tag file comes from
150              a built-in parser or a private optlib parser.
151
152       --kinddef-<LANG>=<letter>,<name>,<description>
153              Define   a   kind   for  <LANG>.   Be  not  confused  this  with
154              --kinds-<LANG>.
155
156              <letter> must be an alphabetical character ('[a-zA-EG-Z]') other
157              than  "F".  "F"  has been reserved for representing a file since
158              Exuberant Ctags.
159
160              <name> must start with an alphabetic  character,  and  the  rest
161              must   be  alphanumeric  ('[a-zA-Z][a-zA-Z0-9]*').  Do  not  use
162              "file" as <name>. It has been reserved for representing  a  file
163              since Exuberant Ctags.
164
165              Note that using a number character in a <name> violates the ver‐
166              sion 2 of tags file format though ctags accepts it. For more de‐
167              tail, see tags(5).
168
169              <description> comes from any printable ASCII characters. The ex‐
170              ception is { and \. { is reserved for adding flags  this  option
171              in  the future. So put \ before { to include { to a description.
172              To include \ itself to a description, put \ before \.
173
174              Both <letter>, <name> and their combination must be unique in  a
175              <LANG>.
176
177              This option is newly introduced in Universal Ctags.  This option
178              reduces   the   typing   defining   a   regex    pattern    with
179              --regex-<LANG>=,  and  keeps the consistency of kind definitions
180              in a language.
181
182              The <letter> can be used as an argument for  --kinds-<LANG>  op‐
183              tion  to  enable or disable the kind. Unless K field is enabled,
184              the <letter> is used as value in the "kind" extension  field  in
185              tags output.
186
187              The  <name>  surrounded by braces can be used as an argument for
188              --kind-<LANG> option. If K field is enabled, the <name> is  used
189              as value in the "kind" extension field in tags output.
190
191              The  <description>  and <letter> are listed in --list-kinds out‐
192              put.  All  three  elements  of  the  kind-spec  are  listed   in
193              --list-kinds-full output. Don't use braces in the <description>.
194              They will be used meta characters in the future.
195
196       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
197              Define a single-line regular expression.
198
199              The /<line_pattern>/<name_pattern>/ pair defines a  regular  ex‐
200              pression  replacement pattern, similar in style to sed substitu‐
201              tion commands, s/regexp/replacement/,  with  which  to  generate
202              tags  from  source  files  mapped to the named language, <LANG>,
203              (case-insensitive; either a built-in or user-defined language).
204
205              The regular expression, <line_pattern>, defines an extended reg‐
206              ular  expression  (roughly that used by egrep(1)), which is used
207              to locate a single source line containing a tag and may  specify
208              tab characters using \t.
209
210              When  a  matching line is found, a tag will be generated for the
211              name defined by <name_pattern>, which generally will contain the
212              special  back-references  \1  through  \9  to  refer to matching
213              sub-expression groups within <line_pattern>.
214
215              The '/' separator characters shown in the parameter to  the  op‐
216              tion can actually be replaced by any character. Note that which‐
217              ever separator character is used will have to be escaped with  a
218              backslash  ('\')  character wherever it is used in the parameter
219              as something other than a separator. The regular expression  de‐
220              fined by this option is added to the current list of regular ex‐
221              pressions for the specified language  unless  the  parameter  is
222              omitted, in which case the current list is cleared.
223
224              Unless  modified  by <flags>, <line_pattern> is interpreted as a
225              POSIX extended regular expression. The <name_pattern> should ex‐
226              pand for all matching lines to a non-empty string of characters,
227              or a warning message will be reported unless {placeholder} regex
228              flag is specified.
229
230              A kind specifier (<kind-spec>) for tags matching regexp may fol‐
231              low <name_pattern>, which will determine what kind of tag is re‐
232              ported in the kind extension field (see tags(5)).
233
234              <kind-spec> has two forms: one-letter form and full form.
235
236              The      one-letter form in the form of <letter>. It just refers
237              a kind <letter> defined with --kinddef-<LANG>. This form is rec‐
238              ommended in Universal Ctags.
239
240              The   full   form  of  <kind-spec>  is  in  the  form  of  <let‐
241              ter>,<name>,<description>.      Either the  kind  <name>  and/or
242              the <description> can be omitted. See the description of --kind‐
243              def-<LANG>=<letter>,<name>,<description> option about  the  ele‐
244              ments.
245
246              The  full  form  is supported only for keeping the compatibility
247              with Exuberant Ctags which does not  have  --kinddef-<LANG>  op‐
248              tion.  Supporting  the form will be removed from Universal Ctags
249              in the future.
250
251              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".
252
253              For more information on the regular expressions used  by  ctags,
254              see  either  the regex(5,7) man page, or the GNU info documenta‐
255              tion for regex (e.g. "info regex").
256
257       --list-regex-flags
258              Lists the flags that can be used in --regex-<LANG> option.
259
260       --list-mline-regex-flags
261              Lists the flags that can be used in --mline-regex-<LANG> option.
262
263       --mline-regex-<LANG>=/<line_pattern>/<name_pat‐
264       tern>/<kind-spec>/[<flags>]
265              Define a multi-line regular expression.
266
267              This  option is similar to --regex-<LANG> option except the pat‐
268              tern is applied to the whole file’s contents, not line by line.
269
270       --_echo=<message>
271              Print <message> to the standard error stream.  This  is  helpful
272              to  understand  (and  debug) optlib loading feature of Universal
273              Ctags.
274
275       --_force-quit[=<num>]
276              Exits immediately when this option is processed.   If  <num>  is
277              used as exit status. The default is 0.  This is helpful to debug
278              optlib loading feature of Universal Ctags.
279
280   FLAGS FOR --regex-<LANG> OPTION
281       You can specify more than one flag, <letter>|{<name>}, at  the  end  of
282       --regex-<LANG> to control how Universal Ctags uses the pattern.
283
284       Exuberant  Ctags  uses  a  <letter>  to  represent a flag. In Universal
285       Ctags, a <name> surrounded by braces (name form) can be used  in  addi‐
286       tion  to  <letter>.  The  name form makes a user reading an optlib file
287       easier.
288
289       The most of all flags newly added in Universal  Ctags  don't  have  the
290       one-letter  representation.  All of them have only the name representa‐
291       tion. --list-regex-flags lists all the flags.
292
293       basic (one-letter form b)
294              The pattern is interpreted as a POSIX basic regular expression.
295
296       exclusive (one-letter form x)
297              Skip testing the other patterns if a line  is  matched  to  this
298              pattern.  This  is  useful to avoid using CPU to parse line com‐
299              ments.
300
301       extend (one-letter form e)
302              The pattern is interpreted as a POSIX extended  regular  expres‐
303              sion (default).
304
305       icase (one-letter form i)
306              The  regular  expression  is to be applied in a case-insensitive
307              manner.
308
309       placeholder
310              Don't emit a tag captured with a regex pattern.  The replacement
311              can  be  an  empty  string.   See  the  following description of
312              scope=... flag about how this is useful.
313
314       scope=(ref|push|pop|clear|set)
315          Specify what to do with the internal scope stack.
316
317          A parser programmed with --regex-<LANG> has a  stack  (scope  stack)
318          internally.  You  can  use  it  for  tracking scope information. The
319          scope=... flag is for manipulating and utilizing the scope stack.
320
321          If {scope=push} is specified, a tag captured with --regex-<LANG>  is
322          pushed to the stack. {scope=push} implies {scope=ref}.
323
324          You  can  fill  the scope field of captured tag with {scope=ref}. If
325          {scope=ref} flag is given, ctags attaches the tag at the top to  the
326          tag captured with --regex-<LANG> as the value for the scope: field.
327
328          ctags  pops the tag at the top of the stack when --regex-<LANG> with
329          {scope=pop} is matched to the input line.
330
331          Specifying {scope=clear} removes all the tags in the scope.   Speci‐
332          fying {scope=set} removes all the tags in the scope, and then pushes
333          the captured tag as {scope=push} does.
334
335          In some cases, you may want to use --regex-<LANG> only for its  side
336          effects: using it only to manipulate the stack but not for capturing
337          a  tag.  In  such  a  case,   make   <name_pattern>   component   of
338          --regex-<LANG>  option  empty  while  specifying  {placeholder} as a
339          regex flag. For example, a non-named tag can be put on the stack  by
340          giving a regex flag "{scope=push}{placeholder}".
341
342          You may wonder what happens if a regex pattern with {scope=ref} flag
343          matches an input line but the stack is empty, or a non-named tag  is
344          at the top. If the regex pattern contains a {scope=ref} flag and the
345          stack is empty, the {scope=ref} flag is ignored and nothing  is  at‐
346          tached to the scope: field.
347
348          If  the  top  of  the  stack contains an unnamed tag, ctags searches
349          deeper into the stack to find the top-most named tag. If it  reaches
350          the bottom of the stack without finding a named tag, the {scope=ref}
351          flag is ignored and nothing is attached to the scope: field.
352
353          When a named tag on the stack is popped or cleared as the  side  ef‐
354          fect  of  a  pattern matching, ctags attaches the line number of the
355          match to the end: field of the named tag.
356
357          ctags clears all of the tags on the stack when it reaches the end of
358          the input source file. The line number of the end is attached to the
359          end: field of the cleared tags.
360
361       warning=<message>
362              print the given <message> at WARNING level
363
364       fatal=<message>
365              print the given <message> and exit
366

EXAMPLES

368   Perl Pod
369       This is the definition (pod.ctags) used in ctags  for  parsing  Pod  (‐
370       https://perldoc.perl.org/perlpod.html) file.
371
372          --langdef=pod
373          --map-pod=+.pod
374
375          --kinddef-pod=c,chapter,chapters
376          --kinddef-pod=s,section,sections
377          --kinddef-pod=S,subsection,subsections
378          --kinddef-pod=t,subsubsection,subsubsections
379
380          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
381          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
382          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
383          --regex-pod=/^=head4[ \t]+(.+)/\1/t/
384
385   Using scope regex flags
386       Let's  think about writing a parser for a very small subset of the Ruby
387       language.
388
389       input source file (input.srb):
390
391          class Example
392            def methodA
393                  puts "in class_method"
394            end
395            def methodB
396                  puts "in class_method"
397            end
398          end
399
400       The parser for the  input  should  capture  Example  with  class  kind,
401       methodA,  and methodB with method kind. methodA and methodB should have
402       Example as their scope. end: fields of each tag should have proper val‐
403       ues.
404
405       optlib file (sub-ruby.ctags):
406
407          --langdef=subRuby
408          --map-subRuby=.srb
409          --kinddef-subRuby=c,class,classes
410          --kinddef-subRuby=m,method,methods
411          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
412          --regex-subRuby=/^end///{scope=pop}{placeholder}
413          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
414          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
415
416       command line and output:
417
418          $ ctags --quiet --fields=+eK \
419          --options=./sub-ruby.ctags -o - input.srb
420          Example input.srb       /^class Example$/;"     class   end:8
421          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
422          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7
423

SEE ALSO

425       The official Universal Ctags web site at:
426
427       https://ctags.io/
428
429       ctags(1), tags(5), regex(3), regex(7), egrep(1)
430

AUTHOR

432       Universal  Ctags project https://ctags.io/ (This man page partially de‐
433       rived from ctags(1) of Executable-ctags)
434
435       Darren             Hiebert             <dhiebert@users.sourceforge.net>
436       http://DarrenHiebert.com/
437
438
439
440
4415.9.0                                                          CTAGS-OPTLIB(7)
Impressum