1TAGS(5)                         Universal Ctags                        TAGS(5)
2
3
4

NAME

6       tags - Vi tags file format extended in ctags projects
7

DESCRIPTION

9       The  contents  of  next  section  is a copy of FORMAT file in Exuberant
10       Ctags source code in its subversion repository at sourceforge.net.
11
12       Exceptions introduced in Universal Ctags are explained inline with "EX‐
13       CEPTION" marker.
14
15
16                                        ----
17
18
19

PROPOSAL FOR EXTENDED VI TAGS FILE FORMAT

21       Version: 0.06 DRAFT
22       Date: 1998 Feb 8
23       Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>
24
25
26   Introduction
27       The  file format for the "tags" file, as used by Vi and many of its de‐
28       scendants, has limited capabilities.
29
30       This additional functionality is desired:
31
32       1. Static or local tags.  The scope of these tags  is  the  file  where
33          they are defined.  The same tag can appear in several files, without
34          really being a duplicate.
35
36       2. Duplicate tags.  Allow the same tag to occur more then  once.   They
37          can be located in a different file and/or have a different command.
38
39       3. Support  for C++.  A tag is not only specified by its name, but also
40          by the context (the class name).
41
42       4. Future extension.  When even more additional  functionality  is  de‐
43          sired,  it must be possible to add this later, without breaking pro‐
44          grams that don't support it.
45
46   From proposal to standard
47       To make this proposal into a standard for tags files, it  needs  to  be
48       supported  by most people working on versions of Vi, ctags, etc..  Cur‐
49       rently this standard is supported by:
50
51       Darren Hiebert <dhiebert at users.sourceforge.net>
52              Exuberant Ctags
53
54       Bram Moolenaar <Bram at vim.org>
55              Vim (Vi IMproved)
56
57       These have been or will be asked to support this standard:
58
59       Nvi    Keith Bostic <bostic at bsdi.com>
60
61       Vile   Tom E. Dickey <dickey at clark.net>
62
63       NEdit  Mark Edel <edel at ltx.com>
64
65       CRiSP  Paul Fox <fox at crisp.demon.co.uk>
66
67       Lemmy  James Iuliano <jai at accessone.com>
68
69       Zeus   Jussi Jumppanen <jussij at ca.com.au>
70
71       Elvis  Steve Kirkendall <kirkenda at cs.pdx.edu>
72
73       FTE    Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
74
75   Backwards compatibility
76       A tags file that is generated in the new format should still be  usable
77       by Vi.  This makes it possible to distribute tags files that are usable
78       by all versions and descendants of Vi.
79
80       This restricts the format to what Vi can handle.  The format is:
81
82       1. The tags file is a list of lines, each line in the format:
83
84             {tagname}<Tab>{tagfile}<Tab>{tagaddress}
85
86          {tagname}
87                 Any identifier, not containing white space..
88
89                 EXCEPTION: Universal Ctags violates this  item  of  the  pro‐
90                 posal;  tagname may contain spaces. However, tabs are not al‐
91                 lowed.
92
93          <Tab>  Exactly one TAB character (although many versions of  Vi  can
94                 handle any amount of white space).
95
96          {tagfile}
97                 The  name of the file where {tagname} is defined, relative to
98                 the current directory (or location of the tags file?).
99
100          {tagaddress}
101                 Any Ex command.  When executed, it behaves like  'magic'  was
102                 not set.
103
104       2. The  tags  file  is  sorted  on {tagname}.  This allows for a binary
105          search in the file.
106
107       3. Duplicate tags are allowed, but which one is actually used is unpre‐
108          dictable (because of the binary search).
109
110       The  best  way to add extra text to the line for the new functionality,
111       without breaking it for Vi, is to put a comment  in  the  {tagaddress}.
112       This  gives  the freedom to use any text, and should work in any tradi‐
113       tional Vi implementation.
114
115       For example, when the old tags file contains:
116
117          main    main.c  /^main(argc, argv)$/
118          DEBUG   defines.c       89
119
120       The new lines can be:
121
122          main    main.c  /^main(argc, argv)$/;"any additional text
123          DEBUG   defines.c       89;"any additional text
124
125       Note that the ';' is required to put the cursor in the right line,  and
126       then the '"' is recognized as the start of a comment.
127
128       For  Posix  compliant Vi versions this will NOT work, since only a line
129       number or a search command is recognized.  I  hope  Posix  can  be  ad‐
130       justed.  Nvi suffers from this.
131
132   Security
133       Vi  allows  the use of any Ex command in a tags file.  This has the po‐
134       tential of a trojan horse security leak.
135
136       The proposal is to allow only Ex commands that position the cursor in a
137       single  file.   Other commands, like editing another file, quitting the
138       editor, changing a file or writing a file,  are  not  allowed.   It  is
139       therefore logical to call the command a tagaddress.
140
141       Specifically, these two Ex commands are allowed:
142
143       • A decimal line number:
144
145            89
146
147       • A search command.  It is a regular expression pattern, as used by Vi,
148         enclosed in // or ??:
149
150            /^int c;$/
151            ?main()?
152
153       There are two combinations possible:
154
155       • Concatenation of the above, with ';' in between.  The meaning is that
156         the  first line number or search command is used, the cursor is posi‐
157         tioned in that line, and then the second search command  is  used  (a
158         line  number  would not be useful).  This can be done multiple times.
159         This is useful when the information in a single line is  not  unique,
160         and the search needs to start in a specified line.
161
162            /struct xyz {/;/int count;/
163            389;/struct foo/;/char *s;/
164
165       • A  trailing comment can be added, starting with ';"' (two characters:
166         semi-colon and double-quote).  This is used below.
167
168            89;" foo bar
169
170       This might be extended in the future.  What is currently missing  is  a
171       way to position the cursor in a certain column.
172
173   Goals
174       Now  the usage of the comment text has to be defined.  The following is
175       aimed at:
176
177       1. Keep the text short, because:
178
179          • The line length that Vi can handle is limited to 512 characters.
180
181          • Tags files can contain thousands of tags.  I have seen tags  files
182            of several Mbytes.
183
184          • More text makes searching slower.
185
186       2. Keep the text readable, because:
187
188          • It is often necessary to check the output of a new ctags program.
189
190          • Be able to edit the file by hand.
191
192          • Make it easier to write a program to produce or parse the file.
193
194       3. Don't use special characters, because:
195
196          • It  should  be  possible to treat a tags file like any normal text
197            file.
198
199   Proposal
200       Use a comment after the {tagaddress} field.  The format would be:
201
202          {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
203
204       {tagname}
205              Any identifier, not containing white space..
206
207              EXCEPTION: Universal Ctags violates this item of  the  proposal;
208              name may contain spaces. However, tabs are not allowed.  Conver‐
209              sion, for some characters including <Tab> in  the  "value",  ex‐
210              plained in the last of this section is applied.
211
212       <Tab>  Exactly one TAB character (although many versions of Vi can han‐
213              dle any amount of white space).
214
215       {tagfile}
216              The name of the file where {tagname} is defined, relative to the
217              current directory (or location of the tags file?).
218
219       {tagaddress}
220              Any  Ex command.  When executed, it behaves like 'magic' was not
221              set.  It may be restricted to a line number or a search  pattern
222              (Posix).
223
224       Optionally:
225
226       ;"     semicolon  +  doublequote: Ends the tagaddress in way that looks
227              like the start of a comment to Vi.
228
229       {tagfield}
230              See below.
231
232       A tagfield has a name, a colon, and a value: "name:value".
233
234       • The name consist only out  of  alphabetical  characters.   Upper  and
235         lower  case  are  allowed.   Lower case is recommended.  Case matters
236         ("kind:" and "Kind: are different tagfields).
237
238         EXCEPTION: Universal Ctags allows users to use a numerical  character
239         in the name other than its initial letter.
240
241       • The value may be empty.  It cannot contain a <Tab>.
242
243         • When a value contains a \t, this stands for a <Tab>.
244
245         • When a value contains a \r, this stands for a <CR>.
246
247         • When a value contains a \n, this stands for a <NL>.
248
249         • When a value contains a \\, this stands for a single \ character.
250
251         Other  use  of  the backslash character is reserved for future expan‐
252         sion.  Warning: When a tagfield value holds an MS-DOS file name,  the
253         backslashes must be doubled!
254
255         EXCEPTION: Universal Ctags introduces more conversion rules.
256
257         • When a value contains a \a, this stands for a <BEL> (0x07).
258
259         • When a value contains a \b, this stands for a <BS> (0x08).
260
261         • When a value contains a \v, this stands for a <VT> (0x0b).
262
263         • When a value contains a \f, this stands for a <FF> (0x0c).
264
265         • The  characters  in  range 0x01 to 0x1F included, and 0x7F are con‐
266           verted to \x prefixed hexadecimal number if the characters are  not
267           handled in the above "value" rules.
268
269         • The leading space (0x20) and ! (0x21) in {tagname} are converted to
270           \x prefixed hexadecimal number (\x20 and \x21) if the tag is not  a
271           pseudo-tag.  As  described later, a pseudo-tag starts with !. These
272           rules are for distinguishing pseudo-tags and non pseudo-tags (regu‐
273           lar tags) when tags lines in a tag file are sorted.
274
275       Proposed tagfield names:
276
277                      ┌───────────┬────────────────────────────┐
278                      │FIELD-NAME │ DESCRIPTION                │
279                      ├───────────┼────────────────────────────┤
280                      │arity      │ Number  of arguments for a │
281                      │           │ function tag.              │
282                      ├───────────┼────────────────────────────┤
283                      │class      │ Name  of  the  class   for │
284                      │           │ which this tag is a member │
285                      │           │ or method.                 │
286                      ├───────────┼────────────────────────────┤
287                      │enum       │ Name of the enumeration in │
288                      │           │ which  this tag is an enu‐ │
289                      │           │ merator.                   │
290                      ├───────────┼────────────────────────────┤
291                      │file       │ Static (local) tag, with a │
292                      │           │ scope   of  the  specified │
293                      │           │ file.  When the  value  is │
294                      │           │ empty, {tagfile} is used.  │
295                      ├───────────┼────────────────────────────┤
296                      │function   │ Function in which this tag │
297                      │           │ is  defined.   Useful  for │
298                      │           │ local variables (and func‐ │
299                      │           │ tions).   When   functions │
300                      │           │ nest  (e.g.,  in  Pascal), │
301                      │           │ the  function  names   are │
302                      │           │ concatenated,    separated │
303                      │           │ with '/', so it looks like │
304                      │           │ a path.                    │
305                      └───────────┴────────────────────────────┘
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339                      │kind       │ Kind  of  tag.   The value │
340                      │           │ depends on  the  language. │
341                      │           │ For  C and C++ these kinds │
342                      │           │ are recommended:           │
343                      │           │                            │
344                      │           │        c      class name   │
345                      │           │                            │
346                      │           │        d      define (from │
347                      │           │               #define XXX) │
348                      │           │                            │
349                      │           │        e      enumerator   │
350                      │           │                            │
351                      │           │        f      function  or │
352                      │           │               method name  │
353                      │           │                            │
354                      │           │        F      file name    │
355                      │           │                            │
356                      │           │        g      enumeration  │
357                      │           │               name         │
358                      │           │                            │
359                      │           │        m      member   (of │
360                      │           │               structure or │
361                      │           │               class data)  │
362                      │           │                            │
363                      │           │        p      function     │
364                      │           │               prototype    │
365                      │           │                            │
366                      │           │        s      structure    │
367                      │           │               name         │
368                      │           │                            │
369                      │           │        t      typedef      │
370                      │           │                            │
371                      │           │        u      union name   │
372                      │           │                            │
373                      │           │        v      variable     │
374                      │           │                            │
375                      │           │        When  this field is │
376                      │           │        omitted,  the  kind │
377                      │           │        of   tag  is  unde‐ │
378                      │           │        fined.              │
379                      ├───────────┼────────────────────────────┤
380                      │struct     │ Name  of  the  struct   in │
381                      │           │ which  this  tag is a mem‐ │
382                      │           │ ber.                       │
383                      ├───────────┼────────────────────────────┤
384                      │union      │ Name of the union in which │
385                      │           │ this tag is a member.      │
386                      └───────────┴────────────────────────────┘
387
388       Note that these are mostly for C and C++.  When tags programs are writ‐
389       ten for other languages, this list should be extended  to  include  the
390       used  field  names.  This will help users to be independent of the tags
391       program used.
392
393       Examples:
394
395          asdf    sub.cc  /^asdf()$/;"    new_field:some\svalue   file:
396          foo_t   sub.h   /^typedef foo_t$/;"     kind:t
397          func3   sub.p   /^func3()$/;"   function:/func1/func2   file:
398          getflag sub.c   /^getflag(arg)$/;"      kind:f  file:
399          inc     sub.cc  /^inc()$/;"     file: class:PipeBuf
400
401       The name of the "kind:" field can be omitted.  This is  to  reduce  the
402       size  of  the  tags file by about 15%.  A program reading the tags file
403       can recognize the "kind:" field by the missing ':'.  Examples:
404
405          foo_t   sub.h   /^typedef foo_t$/;"     t
406          getflag sub.c   /^getflag(arg)$/;"      f       file:
407
408       Additional remarks:
409
410       • When a tagfield appears twice in a tag line, only  the  last  one  is
411         used.
412
413       Note about line separators:
414
415       Vi  traditionally  runs  on Unix systems, where the line separator is a
416       single linefeed character  <NL>.   On  MS-DOS  and  compatible  systems
417       <CR><NL> is the standard line separator.  To increase portability, this
418       line separator is also supported.
419
420       On the Macintosh a single <CR> is used for line separator.   Supporting
421       this  on Unix systems causes problems, because most fgets() implementa‐
422       tion don't see the <CR> as a line separator.  Therefore the support for
423       a <CR> as line separator is limited to the Macintosh.
424
425       Summary:
426
427                ┌───────────────┬──────────────┬─────────────────────┐
428                │line separator │ generated on │ accepted on         │
429                ├───────────────┼──────────────┼─────────────────────┤
430                │<LF>           │ Unix         │ Unix,  MS-DOS, Mac‐ │
431                │               │              │ intosh              │
432                ├───────────────┼──────────────┼─────────────────────┤
433                │<CR>           │ Macintosh    │ Macintosh           │
434                ├───────────────┼──────────────┼─────────────────────┤
435                │<CR><LF>       │ MS-DOS       │ Unix, MS-DOS,  Mac‐ │
436                │               │              │ intosh              │
437                └───────────────┴──────────────┴─────────────────────┘
438
439       The characters <CR> and <LF> cannot be used inside a tag line.  This is
440       not mentioned elsewhere (because it's obvious).
441
442       Note about white space:
443
444       Vi allowed any white space to separate the tagname  from  the  tagfile,
445       and  the  filename  from the tagaddress.  This would need to be allowed
446       for backwards compatibility.  However, all known programs that generate
447       tags use a single <Tab> to separate fields.
448
449       There  is  a  problem for using file names with embedded white space in
450       the tagfile field.  To work around this, the  same  special  characters
451       could  be  used  as  in  the new fields, for example \s.  But, unfortu‐
452       nately, in MS-DOS the backslash character  is  used  to  separate  file
453       names.   The  file  name  c:\vim\sap  contains  \s,  but  this is not a
454       <Space>.  The number of backslashes could be doubled, but that will add
455       a lot of characters, and make parsing the tags file slower and clumsy.
456
457       To avoid these problems, we will only allow a <Tab> to separate fields,
458       and not support a file name or tagname that contains a <Tab> character.
459       This  means  that  we are not 100% Vi compatible.  However, there is no
460       known tags program that uses something else than a  <Tab>  to  separate
461       the  fields.  Only when a user typed the tags file himself, or made his
462       own program to generate a tags file, we could run  into  problems.   To
463       solve  this, the tags file should be filtered, to replace the arbitrary
464       white space with a single <Tab>.  This Vi command can be used:
465
466          :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
467
468       (replace ^I with a real <Tab>).
469
470       TAG FILE INFORMATION:
471
472       Pseudo-tag lines can be used to encode information into  the  tag  file
473       regarding  details  about its content (e.g. have the tags been sorted?,
474       are the optional tagfields present?), and regarding the program used to
475       generate  the  tag file.  This information can be used both to optimize
476       use of the tag file (e.g.  enable/disable binary searching) and provide
477       general information (what version of the generator was used).
478
479       The names of the tags used in these lines may be suitably chosen to en‐
480       sure that when sorted, they will always be located near the first lines
481       of the tag file.  The use of "!_TAG_" is recommended.  Note that a rare
482       tag like "!"  can sort to before these lines.  The program reading  the
483       tags file should be smart enough to skip over these tags.
484
485       The  lines  described  below have been chosen to convey a select set of
486       information.
487
488       Tag lines providing information about the content of the tag file:
489
490          !_TAG_FILE_FORMAT   {version-number}        /optional comment/
491          !_TAG_FILE_SORTED   {0|1}                   /0=unsorted, 1=sorted/
492
493       The {version-number} used in the tag  file  format  line  reserves  the
494       value  of  "1"  for tag files complying with the original UNIX vi/ctags
495       format, and reserves the value "2" for tag files  complying  with  this
496       proposal.  This value may be used to determine if the extended features
497       described in this proposal are present.
498
499       Tag lines providing information about the program used to generate  the
500       tag file, and provided solely for documentation purposes:
501
502          !_TAG_PROGRAM_AUTHOR        {author-name}   /{email-address}/
503          !_TAG_PROGRAM_NAME  {program-name}  /optional comment/
504          !_TAG_PROGRAM_URL   {URL}   /optional comment/
505          !_TAG_PROGRAM_VERSION       {version-id}    /optional comment/
506
507       EXCEPTION:  Universal  Ctags introduces more kinds of pseudo-tags.  See
508       ctags-client-tools(7) about them.
509
510
511                                        ----
512
513
514

EXCEPTIONS IN UNIVERSAL CTAGS

516       Universal Ctags supports this proposal with some exceptions.
517
518   Exceptions
519       1. {tagname} in tags file generated by Universal Ctags may contain spa‐
520          ces and several escape sequences. Parsers for documents like Tex and
521          reStructuredText, or liberal languages such as JavaScript need these
522          exceptions.  See {tagname} of Proposal section for more detail about
523          the conversion.
524
525       2. "name" part of {tagfield} in a tag generated by Universal Ctags  may
526          contain  numeric  characters,  but the first character of the "name"
527          must be alphabetic.
528
529   Compatible output and weakness
530       Default behavior (--output-format=u-ctags option) has  the  exceptions.
531       In  other hand, with --output-format=e-ctags option ctags has no excep‐
532       tion; Universal Ctags command may use the same file format as Exuberant
533       Ctags.  However,  --output-format=e-ctags throws away a tag entry which
534       name includes a space or a tab  character.  TAG_OUTPUT_MODE  pseudo-tag
535       tells which format is used when ctags generating tags file.
536

SEE ALSO

538       ctags(1),   ctags-client-tools(7),   ctags-incompatibilities(7),  read‐
539       tags(1)
540
541
542
543
5442+                                                                     TAGS(5)
Impressum