1TAGS(5)                         Universal Ctags                        TAGS(5)
2
3
4

NAME

6       tags - Vi tags file format extended in ctags projects
7

DESCRIPTION

9       The  contents  of  next  section  is a copy of FORMAT file in Exuberant
10       Ctags source code in its subversion repository at sourceforge.net.
11
12       Exceptions introduced in Universal Ctags are explained inline with "EX‐
13       CEPTION"  marker.  Statements  that are made further clear in Universal
14       Ctags are explained inline with "COMMENT" marker.
15
16
17                                        ----
18
19
20

PROPOSAL FOR EXTENDED VI TAGS FILE FORMAT

22       Version: 0.06 DRAFT
23       Date: 1998 Feb 8
24       Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>
25
26
27   Introduction
28       The file format for the "tags" file, as used by Vi and many of its  de‐
29       scendants, has limited capabilities.
30
31       This additional functionality is desired:
32
33       1. Static  or  local  tags.   The scope of these tags is the file where
34          they are defined.  The same tag can appear in several files, without
35          really being a duplicate.
36
37       2. Duplicate  tags.   Allow the same tag to occur more then once.  They
38          can be located in a different file and/or have a different command.
39
40       3. Support for C++.  A tag is not only specified by its name, but  also
41          by the context (the class name).
42
43       4. Future  extension.   When  even more additional functionality is de‐
44          sired, it must be possible to add this later, without breaking  pro‐
45          grams that don't support it.
46
47   From proposal to standard
48       To  make  this  proposal into a standard for tags files, it needs to be
49       supported by most people working on versions of Vi, ctags, etc..   Cur‐
50       rently this standard is supported by:
51
52       Darren Hiebert <dhiebert at users.sourceforge.net>
53              Exuberant Ctags
54
55       Bram Moolenaar <Bram at vim.org>
56              Vim (Vi IMproved)
57
58       These have been or will be asked to support this standard:
59
60       Nvi    Keith Bostic <bostic at bsdi.com>
61
62       Vile   Tom E. Dickey <dickey at clark.net>
63
64       NEdit  Mark Edel <edel at ltx.com>
65
66       CRiSP  Paul Fox <fox at crisp.demon.co.uk>
67
68       Lemmy  James Iuliano <jai at accessone.com>
69
70       Zeus   Jussi Jumppanen <jussij at ca.com.au>
71
72       Elvis  Steve Kirkendall <kirkenda at cs.pdx.edu>
73
74       FTE    Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
75
76   Backwards compatibility
77       A  tags file that is generated in the new format should still be usable
78       by Vi.  This makes it possible to distribute tags files that are usable
79       by all versions and descendants of Vi.
80
81       This restricts the format to what Vi can handle.  The format is:
82
83       1. The tags file is a list of lines, each line in the format:
84
85             {tagname}<Tab>{tagfile}<Tab>{tagaddress}
86
87          {tagname}
88                 Any identifier, not containing white space..
89
90                 EXCEPTION:  Universal  Ctags  violates  this item of the pro‐
91                 posal; tagname may contain spaces. However, tabs are not  al‐
92                 lowed.
93
94          <Tab>  Exactly  one  TAB character (although many versions of Vi can
95                 handle any amount of white space).
96
97          {tagfile}
98                 The name of the file where {tagname} is defined, relative  to
99                 the current directory (or location of the tags file?).
100
101          {tagaddress}
102                 Any  Ex  command.  When executed, it behaves like 'magic' was
103                 not set.
104
105       2. The tags file is sorted on {tagname}.   This  allows  for  a  binary
106          search in the file.
107
108       3. Duplicate tags are allowed, but which one is actually used is unpre‐
109          dictable (because of the binary search).
110
111       The best way to add extra text to the line for the  new  functionality,
112       without  breaking  it  for Vi, is to put a comment in the {tagaddress}.
113       This gives the freedom to use any text, and should work in  any  tradi‐
114       tional Vi implementation.
115
116       For example, when the old tags file contains:
117
118          main    main.c  /^main(argc, argv)$/
119          DEBUG   defines.c       89
120
121       The new lines can be:
122
123          main    main.c  /^main(argc, argv)$/;"any additional text
124          DEBUG   defines.c       89;"any additional text
125
126       Note  that the ';' is required to put the cursor in the right line, and
127       then the '"' is recognized as the start of a comment.
128
129       For Posix compliant Vi versions this will NOT work, since only  a  line
130       number  or  a  search  command  is recognized.  I hope Posix can be ad‐
131       justed.  Nvi suffers from this.
132
133   Security
134       Vi allows the use of any Ex command in a tags file.  This has  the  po‐
135       tential of a trojan horse security leak.
136
137       The proposal is to allow only Ex commands that position the cursor in a
138       single file.  Other commands, like editing another file,  quitting  the
139       editor,  changing  a  file  or  writing a file, are not allowed.  It is
140       therefore logical to call the command a tagaddress.
141
142       Specifically, these two Ex commands are allowed:
143
144       • A decimal line number:
145
146            89
147
148       • A search command.  It is a regular expression pattern, as used by Vi,
149         enclosed in // or ??:
150
151            /^int c;$/
152            ?main()?
153
154       There are two combinations possible:
155
156       • Concatenation of the above, with ';' in between.  The meaning is that
157         the first line number or search command is used, the cursor is  posi‐
158         tioned  in  that  line, and then the second search command is used (a
159         line number would not be useful).  This can be done  multiple  times.
160         This  is  useful when the information in a single line is not unique,
161         and the search needs to start in a specified line.
162
163            /struct xyz {/;/int count;/
164            389;/struct foo/;/char *s;/
165
166       • A trailing comment can be added, starting with ';"' (two  characters:
167         semi-colon and double-quote).  This is used below.
168
169            89;" foo bar
170
171       This  might  be extended in the future.  What is currently missing is a
172       way to position the cursor in a certain column.
173
174   Goals
175       Now the usage of the comment text has to be defined.  The following  is
176       aimed at:
177
178       1. Keep the text short, because:
179
180          • The line length that Vi can handle is limited to 512 characters.
181
182          • Tags  files can contain thousands of tags.  I have seen tags files
183            of several Mbytes.
184
185          • More text makes searching slower.
186
187       2. Keep the text readable, because:
188
189          • It is often necessary to check the output of a new ctags program.
190
191          • Be able to edit the file by hand.
192
193          • Make it easier to write a program to produce or parse the file.
194
195       3. Don't use special characters, because:
196
197          • It should be possible to treat a tags file like  any  normal  text
198            file.
199
200   Proposal
201       Use a comment after the {tagaddress} field.  The format would be:
202
203          {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
204
205       {tagname}
206              Any identifier, not containing white space..
207
208              EXCEPTION:  Universal  Ctags violates this item of the proposal;
209              name may contain spaces. However, tabs are not allowed.  Conver‐
210              sion,  for  some  characters including <Tab> in the "value", ex‐
211              plained in the last of this section is applied.
212
213       <Tab>  Exactly one TAB character (although many versions of Vi can han‐
214              dle any amount of white space).
215
216       {tagfile}
217              The name of the file where {tagname} is defined, relative to the
218              current directory (or location of the tags file?).
219
220       {tagaddress}
221              Any Ex command.  When executed, it behaves like 'magic' was  not
222              set.   It may be restricted to a line number or a search pattern
223              (Posix).
224
225              COMMENT:  {tagaddress}  could  contain   tab   characters.   See
226              ctags-client-tools(7)  to  know  how to programmatically extract
227              {tagaddress} (called "pattern field" there) and parse it.
228
229       Optionally:
230
231       ;"     semicolon + doublequote: Ends the tagaddress in way  that  looks
232              like the start of a comment to Vi.
233
234       {tagfield}
235              See below.
236
237       A tagfield has a name, a colon, and a value: "name:value".
238
239       • The  name  consist  only  out  of alphabetical characters.  Upper and
240         lower case are allowed.  Lower case  is  recommended.   Case  matters
241         ("kind:" and "Kind: are different tagfields).
242
243         EXCEPTION:  Universal Ctags allows users to use a numerical character
244         in the name other than its initial letter.
245
246       • The value may be empty.  It cannot contain a <Tab>.
247
248         • When a value contains a \t, this stands for a <Tab>.
249
250         • When a value contains a \r, this stands for a <CR>.
251
252         • When a value contains a \n, this stands for a <NL>.
253
254         • When a value contains a \\, this stands for a single \ character.
255
256         Other use of the backslash character is reserved  for  future  expan‐
257         sion.   Warning: When a tagfield value holds an MS-DOS file name, the
258         backslashes must be doubled!
259
260         EXCEPTION: Universal Ctags introduces more conversion rules.
261
262         • When a value contains a \a, this stands for a <BEL> (0x07).
263
264         • When a value contains a \b, this stands for a <BS> (0x08).
265
266         • When a value contains a \v, this stands for a <VT> (0x0b).
267
268         • When a value contains a \f, this stands for a <FF> (0x0c).
269
270         • The characters in range 0x01 to 0x1F included, and  0x7F  are  con‐
271           verted  to \x prefixed hexadecimal number if the characters are not
272           handled in the above "value" rules.
273
274         EXCEPTION: Universal Ctags allows all these escape sequences in {tag‐
275         name} also.
276
277         • The leading space (0x20) and ! (0x21) in {tagname} are converted to
278           \x prefixed hexadecimal number (\x20 and \x21) if the tag is not  a
279           pseudo-tag.  As  described later, a pseudo-tag starts with !. These
280           rules are for distinguishing pseudo-tags and non pseudo-tags (regu‐
281           lar tags) when tags lines in a tag file are sorted.
282
283       Proposed tagfield names:
284
285                      ┌───────────┬────────────────────────────┐
286                      │FIELD-NAME │ DESCRIPTION                │
287                      ├───────────┼────────────────────────────┤
288                      │arity      │ Number  of arguments for a │
289                      │           │ function tag.              │
290                      ├───────────┼────────────────────────────┤
291                      │class      │ Name  of  the  class   for │
292                      │           │ which this tag is a member │
293                      │           │ or method.                 │
294                      ├───────────┼────────────────────────────┤
295                      │enum       │ Name of the enumeration in │
296                      │           │ which  this tag is an enu‐ │
297                      │           │ merator.                   │
298                      ├───────────┼────────────────────────────┤
299                      │file       │ Static (local) tag, with a │
300                      │           │ scope   of  the  specified │
301                      │           │ file.  When the  value  is │
302                      │           │ empty, {tagfile} is used.  │
303                      ├───────────┼────────────────────────────┤
304                      │function   │ Function in which this tag │
305                      │           │ is  defined.   Useful  for │
306                      │           │ local variables (and func‐ │
307                      │           │ tions).   When   functions │
308                      │           │ nest  (e.g.,  in  Pascal), │
309                      │           │ the  function  names   are │
310                      │           │ concatenated,    separated │
311                      │           │ with '/', so it looks like │
312                      │           │ a path.                    │
313                      └───────────┴────────────────────────────┘
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340                      │kind       │ Kind  of  tag.   The value │
341                      │           │ depends on  the  language. │
342                      │           │ For  C and C++ these kinds │
343                      │           │ are recommended:           │
344                      │           │                            │
345                      │           │        c      class name   │
346                      │           │                            │
347                      │           │        d      define (from │
348                      │           │               #define XXX) │
349                      │           │                            │
350                      │           │        e      enumerator   │
351                      │           │                            │
352                      │           │        f      function  or │
353                      │           │               method name  │
354                      │           │                            │
355                      │           │        F      file name    │
356                      │           │                            │
357                      │           │        g      enumeration  │
358                      │           │               name         │
359                      │           │                            │
360                      │           │        m      member   (of │
361                      │           │               structure or │
362                      │           │               class data)  │
363                      │           │                            │
364                      │           │        p      function     │
365                      │           │               prototype    │
366                      │           │                            │
367                      │           │        s      structure    │
368                      │           │               name         │
369                      │           │                            │
370                      │           │        t      typedef      │
371                      │           │                            │
372                      │           │        u      union name   │
373                      │           │                            │
374                      │           │        v      variable     │
375                      │           │                            │
376                      │           │        When  this field is │
377                      │           │        omitted,  the  kind │
378                      │           │        of   tag  is  unde‐ │
379                      │           │        fined.              │
380                      ├───────────┼────────────────────────────┤
381                      │struct     │ Name  of  the  struct   in │
382                      │           │ which  this  tag is a mem‐ │
383                      │           │ ber.                       │
384                      ├───────────┼────────────────────────────┤
385                      │union      │ Name of the union in which │
386                      │           │ this tag is a member.      │
387                      └───────────┴────────────────────────────┘
388
389       Note that these are mostly for C and C++.  When tags programs are writ‐
390       ten for other languages, this list should be extended  to  include  the
391       used  field  names.  This will help users to be independent of the tags
392       program used.
393
394       Examples:
395
396          asdf    sub.cc  /^asdf()$/;"    new_field:some\svalue   file:
397          foo_t   sub.h   /^typedef foo_t$/;"     kind:t
398          func3   sub.p   /^func3()$/;"   function:/func1/func2   file:
399          getflag sub.c   /^getflag(arg)$/;"      kind:f  file:
400          inc     sub.cc  /^inc()$/;"     file: class:PipeBuf
401
402       The name of the "kind:" field can be omitted.  This is  to  reduce  the
403       size  of  the  tags file by about 15%.  A program reading the tags file
404       can recognize the "kind:" field by the missing ':'.  Examples:
405
406          foo_t   sub.h   /^typedef foo_t$/;"     t
407          getflag sub.c   /^getflag(arg)$/;"      f       file:
408
409       Additional remarks:
410
411       • When a tagfield appears twice in a tag line, only  the  last  one  is
412         used.
413
414       Note about line separators:
415
416       Vi  traditionally  runs  on Unix systems, where the line separator is a
417       single linefeed character  <NL>.   On  MS-DOS  and  compatible  systems
418       <CR><NL> is the standard line separator.  To increase portability, this
419       line separator is also supported.
420
421       On the Macintosh a single <CR> is used for line separator.   Supporting
422       this  on Unix systems causes problems, because most fgets() implementa‐
423       tion don't see the <CR> as a line separator.  Therefore the support for
424       a <CR> as line separator is limited to the Macintosh.
425
426       Summary:
427
428                ┌───────────────┬──────────────┬─────────────────────┐
429                │line separator │ generated on │ accepted on         │
430                ├───────────────┼──────────────┼─────────────────────┤
431                │<LF>           │ Unix         │ Unix,  MS-DOS, Mac‐ │
432                │               │              │ intosh              │
433                ├───────────────┼──────────────┼─────────────────────┤
434                │<CR>           │ Macintosh    │ Macintosh           │
435                ├───────────────┼──────────────┼─────────────────────┤
436                │<CR><LF>       │ MS-DOS       │ Unix, MS-DOS,  Mac‐ │
437                │               │              │ intosh              │
438                └───────────────┴──────────────┴─────────────────────┘
439
440       The characters <CR> and <LF> cannot be used inside a tag line.  This is
441       not mentioned elsewhere (because it's obvious).
442
443       Note about white space:
444
445       Vi allowed any white space to separate the tagname  from  the  tagfile,
446       and  the  filename  from the tagaddress.  This would need to be allowed
447       for backwards compatibility.  However, all known programs that generate
448       tags use a single <Tab> to separate fields.
449
450       There  is  a  problem for using file names with embedded white space in
451       the tagfile field.  To work around this, the  same  special  characters
452       could  be  used  as  in  the new fields, for example \s.  But, unfortu‐
453       nately, in MS-DOS the backslash character  is  used  to  separate  file
454       names.   The  file  name  c:\vim\sap  contains  \s,  but  this is not a
455       <Space>.  The number of backslashes could be doubled, but that will add
456       a lot of characters, and make parsing the tags file slower and clumsy.
457
458       To avoid these problems, we will only allow a <Tab> to separate fields,
459       and not support a file name or tagname that contains a <Tab> character.
460       This  means  that  we are not 100% Vi compatible.  However, there is no
461       known tags program that uses something else than a  <Tab>  to  separate
462       the  fields.  Only when a user typed the tags file himself, or made his
463       own program to generate a tags file, we could run  into  problems.   To
464       solve  this, the tags file should be filtered, to replace the arbitrary
465       white space with a single <Tab>.  This Vi command can be used:
466
467          :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
468
469       (replace ^I with a real <Tab>).
470
471       TAG FILE INFORMATION:
472
473       Pseudo-tag lines can be used to encode information into  the  tag  file
474       regarding  details  about its content (e.g. have the tags been sorted?,
475       are the optional tagfields present?), and regarding the program used to
476       generate  the  tag file.  This information can be used both to optimize
477       use of the tag file (e.g.  enable/disable binary searching) and provide
478       general information (what version of the generator was used).
479
480       The names of the tags used in these lines may be suitably chosen to en‐
481       sure that when sorted, they will always be located near the first lines
482       of the tag file.  The use of "!_TAG_" is recommended.  Note that a rare
483       tag like "!"  can sort to before these lines.  The program reading  the
484       tags file should be smart enough to skip over these tags.
485
486       The  lines  described  below have been chosen to convey a select set of
487       information.
488
489       Tag lines providing information about the content of the tag file:
490
491          !_TAG_FILE_FORMAT   {version-number}        /optional comment/
492          !_TAG_FILE_SORTED   {0|1}                   /0=unsorted, 1=sorted/
493
494       The {version-number} used in the tag  file  format  line  reserves  the
495       value  of  "1"  for tag files complying with the original UNIX vi/ctags
496       format, and reserves the value "2" for tag files  complying  with  this
497       proposal.  This value may be used to determine if the extended features
498       described in this proposal are present.
499
500       Tag lines providing information about the program used to generate  the
501       tag file, and provided solely for documentation purposes:
502
503          !_TAG_PROGRAM_AUTHOR        {author-name}   /{email-address}/
504          !_TAG_PROGRAM_NAME  {program-name}  /optional comment/
505          !_TAG_PROGRAM_URL   {URL}   /optional comment/
506          !_TAG_PROGRAM_VERSION       {version-id}    /optional comment/
507
508       EXCEPTION:  Universal  Ctags introduces more kinds of pseudo-tags.  See
509       ctags-client-tools(7) about them.
510
511       COMMENT: Though pseudo-tags are  semantically  different  from  regular
512       tags, They use the same format, which is:
513
514          {tagname}<Tab>{tagfile}<Tab>{tagaddress}
515
516       ,  and  the  escape sequences and illegal characters explained in "Pro‐
517       posal" section also applies to pseudo-tags.
518
519
520                                        ----
521
522
523

EXCEPTIONS IN UNIVERSAL CTAGS

525       Universal Ctags supports this proposal with some exceptions.
526
527   Exceptions
528       1. {tagname} in tags file generated by Universal Ctags may contain spa‐
529          ces and several escape sequences. Parsers for documents like Tex and
530          reStructuredText, or liberal languages such as JavaScript need these
531          exceptions.  See {tagname} of Proposal section for more detail about
532          the conversion.
533
534       2. "name" part of {tagfield} in a tag generated by Universal Ctags  may
535          contain  numeric  characters,  but the first character of the "name"
536          must be alphabetic.
537
538   Compatible output and weakness
539       Default behavior (--output-format=u-ctags option) has  the  exceptions.
540       In  other hand, with --output-format=e-ctags option ctags has no excep‐
541       tion; Universal Ctags command may use the same file format as Exuberant
542       Ctags.  However,  --output-format=e-ctags throws away a tag entry which
543       name includes a space or a tab  character.  TAG_OUTPUT_MODE  pseudo-tag
544       tells which format is used when ctags generating tags file.
545

SEE ALSO

547       ctags(1),   ctags-client-tools(7),   ctags-incompatibilities(7),  read‐
548       tags(1)
549
550
551
552
5532+                                                                     TAGS(5)
Impressum