1TAGS(5) Universal Ctags TAGS(5)
2
3
4
6 tags - Vi tags file format extended in ctags projects
7
9 The contents of next section is a copy of FORMAT file in Exuberant
10 Ctags source code in its subversion repository at sourceforge.net.
11
12 Exceptions introduced in Universal Ctags are explained inline with "EX‐
13 CEPTION" marker.
14
15
16 ----
17
18
19
21 Version: 0.06 DRAFT
22 Date: 1998 Feb 8
23 Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>
24
25
26 Introduction
27 The file format for the "tags" file, as used by Vi and many of its de‐
28 scendants, has limited capabilities.
29
30 This additional functionality is desired:
31
32 1. Static or local tags. The scope of these tags is the file where
33 they are defined. The same tag can appear in several files, without
34 really being a duplicate.
35
36 2. Duplicate tags. Allow the same tag to occur more then once. They
37 can be located in a different file and/or have a different command.
38
39 3. Support for C++. A tag is not only specified by its name, but also
40 by the context (the class name).
41
42 4. Future extension. When even more additional functionality is de‐
43 sired, it must be possible to add this later, without breaking pro‐
44 grams that don't support it.
45
46 From proposal to standard
47 To make this proposal into a standard for tags files, it needs to be
48 supported by most people working on versions of Vi, ctags, etc.. Cur‐
49 rently this standard is supported by:
50
51 Darren Hiebert <dhiebert at users.sourceforge.net>
52 Exuberant Ctags
53
54 Bram Moolenaar <Bram at vim.org>
55 Vim (Vi IMproved)
56
57 These have been or will be asked to support this standard:
58
59 Nvi Keith Bostic <bostic at bsdi.com>
60
61 Vile Tom E. Dickey <dickey at clark.net>
62
63 NEdit Mark Edel <edel at ltx.com>
64
65 CRiSP Paul Fox <fox at crisp.demon.co.uk>
66
67 Lemmy James Iuliano <jai at accessone.com>
68
69 Zeus Jussi Jumppanen <jussij at ca.com.au>
70
71 Elvis Steve Kirkendall <kirkenda at cs.pdx.edu>
72
73 FTE Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
74
75 Backwards compatibility
76 A tags file that is generated in the new format should still be usable
77 by Vi. This makes it possible to distribute tags files that are usable
78 by all versions and descendants of Vi.
79
80 This restricts the format to what Vi can handle. The format is:
81
82 1. The tags file is a list of lines, each line in the format:
83
84 {tagname}<Tab>{tagfile}<Tab>{tagaddress}
85
86 {tagname}
87 Any identifier, not containing white space..
88
89 EXCEPTION: Universal Ctags violates this item of the pro‐
90 posal; tagname may contain spaces. However, tabs are not al‐
91 lowed.
92
93 <Tab> Exactly one TAB character (although many versions of Vi can
94 handle any amount of white space).
95
96 {tagfile}
97 The name of the file where {tagname} is defined, relative to
98 the current directory (or location of the tags file?).
99
100 {tagaddress}
101 Any Ex command. When executed, it behaves like 'magic' was
102 not set.
103
104 2. The tags file is sorted on {tagname}. This allows for a binary
105 search in the file.
106
107 3. Duplicate tags are allowed, but which one is actually used is unpre‐
108 dictable (because of the binary search).
109
110 The best way to add extra text to the line for the new functionality,
111 without breaking it for Vi, is to put a comment in the {tagaddress}.
112 This gives the freedom to use any text, and should work in any tradi‐
113 tional Vi implementation.
114
115 For example, when the old tags file contains:
116
117 main main.c /^main(argc, argv)$/
118 DEBUG defines.c 89
119
120 The new lines can be:
121
122 main main.c /^main(argc, argv)$/;"any additional text
123 DEBUG defines.c 89;"any additional text
124
125 Note that the ';' is required to put the cursor in the right line, and
126 then the '"' is recognized as the start of a comment.
127
128 For Posix compliant Vi versions this will NOT work, since only a line
129 number or a search command is recognized. I hope Posix can be ad‐
130 justed. Nvi suffers from this.
131
132 Security
133 Vi allows the use of any Ex command in a tags file. This has the po‐
134 tential of a trojan horse security leak.
135
136 The proposal is to allow only Ex commands that position the cursor in a
137 single file. Other commands, like editing another file, quitting the
138 editor, changing a file or writing a file, are not allowed. It is
139 therefore logical to call the command a tagaddress.
140
141 Specifically, these two Ex commands are allowed:
142
143 • A decimal line number:
144
145 89
146
147 • A search command. It is a regular expression pattern, as used by Vi,
148 enclosed in // or ??:
149
150 /^int c;$/
151 ?main()?
152
153 There are two combinations possible:
154
155 • Concatenation of the above, with ';' in between. The meaning is that
156 the first line number or search command is used, the cursor is posi‐
157 tioned in that line, and then the second search command is used (a
158 line number would not be useful). This can be done multiple times.
159 This is useful when the information in a single line is not unique,
160 and the search needs to start in a specified line.
161
162 /struct xyz {/;/int count;/
163 389;/struct foo/;/char *s;/
164
165 • A trailing comment can be added, starting with ';"' (two characters:
166 semi-colon and double-quote). This is used below.
167
168 89;" foo bar
169
170 This might be extended in the future. What is currently missing is a
171 way to position the cursor in a certain column.
172
173 Goals
174 Now the usage of the comment text has to be defined. The following is
175 aimed at:
176
177 1. Keep the text short, because:
178
179 • The line length that Vi can handle is limited to 512 characters.
180
181 • Tags files can contain thousands of tags. I have seen tags files
182 of several Mbytes.
183
184 • More text makes searching slower.
185
186 2. Keep the text readable, because:
187
188 • It is often necessary to check the output of a new ctags program.
189
190 • Be able to edit the file by hand.
191
192 • Make it easier to write a program to produce or parse the file.
193
194 3. Don't use special characters, because:
195
196 • It should be possible to treat a tags file like any normal text
197 file.
198
199 Proposal
200 Use a comment after the {tagaddress} field. The format would be:
201
202 {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
203
204 {tagname}
205 Any identifier, not containing white space..
206
207 EXCEPTION: Universal Ctags violates this item of the proposal;
208 name may contain spaces. However, tabs are not allowed. Conver‐
209 sion, for some characters including <Tab> in the "value", ex‐
210 plained in the last of this section is applied.
211
212 <Tab> Exactly one TAB character (although many versions of Vi can han‐
213 dle any amount of white space).
214
215 {tagfile}
216 The name of the file where {tagname} is defined, relative to the
217 current directory (or location of the tags file?).
218
219 {tagaddress}
220 Any Ex command. When executed, it behaves like 'magic' was not
221 set. It may be restricted to a line number or a search pattern
222 (Posix).
223
224 Optionally:
225
226 ;" semicolon + doublequote: Ends the tagaddress in way that looks
227 like the start of a comment to Vi.
228
229 {tagfield}
230 See below.
231
232 A tagfield has a name, a colon, and a value: "name:value".
233
234 • The name consist only out of alphabetical characters. Upper and
235 lower case are allowed. Lower case is recommended. Case matters
236 ("kind:" and "Kind: are different tagfields).
237
238 EXCEPTION: Universal Ctags allows users to use a numerical character
239 in the name other than its initial letter.
240
241 • The value may be empty. It cannot contain a <Tab>.
242
243 • When a value contains a \t, this stands for a <Tab>.
244
245 • When a value contains a \r, this stands for a <CR>.
246
247 • When a value contains a \n, this stands for a <NL>.
248
249 • When a value contains a \\, this stands for a single \ character.
250
251 Other use of the backslash character is reserved for future expan‐
252 sion. Warning: When a tagfield value holds an MS-DOS file name, the
253 backslashes must be doubled!
254
255 EXCEPTION: Universal Ctags introduces more conversion rules.
256
257 • When a value contains a \a, this stands for a <BEL> (0x07).
258
259 • When a value contains a \b, this stands for a <BS> (0x08).
260
261 • When a value contains a \v, this stands for a <VT> (0x0b).
262
263 • When a value contains a \f, this stands for a <FF> (0x0c).
264
265 • The characters in range 0x01 to 0x1F included, and 0x7F are con‐
266 verted to \x prefixed hexadecimal number if the characters are not
267 handled in the above "value" rules.
268
269 • The leading space (0x20) and ! [22m(0x21) in {tagname} are converted to
270 \x prefixed hexadecimal number (\x20 and \x21) if the tag is not a
271 pseudo-tag. As described later, a pseudo-tag starts with !. These
272 rules are for distinguishing pseudo-tags and non pseudo-tags (regu‐
273 lar tags) when tags lines in a tag file are sorted.
274
275 Proposed tagfield names:
276
277 ┌───────────┬────────────────────────────┐
278 │FIELD-NAME │ DESCRIPTION │
279 ├───────────┼────────────────────────────┤
280 │arity │ Number of arguments for a │
281 │ │ function tag. │
282 ├───────────┼────────────────────────────┤
283 │class │ Name of the class for │
284 │ │ which this tag is a member │
285 │ │ or method. │
286 ├───────────┼────────────────────────────┤
287 │enum │ Name of the enumeration in │
288 │ │ which this tag is an enu‐ │
289 │ │ merator. │
290 ├───────────┼────────────────────────────┤
291 │file │ Static (local) tag, with a │
292 │ │ scope of the specified │
293 │ │ file. When the value is │
294 │ │ empty, {tagfile} is used. │
295 ├───────────┼────────────────────────────┤
296 │function │ Function in which this tag │
297 │ │ is defined. Useful for │
298 │ │ local variables (and func‐ │
299 │ │ tions). When functions │
300 │ │ nest (e.g., in Pascal), │
301 │ │ the function names are │
302 │ │ concatenated, separated │
303 │ │ with '/', so it looks like │
304 │ │ a path. │
305 └───────────┴────────────────────────────┘
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339 │kind │ Kind of tag. The value │
340 │ │ depends on the language. │
341 │ │ For C and C++ these kinds │
342 │ │ are recommended: │
343 │ │ │
344 │ │ c class name │
345 │ │ │
346 │ │ d define (from │
347 │ │ #define XXX) │
348 │ │ │
349 │ │ e enumerator │
350 │ │ │
351 │ │ f function or │
352 │ │ method name │
353 │ │ │
354 │ │ F file name │
355 │ │ │
356 │ │ g enumeration │
357 │ │ name │
358 │ │ │
359 │ │ m member (of │
360 │ │ structure or │
361 │ │ class data) │
362 │ │ │
363 │ │ p function │
364 │ │ prototype │
365 │ │ │
366 │ │ s structure │
367 │ │ name │
368 │ │ │
369 │ │ t typedef │
370 │ │ │
371 │ │ u union name │
372 │ │ │
373 │ │ v variable │
374 │ │ │
375 │ │ When this field is │
376 │ │ omitted, the kind │
377 │ │ of tag is unde‐ │
378 │ │ fined. │
379 ├───────────┼────────────────────────────┤
380 │struct │ Name of the struct in │
381 │ │ which this tag is a mem‐ │
382 │ │ ber. │
383 ├───────────┼────────────────────────────┤
384 │union │ Name of the union in which │
385 │ │ this tag is a member. │
386 └───────────┴────────────────────────────┘
387
388 Note that these are mostly for C and C++. When tags programs are writ‐
389 ten for other languages, this list should be extended to include the
390 used field names. This will help users to be independent of the tags
391 program used.
392
393 Examples:
394
395 asdf sub.cc /^asdf()$/;" new_field:some\svalue file:
396 foo_t sub.h /^typedef foo_t$/;" kind:t
397 func3 sub.p /^func3()$/;" function:/func1/func2 file:
398 getflag sub.c /^getflag(arg)$/;" kind:f file:
399 inc sub.cc /^inc()$/;" file: class:PipeBuf
400
401 The name of the "kind:" field can be omitted. This is to reduce the
402 size of the tags file by about 15%. A program reading the tags file
403 can recognize the "kind:" field by the missing ':'. Examples:
404
405 foo_t sub.h /^typedef foo_t$/;" t
406 getflag sub.c /^getflag(arg)$/;" f file:
407
408 Additional remarks:
409
410 • When a tagfield appears twice in a tag line, only the last one is
411 used.
412
413 Note about line separators:
414
415 Vi traditionally runs on Unix systems, where the line separator is a
416 single linefeed character <NL>. On MS-DOS and compatible systems
417 <CR><NL> is the standard line separator. To increase portability, this
418 line separator is also supported.
419
420 On the Macintosh a single <CR> is used for line separator. Supporting
421 this on Unix systems causes problems, because most fgets() implementa‐
422 tion don't see the <CR> as a line separator. Therefore the support for
423 a <CR> as line separator is limited to the Macintosh.
424
425 Summary:
426
427 ┌───────────────┬──────────────┬─────────────────────┐
428 │line separator │ generated on │ accepted on │
429 ├───────────────┼──────────────┼─────────────────────┤
430 │<LF> │ Unix │ Unix, MS-DOS, Mac‐ │
431 │ │ │ intosh │
432 ├───────────────┼──────────────┼─────────────────────┤
433 │<CR> │ Macintosh │ Macintosh │
434 ├───────────────┼──────────────┼─────────────────────┤
435 │<CR><LF> │ MS-DOS │ Unix, MS-DOS, Mac‐ │
436 │ │ │ intosh │
437 └───────────────┴──────────────┴─────────────────────┘
438
439 The characters <CR> and <LF> cannot be used inside a tag line. This is
440 not mentioned elsewhere (because it's obvious).
441
442 Note about white space:
443
444 Vi allowed any white space to separate the tagname from the tagfile,
445 and the filename from the tagaddress. This would need to be allowed
446 for backwards compatibility. However, all known programs that generate
447 tags use a single <Tab> to separate fields.
448
449 There is a problem for using file names with embedded white space in
450 the tagfile field. To work around this, the same special characters
451 could be used as in the new fields, for example \s. But, unfortu‐
452 nately, in MS-DOS the backslash character is used to separate file
453 names. The file name c:\vim\sap contains \s, but this is not a
454 <Space>. The number of backslashes could be doubled, but that will add
455 a lot of characters, and make parsing the tags file slower and clumsy.
456
457 To avoid these problems, we will only allow a <Tab> to separate fields,
458 and not support a file name or tagname that contains a <Tab> character.
459 This means that we are not 100% Vi compatible. However, there is no
460 known tags program that uses something else than a <Tab> to separate
461 the fields. Only when a user typed the tags file himself, or made his
462 own program to generate a tags file, we could run into problems. To
463 solve this, the tags file should be filtered, to replace the arbitrary
464 white space with a single <Tab>. This Vi command can be used:
465
466 :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
467
468 (replace ^I with a real <Tab>).
469
470 TAG FILE INFORMATION:
471
472 Pseudo-tag lines can be used to encode information into the tag file
473 regarding details about its content (e.g. have the tags been sorted?,
474 are the optional tagfields present?), and regarding the program used to
475 generate the tag file. This information can be used both to optimize
476 use of the tag file (e.g. enable/disable binary searching) and provide
477 general information (what version of the generator was used).
478
479 The names of the tags used in these lines may be suitably chosen to en‐
480 sure that when sorted, they will always be located near the first lines
481 of the tag file. The use of "!_TAG_" is recommended. Note that a rare
482 tag like "!" can sort to before these lines. The program reading the
483 tags file should be smart enough to skip over these tags.
484
485 The lines described below have been chosen to convey a select set of
486 information.
487
488 Tag lines providing information about the content of the tag file:
489
490 !_TAG_FILE_FORMAT {version-number} /optional comment/
491 !_TAG_FILE_SORTED {0|1} /0=unsorted, 1=sorted/
492
493 The {version-number} used in the tag file format line reserves the
494 value of "1" for tag files complying with the original UNIX vi/ctags
495 format, and reserves the value "2" for tag files complying with this
496 proposal. This value may be used to determine if the extended features
497 described in this proposal are present.
498
499 Tag lines providing information about the program used to generate the
500 tag file, and provided solely for documentation purposes:
501
502 !_TAG_PROGRAM_AUTHOR {author-name} /{email-address}/
503 !_TAG_PROGRAM_NAME {program-name} /optional comment/
504 !_TAG_PROGRAM_URL {URL} /optional comment/
505 !_TAG_PROGRAM_VERSION {version-id} /optional comment/
506
507 EXCEPTION: Universal Ctags introduces more kinds of pseudo-tags. See
508 ctags-client-tools(7) about them.
509
510
511 ----
512
513
514
516 Universal Ctags supports this proposal with some exceptions.
517
518 Exceptions
519 1. {tagname} in tags file generated by Universal Ctags may contain spa‐
520 ces and several escape sequences. Parsers for documents like Tex and
521 reStructuredText, or liberal languages such as JavaScript need these
522 exceptions. See {tagname} of Proposal section for more detail about
523 the conversion.
524
525 2. "name" part of {tagfield} in a tag generated by Universal Ctags may
526 contain numeric characters, but the first character of the "name"
527 must be alphabetic.
528
529 Compatible output and weakness
530 Default behavior (--output-format=u-ctags option) has the exceptions.
531 In other hand, with --output-format=e-ctags option ctags has no excep‐
532 tion; Universal Ctags command may use the same file format as Exuberant
533 Ctags. However, --output-format=e-ctags throws away a tag entry which
534 name includes a space or a tab character. TAG_OUTPUT_MODE pseudo-tag
535 tells which format is used when ctags generating tags file.
536
538 ctags(1), ctags-client-tools(7), ctags-incompatibilities(7), read‐
539 tags(1)
540
541
542
543
5442+ TAGS(5)