1TAGS(5) Universal Ctags TAGS(5)
2
3
4
6 tags - Vi tags file format extended in ctags projects
7
9 The contents of next section is a copy of FORMAT file in Exuberant
10 Ctags source code in its subversion repository at sourceforge.net.
11
12 Exceptions introduced in Universal Ctags are explained inline with "EX‐
13 CEPTION" marker. Statements that are made further clear in Universal
14 Ctags are explained inline with "COMMENT" marker.
15
16
17 ----
18
19
20
22 Version: 0.06 DRAFT
23 Date: 1998 Feb 8
24 Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>
25
26
27 Introduction
28 The file format for the "tags" file, as used by Vi and many of its de‐
29 scendants, has limited capabilities.
30
31 This additional functionality is desired:
32
33 1. Static or local tags. The scope of these tags is the file where
34 they are defined. The same tag can appear in several files, without
35 really being a duplicate.
36
37 2. Duplicate tags. Allow the same tag to occur more then once. They
38 can be located in a different file and/or have a different command.
39
40 3. Support for C++. A tag is not only specified by its name, but also
41 by the context (the class name).
42
43 4. Future extension. When even more additional functionality is de‐
44 sired, it must be possible to add this later, without breaking pro‐
45 grams that don't support it.
46
47 From proposal to standard
48 To make this proposal into a standard for tags files, it needs to be
49 supported by most people working on versions of Vi, ctags, etc.. Cur‐
50 rently this standard is supported by:
51
52 Darren Hiebert <dhiebert at users.sourceforge.net>
53 Exuberant Ctags
54
55 Bram Moolenaar <Bram at vim.org>
56 Vim (Vi IMproved)
57
58 These have been or will be asked to support this standard:
59
60 Nvi Keith Bostic <bostic at bsdi.com>
61
62 Vile Tom E. Dickey <dickey at clark.net>
63
64 NEdit Mark Edel <edel at ltx.com>
65
66 CRiSP Paul Fox <fox at crisp.demon.co.uk>
67
68 Lemmy James Iuliano <jai at accessone.com>
69
70 Zeus Jussi Jumppanen <jussij at ca.com.au>
71
72 Elvis Steve Kirkendall <kirkenda at cs.pdx.edu>
73
74 FTE Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
75
76 Backwards compatibility
77 A tags file that is generated in the new format should still be usable
78 by Vi. This makes it possible to distribute tags files that are usable
79 by all versions and descendants of Vi.
80
81 This restricts the format to what Vi can handle. The format is:
82
83 1. The tags file is a list of lines, each line in the format:
84
85 {tagname}<Tab>{tagfile}<Tab>{tagaddress}
86
87 {tagname}
88 Any identifier, not containing white space..
89
90 EXCEPTION: Universal Ctags violates this item of the pro‐
91 posal; tagname may contain spaces. However, tabs are not al‐
92 lowed.
93
94 <Tab> Exactly one TAB character (although many versions of Vi can
95 handle any amount of white space).
96
97 {tagfile}
98 The name of the file where {tagname} is defined, relative to
99 the current directory (or location of the tags file?).
100
101 {tagaddress}
102 Any Ex command. When executed, it behaves like 'magic' was
103 not set.
104
105 2. The tags file is sorted on {tagname}. This allows for a binary
106 search in the file.
107
108 3. Duplicate tags are allowed, but which one is actually used is unpre‐
109 dictable (because of the binary search).
110
111 The best way to add extra text to the line for the new functionality,
112 without breaking it for Vi, is to put a comment in the {tagaddress}.
113 This gives the freedom to use any text, and should work in any tradi‐
114 tional Vi implementation.
115
116 For example, when the old tags file contains:
117
118 main main.c /^main(argc, argv)$/
119 DEBUG defines.c 89
120
121 The new lines can be:
122
123 main main.c /^main(argc, argv)$/;"any additional text
124 DEBUG defines.c 89;"any additional text
125
126 Note that the ';' is required to put the cursor in the right line, and
127 then the '"' is recognized as the start of a comment.
128
129 For Posix compliant Vi versions this will NOT work, since only a line
130 number or a search command is recognized. I hope Posix can be ad‐
131 justed. Nvi suffers from this.
132
133 Security
134 Vi allows the use of any Ex command in a tags file. This has the po‐
135 tential of a trojan horse security leak.
136
137 The proposal is to allow only Ex commands that position the cursor in a
138 single file. Other commands, like editing another file, quitting the
139 editor, changing a file or writing a file, are not allowed. It is
140 therefore logical to call the command a tagaddress.
141
142 Specifically, these two Ex commands are allowed:
143
144 • A decimal line number:
145
146 89
147
148 • A search command. It is a regular expression pattern, as used by Vi,
149 enclosed in // or ??:
150
151 /^int c;$/
152 ?main()?
153
154 There are two combinations possible:
155
156 • Concatenation of the above, with ';' in between. The meaning is that
157 the first line number or search command is used, the cursor is posi‐
158 tioned in that line, and then the second search command is used (a
159 line number would not be useful). This can be done multiple times.
160 This is useful when the information in a single line is not unique,
161 and the search needs to start in a specified line.
162
163 /struct xyz {/;/int count;/
164 389;/struct foo/;/char *s;/
165
166 • A trailing comment can be added, starting with ';"' (two characters:
167 semi-colon and double-quote). This is used below.
168
169 89;" foo bar
170
171 This might be extended in the future. What is currently missing is a
172 way to position the cursor in a certain column.
173
174 Goals
175 Now the usage of the comment text has to be defined. The following is
176 aimed at:
177
178 1. Keep the text short, because:
179
180 • The line length that Vi can handle is limited to 512 characters.
181
182 • Tags files can contain thousands of tags. I have seen tags files
183 of several Mbytes.
184
185 • More text makes searching slower.
186
187 2. Keep the text readable, because:
188
189 • It is often necessary to check the output of a new ctags program.
190
191 • Be able to edit the file by hand.
192
193 • Make it easier to write a program to produce or parse the file.
194
195 3. Don't use special characters, because:
196
197 • It should be possible to treat a tags file like any normal text
198 file.
199
200 Proposal
201 Use a comment after the {tagaddress} field. The format would be:
202
203 {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
204
205 {tagname}
206 Any identifier, not containing white space..
207
208 EXCEPTION: Universal Ctags violates this item of the proposal;
209 name may contain spaces. However, tabs are not allowed. Conver‐
210 sion, for some characters including <Tab> in the "value", ex‐
211 plained in the last of this section is applied.
212
213 <Tab> Exactly one TAB character (although many versions of Vi can han‐
214 dle any amount of white space).
215
216 {tagfile}
217 The name of the file where {tagname} is defined, relative to the
218 current directory (or location of the tags file?).
219
220 {tagaddress}
221 Any Ex command. When executed, it behaves like 'magic' was not
222 set. It may be restricted to a line number or a search pattern
223 (Posix).
224
225 COMMENT: {tagaddress} could contain tab characters. See
226 ctags-client-tools(7) to know how to programmatically extract
227 {tagaddress} (called "pattern field" there) and parse it.
228
229 Optionally:
230
231 ;" semicolon + doublequote: Ends the tagaddress in way that looks
232 like the start of a comment to Vi.
233
234 {tagfield}
235 See below.
236
237 A tagfield has a name, a colon, and a value: "name:value".
238
239 • The name consist only out of alphabetical characters. Upper and
240 lower case are allowed. Lower case is recommended. Case matters
241 ("kind:" and "Kind: are different tagfields).
242
243 EXCEPTION: Universal Ctags allows users to use a numerical character
244 in the name other than its initial letter.
245
246 • The value may be empty. It cannot contain a <Tab>.
247
248 • When a value contains a \t, this stands for a <Tab>.
249
250 • When a value contains a \r, this stands for a <CR>.
251
252 • When a value contains a \n, this stands for a <NL>.
253
254 • When a value contains a \\, this stands for a single \ character.
255
256 Other use of the backslash character is reserved for future expan‐
257 sion. Warning: When a tagfield value holds an MS-DOS file name, the
258 backslashes must be doubled!
259
260 EXCEPTION: Universal Ctags introduces more conversion rules.
261
262 • When a value contains a \a, this stands for a <BEL> (0x07).
263
264 • When a value contains a \b, this stands for a <BS> (0x08).
265
266 • When a value contains a \v, this stands for a <VT> (0x0b).
267
268 • When a value contains a \f, this stands for a <FF> (0x0c).
269
270 • The characters in range 0x01 to 0x1F included, and 0x7F are con‐
271 verted to \x prefixed hexadecimal number if the characters are not
272 handled in the above "value" rules.
273
274 EXCEPTION: Universal Ctags allows all these escape sequences in {tag‐
275 name} also.
276
277 • The leading space (0x20) and ! [22m(0x21) in {tagname} are converted to
278 \x prefixed hexadecimal number (\x20 and \x21) if the tag is not a
279 pseudo-tag. As described later, a pseudo-tag starts with !. These
280 rules are for distinguishing pseudo-tags and non pseudo-tags (regu‐
281 lar tags) when tags lines in a tag file are sorted.
282
283 Proposed tagfield names:
284
285 ┌───────────┬────────────────────────────┐
286 │FIELD-NAME │ DESCRIPTION │
287 ├───────────┼────────────────────────────┤
288 │arity │ Number of arguments for a │
289 │ │ function tag. │
290 ├───────────┼────────────────────────────┤
291 │class │ Name of the class for │
292 │ │ which this tag is a member │
293 │ │ or method. │
294 ├───────────┼────────────────────────────┤
295 │enum │ Name of the enumeration in │
296 │ │ which this tag is an enu‐ │
297 │ │ merator. │
298 ├───────────┼────────────────────────────┤
299 │file │ Static (local) tag, with a │
300 │ │ scope of the specified │
301 │ │ file. When the value is │
302 │ │ empty, {tagfile} is used. │
303 ├───────────┼────────────────────────────┤
304 │function │ Function in which this tag │
305 │ │ is defined. Useful for │
306 │ │ local variables (and func‐ │
307 │ │ tions). When functions │
308 │ │ nest (e.g., in Pascal), │
309 │ │ the function names are │
310 │ │ concatenated, separated │
311 │ │ with '/', so it looks like │
312 │ │ a path. │
313 └───────────┴────────────────────────────┘
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340 │kind │ Kind of tag. The value │
341 │ │ depends on the language. │
342 │ │ For C and C++ these kinds │
343 │ │ are recommended: │
344 │ │ │
345 │ │ c class name │
346 │ │ │
347 │ │ d define (from │
348 │ │ #define XXX) │
349 │ │ │
350 │ │ e enumerator │
351 │ │ │
352 │ │ f function or │
353 │ │ method name │
354 │ │ │
355 │ │ F file name │
356 │ │ │
357 │ │ g enumeration │
358 │ │ name │
359 │ │ │
360 │ │ m member (of │
361 │ │ structure or │
362 │ │ class data) │
363 │ │ │
364 │ │ p function │
365 │ │ prototype │
366 │ │ │
367 │ │ s structure │
368 │ │ name │
369 │ │ │
370 │ │ t typedef │
371 │ │ │
372 │ │ u union name │
373 │ │ │
374 │ │ v variable │
375 │ │ │
376 │ │ When this field is │
377 │ │ omitted, the kind │
378 │ │ of tag is unde‐ │
379 │ │ fined. │
380 ├───────────┼────────────────────────────┤
381 │struct │ Name of the struct in │
382 │ │ which this tag is a mem‐ │
383 │ │ ber. │
384 ├───────────┼────────────────────────────┤
385 │union │ Name of the union in which │
386 │ │ this tag is a member. │
387 └───────────┴────────────────────────────┘
388
389 Note that these are mostly for C and C++. When tags programs are writ‐
390 ten for other languages, this list should be extended to include the
391 used field names. This will help users to be independent of the tags
392 program used.
393
394 Examples:
395
396 asdf sub.cc /^asdf()$/;" new_field:some\svalue file:
397 foo_t sub.h /^typedef foo_t$/;" kind:t
398 func3 sub.p /^func3()$/;" function:/func1/func2 file:
399 getflag sub.c /^getflag(arg)$/;" kind:f file:
400 inc sub.cc /^inc()$/;" file: class:PipeBuf
401
402 The name of the "kind:" field can be omitted. This is to reduce the
403 size of the tags file by about 15%. A program reading the tags file
404 can recognize the "kind:" field by the missing ':'. Examples:
405
406 foo_t sub.h /^typedef foo_t$/;" t
407 getflag sub.c /^getflag(arg)$/;" f file:
408
409 Additional remarks:
410
411 • When a tagfield appears twice in a tag line, only the last one is
412 used.
413
414 Note about line separators:
415
416 Vi traditionally runs on Unix systems, where the line separator is a
417 single linefeed character <NL>. On MS-DOS and compatible systems
418 <CR><NL> is the standard line separator. To increase portability, this
419 line separator is also supported.
420
421 On the Macintosh a single <CR> is used for line separator. Supporting
422 this on Unix systems causes problems, because most fgets() implementa‐
423 tion don't see the <CR> as a line separator. Therefore the support for
424 a <CR> as line separator is limited to the Macintosh.
425
426 Summary:
427
428 ┌───────────────┬──────────────┬─────────────────────┐
429 │line separator │ generated on │ accepted on │
430 ├───────────────┼──────────────┼─────────────────────┤
431 │<LF> │ Unix │ Unix, MS-DOS, Mac‐ │
432 │ │ │ intosh │
433 ├───────────────┼──────────────┼─────────────────────┤
434 │<CR> │ Macintosh │ Macintosh │
435 ├───────────────┼──────────────┼─────────────────────┤
436 │<CR><LF> │ MS-DOS │ Unix, MS-DOS, Mac‐ │
437 │ │ │ intosh │
438 └───────────────┴──────────────┴─────────────────────┘
439
440 The characters <CR> and <LF> cannot be used inside a tag line. This is
441 not mentioned elsewhere (because it's obvious).
442
443 Note about white space:
444
445 Vi allowed any white space to separate the tagname from the tagfile,
446 and the filename from the tagaddress. This would need to be allowed
447 for backwards compatibility. However, all known programs that generate
448 tags use a single <Tab> to separate fields.
449
450 There is a problem for using file names with embedded white space in
451 the tagfile field. To work around this, the same special characters
452 could be used as in the new fields, for example \s. But, unfortu‐
453 nately, in MS-DOS the backslash character is used to separate file
454 names. The file name c:\vim\sap contains \s, but this is not a
455 <Space>. The number of backslashes could be doubled, but that will add
456 a lot of characters, and make parsing the tags file slower and clumsy.
457
458 To avoid these problems, we will only allow a <Tab> to separate fields,
459 and not support a file name or tagname that contains a <Tab> character.
460 This means that we are not 100% Vi compatible. However, there is no
461 known tags program that uses something else than a <Tab> to separate
462 the fields. Only when a user typed the tags file himself, or made his
463 own program to generate a tags file, we could run into problems. To
464 solve this, the tags file should be filtered, to replace the arbitrary
465 white space with a single <Tab>. This Vi command can be used:
466
467 :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
468
469 (replace ^I with a real <Tab>).
470
471 TAG FILE INFORMATION:
472
473 Pseudo-tag lines can be used to encode information into the tag file
474 regarding details about its content (e.g. have the tags been sorted?,
475 are the optional tagfields present?), and regarding the program used to
476 generate the tag file. This information can be used both to optimize
477 use of the tag file (e.g. enable/disable binary searching) and provide
478 general information (what version of the generator was used).
479
480 The names of the tags used in these lines may be suitably chosen to en‐
481 sure that when sorted, they will always be located near the first lines
482 of the tag file. The use of "!_TAG_" is recommended. Note that a rare
483 tag like "!" can sort to before these lines. The program reading the
484 tags file should be smart enough to skip over these tags.
485
486 The lines described below have been chosen to convey a select set of
487 information.
488
489 Tag lines providing information about the content of the tag file:
490
491 !_TAG_FILE_FORMAT {version-number} /optional comment/
492 !_TAG_FILE_SORTED {0|1} /0=unsorted, 1=sorted/
493
494 The {version-number} used in the tag file format line reserves the
495 value of "1" for tag files complying with the original UNIX vi/ctags
496 format, and reserves the value "2" for tag files complying with this
497 proposal. This value may be used to determine if the extended features
498 described in this proposal are present.
499
500 Tag lines providing information about the program used to generate the
501 tag file, and provided solely for documentation purposes:
502
503 !_TAG_PROGRAM_AUTHOR {author-name} /{email-address}/
504 !_TAG_PROGRAM_NAME {program-name} /optional comment/
505 !_TAG_PROGRAM_URL {URL} /optional comment/
506 !_TAG_PROGRAM_VERSION {version-id} /optional comment/
507
508 EXCEPTION: Universal Ctags introduces more kinds of pseudo-tags. See
509 ctags-client-tools(7) about them.
510
511 COMMENT: Though pseudo-tags are semantically different from regular
512 tags, They use the same format, which is:
513
514 {tagname}<Tab>{tagfile}<Tab>{tagaddress}
515
516 , and the escape sequences and illegal characters explained in "Pro‐
517 posal" section also applies to pseudo-tags.
518
519
520 ----
521
522
523
525 Universal Ctags supports this proposal with some exceptions.
526
527 Exceptions
528 1. {tagname} in tags file generated by Universal Ctags may contain spa‐
529 ces and several escape sequences. Parsers for documents like Tex and
530 reStructuredText, or liberal languages such as JavaScript need these
531 exceptions. See {tagname} of Proposal section for more detail about
532 the conversion.
533
534 2. "name" part of {tagfield} in a tag generated by Universal Ctags may
535 contain numeric characters, but the first character of the "name"
536 must be alphabetic.
537
538 Compatible output and weakness
539 Default behavior (--output-format=u-ctags option) has the exceptions.
540 In other hand, with --output-format=e-ctags option ctags has no excep‐
541 tion; Universal Ctags command may use the same file format as Exuberant
542 Ctags. However, --output-format=e-ctags throws away a tag entry which
543 name includes a space or a tab character. TAG_OUTPUT_MODE pseudo-tag
544 tells which format is used when ctags generating tags file.
545
547 ctags(1), ctags-client-tools(7), ctags-incompatibilities(7), read‐
548 tags(1)
549
550
551
552
5532+ TAGS(5)