1TR(1P) POSIX Programmer's Manual TR(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 tr — translate characters
13
15 tr [-c|-C] [-s] string1 string2
16
17 tr -s [-c|-C] string1
18
19 tr -d [-c|-C] string1
20
21 tr -ds [-c|-C] string1 string2
22
24 The tr utility shall copy the standard input to the standard output
25 with substitution or deletion of selected characters. The options spec‐
26 ified and the string1 and string2 operands shall control translations
27 that occur while copying characters and single-character collating ele‐
28 ments.
29
31 The tr utility shall conform to the Base Definitions volume of
32 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
33
34 The following options shall be supported:
35
36 -c Complement the set of values specified by string1. See the
37 EXTENDED DESCRIPTION section.
38
39 -C Complement the set of characters specified by string1. See
40 the EXTENDED DESCRIPTION section.
41
42 -d Delete all occurrences of input characters that are specified
43 by string1.
44
45 -s Replace instances of repeated characters with a single char‐
46 acter, as described in the EXTENDED DESCRIPTION section.
47
49 The following operands shall be supported:
50
51 string1, string2
52 Translation control strings. Each string shall represent a
53 set of characters to be converted into an array of characters
54 used for the translation. For a detailed description of how
55 the strings are interpreted, see the EXTENDED DESCRIPTION
56 section.
57
59 The standard input can be any type of file.
60
62 None.
63
65 The following environment variables shall affect the execution of tr:
66
67 LANG Provide a default value for the internationalization vari‐
68 ables that are unset or null. (See the Base Definitions vol‐
69 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
70 ables for the precedence of internationalization variables
71 used to determine the values of locale categories.)
72
73 LC_ALL If set to a non-empty string value, override the values of
74 all the other internationalization variables.
75
76 LC_COLLATE
77 Determine the locale for the behavior of range expressions
78 and equivalence classes.
79
80 LC_CTYPE Determine the locale for the interpretation of sequences of
81 bytes of text data as characters (for example, single-byte as
82 opposed to multi-byte characters in arguments) and the behav‐
83 ior of character classes.
84
85 LC_MESSAGES
86 Determine the locale that should be used to affect the format
87 and contents of diagnostic messages written to standard
88 error.
89
90 NLSPATH Determine the location of message catalogs for the processing
91 of LC_MESSAGES.
92
94 Default.
95
97 The tr output shall be identical to the input, with the exception of
98 the specified transformations.
99
101 The standard error shall be used only for diagnostic messages.
102
104 None.
105
107 The operands string1 and string2 (if specified) define two arrays of
108 characters. The constructs in the following list can be used to specify
109 characters or single-character collating elements. If any of the con‐
110 structs result in multi-character collating elements, tr shall exclude,
111 without a diagnostic, those multi-character elements from the resulting
112 array.
113
114 character Any character not described by one of the conventions below
115 shall represent itself.
116
117 \octal Octal sequences can be used to represent characters with spe‐
118 cific coded values. An octal sequence shall consist of a
119 <backslash> followed by the longest sequence of one, two, or
120 three-octal-digit characters (01234567). The sequence shall
121 cause the value whose encoding is represented by the one,
122 two, or three-digit octal integer to be placed into the
123 array. Multi-byte characters require multiple, concatenated
124 escape sequences of this type, including the leading <back‐
125 slash> for each byte.
126
127 \character
128 The <backslash>-escape sequences in the Base Definitions vol‐
129 ume of POSIX.1‐2017, Table 5-1, Escape Sequences and Associ‐
130 ated Actions ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v')
131 shall be supported. The results of using any other character,
132 other than an octal digit, following the <backslash> are
133 unspecified. Also, if there is no character following the
134 <backslash>, the results are unspecified.
135
136 c-c In the POSIX locale, this construct shall represent the range
137 of collating elements between the range endpoints (as long as
138 neither endpoint is an octal sequence of the form \octal),
139 inclusive, as defined by the collation sequence. The charac‐
140 ters or collating elements in the range shall be placed in
141 the array in ascending collation sequence. If the second end‐
142 point precedes the starting endpoint in the collation
143 sequence, it is unspecified whether the range of collating
144 elements is empty, or this construct is treated as invalid.
145 In locales other than the POSIX locale, this construct has
146 unspecified behavior.
147
148 If either or both of the range endpoints are octal sequences
149 of the form \octal, this shall represent the range of spe‐
150 cific coded values between the two range endpoints, inclu‐
151 sive.
152
153 [:class:] Represents all characters belonging to the defined character
154 class, as defined by the current setting of the LC_CTYPE
155 locale category. The following character class names shall be
156 accepted when specified in string1:
157
158 alnum blank digit lower punct upper
159 alpha cntrl graph print space xdigit
160
161 In addition, character class expressions of the form [:name:]
162 shall be recognized in those locales where the name keyword
163 has been given a charclass definition in the LC_CTYPE cate‐
164 gory.
165
166 When both the -d and -s options are specified, any of the
167 character class names shall be accepted in string2. Other‐
168 wise, only character class names lower or upper are valid in
169 string2 and then only if the corresponding character class
170 (upper and lower, respectively) is specified in the same rel‐
171 ative position in string1. Such a specification shall be
172 interpreted as a request for case conversion. When [:lower:]
173 appears in string1 and [:upper:] appears in string2, the
174 arrays shall contain the characters from the toupper mapping
175 in the LC_CTYPE category of the current locale. When
176 [:upper:] appears in string1 and [:lower:] appears in
177 string2, the arrays shall contain the characters from the
178 tolower mapping in the LC_CTYPE category of the current
179 locale. The first character from each mapping pair shall be
180 in the array for string1 and the second character from each
181 mapping pair shall be in the array for string2 in the same
182 relative position.
183
184 Except for case conversion, the characters specified by a
185 character class expression shall be placed in the array in an
186 unspecified order.
187
188 If the name specified for class does not define a valid char‐
189 acter class in the current locale, the behavior is undefined.
190
191 [=equiv=] Represents all characters or collating elements belonging to
192 the same equivalence class as equiv, as defined by the cur‐
193 rent setting of the LC_COLLATE locale category. An equiva‐
194 lence class expression shall be allowed only in string1, or
195 in string2 when it is being used by the combined -d and -s
196 options. The characters belonging to the equivalence class
197 shall be placed in the array in an unspecified order.
198
199 [x*n] Represents n repeated occurrences of the character x.
200 Because this expression is used to map multiple characters to
201 one, it is only valid when it occurs in string2. If n is
202 omitted or is zero, it shall be interpreted as large enough
203 to extend the string2-based sequence to the length of the
204 string1-based sequence. If n has a leading zero, it shall be
205 interpreted as an octal value. Otherwise, it shall be inter‐
206 preted as a decimal value.
207
208 When the -d option is not specified:
209
210 * If string2 is present, each input character found in the array
211 specified by string1 shall be replaced by the character in the same
212 relative position in the array specified by string2. If the array
213 specified by string2 is shorter that the one specified by string1,
214 or if a character occurs more than once in string1, the results are
215 unspecified.
216
217 * If the -C option is specified, the complements of the characters
218 specified by string1 (the set of all characters in the current
219 character set, as defined by the current setting of LC_CTYPE,
220 except for those actually specified in the string1 operand) shall
221 be placed in the array in ascending collation sequence, as defined
222 by the current setting of LC_COLLATE.
223
224 * If the -c option is specified, the complement of the values speci‐
225 fied by string1 shall be placed in the array in ascending order by
226 binary value.
227
228 * Because the order in which characters specified by character class
229 expressions or equivalence class expressions is undefined, such
230 expressions should only be used if the intent is to map several
231 characters into one. An exception is case conversion, as described
232 previously.
233
234 When the -d option is specified:
235
236 * Input characters found in the array specified by string1 shall be
237 deleted.
238
239 * When the -C option is specified with -d, all characters except
240 those specified by string1 shall be deleted. The contents of
241 string2 are ignored, unless the -s option is also specified.
242
243 * When the -c option is specified with -d, all values except those
244 specified by string1 shall be deleted. The contents of string2
245 shall be ignored, unless the -s option is also specified.
246
247 * The same string cannot be used for both the -d and the -s option;
248 when both options are specified, both string1 (used for deletion)
249 and string2 (used for squeezing) shall be required.
250
251 When the -s option is specified, after any deletions or translations
252 have taken place, repeated sequences of the same character shall be
253 replaced by one occurrence of the same character, if the character is
254 found in the array specified by the last operand. If the last operand
255 contains a character class, such as the following example:
256
257
258 tr -s '[:space:]'
259
260 the last operand's array shall contain all of the characters in that
261 character class. However, in a case conversion, as described previ‐
262 ously, such as:
263
264
265 tr -s '[:upper:]' '[:lower:]'
266
267 the last operand's array shall contain only those characters defined as
268 the second characters in each of the toupper or tolower character
269 pairs, as appropriate.
270
271 An empty string used for string1 or string2 produces undefined results.
272
274 The following exit values shall be returned:
275
276 0 All input was processed successfully.
277
278 >0 An error occurred.
279
281 Default.
282
283 The following sections are informative.
284
286 If necessary, string1 and string2 can be quoted to avoid pattern match‐
287 ing by the shell.
288
289 If an ordinary digit (representing itself) is to follow an octal
290 sequence, the octal sequence must use the full three digits to avoid
291 ambiguity.
292
293 When string2 is shorter than string1, a difference results between his‐
294 torical System V and BSD systems. A BSD system pads string2 with the
295 last character found in string2. Thus, it is possible to do the fol‐
296 lowing:
297
298
299 tr 0123456789 d
300
301 which would translate all digits to the letter 'd'. Since this area is
302 specifically unspecified in this volume of POSIX.1‐2017, both the BSD
303 and System V behaviors are allowed, but a conforming application cannot
304 rely on the BSD behavior. It would have to code the example in the fol‐
305 lowing way:
306
307
308 tr 0123456789 '[d*]'
309
310 It should be noted that, despite similarities in appearance, the string
311 operands used by tr are not regular expressions.
312
313 Unlike some historical implementations, this definition of the tr util‐
314 ity correctly processes NUL characters in its input stream. NUL charac‐
315 ters can be stripped by using:
316
317
318 tr -d '\000'
319
321 1. The following example creates a list of all words in file1 one per
322 line in file2, where a word is taken to be a maximal string of let‐
323 ters.
324
325
326 tr -cs "[:alpha:]" "[\n*]" <file1 >file2
327
328 2. The next example translates all lowercase characters in file1 to
329 uppercase and writes the results to standard output.
330
331
332 tr "[:lower:]" "[:upper:]" <file1
333
334 3. This example uses an equivalence class to identify accented vari‐
335 ants of the base character 'e' in file1, which are stripped of dia‐
336 critical marks and written to file2.
337
338
339 tr "[=e=]" "[e*]" <file1 >file2
340
342 In some early proposals, an explicit option -n was added to disable the
343 historical behavior of stripping NUL characters from the input. It was
344 considered that automatically stripping NUL characters from the input
345 was not correct functionality. However, the removal of -n in a later
346 proposal does not remove the requirement that tr correctly process NUL
347 characters in its input stream. NUL characters can be stripped by using
348 tr -d '\000'.
349
350 Historical implementations of tr differ widely in syntax and behavior.
351 For example, the BSD version has not needed the bracket characters for
352 the repetition sequence. The tr utility syntax is based more closely on
353 the System V and XPG3 model while attempting to accommodate historical
354 BSD implementations. In the case of the short string2 padding, the
355 decision was to unspecify the behavior and preserve System V and XPG3
356 scripts, which might find difficulty with the BSD method. The assump‐
357 tion was made that BSD users of tr have to make accommodations to meet
358 the syntax defined here. Since it is possible to use the repetition
359 sequence to duplicate the desired behavior, whereas there is no simple
360 way to achieve the System V method, this was the correct, if not desir‐
361 able, approach.
362
363 The use of octal values to specify control characters, while having
364 historical precedents, is not portable. The introduction of escape
365 sequences for control characters should provide the necessary portabil‐
366 ity. It is recognized that this may cause some historical scripts to
367 break.
368
369 An early proposal included support for multi-character collating ele‐
370 ments. It was pointed out that, while tr does employ some syntactical
371 elements from REs, the aim of tr is quite different; ranges, for exam‐
372 ple, do not have a similar meaning (``any of the chars in the range
373 matches'', versus ``translate each character in the range to the output
374 counterpart''). As a result, the previously included support for multi-
375 character collating elements has been removed. What remains are ranges
376 in current collation order (to support, for example, accented charac‐
377 ters), character classes, and equivalence classes.
378
379 In XPG3 the [:class:] and [=equiv=] conventions are shown with double
380 brackets, as in RE syntax. However, tr does not implement RE princi‐
381 ples; it just borrows part of the syntax. Consequently, [:class:] and
382 [=equiv=] should be regarded as syntactical elements on a par with
383 [x*n], which is not an RE bracket expression.
384
385 The standard developers will consider changes to tr that allow it to
386 translate characters between different character encodings, or they
387 will consider providing a new utility to accomplish this.
388
389 On historical System V systems, a range expression requires enclosing
390 square-brackets, such as:
391
392
393 tr '[a-z]' '[A-Z]'
394
395 However, BSD-based systems did not require the brackets, and this con‐
396 vention is used here to avoid breaking large numbers of BSD scripts:
397
398
399 tr a-z A-Z
400
401 The preceding System V script will continue to work because the brack‐
402 ets, treated as regular characters, are translated to themselves. How‐
403 ever, any System V script that relied on "a‐z" representing the three
404 characters 'a', '-', and 'z' have to be rewritten as "az-".
405
406 The ISO POSIX‐2:1993 standard had a -c option that behaved similarly to
407 the -C option, but did not supply functionality equivalent to the -c
408 option specified in POSIX.1‐2008.
409
410 The earlier version also said that octal sequences referred to collat‐
411 ing elements and could be placed adjacent to each other to specify
412 multi-byte characters. However, it was noted that this caused ambigui‐
413 ties because tr would not be able to tell whether adjacent octal
414 sequences were intending to specify multi-byte characters or multiple
415 single byte characters. POSIX.1‐2008 specifies that octal sequences
416 always refer to single byte binary values when used to specify an end‐
417 point of a range of collating elements.
418
419 Earlier versions of this standard allowed for implementations with
420 bytes other than eight bits, but this has been modified in this ver‐
421 sion.
422
424 None.
425
427 sed
428
429 The Base Definitions volume of POSIX.1‐2017, Table 5-1, Escape
430 Sequences and Associated Actions, Chapter 8, Environment Variables,
431 Section 12.2, Utility Syntax Guidelines
432
434 Portions of this text are reprinted and reproduced in electronic form
435 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
436 table Operating System Interface (POSIX), The Open Group Base Specifi‐
437 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
438 Electrical and Electronics Engineers, Inc and The Open Group. In the
439 event of any discrepancy between this version and the original IEEE and
440 The Open Group Standard, the original IEEE and The Open Group Standard
441 is the referee document. The original Standard can be obtained online
442 at http://www.opengroup.org/unix/online.html .
443
444 Any typographical or formatting errors that appear in this page are
445 most likely to have been introduced during the conversion of the source
446 files to man page format. To report such errors, see https://www.ker‐
447 nel.org/doc/man-pages/reporting_bugs.html .
448
449
450
451IEEE/The Open Group 2017 TR(1P)