1TR(1P) POSIX Programmer's Manual TR(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
11
13 tr — translate characters
14
16 tr [−c|−C] [−s] string1 string2
17
18 tr −s [−c|−C] string1
19
20 tr −d [−c|−C] string1
21
22 tr −ds [−c|−C] string1 string2
23
25 The tr utility shall copy the standard input to the standard output
26 with substitution or deletion of selected characters. The options spec‐
27 ified and the string1 and string2 operands shall control translations
28 that occur while copying characters and single-character collating ele‐
29 ments.
30
32 The tr utility shall conform to the Base Definitions volume of
33 POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
34
35 The following options shall be supported:
36
37 −c Complement the set of values specified by string1. See the
38 EXTENDED DESCRIPTION section.
39
40 −C Complement the set of characters specified by string1. See
41 the EXTENDED DESCRIPTION section.
42
43 −d Delete all occurrences of input characters that are specified
44 by string1.
45
46 −s Replace instances of repeated characters with a single char‐
47 acter, as described in the EXTENDED DESCRIPTION section.
48
50 The following operands shall be supported:
51
52 string1, string2
53 Translation control strings. Each string shall represent a
54 set of characters to be converted into an array of characters
55 used for the translation. For a detailed description of how
56 the strings are interpreted, see the EXTENDED DESCRIPTION
57 section.
58
60 The standard input can be any type of file.
61
63 None.
64
66 The following environment variables shall affect the execution of tr:
67
68 LANG Provide a default value for the internationalization vari‐
69 ables that are unset or null. (See the Base Definitions vol‐
70 ume of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
71 ables for the precedence of internationalization variables
72 used to determine the values of locale categories.)
73
74 LC_ALL If set to a non-empty string value, override the values of
75 all the other internationalization variables.
76
77 LC_COLLATE
78 Determine the locale for the behavior of range expressions
79 and equivalence classes.
80
81 LC_CTYPE Determine the locale for the interpretation of sequences of
82 bytes of text data as characters (for example, single-byte as
83 opposed to multi-byte characters in arguments) and the behav‐
84 ior of character classes.
85
86 LC_MESSAGES
87 Determine the locale that should be used to affect the format
88 and contents of diagnostic messages written to standard
89 error.
90
91 NLSPATH Determine the location of message catalogs for the processing
92 of LC_MESSAGES.
93
95 Default.
96
98 The tr output shall be identical to the input, with the exception of
99 the specified transformations.
100
102 The standard error shall be used only for diagnostic messages.
103
105 None.
106
108 The operands string1 and string2 (if specified) define two arrays of
109 characters. The constructs in the following list can be used to specify
110 characters or single-character collating elements. If any of the con‐
111 structs result in multi-character collating elements, tr shall exclude,
112 without a diagnostic, those multi-character elements from the resulting
113 array.
114
115 character Any character not described by one of the conventions below
116 shall represent itself.
117
118 \octal Octal sequences can be used to represent characters with spe‐
119 cific coded values. An octal sequence shall consist of a
120 <backslash> followed by the longest sequence of one, two, or
121 three-octal-digit characters (01234567). The sequence shall
122 cause the value whose encoding is represented by the one,
123 two, or three-digit octal integer to be placed into the
124 array. Multi-byte characters require multiple, concatenated
125 escape sequences of this type, including the leading <back‐
126 slash> for each byte.
127
128 \character
129 The <backslash>-escape sequences in the Base Definitions vol‐
130 ume of POSIX.1‐2008, Table 5-1, Escape Sequences and Associ‐
131 ated Actions ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v')
132 shall be supported. The results of using any other character,
133 other than an octal digit, following the <backslash> are
134 unspecified. Also, if there is no character following the
135 <backslash>, the results are unspecified.
136
137 c−c In the POSIX locale, this construct shall represent the range
138 of collating elements between the range endpoints (as long as
139 neither endpoint is an octal sequence of the form \octal),
140 inclusive, as defined by the collation sequence. The charac‐
141 ters or collating elements in the range shall be placed in
142 the array in ascending collation sequence. If the second end‐
143 point precedes the starting endpoint in the collation
144 sequence, it is unspecified whether the range of collating
145 elements is empty, or this construct is treated as invalid.
146 In locales other than the POSIX locale, this construct has
147 unspecified behavior.
148
149 If either or both of the range endpoints are octal sequences
150 of the form \octal, this shall represent the range of spe‐
151 cific coded values between the two range endpoints, inclu‐
152 sive.
153
154 [:class:] Represents all characters belonging to the defined character
155 class, as defined by the current setting of the LC_CTYPE
156 locale category. The following character class names shall be
157 accepted when specified in string1:
158
159 alnum blank digit lower punct upper
160 alpha cntrl graph print space xdigit
161
162 In addition, character class expressions of the form [:name:]
163 shall be recognized in those locales where the name keyword
164 has been given a charclass definition in the LC_CTYPE cate‐
165 gory.
166
167 When both the −d and −s options are specified, any of the
168 character class names shall be accepted in string2. Other‐
169 wise, only character class names lower or upper are valid in
170 string2 and then only if the corresponding character class
171 (upper and lower, respectively) is specified in the same rel‐
172 ative position in string1. Such a specification shall be
173 interpreted as a request for case conversion. When [:lower:]
174 appears in string1 and [:upper:] appears in string2, the
175 arrays shall contain the characters from the toupper mapping
176 in the LC_CTYPE category of the current locale. When
177 [:upper:] appears in string1 and [:lower:] appears in
178 string2, the arrays shall contain the characters from the
179 tolower mapping in the LC_CTYPE category of the current
180 locale. The first character from each mapping pair shall be
181 in the array for string1 and the second character from each
182 mapping pair shall be in the array for string2 in the same
183 relative position.
184
185 Except for case conversion, the characters specified by a
186 character class expression shall be placed in the array in an
187 unspecified order.
188
189 If the name specified for class does not define a valid char‐
190 acter class in the current locale, the behavior is undefined.
191
192 [=equiv=] Represents all characters or collating elements belonging to
193 the same equivalence class as equiv, as defined by the cur‐
194 rent setting of the LC_COLLATE locale category. An equiva‐
195 lence class expression shall be allowed only in string1, or
196 in string2 when it is being used by the combined −d and −s
197 options. The characters belonging to the equivalence class
198 shall be placed in the array in an unspecified order.
199
200 [x*n] Represents n repeated occurrences of the character x.
201 Because this expression is used to map multiple characters to
202 one, it is only valid when it occurs in string2. If n is
203 omitted or is zero, it shall be interpreted as large enough
204 to extend the string2-based sequence to the length of the
205 string1-based sequence. If n has a leading zero, it shall be
206 interpreted as an octal value. Otherwise, it shall be inter‐
207 preted as a decimal value.
208
209 When the −d option is not specified:
210
211 * If string2 is present, each input character found in the array
212 specified by string1 shall be replaced by the character in the same
213 relative position in the array specified by string2. If the array
214 specified by string2 is shorter that the one specified by string1,
215 or if a character occurs more than once in string1, the results are
216 unspecified.
217
218 * If the −C option is specified, the complements of the characters
219 specified by string1 (the set of all characters in the current
220 character set, as defined by the current setting of LC_CTYPE,
221 except for those actually specified in the string1 operand) shall
222 be placed in the array in ascending collation sequence, as defined
223 by the current setting of LC_COLLATE.
224
225 * If the −c option is specified, the complement of the values speci‐
226 fied by string1 shall be placed in the array in ascending order by
227 binary value.
228
229 * Because the order in which characters specified by character class
230 expressions or equivalence class expressions is undefined, such
231 expressions should only be used if the intent is to map several
232 characters into one. An exception is case conversion, as described
233 previously.
234
235 When the −d option is specified:
236
237 * Input characters found in the array specified by string1 shall be
238 deleted.
239
240 * When the −C option is specified with −d, all characters except
241 those specified by string1 shall be deleted. The contents of
242 string2 are ignored, unless the −s option is also specified.
243
244 * When the −c option is specified with −d, all values except those
245 specified by string1 shall be deleted. The contents of string2
246 shall be ignored, unless the −s option is also specified.
247
248 * The same string cannot be used for both the −d and the −s option;
249 when both options are specified, both string1 (used for deletion)
250 and string2 (used for squeezing) shall be required.
251
252 When the −s option is specified, after any deletions or translations
253 have taken place, repeated sequences of the same character shall be
254 replaced by one occurrence of the same character, if the character is
255 found in the array specified by the last operand. If the last operand
256 contains a character class, such as the following example:
257
258 tr −s '[:space:]'
259
260 the last operand's array shall contain all of the characters in that
261 character class. However, in a case conversion, as described previ‐
262 ously, such as:
263
264 tr −s '[:upper:]' '[:lower:]'
265
266 the last operand's array shall contain only those characters defined as
267 the second characters in each of the toupper or tolower character
268 pairs, as appropriate.
269
270 An empty string used for string1 or string2 produces undefined results.
271
273 The following exit values shall be returned:
274
275 0 All input was processed successfully.
276
277 >0 An error occurred.
278
280 Default.
281
282 The following sections are informative.
283
285 If necessary, string1 and string2 can be quoted to avoid pattern match‐
286 ing by the shell.
287
288 If an ordinary digit (representing itself) is to follow an octal
289 sequence, the octal sequence must use the full three digits to avoid
290 ambiguity.
291
292 When string2 is shorter than string1, a difference results between his‐
293 torical System V and BSD systems. A BSD system pads string2 with the
294 last character found in string2. Thus, it is possible to do the fol‐
295 lowing:
296
297 tr 0123456789 d
298
299 which would translate all digits to the letter 'd'. Since this area is
300 specifically unspecified in this volume of POSIX.1‐2008, both the BSD
301 and System V behaviors are allowed, but a conforming application cannot
302 rely on the BSD behavior. It would have to code the example in the fol‐
303 lowing way:
304
305 tr 0123456789 '[d*]'
306
307 It should be noted that, despite similarities in appearance, the string
308 operands used by tr are not regular expressions.
309
310 Unlike some historical implementations, this definition of the tr util‐
311 ity correctly processes NUL characters in its input stream. NUL charac‐
312 ters can be stripped by using:
313
314 tr −d '\000'
315
317 1. The following example creates a list of all words in file1 one per
318 line in file2, where a word is taken to be a maximal string of let‐
319 ters.
320
321 tr −cs "[:alpha:]" "[\n*]" <file1 >file2
322
323 2. The next example translates all lowercase characters in file1 to
324 uppercase and writes the results to standard output.
325
326 tr "[:lower:]" "[:upper:]" <file1
327
328 3. This example uses an equivalence class to identify accented vari‐
329 ants of the base character 'e' in file1, which are stripped of dia‐
330 critical marks and written to file2.
331
332 tr "[=e=]" "[e*]" <file1 >file2
333
335 In some early proposals, an explicit option −n was added to disable the
336 historical behavior of stripping NUL characters from the input. It was
337 considered that automatically stripping NUL characters from the input
338 was not correct functionality. However, the removal of −n in a later
339 proposal does not remove the requirement that tr correctly process NUL
340 characters in its input stream. NUL characters can be stripped by using
341 tr −d '\000'.
342
343 Historical implementations of tr differ widely in syntax and behavior.
344 For example, the BSD version has not needed the bracket characters for
345 the repetition sequence. The tr utility syntax is based more closely on
346 the System V and XPG3 model while attempting to accommodate historical
347 BSD implementations. In the case of the short string2 padding, the
348 decision was to unspecify the behavior and preserve System V and XPG3
349 scripts, which might find difficulty with the BSD method. The assump‐
350 tion was made that BSD users of tr have to make accommodations to meet
351 the syntax defined here. Since it is possible to use the repetition
352 sequence to duplicate the desired behavior, whereas there is no simple
353 way to achieve the System V method, this was the correct, if not desir‐
354 able, approach.
355
356 The use of octal values to specify control characters, while having
357 historical precedents, is not portable. The introduction of escape
358 sequences for control characters should provide the necessary portabil‐
359 ity. It is recognized that this may cause some historical scripts to
360 break.
361
362 An early proposal included support for multi-character collating ele‐
363 ments. It was pointed out that, while tr does employ some syntactical
364 elements from REs, the aim of tr is quite different; ranges, for exam‐
365 ple, do not have a similar meaning (``any of the chars in the range
366 matches'', versus ``translate each character in the range to the output
367 counterpart''). As a result, the previously included support for multi-
368 character collating elements has been removed. What remains are ranges
369 in current collation order (to support, for example, accented charac‐
370 ters), character classes, and equivalence classes.
371
372 In XPG3 the [:class:] and [=equiv=] conventions are shown with double
373 brackets, as in RE syntax. However, tr does not implement RE princi‐
374 ples; it just borrows part of the syntax. Consequently, [:class:] and
375 [=equiv=] should be regarded as syntactical elements on a par with
376 [x*n], which is not an RE bracket expression.
377
378 The standard developers will consider changes to tr that allow it to
379 translate characters between different character encodings, or they
380 will consider providing a new utility to accomplish this.
381
382 On historical System V systems, a range expression requires enclosing
383 square-brackets, such as:
384
385 tr '[a-z]' '[A-Z]'
386
387 However, BSD-based systems did not require the brackets, and this con‐
388 vention is used here to avoid breaking large numbers of BSD scripts:
389
390 tr a-z A-Z
391
392 The preceding System V script will continue to work because the brack‐
393 ets, treated as regular characters, are translated to themselves. How‐
394 ever, any System V script that relied on "a‐z" representing the three
395 characters 'a', '−', and 'z' have to be rewritten as "az−".
396
397 The ISO POSIX‐2:1993 standard had a −c option that behaved similarly to
398 the −C option, but did not supply functionality equivalent to the −c
399 option specified in POSIX.1‐2008. This meant that historical practice
400 of being able to specify tr −cd\000−\177 (which would delete all bytes
401 with the top bit set) would have no effect because, in the C locale,
402 bytes with the values octal 200 to octal 377 are not characters.
403
404 The earlier version also said that octal sequences referred to collat‐
405 ing elements and could be placed adjacent to each other to specify
406 multi-byte characters. However, it was noted that this caused ambigui‐
407 ties because tr would not be able to tell whether adjacent octal
408 sequences were intending to specify multi-byte characters or multiple
409 single byte characters. POSIX.1‐2008 specifies that octal sequences
410 always refer to single byte binary values when used to specify an end‐
411 point of a range of collating elements.
412
413 Earlier versions of this standard allowed for implementations with
414 bytes other than eight bits, but this has been modified in this ver‐
415 sion.
416
418 None.
419
421 sed
422
423 The Base Definitions volume of POSIX.1‐2008, Table 5-1, Escape
424 Sequences and Associated Actions, Chapter 8, Environment Variables,
425 Section 12.2, Utility Syntax Guidelines
426
428 Portions of this text are reprinted and reproduced in electronic form
429 from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
430 -- Portable Operating System Interface (POSIX), The Open Group Base
431 Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
432 cal and Electronics Engineers, Inc and The Open Group. (This is
433 POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
434 event of any discrepancy between this version and the original IEEE and
435 The Open Group Standard, the original IEEE and The Open Group Standard
436 is the referee document. The original Standard can be obtained online
437 at http://www.unix.org/online.html .
438
439 Any typographical or formatting errors that appear in this page are
440 most likely to have been introduced during the conversion of the source
441 files to man page format. To report such errors, see https://www.ker‐
442 nel.org/doc/man-pages/reporting_bugs.html .
443
444
445
446IEEE/The Open Group 2013 TR(1P)