tr(1p) - c8

1TR(1P)                     POSIX Programmer's Manual                    TR(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10
11

NAME

13       tr — translate characters
14

SYNOPSIS

16       tr [−c|−C] [−s] string1 string2
17
18       tr −s [−c|−C] string1
19
20       tr −d [−c|−C] string1
21
22       tr −ds [−c|−C] string1 string2
23

DESCRIPTION

25       The tr utility shall copy the standard input  to  the  standard  output
26       with substitution or deletion of selected characters. The options spec‐
27       ified and the string1 and string2 operands shall  control  translations
28       that occur while copying characters and single-character collating ele‐
29       ments.
30

OPTIONS

32       The tr  utility  shall  conform  to  the  Base  Definitions  volume  of
33       POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
34
35       The following options shall be supported:
36
37       −c        Complement  the  set of values specified by string1.  See the
38                 EXTENDED DESCRIPTION section.
39
40       −C        Complement the set of characters specified by  string1.   See
41                 the EXTENDED DESCRIPTION section.
42
43       −d        Delete all occurrences of input characters that are specified
44                 by string1.
45
46       −s        Replace instances of repeated characters with a single  char‐
47                 acter, as described in the EXTENDED DESCRIPTION section.
48

OPERANDS

50       The following operands shall be supported:
51
52       string1, string2
53                 Translation  control  strings.  Each string shall represent a
54                 set of characters to be converted into an array of characters
55                 used  for  the translation. For a detailed description of how
56                 the strings are interpreted,  see  the  EXTENDED  DESCRIPTION
57                 section.
58

STDIN

60       The standard input can be any type of file.
61

INPUT FILES

63       None.
64

ENVIRONMENT VARIABLES

66       The following environment variables shall affect the execution of tr:
67
68       LANG      Provide  a  default  value for the internationalization vari‐
69                 ables that are unset or null. (See the Base Definitions  vol‐
70                 ume  of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
71                 ables for the precedence  of  internationalization  variables
72                 used to determine the values of locale categories.)
73
74       LC_ALL    If  set  to  a non-empty string value, override the values of
75                 all the other internationalization variables.
76
77       LC_COLLATE
78                 Determine the locale for the behavior  of  range  expressions
79                 and equivalence classes.
80
81       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
82                 bytes of text data as characters (for example, single-byte as
83                 opposed to multi-byte characters in arguments) and the behav‐
84                 ior of character classes.
85
86       LC_MESSAGES
87                 Determine the locale that should be used to affect the format
88                 and  contents  of  diagnostic  messages  written  to standard
89                 error.
90
91       NLSPATH   Determine the location of message catalogs for the processing
92                 of LC_MESSAGES.
93

ASYNCHRONOUS EVENTS

95       Default.
96

STDOUT

98       The  tr  output  shall be identical to the input, with the exception of
99       the specified transformations.
100

STDERR

102       The standard error shall be used only for diagnostic messages.
103

OUTPUT FILES

105       None.
106

EXTENDED DESCRIPTION

108       The operands string1 and string2 (if specified) define  two  arrays  of
109       characters. The constructs in the following list can be used to specify
110       characters or single-character collating elements. If any of  the  con‐
111       structs result in multi-character collating elements, tr shall exclude,
112       without a diagnostic, those multi-character elements from the resulting
113       array.
114
115       character Any  character  not described by one of the conventions below
116                 shall represent itself.
117
118       \octal    Octal sequences can be used to represent characters with spe‐
119                 cific  coded  values.  An  octal  sequence shall consist of a
120                 <backslash> followed by the longest sequence of one, two,  or
121                 three-octal-digit  characters  (01234567). The sequence shall
122                 cause the value whose encoding is  represented  by  the  one,
123                 two,  or  three-digit  octal  integer  to  be placed into the
124                 array. Multi-byte characters require  multiple,  concatenated
125                 escape  sequences  of this type, including the leading <back‐
126                 slash> for each byte.
127
128       \character
129                 The <backslash>-escape sequences in the Base Definitions vol‐
130                 ume  of POSIX.1‐2008, Table 5-1, Escape Sequences and Associ‐
131                 ated Actions ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v')
132                 shall be supported. The results of using any other character,
133                 other than an octal  digit,  following  the  <backslash>  are
134                 unspecified.  Also,  if  there  is no character following the
135                 <backslash>, the results are unspecified.
136
137       c−c       In the POSIX locale, this construct shall represent the range
138                 of collating elements between the range endpoints (as long as
139                 neither endpoint is an octal sequence of  the  form  \octal),
140                 inclusive,  as defined by the collation sequence. The charac‐
141                 ters or collating elements in the range shall  be  placed  in
142                 the array in ascending collation sequence. If the second end‐
143                 point  precedes  the  starting  endpoint  in  the   collation
144                 sequence,  it  is  unspecified whether the range of collating
145                 elements is empty, or this construct is treated  as  invalid.
146                 In  locales  other  than the POSIX locale, this construct has
147                 unspecified behavior.
148
149                 If either or both of the range endpoints are octal  sequences
150                 of  the  form  \octal, this shall represent the range of spe‐
151                 cific coded values between the two  range  endpoints,  inclu‐
152                 sive.
153
154       [:class:] Represents  all characters belonging to the defined character
155                 class, as defined by the  current  setting  of  the  LC_CTYPE
156                 locale category. The following character class names shall be
157                 accepted when specified in string1:
158
159                 alnum   blank   digit   lower   punct   upper
160                 alpha   cntrl   graph   print   space   xdigit
161
162                 In addition, character class expressions of the form [:name:]
163                 shall  be  recognized in those locales where the name keyword
164                 has been given a charclass definition in the  LC_CTYPE  cate‐
165                 gory.
166
167                 When  both  the  −d  and −s options are specified, any of the
168                 character class names shall be accepted in  string2.   Other‐
169                 wise,  only character class names lower or upper are valid in
170                 string2 and then only if the  corresponding  character  class
171                 (upper and lower, respectively) is specified in the same rel‐
172                 ative position in string1.  Such  a  specification  shall  be
173                 interpreted  as a request for case conversion. When [:lower:]
174                 appears in string1 and  [:upper:]  appears  in  string2,  the
175                 arrays  shall contain the characters from the toupper mapping
176                 in  the  LC_CTYPE  category  of  the  current  locale.   When
177                 [:upper:]   appears  in  string1  and  [:lower:]  appears  in
178                 string2, the arrays shall contain  the  characters  from  the
179                 tolower  mapping  in  the  LC_CTYPE  category  of the current
180                 locale. The first character from each mapping pair  shall  be
181                 in  the  array for string1 and the second character from each
182                 mapping pair shall be in the array for string2  in  the  same
183                 relative position.
184
185                 Except  for  case  conversion,  the characters specified by a
186                 character class expression shall be placed in the array in an
187                 unspecified order.
188
189                 If the name specified for class does not define a valid char‐
190                 acter class in the current locale, the behavior is undefined.
191
192       [=equiv=] Represents all characters or collating elements belonging  to
193                 the  same  equivalence class as equiv, as defined by the cur‐
194                 rent setting of the LC_COLLATE locale  category.  An  equiva‐
195                 lence  class  expression shall be allowed only in string1, or
196                 in string2 when it is being used by the combined  −d  and  −s
197                 options.  The  characters  belonging to the equivalence class
198                 shall be placed in the array in an unspecified order.
199
200       [x*n]     Represents  n  repeated  occurrences  of  the  character   x.
201                 Because this expression is used to map multiple characters to
202                 one, it is only valid when it occurs in  string2.   If  n  is
203                 omitted  or  is zero, it shall be interpreted as large enough
204                 to extend the string2-based sequence to  the  length  of  the
205                 string1-based  sequence. If n has a leading zero, it shall be
206                 interpreted as an octal value.  Otherwise, it shall be inter‐
207                 preted as a decimal value.
208
209       When the −d option is not specified:
210
211        *  If  string2  is  present,  each  input character found in the array
212           specified by string1 shall be replaced by the character in the same
213           relative  position in the array specified by string2.  If the array
214           specified by string2 is shorter that the one specified by  string1,
215           or if a character occurs more than once in string1, the results are
216           unspecified.
217
218        *  If the −C option is specified, the complements  of  the  characters
219           specified  by  string1  (the  set  of all characters in the current
220           character set, as defined  by  the  current  setting  of  LC_CTYPE,
221           except  for  those actually specified in the string1 operand) shall
222           be placed in the array in ascending collation sequence, as  defined
223           by the current setting of LC_COLLATE.
224
225        *  If  the −c option is specified, the complement of the values speci‐
226           fied by string1 shall be placed in the array in ascending order  by
227           binary value.
228
229        *  Because  the order in which characters specified by character class
230           expressions or equivalence class  expressions  is  undefined,  such
231           expressions  should  only  be  used if the intent is to map several
232           characters into one. An exception is case conversion, as  described
233           previously.
234
235       When the −d option is specified:
236
237        *  Input  characters  found in the array specified by string1 shall be
238           deleted.
239
240        *  When the −C option is specified  with  −d,  all  characters  except
241           those  specified  by  string1  shall  be  deleted.  The contents of
242           string2 are ignored, unless the −s option is also specified.
243
244        *  When the −c option is specified with −d, all  values  except  those
245           specified  by  string1  shall  be  deleted. The contents of string2
246           shall be ignored, unless the −s option is also specified.
247
248        *  The same string cannot be used for both the −d and the  −s  option;
249           when  both  options are specified, both string1 (used for deletion)
250           and string2 (used for squeezing) shall be required.
251
252       When the −s option is specified, after any  deletions  or  translations
253       have  taken  place,  repeated  sequences of the same character shall be
254       replaced by one occurrence of the same character, if the  character  is
255       found  in  the array specified by the last operand. If the last operand
256       contains a character class, such as the following example:
257
258           tr −s '[:space:]'
259
260       the last operand's array shall contain all of the  characters  in  that
261       character  class.  However,  in  a case conversion, as described previ‐
262       ously, such as:
263
264           tr −s '[:upper:]' '[:lower:]'
265
266       the last operand's array shall contain only those characters defined as
267       the  second  characters  in  each  of  the toupper or tolower character
268       pairs, as appropriate.
269
270       An empty string used for string1 or string2 produces undefined results.
271

EXIT STATUS

273       The following exit values shall be returned:
274
275        0    All input was processed successfully.
276
277       >0    An error occurred.
278

CONSEQUENCES OF ERRORS

280       Default.
281
282       The following sections are informative.
283

APPLICATION USAGE

285       If necessary, string1 and string2 can be quoted to avoid pattern match‐
286       ing by the shell.
287
288       If  an  ordinary  digit  (representing  itself)  is  to follow an octal
289       sequence, the octal sequence must use the full three  digits  to  avoid
290       ambiguity.
291
292       When string2 is shorter than string1, a difference results between his‐
293       torical System V and BSD systems. A BSD system pads  string2  with  the
294       last  character  found in string2.  Thus, it is possible to do the fol‐
295       lowing:
296
297           tr 0123456789 d
298
299       which would translate all digits to the letter 'd'.  Since this area is
300       specifically  unspecified  in this volume of POSIX.1‐2008, both the BSD
301       and System V behaviors are allowed, but a conforming application cannot
302       rely on the BSD behavior. It would have to code the example in the fol‐
303       lowing way:
304
305           tr 0123456789 '[d*]'
306
307       It should be noted that, despite similarities in appearance, the string
308       operands used by tr are not regular expressions.
309
310       Unlike some historical implementations, this definition of the tr util‐
311       ity correctly processes NUL characters in its input stream. NUL charac‐
312       ters can be stripped by using:
313
314           tr −d '\000'
315

EXAMPLES

317        1. The  following example creates a list of all words in file1 one per
318           line in file2, where a word is taken to be a maximal string of let‐
319           ters.
320
321               tr −cs "[:alpha:]" "[\n*]" <file1 >file2
322
323        2. The  next  example  translates all lowercase characters in file1 to
324           uppercase and writes the results to standard output.
325
326               tr "[:lower:]" "[:upper:]" <file1
327
328        3. This example uses an equivalence class to identify  accented  vari‐
329           ants of the base character 'e' in file1, which are stripped of dia‐
330           critical marks and written to file2.
331
332               tr "[=e=]" "[e*]" <file1 >file2
333

RATIONALE

335       In some early proposals, an explicit option −n was added to disable the
336       historical  behavior of stripping NUL characters from the input. It was
337       considered that automatically stripping NUL characters from  the  input
338       was  not  correct functionality.  However, the removal of −n in a later
339       proposal does not remove the requirement that tr correctly process  NUL
340       characters in its input stream. NUL characters can be stripped by using
341       tr −d '\000'.
342
343       Historical implementations of tr differ widely in syntax and  behavior.
344       For  example, the BSD version has not needed the bracket characters for
345       the repetition sequence. The tr utility syntax is based more closely on
346       the  System V and XPG3 model while attempting to accommodate historical
347       BSD implementations. In the case of  the  short  string2  padding,  the
348       decision  was  to unspecify the behavior and preserve System V and XPG3
349       scripts, which might find difficulty with the BSD method.  The  assump‐
350       tion  was made that BSD users of tr have to make accommodations to meet
351       the syntax defined here. Since it is possible  to  use  the  repetition
352       sequence  to duplicate the desired behavior, whereas there is no simple
353       way to achieve the System V method, this was the correct, if not desir‐
354       able, approach.
355
356       The  use  of  octal  values to specify control characters, while having
357       historical precedents, is not  portable.  The  introduction  of  escape
358       sequences for control characters should provide the necessary portabil‐
359       ity. It is recognized that this may cause some  historical  scripts  to
360       break.
361
362       An  early  proposal included support for multi-character collating ele‐
363       ments.  It was pointed out that, while tr does employ some  syntactical
364       elements  from REs, the aim of tr is quite different; ranges, for exam‐
365       ple, do not have a similar meaning (``any of the  chars  in  the  range
366       matches'', versus ``translate each character in the range to the output
367       counterpart''). As a result, the previously included support for multi-
368       character  collating elements has been removed. What remains are ranges
369       in current collation order (to support, for example,  accented  charac‐
370       ters), character classes, and equivalence classes.
371
372       In  XPG3  the [:class:] and [=equiv=] conventions are shown with double
373       brackets, as in RE syntax. However, tr does not  implement  RE  princi‐
374       ples;  it just borrows part of the syntax.  Consequently, [:class:] and
375       [=equiv=] should be regarded as syntactical  elements  on  a  par  with
376       [x*n], which is not an RE bracket expression.
377
378       The  standard  developers  will consider changes to tr that allow it to
379       translate characters between different  character  encodings,  or  they
380       will consider providing a new utility to accomplish this.
381
382       On  historical  System V systems, a range expression requires enclosing
383       square-brackets, such as:
384
385           tr '[a-z]' '[A-Z]'
386
387       However, BSD-based systems did not require the brackets, and this  con‐
388       vention is used here to avoid breaking large numbers of BSD scripts:
389
390           tr a-z A-Z
391
392       The  preceding System V script will continue to work because the brack‐
393       ets, treated as regular characters, are translated to themselves.  How‐
394       ever,  any  System V script that relied on "a‐z" representing the three
395       characters 'a', '−', and 'z' have to be rewritten as "az−".
396
397       The ISO POSIX‐2:1993 standard had a −c option that behaved similarly to
398       the  −C  option,  but did not supply functionality equivalent to the −c
399       option specified in POSIX.1‐2008. This meant that  historical  practice
400       of  being able to specify tr −cd\000−\177 (which would delete all bytes
401       with the top bit set) would have no effect because, in  the  C  locale,
402       bytes with the values octal 200 to octal 377 are not characters.
403
404       The  earlier version also said that octal sequences referred to collat‐
405       ing elements and could be placed adjacent  to  each  other  to  specify
406       multi-byte  characters. However, it was noted that this caused ambigui‐
407       ties because tr would not  be  able  to  tell  whether  adjacent  octal
408       sequences  were  intending to specify multi-byte characters or multiple
409       single byte characters. POSIX.1‐2008  specifies  that  octal  sequences
410       always  refer to single byte binary values when used to specify an end‐
411       point of a range of collating elements.
412
413       Earlier versions of this  standard  allowed  for  implementations  with
414       bytes  other  than  eight bits, but this has been modified in this ver‐
415       sion.
416

FUTURE DIRECTIONS

418       None.
419

COPYRIGHT

428       Portions of this text are reprinted and reproduced in  electronic  form
429       from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
430       -- Portable Operating System Interface (POSIX),  The  Open  Group  Base
431       Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
432       cal and Electronics Engineers,  Inc  and  The  Open  Group.   (This  is
433       POSIX.1-2008  with  the  2013  Technical Corrigendum 1 applied.) In the
434       event of any discrepancy between this version and the original IEEE and
435       The  Open Group Standard, the original IEEE and The Open Group Standard
436       is the referee document. The original Standard can be  obtained  online
437       at http://www.unix.org/online.html .
438
439       Any  typographical  or  formatting  errors that appear in this page are
440       most likely to have been introduced during the conversion of the source
441       files  to  man page format. To report such errors, see https://www.ker‐
442       nel.org/doc/man-pages/reporting_bugs.html .
443
444
445
446IEEE/The Open Group                  2013                               TR(1P)