tr(1p) - f34

1TR(1P)                     POSIX Programmer's Manual                    TR(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       tr — translate characters
13

SYNOPSIS

15       tr [-c|-C] [-s] string1 string2
16
17       tr -s [-c|-C] string1
18
19       tr -d [-c|-C] string1
20
21       tr -ds [-c|-C] string1 string2
22

DESCRIPTION

24       The tr utility shall copy the standard input  to  the  standard  output
25       with substitution or deletion of selected characters. The options spec‐
26       ified and the string1 and string2 operands shall  control  translations
27       that occur while copying characters and single-character collating ele‐
28       ments.
29

OPTIONS

31       The tr  utility  shall  conform  to  the  Base  Definitions  volume  of
32       POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
33
34       The following options shall be supported:
35
36       -c        Complement  the  set of values specified by string1.  See the
37                 EXTENDED DESCRIPTION section.
38
39       -C        Complement the set of characters specified by  string1.   See
40                 the EXTENDED DESCRIPTION section.
41
42       -d        Delete all occurrences of input characters that are specified
43                 by string1.
44
45       -s        Replace instances of repeated characters with a single  char‐
46                 acter, as described in the EXTENDED DESCRIPTION section.
47

OPERANDS

49       The following operands shall be supported:
50
51       string1, string2
52                 Translation  control  strings.  Each string shall represent a
53                 set of characters to be converted into an array of characters
54                 used  for  the translation. For a detailed description of how
55                 the strings are interpreted,  see  the  EXTENDED  DESCRIPTION
56                 section.
57

STDIN

59       The standard input can be any type of file.
60

INPUT FILES

62       None.
63

ENVIRONMENT VARIABLES

65       The following environment variables shall affect the execution of tr:
66
67       LANG      Provide  a  default  value for the internationalization vari‐
68                 ables that are unset or null. (See the Base Definitions  vol‐
69                 ume  of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
70                 ables for the precedence  of  internationalization  variables
71                 used to determine the values of locale categories.)
72
73       LC_ALL    If  set  to  a non-empty string value, override the values of
74                 all the other internationalization variables.
75
76       LC_COLLATE
77                 Determine the locale for the behavior  of  range  expressions
78                 and equivalence classes.
79
80       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
81                 bytes of text data as characters (for example, single-byte as
82                 opposed to multi-byte characters in arguments) and the behav‐
83                 ior of character classes.
84
85       LC_MESSAGES
86                 Determine the locale that should be used to affect the format
87                 and  contents  of  diagnostic  messages  written  to standard
88                 error.
89
90       NLSPATH   Determine the location of message catalogs for the processing
91                 of LC_MESSAGES.
92

ASYNCHRONOUS EVENTS

94       Default.
95

STDOUT

97       The  tr  output  shall be identical to the input, with the exception of
98       the specified transformations.
99

STDERR

101       The standard error shall be used only for diagnostic messages.
102

OUTPUT FILES

104       None.
105

EXTENDED DESCRIPTION

107       The operands string1 and string2 (if specified) define  two  arrays  of
108       characters. The constructs in the following list can be used to specify
109       characters or single-character collating elements. If any of  the  con‐
110       structs result in multi-character collating elements, tr shall exclude,
111       without a diagnostic, those multi-character elements from the resulting
112       array.
113
114       character Any  character  not described by one of the conventions below
115                 shall represent itself.
116
117       \octal    Octal sequences can be used to represent characters with spe‐
118                 cific  coded  values.  An  octal  sequence shall consist of a
119                 <backslash> followed by the longest sequence of one, two,  or
120                 three-octal-digit  characters  (01234567). The sequence shall
121                 cause the value whose encoding is  represented  by  the  one,
122                 two,  or  three-digit  octal  integer  to  be placed into the
123                 array. Multi-byte characters require  multiple,  concatenated
124                 escape  sequences  of this type, including the leading <back‐
125                 slash> for each byte.
126
127       \character
128                 The <backslash>-escape sequences in the Base Definitions vol‐
129                 ume  of POSIX.1‐2017, Table 5-1, Escape Sequences and Associ‐
130                 ated Actions ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v')
131                 shall be supported. The results of using any other character,
132                 other than an octal  digit,  following  the  <backslash>  are
133                 unspecified.  Also,  if  there  is no character following the
134                 <backslash>, the results are unspecified.
135
136       c-c       In the POSIX locale, this construct shall represent the range
137                 of collating elements between the range endpoints (as long as
138                 neither endpoint is an octal sequence of  the  form  \octal),
139                 inclusive,  as defined by the collation sequence. The charac‐
140                 ters or collating elements in the range shall  be  placed  in
141                 the array in ascending collation sequence. If the second end‐
142                 point  precedes  the  starting  endpoint  in  the   collation
143                 sequence,  it  is  unspecified whether the range of collating
144                 elements is empty, or this construct is treated  as  invalid.
145                 In  locales  other  than the POSIX locale, this construct has
146                 unspecified behavior.
147
148                 If either or both of the range endpoints are octal  sequences
149                 of  the  form  \octal, this shall represent the range of spe‐
150                 cific coded values between the two  range  endpoints,  inclu‐
151                 sive.
152
153       [:class:] Represents  all characters belonging to the defined character
154                 class, as defined by the  current  setting  of  the  LC_CTYPE
155                 locale category. The following character class names shall be
156                 accepted when specified in string1:
157
158                 alnum   blank   digit   lower   punct   upper
159                 alpha   cntrl   graph   print   space   xdigit
160
161                 In addition, character class expressions of the form [:name:]
162                 shall  be  recognized in those locales where the name keyword
163                 has been given a charclass definition in the  LC_CTYPE  cate‐
164                 gory.
165
166                 When  both  the  -d  and -s options are specified, any of the
167                 character class names shall be accepted in  string2.   Other‐
168                 wise,  only character class names lower or upper are valid in
169                 string2 and then only if the  corresponding  character  class
170                 (upper and lower, respectively) is specified in the same rel‐
171                 ative position in string1.  Such  a  specification  shall  be
172                 interpreted  as a request for case conversion. When [:lower:]
173                 appears in string1 and  [:upper:]  appears  in  string2,  the
174                 arrays  shall contain the characters from the toupper mapping
175                 in  the  LC_CTYPE  category  of  the  current  locale.   When
176                 [:upper:]   appears  in  string1  and  [:lower:]  appears  in
177                 string2, the arrays shall contain  the  characters  from  the
178                 tolower  mapping  in  the  LC_CTYPE  category  of the current
179                 locale. The first character from each mapping pair  shall  be
180                 in  the  array for string1 and the second character from each
181                 mapping pair shall be in the array for string2  in  the  same
182                 relative position.
183
184                 Except  for  case  conversion,  the characters specified by a
185                 character class expression shall be placed in the array in an
186                 unspecified order.
187
188                 If the name specified for class does not define a valid char‐
189                 acter class in the current locale, the behavior is undefined.
190
191       [=equiv=] Represents all characters or collating elements belonging  to
192                 the  same  equivalence class as equiv, as defined by the cur‐
193                 rent setting of the LC_COLLATE locale  category.  An  equiva‐
194                 lence  class  expression shall be allowed only in string1, or
195                 in string2 when it is being used by the combined  -d  and  -s
196                 options.  The  characters  belonging to the equivalence class
197                 shall be placed in the array in an unspecified order.
198
199       [x*n]     Represents  n  repeated  occurrences  of  the  character   x.
200                 Because this expression is used to map multiple characters to
201                 one, it is only valid when it occurs in  string2.   If  n  is
202                 omitted  or  is zero, it shall be interpreted as large enough
203                 to extend the string2-based sequence to  the  length  of  the
204                 string1-based  sequence. If n has a leading zero, it shall be
205                 interpreted as an octal value.  Otherwise, it shall be inter‐
206                 preted as a decimal value.
207
208       When the -d option is not specified:
209
210        *  If  string2  is  present,  each  input character found in the array
211           specified by string1 shall be replaced by the character in the same
212           relative  position in the array specified by string2.  If the array
213           specified by string2 is shorter that the one specified by  string1,
214           or if a character occurs more than once in string1, the results are
215           unspecified.
216
217        *  If the -C option is specified, the complements  of  the  characters
218           specified  by  string1  (the  set  of all characters in the current
219           character set, as defined  by  the  current  setting  of  LC_CTYPE,
220           except  for  those actually specified in the string1 operand) shall
221           be placed in the array in ascending collation sequence, as  defined
222           by the current setting of LC_COLLATE.
223
224        *  If  the -c option is specified, the complement of the values speci‐
225           fied by string1 shall be placed in the array in ascending order  by
226           binary value.
227
228        *  Because  the order in which characters specified by character class
229           expressions or equivalence class  expressions  is  undefined,  such
230           expressions  should  only  be  used if the intent is to map several
231           characters into one. An exception is case conversion, as  described
232           previously.
233
234       When the -d option is specified:
235
236        *  Input  characters  found in the array specified by string1 shall be
237           deleted.
238
239        *  When the -C option is specified  with  -d,  all  characters  except
240           those  specified  by  string1  shall  be  deleted.  The contents of
241           string2 are ignored, unless the -s option is also specified.
242
243        *  When the -c option is specified with -d, all  values  except  those
244           specified  by  string1  shall  be  deleted. The contents of string2
245           shall be ignored, unless the -s option is also specified.
246
247        *  The same string cannot be used for both the -d and the  -s  option;
248           when  both  options are specified, both string1 (used for deletion)
249           and string2 (used for squeezing) shall be required.
250
251       When the -s option is specified, after any  deletions  or  translations
252       have  taken  place,  repeated  sequences of the same character shall be
253       replaced by one occurrence of the same character, if the  character  is
254       found  in  the array specified by the last operand. If the last operand
255       contains a character class, such as the following example:
256
257
258           tr -s '[:space:]'
259
260       the last operand's array shall contain all of the  characters  in  that
261       character  class.  However,  in  a case conversion, as described previ‐
262       ously, such as:
263
264
265           tr -s '[:upper:]' '[:lower:]'
266
267       the last operand's array shall contain only those characters defined as
268       the  second  characters  in  each  of  the toupper or tolower character
269       pairs, as appropriate.
270
271       An empty string used for string1 or string2 produces undefined results.
272

EXIT STATUS

274       The following exit values shall be returned:
275
276        0    All input was processed successfully.
277
278       >0    An error occurred.
279

CONSEQUENCES OF ERRORS

281       Default.
282
283       The following sections are informative.
284

APPLICATION USAGE

286       If necessary, string1 and string2 can be quoted to avoid pattern match‐
287       ing by the shell.
288
289       If  an  ordinary  digit  (representing  itself)  is  to follow an octal
290       sequence, the octal sequence must use the full three  digits  to  avoid
291       ambiguity.
292
293       When string2 is shorter than string1, a difference results between his‐
294       torical System V and BSD systems. A BSD system pads  string2  with  the
295       last  character  found in string2.  Thus, it is possible to do the fol‐
296       lowing:
297
298
299           tr 0123456789 d
300
301       which would translate all digits to the letter 'd'.  Since this area is
302       specifically  unspecified  in this volume of POSIX.1‐2017, both the BSD
303       and System V behaviors are allowed, but a conforming application cannot
304       rely on the BSD behavior. It would have to code the example in the fol‐
305       lowing way:
306
307
308           tr 0123456789 '[d*]'
309
310       It should be noted that, despite similarities in appearance, the string
311       operands used by tr are not regular expressions.
312
313       Unlike some historical implementations, this definition of the tr util‐
314       ity correctly processes NUL characters in its input stream. NUL charac‐
315       ters can be stripped by using:
316
317
318           tr -d '\000'
319

EXAMPLES

321        1. The  following example creates a list of all words in file1 one per
322           line in file2, where a word is taken to be a maximal string of let‐
323           ters.
324
325
326               tr -cs "[:alpha:]" "[\n*]" <file1 >file2
327
328        2. The  next  example  translates all lowercase characters in file1 to
329           uppercase and writes the results to standard output.
330
331
332               tr "[:lower:]" "[:upper:]" <file1
333
334        3. This example uses an equivalence class to identify  accented  vari‐
335           ants of the base character 'e' in file1, which are stripped of dia‐
336           critical marks and written to file2.
337
338
339               tr "[=e=]" "[e*]" <file1 >file2
340

RATIONALE

342       In some early proposals, an explicit option -n was added to disable the
343       historical  behavior of stripping NUL characters from the input. It was
344       considered that automatically stripping NUL characters from  the  input
345       was  not  correct functionality.  However, the removal of -n in a later
346       proposal does not remove the requirement that tr correctly process  NUL
347       characters in its input stream. NUL characters can be stripped by using
348       tr -d '\000'.
349
350       Historical implementations of tr differ widely in syntax and  behavior.
351       For  example, the BSD version has not needed the bracket characters for
352       the repetition sequence. The tr utility syntax is based more closely on
353       the  System V and XPG3 model while attempting to accommodate historical
354       BSD implementations. In the case of  the  short  string2  padding,  the
355       decision  was  to unspecify the behavior and preserve System V and XPG3
356       scripts, which might find difficulty with the BSD method.  The  assump‐
357       tion  was made that BSD users of tr have to make accommodations to meet
358       the syntax defined here. Since it is possible  to  use  the  repetition
359       sequence  to duplicate the desired behavior, whereas there is no simple
360       way to achieve the System V method, this was the correct, if not desir‐
361       able, approach.
362
363       The  use  of  octal  values to specify control characters, while having
364       historical precedents, is not  portable.  The  introduction  of  escape
365       sequences for control characters should provide the necessary portabil‐
366       ity. It is recognized that this may cause some  historical  scripts  to
367       break.
368
369       An  early  proposal included support for multi-character collating ele‐
370       ments.  It was pointed out that, while tr does employ some  syntactical
371       elements  from REs, the aim of tr is quite different; ranges, for exam‐
372       ple, do not have a similar meaning (``any of the  chars  in  the  range
373       matches'', versus ``translate each character in the range to the output
374       counterpart''). As a result, the previously included support for multi-
375       character  collating elements has been removed. What remains are ranges
376       in current collation order (to support, for example,  accented  charac‐
377       ters), character classes, and equivalence classes.
378
379       In  XPG3  the [:class:] and [=equiv=] conventions are shown with double
380       brackets, as in RE syntax. However, tr does not  implement  RE  princi‐
381       ples;  it just borrows part of the syntax.  Consequently, [:class:] and
382       [=equiv=] should be regarded as syntactical  elements  on  a  par  with
383       [x*n], which is not an RE bracket expression.
384
385       The  standard  developers  will consider changes to tr that allow it to
386       translate characters between different  character  encodings,  or  they
387       will consider providing a new utility to accomplish this.
388
389       On  historical  System V systems, a range expression requires enclosing
390       square-brackets, such as:
391
392
393           tr '[a-z]' '[A-Z]'
394
395       However, BSD-based systems did not require the brackets, and this  con‐
396       vention is used here to avoid breaking large numbers of BSD scripts:
397
398
399           tr a-z A-Z
400
401       The  preceding System V script will continue to work because the brack‐
402       ets, treated as regular characters, are translated to themselves.  How‐
403       ever,  any  System V script that relied on "a‐z" representing the three
404       characters 'a', '-', and 'z' have to be rewritten as "az-".
405
406       The ISO POSIX‐2:1993 standard had a -c option that behaved similarly to
407       the  -C  option,  but did not supply functionality equivalent to the -c
408       option specified in POSIX.1‐2008.
409
410       The earlier version also said that octal sequences referred to  collat‐
411       ing  elements  and  could  be  placed adjacent to each other to specify
412       multi-byte characters. However, it was noted that this caused  ambigui‐
413       ties  because  tr  would  not  be  able  to tell whether adjacent octal
414       sequences were intending to specify multi-byte characters  or  multiple
415       single  byte  characters.  POSIX.1‐2008  specifies that octal sequences
416       always refer to single byte binary values when used to specify an  end‐
417       point of a range of collating elements.
418
419       Earlier  versions  of  this  standard  allowed for implementations with
420       bytes other than eight bits, but this has been modified  in  this  ver‐
421       sion.
422

FUTURE DIRECTIONS

424       None.
425

COPYRIGHT

434       Portions  of  this text are reprinted and reproduced in electronic form
435       from IEEE Std 1003.1-2017, Standard for Information Technology --  Por‐
436       table  Operating System Interface (POSIX), The Open Group Base Specifi‐
437       cations Issue 7, 2018 Edition, Copyright (C) 2018 by the  Institute  of
438       Electrical  and  Electronics Engineers, Inc and The Open Group.  In the
439       event of any discrepancy between this version and the original IEEE and
440       The  Open Group Standard, the original IEEE and The Open Group Standard
441       is the referee document. The original Standard can be  obtained  online
442       at http://www.opengroup.org/unix/online.html .
443
444       Any  typographical  or  formatting  errors that appear in this page are
445       most likely to have been introduced during the conversion of the source
446       files  to  man page format. To report such errors, see https://www.ker‐
447       nel.org/doc/man-pages/reporting_bugs.html .
448
449
450
451IEEE/The Open Group                  2017                               TR(1P)