1SED(P) POSIX Programmer's Manual SED(P)
2
3
4
6 sed - stream editor
7
9 sed [-n] script[file...]
10
11 sed [-n][-e script]...[-f script_file]...[file...]
12
13
15 The sed utility is a stream editor that shall read one or more text
16 files, make editing changes according to a script of editing commands,
17 and write the results to standard output. The script shall be obtained
18 from either the script operand string or a combination of the option-
19 arguments from the -e script and -f script_file options.
20
22 The sed utility shall conform to the Base Definitions volume of
23 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, except
24 that the order of presentation of the -e and -f options is significant.
25
26 The following options shall be supported:
27
28 -e script
29 Add the editing commands specified by the script option-argument
30 to the end of the script of editing commands. The script option-
31 argument shall have the same properties as the script operand,
32 described in the OPERANDS section.
33
34 -f script_file
35 Add the editing commands in the file script_file to the end of
36 the script.
37
38 -n Suppress the default output (in which each line, after it is
39 examined for editing, is written to standard output). Only lines
40 explicitly selected for output are written.
41
42
43 Multiple -e and -f options may be specified. All commands shall be
44 added to the script in the order specified, regardless of their origin.
45
47 The following operands shall be supported:
48
49 file A pathname of a file whose contents are read and edited. If mul‐
50 tiple file operands are specified, the named files shall be read
51 in the order specified and the concatenation shall be edited.
52 If no file operands are specified, the standard input shall be
53 used.
54
55 script A string to be used as the script of editing commands. The
56 application shall not present a script that violates the
57 restrictions of a text file except that the final character need
58 not be a <newline>.
59
60
62 The standard input shall be used only if no file operands are speci‐
63 fied. See the INPUT FILES section.
64
66 The input files shall be text files. The script_files named by the -f
67 option shall consist of editing commands.
68
70 The following environment variables shall affect the execution of sed:
71
72 LANG Provide a default value for the internationalization variables
73 that are unset or null. (See the Base Definitions volume of
74 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
75 ables for the precedence of internationalization variables used
76 to determine the values of locale categories.)
77
78 LC_ALL If set to a non-empty string value, override the values of all
79 the other internationalization variables.
80
81 LC_COLLATE
82
83 Determine the locale for the behavior of ranges, equivalence
84 classes, and multi-character collating elements within regular
85 expressions.
86
87 LC_CTYPE
88 Determine the locale for the interpretation of sequences of
89 bytes of text data as characters (for example, single-byte as
90 opposed to multi-byte characters in arguments and input files),
91 and the behavior of character classes within regular expres‐
92 sions.
93
94 LC_MESSAGES
95 Determine the locale that should be used to affect the format
96 and contents of diagnostic messages written to standard error.
97
98 NLSPATH
99 Determine the location of message catalogs for the processing of
100 LC_MESSAGES .
101
102
104 Default.
105
107 The input files shall be written to standard output, with the editing
108 commands specified in the script applied. If the -n option is speci‐
109 fied, only those input lines selected by the script shall be written to
110 standard output.
111
113 The standard error shall be used only for diagnostic messages.
114
116 The output files shall be text files whose formats are dependent on the
117 editing commands given.
118
120 The script shall consist of editing commands of the following form:
121
122
123 [address[,address]]function
124
125 where function represents a single-character command verb from the list
126 in Editing Commands in sed , followed by any applicable arguments.
127
128 The command can be preceded by <blank>s and/or semicolons. The function
129 can be preceded by <blank>s. These optional characters shall have no
130 effect.
131
132 In default operation, sed cyclically shall append a line of input, less
133 its terminating <newline>, into the pattern space. Normally the pattern
134 space will be empty, unless a D command terminated the last cycle. The
135 sed utility shall then apply in sequence all commands whose addresses
136 select that pattern space, and at the end of the script copy the pat‐
137 tern space to standard output (except when -n is specified) and delete
138 the pattern space. Whenever the pattern space is written to standard
139 output or a named file, sed shall immediately follow it with a <new‐
140 line>.
141
142 Some of the editing commands use a hold space to save all or part of
143 the pattern space for subsequent retrieval. The pattern and hold spaces
144 shall each be able to hold at least 8192 bytes.
145
146 Addresses in sed
147 An address is either a decimal number that counts input lines cumula‐
148 tively across files, a '$' character that addresses the last line of
149 input, or a context address (which consists of a BRE, as described in
150 Regular Expressions in sed , preceded and followed by a delimiter, usu‐
151 ally a slash).
152
153 An editing command with no addresses shall select every pattern space.
154
155 An editing command with one address shall select each pattern space
156 that matches the address.
157
158 An editing command with two addresses shall select the inclusive range
159 from the first pattern space that matches the first address through the
160 next pattern space that matches the second. (If the second address is a
161 number less than or equal to the line number first selected, only one
162 line shall be selected.) Starting at the first line following the
163 selected range, sed shall look again for the first address. Thereafter,
164 the process shall be repeated. Omitting either or both of the address
165 components in the following form produces undefined results:
166
167
168 [address[,address]]
169
170 Regular Expressions in sed
171 The sed utility shall support the BREs described in the Base Defini‐
172 tions volume of IEEE Std 1003.1-2001, Section 9.3, Basic Regular
173 Expressions, with the following additions:
174
175 * In a context address, the construction "\cBREc" , where c is any
176 character other than backslash or <newline>, shall be identical to
177 "/BRE/" . If the character designated by c appears following a back‐
178 slash, then it shall be considered to be that literal character,
179 which shall not terminate the BRE. For example, in the context
180 address "\xabc\xdefx" , the second x stands for itself, so that the
181 BRE is "abcxdef" .
182
183 * The escape sequence '\n' shall match a <newline> embedded in the
184 pattern space. A literal <newline> shall not be used in the BRE of a
185 context address or in the substitute function.
186
187 * If an RE is empty (that is, no pattern is specified) sed shall
188 behave as if the last RE used in the last command applied (either as
189 an address or as part of a substitute command) was specified.
190
191 Editing Commands in sed
192 In the following list of editing commands, the maximum number of per‐
193 missible addresses for each function is indicated by [ 0addr], [
194 1addr], or [ 2addr], representing zero, one, or two addresses.
195
196 The argument text shall consist of one or more lines. Each embedded
197 <newline> in the text shall be preceded by a backslash. Other back‐
198 slashes in text shall be removed, and the following character shall be
199 treated literally.
200
201 The r and w command verbs, and the w flag to the s command, take an
202 optional rfile (or wfile) parameter, separated from the command verb
203 letter or flag by one or more <blank>s; implementations may allow zero
204 separation as an extension.
205
206 The argument rfile or the argument wfile shall terminate the editing
207 command. Each wfile shall be created before processing begins. Imple‐
208 mentations shall support at least ten wfile arguments in the script;
209 the actual number (greater than or equal to 10) that is supported by
210 the implementation is unspecified. The use of the wfile parameter shall
211 cause that file to be initially created, if it does not exist, or shall
212 replace the contents of an existing file.
213
214 The b, r, s, t, w, y, and : command verbs shall accept additional argu‐
215 ments. The following synopses indicate which arguments shall be sepa‐
216 rated from the command verbs by a single <space>.
217
218 The a and r commands schedule text for later output. The text specified
219 for the a command, and the contents of the file specified for the r
220 command, shall be written to standard output just before the next
221 attempt to fetch a line of input when executing the N or n commands, or
222 when reaching the end of the script. If written when reaching the end
223 of the script, and the -n option was not specified, the text shall be
224 written after copying the pattern space to standard output. The con‐
225 tents of the file specified for the r command shall be as of the time
226 the output is written, not the time the r command is applied. The text
227 shall be output in the order in which the a and r commands were applied
228 to the input.
229
230 Command verbs other than {, a, b, c, i, r, t, w, :, and # can be fol‐
231 lowed by a semicolon, optional <blank>s, and another command verb. How‐
232 ever, when the s command verb is used with the w flag, following it
233 with another command in this manner produces undefined results.
234
235 A function can be preceded by one or more '!' characters, in which case
236 the function shall be applied if the addresses do not select the pat‐
237 tern space. Zero or more <blank>s shall be accepted before the first
238 '!' character. It is unspecified whether <blank>s can follow a '!'
239 character, and conforming applications shall not follow a '!' charac‐
240 ter with <blank>s.
241
242 [2addr] {function
243
244 function
245
246 ...
247
248 } Execute a list of sed functions only when the pattern space is
249 selected. The list of sed functions shall be surrounded by
250 braces and separated by <newline>s, and conform to the following
251 rules. The braces can be preceded or followed by <blank>s. The
252 functions can be preceded by <blank>s, but shall not be followed
253 by <blank>s. The <right-brace> shall be preceded by a <newline>
254 and can be preceded or followed by <blank>s.
255
256 [1addr]a\
257
258 text Write text to standard output as described previously.
259
260 [2addr]b [label]
261
262 Branch to the : function bearing the label. If label is not
263 specified, branch to the end of the script. The implementation
264 shall support labels recognized as unique up to at least 8 char‐
265 acters; the actual length (greater than or equal to 8) that
266 shall be supported by the implementation is unspecified. It is
267 unspecified whether exceeding a label length causes an error or
268 a silent truncation.
269
270 [2addr]c\
271
272 text Delete the pattern space. With a 0 or 1 address or at the end of
273 a 2-address range, place text on the output and start the next
274 cycle.
275
276 [2addr]d
277 Delete the pattern space and start the next cycle.
278
279 [2addr]D
280 Delete the initial segment of the pattern space through the
281 first <newline> and start the next cycle.
282
283 [2addr]g
284 Replace the contents of the pattern space by the contents of the
285 hold space.
286
287 [2addr]G
288 Append to the pattern space a <newline> followed by the contents
289 of the hold space.
290
291 [2addr]h
292 Replace the contents of the hold space with the contents of the
293 pattern space.
294
295 [2addr]H
296 Append to the hold space a <newline> followed by the contents of
297 the pattern space.
298
299 [1addr]i\
300
301 text Write text to standard output.
302
303 [2addr]l
304 (The letter ell.) Write the pattern space to standard output in
305 a visually unambiguous form. The characters listed in the Base
306 Definitions volume of IEEE Std 1003.1-2001, Table 5-1, Escape
307 Sequences and Associated Actions ( '\\' , '\a' , '\b' , '\f' ,
308 '\r' , '\t' , '\v' ) shall be written as the corresponding
309 escape sequence; the '\n' in that table is not applicable. Non-
310 printable characters not in that table shall be written as one
311 three-digit octal number (with a preceding backslash) for each
312 byte in the character (most significant byte first). If the size
313 of a byte on the system is greater than 9 bits, the format used
314 for non-printable characters is implementation-defined.
315
316 Long lines shall be folded, with the point of folding indicated by
317 writing a backslash followed by a <newline>; the length at which fold‐
318 ing occurs is unspecified, but should be appropriate for the output
319 device. The end of each line shall be marked with a '$' .
320
321 [2addr]n
322 Write the pattern space to standard output if the default output
323 has not been suppressed, and replace the pattern space with the
324 next line of input, less its terminating <newline>.
325
326 If no next line of input is available, the n command verb shall branch
327 to the end of the script and quit without starting a new cycle.
328
329 [2addr]N
330 Append the next line of input, less its terminating <newline>,
331 to the pattern space, using an embedded <newline> to separate
332 the appended material from the original material. Note that the
333 current line number changes.
334
335 If no next line of input is available, the N command verb shall branch
336 to the end of the script and quit without starting a new cycle or copy‐
337 ing the pattern space to standard output.
338
339 [2addr]p
340 Write the pattern space to standard output.
341
342 [2addr]P
343 Write the pattern space, up to the first <newline>, to standard
344 output.
345
346 [1addr]q
347 Branch to the end of the script and quit without starting a new
348 cycle.
349
350 [1addr]r rfile
351 Copy the contents of rfile to standard output as described pre‐
352 viously. If rfile does not exist or cannot be read, it shall be
353 treated as if it were an empty file, causing no error condition.
354
355 [2addr]s/BRE/replacement/flags
356
357 Substitute the replacement string for instances of the BRE in
358 the pattern space. Any character other than backslash or <new‐
359 line> can be used instead of a slash to delimit the BRE and the
360 replacement. Within the BRE and the replacement, the BRE delim‐
361 iter itself can be used as a literal character if it is preceded
362 by a backslash.
363
364 The replacement string shall be scanned from beginning to end. An
365 ampersand ( '&' ) appearing in the replacement shall be replaced by the
366 string matching the BRE. The special meaning of '&' in this context can
367 be suppressed by preceding it by a backslash. The characters "\n",
368 where n is a digit, shall be replaced by the text matched by the corre‐
369 sponding backreference expression. The special meaning of "\n" where n
370 is a digit in this context, can be suppressed by preceding it by a
371 backslash. For each other backslash ( '\' ) encountered, the following
372 character shall lose its special meaning (if any). The meaning of a '\'
373 immediately followed by any character other than '&' , '\' , a digit,
374 or the delimiter character used for this command, is unspecified.
375
376 A line can be split by substituting a <newline> into it. The applica‐
377 tion shall escape the <newline> in the replacement by preceding it by a
378 backslash. A substitution shall be considered to have been performed
379 even if the replacement string is identical to the string that it
380 replaces. Any backslash used to alter the default meaning of a subse‐
381 quent character shall be discarded from the BRE or the replacement
382 before evaluating the BRE or using the replacement.
383
384 The value of flags shall be zero or more of:
385
386 n
387 Substitute for the nth occurrence only of the BRE found within
388 the pattern space.
389
390 g
391 Globally substitute for all non-overlapping instances of the BRE
392 rather than just the first one. If both g and n are specified,
393 the results are unspecified.
394
395 p
396 Write the pattern space to standard output if a replacement was
397 made.
398
399 w wfile
400 Write. Append the pattern space to wfile if a replacement was
401 made. A conforming application shall precede the wfile argument
402 with one or more <blank>s. If the w flag is not the last flag
403 value given in a concatenation of multiple flag values, the
404 results are undefined.
405
406
407 [2addr]t [label]
408
409 Test. Branch to the : command verb bearing the label if any sub‐
410 stitutions have been made since the most recent reading of an
411 input line or execution of a t. If label is not specified,
412 branch to the end of the script.
413
414 [2addr]w wfile
415
416 Append (write) the pattern space to wfile.
417
418 [2addr]x
419 Exchange the contents of the pattern and hold spaces.
420
421 [2addr]y/string1/string2/
422
423 Replace all occurrences of characters in string1 with the corre‐
424 sponding characters in string2. If a backslash followed by an
425 'n' appear in string1 or string2, the two characters shall be
426 handled as a single <newline>. If the number of characters in
427 string1 and string2 are not equal, or if any of the characters
428 in string1 appear more than once, the results are undefined. Any
429 character other than backslash or <newline> can be used instead
430 of slash to delimit the strings. If the delimiter is not n,
431 within string1 and string2, the delimiter itself can be used as
432 a literal character if it is preceded by a backslash. If a
433 backslash character is immediately followed by a backslash char‐
434 acter in string1 or string2, the two backslash characters shall
435 be counted as a single literal backslash character. The meaning
436 of a backslash followed by any character that is not 'n' , a
437 backslash, or the delimiter character is undefined.
438
439 [0addr]:label
440 Do nothing. This command bears a label to which the b and t com‐
441 mands branch.
442
443 [1addr]=
444 Write the following to standard output:
445
446
447 "%d\n", <current line number>
448
449 [0addr]
450 Ignore this empty command.
451
452 [0addr]#
453 Ignore the '#' and the remainder of the line (treat them as a
454 comment), with the single exception that if the first two char‐
455 acters in the script are "#n" , the default output shall be sup‐
456 pressed; this shall be the equivalent of specifying -n on the
457 command line.
458
459
461 The following exit values shall be returned:
462
463 0 Successful completion.
464
465 >0 An error occurred.
466
467
469 Default.
470
471 The following sections are informative.
472
474 Regular expressions match entire strings, not just individual lines,
475 but a <newline> is matched by '\n' in a sed RE; a <newline> is not
476 allowed by the general definition of regular expression in
477 IEEE Std 1003.1-2001. Also note that '\n' cannot be used to match a
478 <newline> at the end of an arbitrary input line; <newline>s appear in
479 the pattern space as a result of the N editing command.
480
482 This sed script simulates the BSD cat -s command, squeezing excess
483 blank lines from standard input.
484
485
486 sed -n '
487 # Write non-empty lines.
488 /./ {
489 p
490 d
491 }
492 # Write a single empty line, then look for more empty lines.
493 /^$/ p
494 # Get next line, discard the held <newline> (empty line),
495 # and look for more empty lines.
496 :Empty
497 /^$/ {
498 N
499 s/.//
500 b Empty
501 }
502 # Write the non-empty line before going back to search
503 # for the first in a set of empty lines.
504 p
505
507 This volume of IEEE Std 1003.1-2001 requires implementations to support
508 at least ten distinct wfiles, matching historical practice on many
509 implementations. Implementations are encouraged to support more, but
510 conforming applications should not exceed this limit.
511
512 The exit status codes specified here are different from those in System
513 V. System V returns 2 for garbled sed commands, but returns zero with
514 its usage message or if the input file could not be opened. The stan‐
515 dard developers considered this to be a bug.
516
517 The manner in which the l command writes non-printable characters was
518 changed to avoid the historical backspace-overstrike method, and other
519 requirements to achieve unambiguous output were added. See the RATIO‐
520 NALE for ed for details of the format chosen, which is the same as that
521 chosen for sed.
522
523 This volume of IEEE Std 1003.1-2001 requires implementations to provide
524 pattern and hold spaces of at least 8192 bytes, larger than the 4000
525 bytes spaces used by some historical implementations, but less than the
526 20480 bytes limit used in an early proposal. Implementations are
527 encouraged to allocate dynamically larger pattern and hold spaces as
528 needed.
529
530 The requirements for acceptance of <blank>s and <space>s in command
531 lines has been made more explicit than in early proposals to describe
532 clearly the historical practice and to remove confusion about the
533 phrase "protect initial blanks [sic] and tabs from the stripping that
534 is done on every script line" that appears in much of the historical
535 documentation of the sed utility description of text. (Not all imple‐
536 mentations are known to have stripped <blank>s from text lines,
537 although they all have allowed leading <blank>s preceding the address
538 on a command line.)
539
540 The treatment of '#' comments differs from the SVID which only allows a
541 comment as the first line of the script, but matches BSD-derived imple‐
542 mentations. The comment character is treated as a command, and it has
543 the same properties in terms of being accepted with leading <blank>s;
544 the BSD implementation has historically supported this.
545
546 Early proposals required that a script_file have at least one non-com‐
547 ment line. Some historical implementations have behaved in unexpected
548 ways if this were not the case. The standard developers considered that
549 this was incorrect behavior and that application developers should not
550 have to avoid this feature. A correct implementation of this volume of
551 IEEE Std 1003.1-2001 shall permit script_files that consist only of
552 comment lines.
553
554 Early proposals indicated that if -e and -f options were intermixed,
555 all -e options were processed before any -f options. This has been
556 changed to process them in the order presented because it matches his‐
557 torical practice and is more intuitive.
558
559 The treatment of the p flag to the s command differs between System V
560 and BSD-based systems when the default output is suppressed. In the two
561 examples:
562
563
564 echo a | sed 's/a/A/p'
565 echo a | sed -n 's/a/A/p'
566
567 this volume of IEEE Std 1003.1-2001, BSD, System V documentation, and
568 the SVID indicate that the first example should write two lines with A,
569 whereas the second should write one. Some System V systems write the A
570 only once in both examples because the p flag is ignored if the -n
571 option is not specified.
572
573 This is a case of a diametrical difference between systems that could
574 not be reconciled through the compromise of declaring the behavior to
575 be unspecified. The SVID/BSD/System V documentation behavior was
576 adopted for this volume of IEEE Std 1003.1-2001 because:
577
578 * No known documentation for any historic system describes the inter‐
579 action between the p flag and the -n option.
580
581 * The selected behavior is more correct as there is no technical jus‐
582 tification for any interaction between the p flag and the -n option.
583 A relationship between -n and the p flag might imply that they are
584 only used together, but this ignores valid scripts that interrupt
585 the cyclical nature of the processing through the use of the D, d,
586 q, or branching commands. Such scripts rely on the p suffix to write
587 the pattern space because they do not make use of the default output
588 at the "bottom" of the script.
589
590 * Because the -n option makes the p flag unnecessary, any interaction
591 would only be useful if sed scripts were written to run both with
592 and without the -n option. This is believed to be unlikely. It is
593 even more unlikely that programmers have coded the p flag expecting
594 it to be unnecessary. Because the interaction was not documented,
595 the likelihood of a programmer discovering the interaction and
596 depending on it is further decreased.
597
598 * Finally, scripts that break under the specified behavior produce too
599 much output instead of too little, which is easier to diagnose and
600 correct.
601
602 The form of the substitute command that uses the n suffix was limited
603 to the first 512 matches in an early proposal. This limit has been
604 removed because there is no reason an editor processing lines of
605 {LINE_MAX} length should have this restriction. The command s/a/A/2047
606 should be able to substitute the 2047th occurrence of a on a line.
607
608 The b, t, and : commands are documented to ignore leading white space,
609 but no mention is made of trailing white space. Historical implementa‐
610 tions of sed assigned different locations to the labels 'x' and "x " .
611 This is not useful, and leads to subtle programming errors, but it is
612 historical practice, and changing it could theoretically break working
613 scripts. Implementors are encouraged to provide warning messages about
614 labels that are never used or jumps to labels that do not exist.
615
616 Historically, the sed ! and } editing commands did not permit multiple
617 commands on a single line using a semicolon as a command delimiter.
618 Implementations are permitted, but not required, to support this exten‐
619 sion.
620
622 None.
623
625 awk , ed , grep
626
628 Portions of this text are reprinted and reproduced in electronic form
629 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
630 -- Portable Operating System Interface (POSIX), The Open Group Base
631 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
632 Electrical and Electronics Engineers, Inc and The Open Group. In the
633 event of any discrepancy between this version and the original IEEE and
634 The Open Group Standard, the original IEEE and The Open Group Standard
635 is the referee document. The original Standard can be obtained online
636 at http://www.opengroup.org/unix/online.html .
637
638
639
640IEEE/The Open Group 2003 SED(P)