1PCREGREP(1) General Commands Manual PCREGREP(1)
2
3
4
6 pcregrep - a grep with Perl-compatible regular expressions.
7
9 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10
12
13 pcregrep searches files for character patterns, in the same way as
14 other grep commands do, but it uses the PCRE regular expression library
15 to support patterns that are compatible with the regular expressions of
16 Perl 5. See pcrepattern(3) for a full description of syntax and seman‐
17 tics of the regular expressions that PCRE supports.
18
19 Patterns, whether supplied on the command line or in a separate file,
20 are given without delimiters. For example:
21
22 pcregrep Thursday /etc/motd
23
24 If you attempt to use delimiters (for example, by surrounding a pattern
25 with slashes, as is common in Perl scripts), they are interpreted as
26 part of the pattern. Quotes can of course be used to delimit patterns
27 on the command line because they are interpreted by the shell, and
28 indeed they are required if a pattern contains white space or shell
29 metacharacters.
30
31 The first argument that follows any option settings is treated as the
32 single pattern to be matched when neither -e nor -f is present. Con‐
33 versely, when one or both of these options are used to specify pat‐
34 terns, all arguments are treated as path names. At least one of -e, -f,
35 or an argument pattern must be provided.
36
37 If no files are specified, pcregrep reads the standard input. The stan‐
38 dard input can also be referenced by a name consisting of a single
39 hyphen. For example:
40
41 pcregrep some-pattern /file1 - /file3
42
43 By default, each line that matches a pattern is copied to the standard
44 output, and if there is more than one file, the file name is output at
45 the start of each line, followed by a colon. However, there are options
46 that can change how pcregrep behaves. In particular, the -M option
47 makes it possible to search for patterns that span line boundaries.
48 What defines a line boundary is controlled by the -N (--newline)
49 option.
50
51 Patterns are limited to 8K or BUFSIZ characters, whichever is the
52 greater. BUFSIZ is defined in <stdio.h>. When there is more than one
53 pattern (specified by the use of -e and/or -f), each pattern is applied
54 to each line in the order in which they are defined, except that all
55 the -e patterns are tried before the -f patterns. As soon as one pat‐
56 tern matches (or fails to match when -v is used), no further patterns
57 are considered.
58
59 When --only-matching, --file-offsets, or --line-offsets is used, the
60 output is the part of the line that matched (either shown literally, or
61 as an offset). In this case, scanning resumes immediately following the
62 match, so that further matches on the same line can be found. If there
63 are multiple patterns, they are all tried on the remainder of the line.
64 However, patterns that follow the one that matched are not tried on the
65 earlier part of the line.
66
67 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
68 the value to set a locale when calling the PCRE library. The --locale
69 option can be used to override this.
70
72
73 It is possible to compile pcregrep so that it uses libz or libbz2 to
74 read files whose names end in .gz or .bz2, respectively. You can find
75 out whether your binary has support for one or both of these file types
76 by running it with the --help option. If the appropriate support is not
77 present, files are treated as plain text. The standard input is always
78 so treated.
79
81
82 -- This terminate the list of options. It is useful if the next
83 item on the command line starts with a hyphen but is not an
84 option. This allows for the processing of patterns and file‐
85 names that start with hyphens.
86
87 -A number, --after-context=number
88 Output number lines of context after each matching line. If
89 filenames and/or line numbers are being output, a hyphen sep‐
90 arator is used instead of a colon for the context lines. A
91 line containing "--" is output between each group of lines,
92 unless they are in fact contiguous in the input file. The
93 value of number is expected to be relatively small. However,
94 pcregrep guarantees to have up to 8K of following text avail‐
95 able for context output.
96
97 -B number, --before-context=number
98 Output number lines of context before each matching line. If
99 filenames and/or line numbers are being output, a hyphen sep‐
100 arator is used instead of a colon for the context lines. A
101 line containing "--" is output between each group of lines,
102 unless they are in fact contiguous in the input file. The
103 value of number is expected to be relatively small. However,
104 pcregrep guarantees to have up to 8K of preceding text avail‐
105 able for context output.
106
107 -C number, --context=number
108 Output number lines of context both before and after each
109 matching line. This is equivalent to setting both -A and -B
110 to the same value.
111
112 -c, --count
113 Do not output individual lines; instead just output a count
114 of the number of lines that would otherwise have been output.
115 If several files are given, a count is output for each of
116 them. In this mode, the -A, -B, and -C options are ignored.
117
118 --colour, --color
119 If this option is given without any data, it is equivalent to
120 "--colour=auto". If data is required, it must be given in
121 the same shell item, separated by an equals sign.
122
123 --colour=value, --color=value
124 This option specifies under what circumstances the part of a
125 line that matched a pattern should be coloured in the output.
126 The value may be "never" (the default), "always", or "auto".
127 In the latter case, colouring happens only if the standard
128 output is connected to a terminal. The colour can be speci‐
129 fied by setting the environment variable PCREGREP_COLOUR or
130 PCREGREP_COLOR. The value of this variable should be a string
131 of two numbers, separated by a semicolon. They are copied
132 directly into the control string for setting colour on a ter‐
133 minal, so it is your responsibility to ensure that they make
134 sense. If neither of the environment variables is set, the
135 default is "1;31", which gives red.
136
137 -D action, --devices=action
138 If an input path is not a regular file or a directory,
139 "action" specifies how it is to be processed. Valid values
140 are "read" (the default) or "skip" (silently skip the path).
141
142 -d action, --directories=action
143 If an input path is a directory, "action" specifies how it is
144 to be processed. Valid values are "read" (the default),
145 "recurse" (equivalent to the -r option), or "skip" (silently
146 skip the path). In the default case, directories are read as
147 if they were ordinary files. In some operating systems the
148 effect of reading a directory like this is an immediate end-
149 of-file.
150
151 -e pattern, --regex=pattern, --regexp=pattern
152 Specify a pattern to be matched. This option can be used mul‐
153 tiple times in order to specify several patterns. It can also
154 be used as a way of specifying a single pattern that starts
155 with a hyphen. When -e is used, no argument pattern is taken
156 from the command line; all arguments are treated as file
157 names. There is an overall maximum of 100 patterns. They are
158 applied to each line in the order in which they are defined
159 until one matches (or fails to match if -v is used). If -f is
160 used with -e, the command line patterns are matched first,
161 followed by the patterns from the file, independent of the
162 order in which these options are specified. Note that multi‐
163 ple use of -e is not the same as a single pattern with alter‐
164 natives. For example, X|Y finds the first character in a line
165 that is X or Y, whereas if the two patterns are given sepa‐
166 rately, pcregrep finds X if it is present, even if it follows
167 Y in the line. It finds Y only if there is no X in the line.
168 This really matters only if you are using -o to show the
169 part(s) of the line that matched.
170
171 --exclude=pattern
172 When pcregrep is searching the files in a directory as a con‐
173 sequence of the -r (recursive search) option, any regular
174 files whose names match the pattern are excluded. Subdirecto‐
175 ries are not excluded by this option; they are searched
176 recursively, subject to the --exclude_dir and --include_dir
177 options. The pattern is a PCRE regular expression, and is
178 matched against the final component of the file name (not the
179 entire path). If a file name matches both --include and
180 --exclude, it is excluded. There is no short form for this
181 option.
182
183 --exclude_dir=pattern
184 When pcregrep is searching the contents of a directory as a
185 consequence of the -r (recursive search) option, any subdi‐
186 rectories whose names match the pattern are excluded. (Note
187 that the --exclude option does not affect subdirectories.)
188 The pattern is a PCRE regular expression, and is matched
189 against the final component of the name (not the entire
190 path). If a subdirectory name matches both --include_dir and
191 --exclude_dir, it is excluded. There is no short form for
192 this option.
193
194 -F, --fixed-strings
195 Interpret each pattern as a list of fixed strings, separated
196 by newlines, instead of as a regular expression. The -w
197 (match as a word) and -x (match whole line) options can be
198 used with -F. They apply to each of the fixed strings. A line
199 is selected if any of the fixed strings are found in it (sub‐
200 ject to -w or -x, if present).
201
202 -f filename, --file=filename
203 Read a number of patterns from the file, one per line, and
204 match them against each line of input. A data line is output
205 if any of the patterns match it. The filename can be given as
206 "-" to refer to the standard input. When -f is used, patterns
207 specified on the command line using -e may also be present;
208 they are tested before the file's patterns. However, no other
209 pattern is taken from the command line; all arguments are
210 treated as file names. There is an overall maximum of 100
211 patterns. Trailing white space is removed from each line, and
212 blank lines are ignored. An empty file contains no patterns
213 and therefore matches nothing. See also the comments about
214 multiple patterns versus a single pattern with alternatives
215 in the description of -e above.
216
217 --file-offsets
218 Instead of showing lines or parts of lines that match, show
219 each match as an offset from the start of the file and a
220 length, separated by a comma. In this mode, no context is
221 shown. That is, the -A, -B, and -C options are ignored. If
222 there is more than one match in a line, each of them is shown
223 separately. This option is mutually exclusive with --line-
224 offsets and --only-matching.
225
226 -H, --with-filename
227 Force the inclusion of the filename at the start of output
228 lines when searching a single file. By default, the filename
229 is not shown in this case. For matching lines, the filename
230 is followed by a colon and a space; for context lines, a
231 hyphen separator is used. If a line number is also being out‐
232 put, it follows the file name without a space.
233
234 -h, --no-filename
235 Suppress the output filenames when searching multiple files.
236 By default, filenames are shown when multiple files are
237 searched. For matching lines, the filename is followed by a
238 colon and a space; for context lines, a hyphen separator is
239 used. If a line number is also being output, it follows the
240 file name without a space.
241
242 --help Output a help message, giving brief details of the command
243 options and file type support, and then exit.
244
245 -i, --ignore-case
246 Ignore upper/lower case distinctions during comparisons.
247
248 --include=pattern
249 When pcregrep is searching the files in a directory as a con‐
250 sequence of the -r (recursive search) option, only those reg‐
251 ular files whose names match the pattern are included. Subdi‐
252 rectories are always included and searched recursively, sub‐
253 ject to the --include_dir and --exclude_dir options. The pat‐
254 tern is a PCRE regular expression, and is matched against the
255 final component of the file name (not the entire path). If a
256 file name matches both --include and --exclude, it is
257 excluded. There is no short form for this option.
258
259 --include_dir=pattern
260 When pcregrep is searching the contents of a directory as a
261 consequence of the -r (recursive search) option, only those
262 subdirectories whose names match the pattern are included.
263 (Note that the --include option does not affect subdirecto‐
264 ries.) The pattern is a PCRE regular expression, and is
265 matched against the final component of the name (not the
266 entire path). If a subdirectory name matches both
267 --include_dir and --exclude_dir, it is excluded. There is no
268 short form for this option.
269
270 -L, --files-without-match
271 Instead of outputting lines from the files, just output the
272 names of the files that do not contain any lines that would
273 have been output. Each file name is output once, on a sepa‐
274 rate line.
275
276 -l, --files-with-matches
277 Instead of outputting lines from the files, just output the
278 names of the files containing lines that would have been out‐
279 put. Each file name is output once, on a separate line.
280 Searching stops as soon as a matching line is found in a
281 file.
282
283 --label=name
284 This option supplies a name to be used for the standard input
285 when file names are being output. If not supplied, "(standard
286 input)" is used. There is no short form for this option.
287
288 --line-offsets
289 Instead of showing lines or parts of lines that match, show
290 each match as a line number, the offset from the start of the
291 line, and a length. The line number is terminated by a colon
292 (as usual; see the -n option), and the offset and length are
293 separated by a comma. In this mode, no context is shown.
294 That is, the -A, -B, and -C options are ignored. If there is
295 more than one match in a line, each of them is shown sepa‐
296 rately. This option is mutually exclusive with --file-offsets
297 and --only-matching.
298
299 --locale=locale-name
300 This option specifies a locale to be used for pattern match‐
301 ing. It overrides the value in the LC_ALL or LC_CTYPE envi‐
302 ronment variables. If no locale is specified, the PCRE
303 library's default (usually the "C" locale) is used. There is
304 no short form for this option.
305
306 -M, --multiline
307 Allow patterns to match more than one line. When this option
308 is given, patterns may usefully contain literal newline char‐
309 acters and internal occurrences of ^ and $ characters. The
310 output for any one match may consist of more than one line.
311 When this option is set, the PCRE library is called in "mul‐
312 tiline" mode. There is a limit to the number of lines that
313 can be matched, imposed by the way that pcregrep buffers the
314 input file as it scans it. However, pcregrep ensures that at
315 least 8K characters or the rest of the document (whichever is
316 the shorter) are available for forward matching, and simi‐
317 larly the previous 8K characters (or all the previous charac‐
318 ters, if fewer than 8K) are guaranteed to be available for
319 lookbehind assertions.
320
321 -N newline-type, --newline=newline-type
322 The PCRE library supports five different conventions for
323 indicating the ends of lines. They are the single-character
324 sequences CR (carriage return) and LF (linefeed), the two-
325 character sequence CRLF, an "anycrlf" convention, which rec‐
326 ognizes any of the preceding three types, and an "any" con‐
327 vention, in which any Unicode line ending sequence is assumed
328 to end a line. The Unicode sequences are the three just men‐
329 tioned, plus VT (vertical tab, U+000B), FF (form feed,
330 U+000C), NEL (next line, U+0085), LS (line separator,
331 U+2028), and PS (paragraph separator, U+2029).
332
333 When the PCRE library is built, a default line-ending
334 sequence is specified. This is normally the standard
335 sequence for the operating system. Unless otherwise specified
336 by this option, pcregrep uses the library's default. The
337 possible values for this option are CR, LF, CRLF, ANYCRLF, or
338 ANY. This makes it possible to use pcregrep on files that
339 have come from other environments without having to modify
340 their line endings. If the data that is being scanned does
341 not agree with the convention set by this option, pcregrep
342 may behave in strange ways.
343
344 -n, --line-number
345 Precede each output line by its line number in the file, fol‐
346 lowed by a colon and a space for matching lines or a hyphen
347 and a space for context lines. If the filename is also being
348 output, it precedes the line number. This option is forced if
349 --line-offsets is used.
350
351 -o, --only-matching
352 Show only the part of the line that matched a pattern. In
353 this mode, no context is shown. That is, the -A, -B, and -C
354 options are ignored. If there is more than one match in a
355 line, each of them is shown separately. If -o is combined
356 with -v (invert the sense of the match to find non-matching
357 lines), no output is generated, but the return code is set
358 appropriately. This option is mutually exclusive with --file-
359 offsets and --line-offsets.
360
361 -q, --quiet
362 Work quietly, that is, display nothing except error messages.
363 The exit status indicates whether or not any matches were
364 found.
365
366 -r, --recursive
367 If any given path is a directory, recursively scan the files
368 it contains, taking note of any --include and --exclude set‐
369 tings. By default, a directory is read as a normal file; in
370 some operating systems this gives an immediate end-of-file.
371 This option is a shorthand for setting the -d option to
372 "recurse".
373
374 -s, --no-messages
375 Suppress error messages about non-existent or unreadable
376 files. Such files are quietly skipped. However, the return
377 code is still 2, even if matches were found in other files.
378
379 -u, --utf-8
380 Operate in UTF-8 mode. This option is available only if PCRE
381 has been compiled with UTF-8 support. Both patterns and sub‐
382 ject lines must be valid strings of UTF-8 characters.
383
384 -V, --version
385 Write the version numbers of pcregrep and the PCRE library
386 that is being used to the standard error stream.
387
388 -v, --invert-match
389 Invert the sense of the match, so that lines which do not
390 match any of the patterns are the ones that are found.
391
392 -w, --word-regex, --word-regexp
393 Force the patterns to match only whole words. This is equiva‐
394 lent to having \b at the start and end of the pattern.
395
396 -x, --line-regex, --line-regexp
397 Force the patterns to be anchored (each must start matching
398 at the beginning of a line) and in addition, require them to
399 match entire lines. This is equivalent to having ^ and $
400 characters at the start and end of each alternative branch in
401 every pattern.
402
404
405 The environment variables LC_ALL and LC_CTYPE are examined, in that
406 order, for a locale. The first one that is set is used. This can be
407 overridden by the --locale option. If no locale is set, the PCRE
408 library's default (usually the "C" locale) is used.
409
411
412 The -N (--newline) option allows pcregrep to scan files with different
413 newline conventions from the default. However, the setting of this
414 option does not affect the way in which pcregrep writes information to
415 the standard error and output streams. It uses the string "\n" in C
416 printf() calls to indicate newlines, relying on the C I/O library to
417 convert this to an appropriate sequence if the output is sent to a
418 file.
419
421
422 The majority of short and long forms of pcregrep's options are the same
423 as in the GNU grep program. Any long option of the form --xxx-regexp
424 (GNU terminology) is also available as --xxx-regex (PCRE terminology).
425 However, the --locale, -M, --multiline, -u, and --utf-8 options are
426 specific to pcregrep.
427
429
430 There are four different ways in which an option with data can be spec‐
431 ified. If a short form option is used, the data may follow immedi‐
432 ately, or in the next command line item. For example:
433
434 -f/some/file
435 -f /some/file
436
437 If a long form option is used, the data may appear in the same command
438 line item, separated by an equals character, or (with one exception) it
439 may appear in the next command line item. For example:
440
441 --file=/some/file
442 --file /some/file
443
444 Note, however, that if you want to supply a file name beginning with ~
445 as data in a shell command, and have the shell expand ~ to a home
446 directory, you must separate the file name from the option, because the
447 shell does not treat ~ specially unless it is at the start of an item.
448
449 The exception to the above is the --colour (or --color) option, for
450 which the data is optional. If this option does have data, it must be
451 given in the first form, using an equals character. Otherwise it will
452 be assumed that it has no data.
453
455
456 It is possible to supply a regular expression that takes a very long
457 time to fail to match certain lines. Such patterns normally involve
458 nested indefinite repeats, for example: (a+)*\d when matched against a
459 line of a's with no final digit. The PCRE matching function has a
460 resource limit that causes it to abort in these circumstances. If this
461 happens, pcregrep outputs an error message and the line that caused the
462 problem to the standard error stream. If there are more than 20 such
463 errors, pcregrep gives up.
464
466
467 Exit status is 0 if any matches were found, 1 if no matches were found,
468 and 2 for syntax errors and non-existent or inaccessible files (even if
469 matches were found in other files) or too many matching errors. Using
470 the -s option to suppress error messages about inaccessible files does
471 not affect the return code.
472
474
475 pcrepattern(3), pcretest(1).
476
478
479 Philip Hazel
480 University Computing Service
481 Cambridge CB2 3QH, England.
482
484
485 Last updated: 08 March 2008
486 Copyright (c) 1997-2008 University of Cambridge.
487
488
489
490 PCREGREP(1)