1PCRE2GREP(1) General Commands Manual PCRE2GREP(1)
2
3
4
6 pcre2grep - a grep with Perl-compatible regular expressions.
7
9 pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10
12
13 pcre2grep searches files for character patterns, in the same way as
14 other grep commands do, but it uses the PCRE2 regular expression
15 library to support patterns that are compatible with the regular
16 expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
17 of pattern syntax, or pcre2pattern(3) for a full description of the
18 syntax and semantics of the regular expressions that PCRE2 supports.
19
20 Patterns, whether supplied on the command line or in a separate file,
21 are given without delimiters. For example:
22
23 pcre2grep Thursday /etc/motd
24
25 If you attempt to use delimiters (for example, by surrounding a pattern
26 with slashes, as is common in Perl scripts), they are interpreted as
27 part of the pattern. Quotes can of course be used to delimit patterns
28 on the command line because they are interpreted by the shell, and
29 indeed quotes are required if a pattern contains white space or shell
30 metacharacters.
31
32 The first argument that follows any option settings is treated as the
33 single pattern to be matched when neither -e nor -f is present. Con‐
34 versely, when one or both of these options are used to specify pat‐
35 terns, all arguments are treated as path names. At least one of -e, -f,
36 or an argument pattern must be provided.
37
38 If no files are specified, pcre2grep reads the standard input. The
39 standard input can also be referenced by a name consisting of a single
40 hyphen. For example:
41
42 pcre2grep some-pattern file1 - file3
43
44 Input files are searched line by line. By default, each line that
45 matches a pattern is copied to the standard output, and if there is
46 more than one file, the file name is output at the start of each line,
47 followed by a colon. However, there are options that can change how
48 pcre2grep behaves. In particular, the -M option makes it possible to
49 search for strings that span line boundaries. What defines a line
50 boundary is controlled by the -N (--newline) option.
51
52 The amount of memory used for buffering files that are being scanned is
53 controlled by parameters that can be set by the --buffer-size and
54 --max-buffer-size options. The first of these sets the size of buffer
55 that is obtained at the start of processing. If an input file contains
56 very long lines, a larger buffer may be needed; this is handled by
57 automatically extending the buffer, up to the limit specified by --max-
58 buffer-size. The default values for these parameters are specified when
59 pcre2grep is built, with the default defaults being 20K and 1M respec‐
60 tively. An error occurs if a line is too long and the buffer can no
61 longer be expanded.
62
63 The block of memory that is actually used is three times the "buffer
64 size", to allow for buffering "before" and "after" lines. If the buffer
65 size is too small, fewer than requested "before" and "after" lines may
66 be output.
67
68 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
69 greater. BUFSIZ is defined in <stdio.h>. When there is more than one
70 pattern (specified by the use of -e and/or -f), each pattern is applied
71 to each line in the order in which they are defined, except that all
72 the -e patterns are tried before the -f patterns.
73
74 By default, as soon as one pattern matches a line, no further patterns
75 are considered. However, if --colour (or --color) is used to colour the
76 matching substrings, or if --only-matching, --file-offsets, or --line-
77 offsets is used to output only the part of the line that matched
78 (either shown literally, or as an offset), scanning resumes immediately
79 following the match, so that further matches on the same line can be
80 found. If there are multiple patterns, they are all tried on the
81 remainder of the line, but patterns that follow the one that matched
82 are not tried on the earlier part of the line.
83
84 This behaviour means that the order in which multiple patterns are
85 specified can affect the output when one of the above options is used.
86 This is no longer the same behaviour as GNU grep, which now manages to
87 display earlier matches for later patterns (as long as there is no
88 overlap).
89
90 Patterns that can match an empty string are accepted, but empty string
91 matches are never recognized. An example is the pattern
92 "(super)?(man)?", in which all components are optional. This pattern
93 finds all occurrences of both "super" and "man"; the output differs
94 from matching with "super|man" when only the matching substrings are
95 being shown.
96
97 If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
98 the value to set a locale when calling the PCRE2 library. The --locale
99 option can be used to override this.
100
102
103 It is possible to compile pcre2grep so that it uses libz or libbz2 to
104 read files whose names end in .gz or .bz2, respectively. You can find
105 out whether your binary has support for one or both of these file types
106 by running it with the --help option. If the appropriate support is not
107 present, files are treated as plain text. The standard input is always
108 so treated.
109
111
112 By default, a file that contains a binary zero byte within the first
113 1024 bytes is identified as a binary file, and is processed specially.
114 (GNU grep also identifies binary files in this manner.) See the
115 --binary-files option for a means of changing the way binary files are
116 handled.
117
119
120 The order in which some of the options appear can affect the output.
121 For example, both the -h and -l options affect the printing of file
122 names. Whichever comes later in the command line will be the one that
123 takes effect. Similarly, except where noted below, if an option is
124 given twice, the later setting is used. Numerical values for options
125 may be followed by K or M, to signify multiplication by 1024 or
126 1024*1024 respectively.
127
128 -- This terminates the list of options. It is useful if the next
129 item on the command line starts with a hyphen but is not an
130 option. This allows for the processing of patterns and file
131 names that start with hyphens.
132
133 -A number, --after-context=number
134 Output up to number lines of context after each matching
135 line. Fewer lines are output if the next match or the end of
136 the file is reached, or if the processing buffer size has
137 been set too small. If file names and/or line numbers are
138 being output, a hyphen separator is used instead of a colon
139 for the context lines. A line containing "--" is output
140 between each group of lines, unless they are in fact contigu‐
141 ous in the input file. The value of number is expected to be
142 relatively small. When -c is used, -A is ignored.
143
144 -a, --text
145 Treat binary files as text. This is equivalent to --binary-
146 files=text.
147
148 -B number, --before-context=number
149 Output up to number lines of context before each matching
150 line. Fewer lines are output if the previous match or the
151 start of the file is within number lines, or if the process‐
152 ing buffer size has been set too small. If file names and/or
153 line numbers are being output, a hyphen separator is used
154 instead of a colon for the context lines. A line containing
155 "--" is output between each group of lines, unless they are
156 in fact contiguous in the input file. The value of number is
157 expected to be relatively small. When -c is used, -B is
158 ignored.
159
160 --binary-files=word
161 Specify how binary files are to be processed. If the word is
162 "binary" (the default), pattern matching is performed on
163 binary files, but the only output is "Binary file <name>
164 matches" when a match succeeds. If the word is "text", which
165 is equivalent to the -a or --text option, binary files are
166 processed in the same way as any other file. In this case,
167 when a match succeeds, the output may be binary garbage,
168 which can have nasty effects if sent to a terminal. If the
169 word is "without-match", which is equivalent to the -I
170 option, binary files are not processed at all; they are
171 assumed not to be of interest and are skipped without causing
172 any output or affecting the return code.
173
174 --buffer-size=number
175 Set the parameter that controls how much memory is obtained
176 at the start of processing for buffering files that are being
177 scanned. See also --max-buffer-size below.
178
179 -C number, --context=number
180 Output number lines of context both before and after each
181 matching line. This is equivalent to setting both -A and -B
182 to the same value.
183
184 -c, --count
185 Do not output lines from the files that are being scanned;
186 instead output the number of lines that would have been
187 shown, either because they matched, or, if -v is set, because
188 they failed to match. By default, this count is exactly the
189 same as the number of lines that would have been output, but
190 if the -M (multiline) option is used (without -v), there may
191 be more suppressed lines than the count (that is, the number
192 of matches).
193
194 If no lines are selected, the number zero is output. If sev‐
195 eral files are are being scanned, a count is output for each
196 of them and the -t option can be used to cause a total to be
197 output at the end. However, if the --files-with-matches
198 option is also used, only those files whose counts are
199 greater than zero are listed. When -c is used, the -A, -B,
200 and -C options are ignored.
201
202 --colour, --color
203 If this option is given without any data, it is equivalent to
204 "--colour=auto". If data is required, it must be given in
205 the same shell item, separated by an equals sign.
206
207 --colour=value, --color=value
208 This option specifies under what circumstances the parts of a
209 line that matched a pattern should be coloured in the output.
210 By default, the output is not coloured. The value (which is
211 optional, see above) may be "never", "always", or "auto". In
212 the latter case, colouring happens only if the standard out‐
213 put is connected to a terminal. More resources are used when
214 colouring is enabled, because pcre2grep has to search for all
215 possible matches in a line, not just one, in order to colour
216 them all.
217
218 The colour that is used can be specified by setting one of
219 the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
220 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
221 order. If none of these are set, pcre2grep looks for
222 GREP_COLORS or GREP_COLOR (in that order). The value of the
223 variable should be a string of two numbers, separated by a
224 semicolon, except in the case of GREP_COLORS, which must
225 start with "ms=" or "mt=" followed by two semicolon-separated
226 colours, terminated by the end of the string or by a colon.
227 If GREP_COLORS does not start with "ms=" or "mt=" it is
228 ignored, and GREP_COLOR is checked.
229
230 If the string obtained from one of the above variables con‐
231 tains any characters other than semicolon or digits, the set‐
232 ting is ignored and the default colour is used. The string is
233 copied directly into the control string for setting colour on
234 a terminal, so it is your responsibility to ensure that the
235 values make sense. If no relevant environment variable is
236 set, the default is "1;31", which gives red.
237
238 -D action, --devices=action
239 If an input path is not a regular file or a directory,
240 "action" specifies how it is to be processed. Valid values
241 are "read" (the default) or "skip" (silently skip the path).
242
243 -d action, --directories=action
244 If an input path is a directory, "action" specifies how it is
245 to be processed. Valid values are "read" (the default in
246 non-Windows environments, for compatibility with GNU grep),
247 "recurse" (equivalent to the -r option), or "skip" (silently
248 skip the path, the default in Windows environments). In the
249 "read" case, directories are read as if they were ordinary
250 files. In some operating systems the effect of reading a
251 directory like this is an immediate end-of-file; in others it
252 may provoke an error.
253
254 -e pattern, --regex=pattern, --regexp=pattern
255 Specify a pattern to be matched. This option can be used mul‐
256 tiple times in order to specify several patterns. It can also
257 be used as a way of specifying a single pattern that starts
258 with a hyphen. When -e is used, no argument pattern is taken
259 from the command line; all arguments are treated as file
260 names. There is no limit to the number of patterns. They are
261 applied to each line in the order in which they are defined
262 until one matches.
263
264 If -f is used with -e, the command line patterns are matched
265 first, followed by the patterns from the file(s), independent
266 of the order in which these options are specified. Note that
267 multiple use of -e is not the same as a single pattern with
268 alternatives. For example, X|Y finds the first character in a
269 line that is X or Y, whereas if the two patterns are given
270 separately, with X first, pcre2grep finds X if it is present,
271 even if it follows Y in the line. It finds Y only if there is
272 no X in the line. This matters only if you are using -o or
273 --colo(u)r to show the part(s) of the line that matched.
274
275 --exclude=pattern
276 Files (but not directories) whose names match the pattern are
277 skipped without being processed. This applies to all files,
278 whether listed on the command line, obtained from --file-
279 list, or by scanning a directory. The pattern is a PCRE2 reg‐
280 ular expression, and is matched against the final component
281 of the file name, not the entire path. The -F, -w, and -x
282 options do not apply to this pattern. The option may be given
283 any number of times in order to specify multiple patterns. If
284 a file name matches both an --include and an --exclude pat‐
285 tern, it is excluded. There is no short form for this option.
286
287 --exclude-from=filename
288 Treat each non-empty line of the file as the data for an
289 --exclude option. What constitutes a newline when reading the
290 file is the operating system's default. The --newline option
291 has no effect on this option. This option may be given more
292 than once in order to specify a number of files to read.
293
294 --exclude-dir=pattern
295 Directories whose names match the pattern are skipped without
296 being processed, whatever the setting of the --recursive
297 option. This applies to all directories, whether listed on
298 the command line, obtained from --file-list, or by scanning a
299 parent directory. The pattern is a PCRE2 regular expression,
300 and is matched against the final component of the directory
301 name, not the entire path. The -F, -w, and -x options do not
302 apply to this pattern. The option may be given any number of
303 times in order to specify more than one pattern. If a direc‐
304 tory matches both --include-dir and --exclude-dir, it is
305 excluded. There is no short form for this option.
306
307 -F, --fixed-strings
308 Interpret each data-matching pattern as a list of fixed
309 strings, separated by newlines, instead of as a regular
310 expression. What constitutes a newline for this purpose is
311 controlled by the --newline option. The -w (match as a word)
312 and -x (match whole line) options can be used with -F. They
313 apply to each of the fixed strings. A line is selected if any
314 of the fixed strings are found in it (subject to -w or -x, if
315 present). This option applies only to the patterns that are
316 matched against the contents of files; it does not apply to
317 patterns specified by any of the --include or --exclude
318 options.
319
320 -f filename, --file=filename
321 Read patterns from the file, one per line, and match them
322 against each line of input. What constitutes a newline when
323 reading the file is the operating system's default. The
324 --newline option has no effect on this option. Trailing
325 white space is removed from each line, and blank lines are
326 ignored. An empty file contains no patterns and therefore
327 matches nothing. See also the comments about multiple pat‐
328 terns versus a single pattern with alternatives in the
329 description of -e above.
330
331 If this option is given more than once, all the specified
332 files are read. A data line is output if any of the patterns
333 match it. A file name can be given as "-" to refer to the
334 standard input. When -f is used, patterns specified on the
335 command line using -e may also be present; they are tested
336 before the file's patterns. However, no other pattern is
337 taken from the command line; all arguments are treated as the
338 names of paths to be searched.
339
340 --file-list=filename
341 Read a list of files and/or directories that are to be
342 scanned from the given file, one per line. Trailing white
343 space is removed from each line, and blank lines are ignored.
344 These paths are processed before any that are listed on the
345 command line. The file name can be given as "-" to refer to
346 the standard input. If --file and --file-list are both spec‐
347 ified as "-", patterns are read first. This is useful only
348 when the standard input is a terminal, from which further
349 lines (the list of files) can be read after an end-of-file
350 indication. If this option is given more than once, all the
351 specified files are read.
352
353 --file-offsets
354 Instead of showing lines or parts of lines that match, show
355 each match as an offset from the start of the file and a
356 length, separated by a comma. In this mode, no context is
357 shown. That is, the -A, -B, and -C options are ignored. If
358 there is more than one match in a line, each of them is shown
359 separately. This option is mutually exclusive with --line-
360 offsets and --only-matching.
361
362 -H, --with-filename
363 Force the inclusion of the file name at the start of output
364 lines when searching a single file. By default, the file name
365 is not shown in this case. For matching lines, the file name
366 is followed by a colon; for context lines, a hyphen separator
367 is used. If a line number is also being output, it follows
368 the file name. When the -M option causes a pattern to match
369 more than one line, only the first is preceded by the file
370 name.
371
372 -h, --no-filename
373 Suppress the output file names when searching multiple files.
374 By default, file names are shown when multiple files are
375 searched. For matching lines, the file name is followed by a
376 colon; for context lines, a hyphen separator is used. If a
377 line number is also being output, it follows the file name.
378
379 --help Output a help message, giving brief details of the command
380 options and file type support, and then exit. Anything else
381 on the command line is ignored.
382
383 -I Ignore binary files. This is equivalent to --binary-
384 files=without-match.
385
386 -i, --ignore-case
387 Ignore upper/lower case distinctions during comparisons.
388
389 --include=pattern
390 If any --include patterns are specified, the only files that
391 are processed are those that match one of the patterns (and
392 do not match an --exclude pattern). This option does not
393 affect directories, but it applies to all files, whether
394 listed on the command line, obtained from --file-list, or by
395 scanning a directory. The pattern is a PCRE2 regular expres‐
396 sion, and is matched against the final component of the file
397 name, not the entire path. The -F, -w, and -x options do not
398 apply to this pattern. The option may be given any number of
399 times. If a file name matches both an --include and an
400 --exclude pattern, it is excluded. There is no short form
401 for this option.
402
403 --include-from=filename
404 Treat each non-empty line of the file as the data for an
405 --include option. What constitutes a newline for this purpose
406 is the operating system's default. The --newline option has
407 no effect on this option. This option may be given any number
408 of times; all the files are read.
409
410 --include-dir=pattern
411 If any --include-dir patterns are specified, the only direc‐
412 tories that are processed are those that match one of the
413 patterns (and do not match an --exclude-dir pattern). This
414 applies to all directories, whether listed on the command
415 line, obtained from --file-list, or by scanning a parent
416 directory. The pattern is a PCRE2 regular expression, and is
417 matched against the final component of the directory name,
418 not the entire path. The -F, -w, and -x options do not apply
419 to this pattern. The option may be given any number of times.
420 If a directory matches both --include-dir and --exclude-dir,
421 it is excluded. There is no short form for this option.
422
423 -L, --files-without-match
424 Instead of outputting lines from the files, just output the
425 names of the files that do not contain any lines that would
426 have been output. Each file name is output once, on a sepa‐
427 rate line.
428
429 -l, --files-with-matches
430 Instead of outputting lines from the files, just output the
431 names of the files containing lines that would have been out‐
432 put. Each file name is output once, on a separate line.
433 Searching normally stops as soon as a matching line is found
434 in a file. However, if the -c (count) option is also used,
435 matching continues in order to obtain the correct count, and
436 those files that have at least one match are listed along
437 with their counts. Using this option with -c is a way of sup‐
438 pressing the listing of files with no matches.
439
440 --label=name
441 This option supplies a name to be used for the standard input
442 when file names are being output. If not supplied, "(standard
443 input)" is used. There is no short form for this option.
444
445 --line-buffered
446 When this option is given, input is read and processed line
447 by line, and the output is flushed after each write. By
448 default, input is read in large chunks, unless pcre2grep can
449 determine that it is reading from a terminal (which is cur‐
450 rently possible only in Unix-like environments). Output to
451 terminal is normally automatically flushed by the operating
452 system. This option can be useful when the input or output is
453 attached to a pipe and you do not want pcre2grep to buffer up
454 large amounts of data. However, its use will affect perfor‐
455 mance, and the -M (multiline) option ceases to work.
456
457 --line-offsets
458 Instead of showing lines or parts of lines that match, show
459 each match as a line number, the offset from the start of the
460 line, and a length. The line number is terminated by a colon
461 (as usual; see the -n option), and the offset and length are
462 separated by a comma. In this mode, no context is shown.
463 That is, the -A, -B, and -C options are ignored. If there is
464 more than one match in a line, each of them is shown sepa‐
465 rately. This option is mutually exclusive with --file-offsets
466 and --only-matching.
467
468 --locale=locale-name
469 This option specifies a locale to be used for pattern match‐
470 ing. It overrides the value in the LC_ALL or LC_CTYPE envi‐
471 ronment variables. If no locale is specified, the PCRE2
472 library's default (usually the "C" locale) is used. There is
473 no short form for this option.
474
475 --match-limit=number
476 Processing some regular expression patterns can require a
477 very large amount of memory, leading in some cases to a pro‐
478 gram crash if not enough is available. Other patterns may
479 take a very long time to search for all possible matching
480 strings. The pcre2_match() function that is called by
481 pcre2grep to do the matching has two parameters that can
482 limit the resources that it uses.
483
484 The --match-limit option provides a means of limiting
485 resource usage when processing patterns that are not going to
486 match, but which have a very large number of possibilities in
487 their search trees. The classic example is a pattern that
488 uses nested unlimited repeats. Internally, PCRE2 uses a func‐
489 tion called match() which it calls repeatedly (sometimes
490 recursively). The limit set by --match-limit is imposed on
491 the number of times this function is called during a match,
492 which has the effect of limiting the amount of backtracking
493 that can take place.
494
495 The --recursion-limit option is similar to --match-limit, but
496 instead of limiting the total number of times that match() is
497 called, it limits the depth of recursive calls, which in turn
498 limits the amount of memory that can be used. The recursion
499 depth is a smaller number than the total number of calls,
500 because not all calls to match() are recursive. This limit is
501 of use only if it is set smaller than --match-limit.
502
503 There are no short forms for these options. The default set‐
504 tings are specified when the PCRE2 library is compiled, with
505 the default default being 10 million.
506
507 --max-buffer-size=number
508 This limits the expansion of the processing buffer, whose
509 initial size can be set by --buffer-size. The maximum buffer
510 size is silently forced to be no smaller than the starting
511 buffer size.
512
513 -M, --multiline
514 Allow patterns to match more than one line. When this option
515 is set, the PCRE2 library is called in "multiline" mode. This
516 allows a matched string to extend past the end of a line and
517 continue on one or more subsequent lines. Patterns used with
518 -M may usefully contain literal newline characters and inter‐
519 nal occurrences of ^ and $ characters. The output for a suc‐
520 cessful match may consist of more than one line. The first
521 line is the line in which the match started, and the last
522 line is the line in which the match ended. If the matched
523 string ends with a newline sequence, the output ends at the
524 end of that line. If -v is set, none of the lines in a
525 multi-line match are output. Once a match has been handled,
526 scanning restarts at the beginning of the line after the one
527 in which the match ended.
528
529 The newline sequence that separates multiple lines must be
530 matched as part of the pattern. For example, to find the
531 phrase "regular expression" in a file where "regular" might
532 be at the end of a line and "expression" at the start of the
533 next line, you could use this command:
534
535 pcre2grep -M 'regular\s+expression' <file>
536
537 The \s escape sequence matches any white space character,
538 including newlines, and is followed by + so as to match
539 trailing white space on the first line as well as possibly
540 handling a two-character newline sequence.
541
542 There is a limit to the number of lines that can be matched,
543 imposed by the way that pcre2grep buffers the input file as
544 it scans it. With a sufficiently large processing buffer,
545 this should not be a problem, but the -M option does not work
546 when input is read line by line (see --line-buffered.)
547
548 -N newline-type, --newline=newline-type
549 The PCRE2 library supports five different conventions for
550 indicating the ends of lines. They are the single-character
551 sequences CR (carriage return) and LF (linefeed), the two-
552 character sequence CRLF, an "anycrlf" convention, which rec‐
553 ognizes any of the preceding three types, and an "any" con‐
554 vention, in which any Unicode line ending sequence is assumed
555 to end a line. The Unicode sequences are the three just men‐
556 tioned, plus VT (vertical tab, U+000B), FF (form feed,
557 U+000C), NEL (next line, U+0085), LS (line separator,
558 U+2028), and PS (paragraph separator, U+2029).
559
560 When the PCRE2 library is built, a default line-ending
561 sequence is specified. This is normally the standard
562 sequence for the operating system. Unless otherwise specified
563 by this option, pcre2grep uses the library's default. The
564 possible values for this option are CR, LF, CRLF, ANYCRLF, or
565 ANY. This makes it possible to use pcre2grep to scan files
566 that have come from other environments without having to mod‐
567 ify their line endings. If the data that is being scanned
568 does not agree with the convention set by this option,
569 pcre2grep may behave in strange ways. Note that this option
570 does not apply to files specified by the -f, --exclude-from,
571 or --include-from options, which are expected to use the
572 operating system's standard newline sequence.
573
574 -n, --line-number
575 Precede each output line by its line number in the file, fol‐
576 lowed by a colon for matching lines or a hyphen for context
577 lines. If the file name is also being output, it precedes the
578 line number. When the -M option causes a pattern to match
579 more than one line, only the first is preceded by its line
580 number. This option is forced if --line-offsets is used.
581
582 --no-jit If the PCRE2 library is built with support for just-in-time
583 compiling (which speeds up matching), pcre2grep automatically
584 makes use of this, unless it was explicitly disabled at build
585 time. This option can be used to disable the use of JIT at
586 run time. It is provided for testing and working round prob‐
587 lems. It should never be needed in normal use.
588
589 -o, --only-matching
590 Show only the part of the line that matched a pattern instead
591 of the whole line. In this mode, no context is shown. That
592 is, the -A, -B, and -C options are ignored. If there is more
593 than one match in a line, each of them is shown separately,
594 on a separate line of output. If -o is combined with -v
595 (invert the sense of the match to find non-matching lines),
596 no output is generated, but the return code is set appropri‐
597 ately. If the matched portion of the line is empty, nothing
598 is output unless the file name or line number are being
599 printed, in which case they are shown on an otherwise empty
600 line. This option is mutually exclusive with --file-offsets
601 and --line-offsets.
602
603 -onumber, --only-matching=number
604 Show only the part of the line that matched the capturing
605 parentheses of the given number. Up to 32 capturing parenthe‐
606 ses are supported, and -o0 is equivalent to -o without a num‐
607 ber. Because these options can be given without an argument
608 (see above), if an argument is present, it must be given in
609 the same shell item, for example, -o3 or --only-matching=2.
610 The comments given for the non-argument case above also apply
611 to this case. If the specified capturing parentheses do not
612 exist in the pattern, or were not set in the match, nothing
613 is output unless the file name or line number are being out‐
614 put.
615
616 If this option is given multiple times, multiple substrings
617 are output for each match, in the order the options are
618 given, and all on one line. For example, -o3 -o1 -o3 causes
619 the substrings matched by capturing parentheses 3 and 1 and
620 then 3 again to be output. By default, there is no separator
621 (but see the next option).
622
623 --om-separator=text
624 Specify a separating string for multiple occurrences of -o.
625 The default is an empty string. Separating strings are never
626 coloured.
627
628 -q, --quiet
629 Work quietly, that is, display nothing except error messages.
630 The exit status indicates whether or not any matches were
631 found.
632
633 -r, --recursive
634 If any given path is a directory, recursively scan the files
635 it contains, taking note of any --include and --exclude set‐
636 tings. By default, a directory is read as a normal file; in
637 some operating systems this gives an immediate end-of-file.
638 This option is a shorthand for setting the -d option to
639 "recurse".
640
641 --recursion-limit=number
642 See --match-limit above.
643
644 -s, --no-messages
645 Suppress error messages about non-existent or unreadable
646 files. Such files are quietly skipped. However, the return
647 code is still 2, even if matches were found in other files.
648
649 -t, --total-count
650 This option is useful when scanning more than one file. If
651 used on its own, -t suppresses all output except for a grand
652 total number of matching lines (or non-matching lines if -v
653 is used) in all the files. If -t is used with -c, a grand
654 total is output except when the previous output is just one
655 line. In other words, it is not output when just one file's
656 count is listed. If file names are being output, the grand
657 total is preceded by "TOTAL:". Otherwise, it appears as just
658 another number. The -t option is ignored when used with -L
659 (list files without matches), because the grand total would
660 always be zero.
661
662 -u, --utf-8
663 Operate in UTF-8 mode. This option is available only if PCRE2
664 has been compiled with UTF-8 support. All patterns (including
665 those for any --exclude and --include options) and all sub‐
666 ject lines that are scanned must be valid strings of UTF-8
667 characters.
668
669 -V, --version
670 Write the version numbers of pcre2grep and the PCRE2 library
671 to the standard output and then exit. Anything else on the
672 command line is ignored.
673
674 -v, --invert-match
675 Invert the sense of the match, so that lines which do not
676 match any of the patterns are the ones that are found.
677
678 -w, --word-regex, --word-regexp
679 Force the patterns to match only whole words. This is equiva‐
680 lent to having \b at the start and end of the pattern. This
681 option applies only to the patterns that are matched against
682 the contents of files; it does not apply to patterns speci‐
683 fied by any of the --include or --exclude options.
684
685 -x, --line-regex, --line-regexp
686 Force the patterns to be anchored (each must start matching
687 at the beginning of a line) and in addition, require them to
688 match entire lines. In multiline mode the match may be more
689 than one line. This is equivalent to having \A and \Z charac‐
690 ters at the start and end of each alternative top-level
691 branch in every pattern. This option applies only to the pat‐
692 terns that are matched against the contents of files; it does
693 not apply to patterns specified by any of the --include or
694 --exclude options.
695
697
698 The environment variables LC_ALL and LC_CTYPE are examined, in that
699 order, for a locale. The first one that is set is used. This can be
700 overridden by the --locale option. If no locale is set, the PCRE2
701 library's default (usually the "C" locale) is used.
702
704
705 The -N (--newline) option allows pcre2grep to scan files with different
706 newline conventions from the default. Any parts of the input files that
707 are written to the standard output are copied identically, with what‐
708 ever newline sequences they have in the input. However, the setting of
709 this option does not affect the interpretation of files specified by
710 the -f, --exclude-from, or --include-from options, which are assumed to
711 use the operating system's standard newline sequence, nor does it
712 affect the way in which pcre2grep writes informational messages to the
713 standard error and output streams. For these it uses the string "\n" to
714 indicate newlines, relying on the C I/O library to convert this to an
715 appropriate sequence.
716
718
719 Many of the short and long forms of pcre2grep's options are the same as
720 in the GNU grep program. Any long option of the form --xxx-regexp (GNU
721 terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
722 ever, the --file-list, --file-offsets, --include-dir, --line-offsets,
723 --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa‐
724 tor, --recursion-limit, -u, and --utf-8 options are specific to
725 pcre2grep, as is the use of the --only-matching option with a capturing
726 parentheses number.
727
728 Although most of the common options work the same way, a few are dif‐
729 ferent in pcre2grep. For example, the --include option's argument is a
730 glob for GNU grep, but a regular expression for pcre2grep. If both the
731 -c and -l options are given, GNU grep lists only file names, without
732 counts, but pcre2grep gives the counts as well.
733
735
736 There are four different ways in which an option with data can be spec‐
737 ified. If a short form option is used, the data may follow immedi‐
738 ately, or (with one exception) in the next command line item. For exam‐
739 ple:
740
741 -f/some/file
742 -f /some/file
743
744 The exception is the -o option, which may appear with or without data.
745 Because of this, if data is present, it must follow immediately in the
746 same item, for example -o3.
747
748 If a long form option is used, the data may appear in the same command
749 line item, separated by an equals character, or (with two exceptions)
750 it may appear in the next command line item. For example:
751
752 --file=/some/file
753 --file /some/file
754
755 Note, however, that if you want to supply a file name beginning with ~
756 as data in a shell command, and have the shell expand ~ to a home
757 directory, you must separate the file name from the option, because the
758 shell does not treat ~ specially unless it is at the start of an item.
759
760 The exceptions to the above are the --colour (or --color) and --only-
761 matching options, for which the data is optional. If one of these
762 options does have data, it must be given in the first form, using an
763 equals character. Otherwise pcre2grep will assume that it has no data.
764
766
767 pcre2grep has, by default, support for calling external programs or
768 scripts during matching by making use of PCRE2's callout facility. How‐
769 ever, this support can be disabled when pcre2grep is built. You can
770 find out whether your binary has support for callouts by running it
771 with the --help option. If the support is not enabled, all callouts in
772 patterns are ignored by pcre2grep.
773
774 A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
775 ment is either a number or a quoted string (see the pcre2callout docu‐
776 mentation for details). Numbered callouts are ignored by pcre2grep.
777 String arguments are parsed as a list of substrings separated by pipe
778 (vertical bar) characters. The first substring must be an executable
779 name, with the following substrings specifying arguments:
780
781 executable_name|arg1|arg2|...
782
783 Any substring (including the executable name) may contain escape
784 sequences started by a dollar character: $<digits> or ${<digits>} is
785 replaced by the captured substring of the given decimal number, which
786 must be greater than zero. If the number is greater than the number of
787 capturing substrings, or if the capture is unset, the replacement is
788 empty.
789
790 Any other character is substituted by itself. In particular, $$ is
791 replaced by a single dollar and $| is replaced by a pipe character.
792 Here is an example:
793
794 echo -e "abcde\n12345" | pcre2grep \
795 '(?x)(.)(..(.))
796 (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
797
798 Output:
799
800 Arg1: [a] [bcd] [d] Arg2: |a| ()
801 abcde
802 Arg1: [1] [234] [4] Arg2: |1| ()
803 12345
804
805 The parameters for the execv() system call that is used to run the pro‐
806 gram or script are zero-terminated strings. This means that binary zero
807 characters in the callout argument will cause premature termination of
808 their substrings, and therefore should not be present. Any syntax
809 errors in the string (for example, a dollar not followed by another
810 character) cause the callout to be ignored. If running the program
811 fails for any reason (including the non-existence of the executable), a
812 local matching failure occurs and the matcher backtracks in the normal
813 way.
814
816
817 It is possible to supply a regular expression that takes a very long
818 time to fail to match certain lines. Such patterns normally involve
819 nested indefinite repeats, for example: (a+)*\d when matched against a
820 line of a's with no final digit. The PCRE2 matching function has a
821 resource limit that causes it to abort in these circumstances. If this
822 happens, pcre2grep outputs an error message and the line that caused
823 the problem to the standard error stream. If there are more than 20
824 such errors, pcre2grep gives up.
825
826 The --match-limit option of pcre2grep can be used to set the overall
827 resource limit; there is a second option called --recursion-limit that
828 sets a limit on the amount of memory (usually stack) that is used (see
829 the discussion of these options above).
830
832
833 Exit status is 0 if any matches were found, 1 if no matches were found,
834 and 2 for syntax errors, overlong lines, non-existent or inaccessible
835 files (even if matches were found in other files) or too many matching
836 errors. Using the -s option to suppress error messages about inaccessi‐
837 ble files does not affect the return code.
838
840
841 pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
842
844
845 Philip Hazel
846 University Computing Service
847 Cambridge, England.
848
850
851 Last updated: 31 December 2016
852 Copyright (c) 1997-2016 University of Cambridge.
853
854
855
856PCRE2 10.23 31 December 2016 PCRE2GREP(1)