1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4

NAME

6       pcre2grep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcre2grep  searches  files  for  character patterns, in the same way as
14       other grep commands do, but it uses the PCRE2  regular  expression  li‐
15       brary  to support patterns that are compatible with the regular expres‐
16       sions of Perl 5. See pcre2syntax(3) for a  quick-reference  summary  of
17       pattern syntax, or pcre2pattern(3) for a full description of the syntax
18       and semantics of the regular expressions that PCRE2 supports.
19
20       Patterns, whether supplied on the command line or in a  separate  file,
21       are given without delimiters. For example:
22
23         pcre2grep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with slashes, as is common in Perl scripts), they  are  interpreted  as
27       part  of  the pattern. Quotes can of course be used to delimit patterns
28       on the command line because they are interpreted by the shell, and  in‐
29       deed  quotes  are  required  if a pattern contains white space or shell
30       metacharacters.
31
32       The first argument that follows any option settings is treated  as  the
33       single  pattern  to be matched when neither -e nor -f is present.  Con‐
34       versely, when one or both of these options are  used  to  specify  pat‐
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcre2grep  reads  the  standard  input.  The
39       standard  input can also be referenced by a name consisting of a single
40       hyphen.  For example:
41
42         pcre2grep some-pattern file1 - file3
43
44       By default, input files are searched  line  by  line.  Each  line  that
45       matches  a  pattern  is  copied to the standard output, and if there is
46       more than one file, the file name is output at the start of each  line,
47       followed  by  a  colon.  However, there are options that can change how
48       pcre2grep behaves. For example, the -M  option  makes  it  possible  to
49       search  for  strings  that  span  line  boundaries. What defines a line
50       boundary is controlled by the -N (--newline) option. The -h and -H  op‐
51       tions  control  whether  or not file names are shown, and the -Z option
52       changes the file name terminator to a zero byte.
53
54       The amount of memory used for buffering files that are being scanned is
55       controlled  by  parameters  that  can  be  set by the --buffer-size and
56       --max-buffer-size options. The first of these sets the size  of  buffer
57       that  is obtained at the start of processing. If an input file contains
58       very long lines, a larger buffer may be needed; this is handled by  au‐
59       tomatically  extending  the buffer, up to the limit specified by --max-
60       buffer-size. The default values for these parameters can  be  set  when
61       pcre2grep  is  built;  if nothing is specified, the defaults are set to
62       20KiB and 1MiB respectively. An error occurs if a line is too long  and
63       the buffer can no longer be expanded.
64
65       The  block  of  memory that is actually used is three times the "buffer
66       size", to allow for buffering "before" and "after" lines. If the buffer
67       size  is too small, fewer than requested "before" and "after" lines may
68       be output.
69
70       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
71       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
72       pattern (specified by the use of -e and/or -f), each pattern is applied
73       to  each  line  in the order in which they are defined, except that all
74       the -e patterns are tried before the -f patterns.
75
76       By default, as soon as one pattern matches a line, no further  patterns
77       are considered. However, if --colour (or --color) is used to colour the
78       matching substrings, or if --only-matching, --file-offsets, --line-off‐
79       sets,  or  --output  is  used  to output only the part of the line that
80       matched (either shown literally, or as an  offset),  the  behaviour  is
81       different. In this situation, all the patterns are applied to the line.
82       If there is more than one match, the one that  begins  nearest  to  the
83       start  of  the subject is processed; if there is more than one match at
84       that position, the one with the  longest  matching  substring  is  pro‐
85       cessed;  if the matching substrings are equal, the first match found is
86       processed.
87
88       Scanning with all the patterns resumes immediately following the match,
89       so  that  later  matches  on the same line can be found. Note, however,
90       that an overlapping match that starts in the middle  of  another  match
91       will not be processed.
92
93       The  above behaviour was changed at release 10.41 to be more compatible
94       with GNU grep. In earlier releases, pcre2grep did not recognize matches
95       from later patterns that were earlier in the subject.
96
97       Patterns  that can match an empty string are accepted, but empty string
98       matches  are  never  recognized.  An  example  is  the  pattern   "(su‐
99       per)?(man)?",  in which all components are optional. This pattern finds
100       all occurrences of both "super" and  "man";  the  output  differs  from
101       matching  with  "super|man" when only the matching substrings are being
102       shown.
103
104       If the LC_ALL or LC_CTYPE environment variable is set,  pcre2grep  uses
105       the value to set a locale when calling the PCRE2 library.  The --locale
106       option can be used to override this.
107

SUPPORT FOR COMPRESSED FILES

109
110       Compile-time options for pcre2grep can set it up to use libz or  libbz2
111       for  reading  compressed  files whose names end in .gz or .bz2, respec‐
112       tively. You can find out whether your pcre2grep binary has support  for
113       one  or  both of these file types by running it with the --help option.
114       If the appropriate support is not present, all  files  are  treated  as
115       plain  text.  The standard input is always so treated. If a file with a
116       .gz or .bz2 extension is not in fact compressed, it is read as a  plain
117       text  file.  When  input  is  from  a  compressed .gz or .bz2 file, the
118       --line-buffered option is ignored.
119

BINARY FILES

121
122       By default, a file that contains a binary zero byte  within  the  first
123       1024  bytes is identified as a binary file, and is processed specially.
124       However, if the newline type is specified as NUL,  that  is,  the  line
125       terminator is a binary zero, the test for a binary file is not applied.
126       See the --binary-files option for a means of changing  the  way  binary
127       files are handled.
128

BINARY ZEROS IN PATTERNS

130
131       Patterns  passed  from the command line are strings that are terminated
132       by a binary zero, so cannot contain internal zeros.  However,  patterns
133       that are read from a file via the -f option may contain binary zeros.
134

OPTIONS

136
137       The  order  in  which some of the options appear can affect the output.
138       For example, both the -H and -l options affect  the  printing  of  file
139       names.  Whichever  comes later in the command line will be the one that
140       takes effect. Similarly, except where noted  below,  if  an  option  is
141       given  twice,  the  later setting is used. Numerical values for options
142       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
143       1024*1024 respectively.
144
145       --        This terminates the list of options. It is useful if the next
146                 item on the command line starts with a hyphen but is  not  an
147                 option.  This  allows for the processing of patterns and file
148                 names that start with hyphens.
149
150       -A number, --after-context=number
151                 Output up to number lines  of  context  after  each  matching
152                 line.  Fewer lines are output if the next match or the end of
153                 the file is reached, or if the  processing  buffer  size  has
154                 been set too small. If file names and/or line numbers are be‐
155                 ing output, a hyphen separator is used instead of a colon for
156                 the  context  lines  (the -Z option can be used to change the
157                 file name terminator to a zero byte). A line containing  "--"
158                 is  output  between  each  group of lines, unless they are in
159                 fact contiguous in the input file. The value of number is ex‐
160                 pected  to  be  relatively  small. When -c is used, -A is ig‐
161                 nored.
162
163       -a, --text
164                 Treat binary files as text. This is equivalent  to  --binary-
165                 files=text.
166
167       --allow-lookaround-bsk
168                 PCRE2 now forbids the use of \K in lookarounds by default, in
169                 line with Perl.  This option  causes  pcre2grep  to  set  the
170                 PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK  option,  which enables this
171                 somewhat dangerous usage.
172
173       -B number, --before-context=number
174                 Output up to number lines of  context  before  each  matching
175                 line.  Fewer  lines  are  output if the previous match or the
176                 start of the file is within number lines, or if the  process‐
177                 ing  buffer size has been set too small. If file names and/or
178                 line numbers are being output, a hyphen separator is used in‐
179                 stead  of a colon for the context lines (the -Z option can be
180                 used to change the file name terminator to a  zero  byte).  A
181                 line  containing  "--" is output between each group of lines,
182                 unless they are in fact contiguous in  the  input  file.  The
183                 value  of  number is expected to be relatively small. When -c
184                 is used, -B is ignored.
185
186       --binary-files=word
187                 Specify how binary files are to be processed. If the word  is
188                 "binary"  (the default), pattern matching is performed on bi‐
189                 nary files, but  the  only  output  is  "Binary  file  <name>
190                 matches"  when a match succeeds. If the word is "text", which
191                 is equivalent to the -a or --text option,  binary  files  are
192                 processed  in  the  same way as any other file. In this case,
193                 when a match succeeds, the  output  may  be  binary  garbage,
194                 which  can  have  nasty effects if sent to a terminal. If the
195                 word is "without-match", which is equivalent to  the  -I  op‐
196                 tion, binary files are not processed at all; they are assumed
197                 not to be of interest and are  skipped  without  causing  any
198                 output or affecting the return code.
199
200       --buffer-size=number
201                 Set  the  parameter that controls how much memory is obtained
202                 at the start of processing for buffering files that are being
203                 scanned. See also --max-buffer-size below.
204
205       -C number, --context=number
206                 Output  number  lines  of  context both before and after each
207                 matching line.  This is equivalent to setting both -A and  -B
208                 to the same value.
209
210       -c, --count
211                 Do  not  output  lines from the files that are being scanned;
212                 instead output the number  of  lines  that  would  have  been
213                 shown, either because they matched, or, if -v is set, because
214                 they failed to match. By default, this count is  exactly  the
215                 same  as the number of lines that would have been output, but
216                 if the -M (multiline) option is used (without -v), there  may
217                 be  more suppressed lines than the count (that is, the number
218                 of matches).
219
220                 If no lines are selected, the number zero is output. If  sev‐
221                 eral  files are are being scanned, a count is output for each
222                 of them and the -t option can be used to cause a total to  be
223                 output  at  the end. However, if the --files-with-matches op‐
224                 tion is also used, only those files whose counts are  greater
225                 than zero are listed. When -c is used, the -A, -B, and -C op‐
226                 tions are ignored.
227
228       --colour, --color
229                 If this option is given without any data, it is equivalent to
230                 "--colour=auto".   If  data  is required, it must be given in
231                 the same shell item, separated by an equals sign.
232
233       --colour=value, --color=value
234                 This option specifies under what circumstances the parts of a
235                 line that matched a pattern should be coloured in the output.
236                 It is ignored if --file-offsets, --line-offsets, or  --output
237                 is set. By default, output is not coloured. The value for the
238                 --colour  option  (which  is  optional,  see  above)  may  be
239                 "never",  "always",  or "auto". In the latter case, colouring
240                 happens only if the standard output is connected to a  termi‐
241                 nal.   More resources are used when colouring is enabled, be‐
242                 cause pcre2grep has to search for all possible matches  in  a
243                 line, not just one, in order to colour them all.
244
245                 The  colour  that  is used can be specified by setting one of
246                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
247                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
248                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
249                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
250                 variable should be a string of two numbers,  separated  by  a
251                 semicolon,  except  in  the  case  of GREP_COLORS, which must
252                 start with "ms=" or "mt=" followed by two semicolon-separated
253                 colours,  terminated  by the end of the string or by a colon.
254                 If GREP_COLORS does not start with "ms=" or "mt=" it  is  ig‐
255                 nored, and GREP_COLOR is checked.
256
257                 If  the  string obtained from one of the above variables con‐
258                 tains any characters other than semicolon or digits, the set‐
259                 ting is ignored and the default colour is used. The string is
260                 copied directly into the control string for setting colour on
261                 a  terminal,  so it is your responsibility to ensure that the
262                 values make sense. If no  relevant  environment  variable  is
263                 set, the default is "1;31", which gives red.
264
265       -D action, --devices=action
266                 If  an  input path is not a regular file or a directory, "ac‐
267                 tion" specifies how it is to be processed. Valid  values  are
268                 "read" (the default) or "skip" (silently skip the path).
269
270       -d action, --directories=action
271                 If an input path is a directory, "action" specifies how it is
272                 to be processed.  Valid values are  "read"  (the  default  in
273                 non-Windows  environments,  for compatibility with GNU grep),
274                 "recurse" (equivalent to the -r option), or "skip"  (silently
275                 skip  the  path, the default in Windows environments). In the
276                 "read" case, directories are read as if  they  were  ordinary
277                 files.  In some operating systems the effect of reading a di‐
278                 rectory like this is an immediate end-of-file; in  others  it
279                 may provoke an error.
280
281       --depth-limit=number
282                 See --match-limit below.
283
284       -e pattern, --regex=pattern, --regexp=pattern
285                 Specify a pattern to be matched. This option can be used mul‐
286                 tiple times in order to specify several patterns. It can also
287                 be  used  as a way of specifying a single pattern that starts
288                 with a hyphen. When -e is used, no argument pattern is  taken
289                 from  the  command  line;  all  arguments are treated as file
290                 names. There is no limit to the number of patterns. They  are
291                 applied to each line in the order in which they are defined.
292
293                 If  -f is used with -e, the command line patterns are matched
294                 first, followed by the patterns from the file(s), independent
295                 of the order in which these options are specified.
296
297       --exclude=pattern
298                 Files (but not directories) whose names match the pattern are
299                 skipped without being processed. This applies to  all  files,
300                 whether  listed  on  the  command line, obtained from --file-
301                 list, or by scanning a directory. The pattern is a PCRE2 reg‐
302                 ular  expression,  and is matched against the final component
303                 of the file name, not the entire path. The -F, -w, and -x op‐
304                 tions  do  not apply to this pattern. The option may be given
305                 any number of times in order to specify multiple patterns. If
306                 a  file  name matches both an --include and an --exclude pat‐
307                 tern, it is excluded. There is no short form for this option.
308
309       --exclude-from=filename
310                 Treat each non-empty line of the file  as  the  data  for  an
311                 --exclude option. What constitutes a newline when reading the
312                 file is the operating system's default. The --newline  option
313                 has  no  effect on this option. This option may be given more
314                 than once in order to specify a number of files to read.
315
316       --exclude-dir=pattern
317                 Directories whose names match the pattern are skipped without
318                 being  processed, whatever the setting of the --recursive op‐
319                 tion. This applies to all directories, whether listed on  the
320                 command  line,  obtained  from  --file-list, or by scanning a
321                 parent directory. The pattern is a PCRE2 regular  expression,
322                 and  is  matched against the final component of the directory
323                 name, not the entire path. The -F, -w, and -x options do  not
324                 apply  to this pattern. The option may be given any number of
325                 times in order to specify more than one pattern. If a  direc‐
326                 tory  matches both --include-dir and --exclude-dir, it is ex‐
327                 cluded. There is no short form for this option.
328
329       -F, --fixed-strings
330                 Interpret each data-matching  pattern  as  a  list  of  fixed
331                 strings,  separated  by newlines, instead of as a regular ex‐
332                 pression. What constitutes a newline for this purpose is con‐
333                 trolled by the --newline option. The -w (match as a word) and
334                 -x (match whole line) options can be used with -F.  They  ap‐
335                 ply  to  each of the fixed strings. A line is selected if any
336                 of the fixed strings are found in it (subject to -w or -x, if
337                 present).  This  option applies only to the patterns that are
338                 matched against the contents of files; it does not  apply  to
339                 patterns  specified  by any of the --include or --exclude op‐
340                 tions.
341
342       -f filename, --file=filename
343                 Read patterns from the file, one per line.  As  is  the  case
344                 with  patterns  on  the command line, no delimiters should be
345                 used. What constitutes a newline when reading the file is the
346                 operating  system's  default interpretation of \n. The --new‐
347                 line option has no effect  on  this  option.  Trailing  white
348                 space is removed from each line, and blank lines are ignored.
349                 An empty file contains  no  patterns  and  therefore  matches
350                 nothing.  Patterns  read  from a file in this way may contain
351                 binary zeros, which are treated as ordinary data characters.
352
353                 If this option is given more than  once,  all  the  specified
354                 files  are read. A data line is output if any of the patterns
355                 match it. A file name can be given as "-"  to  refer  to  the
356                 standard  input.  When  -f is used, patterns specified on the
357                 command line using -e may also be present; they  are  matched
358                 before the file's patterns. However, no pattern is taken from
359                 the command line; all arguments are treated as the  names  of
360                 paths to be searched.
361
362       --file-list=filename
363                 Read  a  list  of  files  and/or  directories  that are to be
364                 scanned from the given file, one per line. What constitutes a
365                 newline  when  reading the file is the operating system's de‐
366                 fault. Trailing white space is removed from  each  line,  and
367                 blank lines are ignored. These paths are processed before any
368                 that are listed on the command line. The  file  name  can  be
369                 given  as  "-"  to refer to the standard input. If --file and
370                 --file-list are both specified  as  "-",  patterns  are  read
371                 first.  This is useful only when the standard input is a ter‐
372                 minal, from which further lines (the list of  files)  can  be
373                 read after an end-of-file indication. If this option is given
374                 more than once, all the specified files are read.
375
376       --file-offsets
377                 Instead of showing lines or parts of lines that  match,  show
378                 each  match  as  an  offset  from the start of the file and a
379                 length, separated by a comma. In this mode, --colour  has  no
380                 effect,  and no context is shown. That is, the -A, -B, and -C
381                 options are ignored. If there is more than  one  match  in  a
382                 line,  each of them is shown separately. This option is mutu‐
383                 ally exclusive with  --output,  --line-offsets,  and  --only-
384                 matching.
385
386       -H, --with-filename
387                 Force  the  inclusion of the file name at the start of output
388                 lines when searching a single file. The file name is not nor‐
389                 mally  shown  in  this case.  By default, for matching lines,
390                 the file name is followed by a colon; for  context  lines,  a
391                 hyphen separator is used. The -Z option can be used to change
392                 the terminator to a zero byte. If a line number is also being
393                 output, it follows the file name. When the -M option causes a
394                 pattern to match more than one line, only the first  is  pre‐
395                 ceded  by  the  file name. This option overrides any previous
396                 -h, -l, or -L options.
397
398       -h, --no-filename
399                 Suppress the output file names when searching multiple files.
400                 File  names  are  normally  shown  when  multiple  files  are
401                 searched. By default, for matching lines, the  file  name  is
402                 followed by a colon; for context lines, a hyphen separator is
403                 used. The -Z option can be used to change the terminator to a
404                 zero  byte. If a line number is also being output, it follows
405                 the file name.  This option overrides any previous -H, -L, or
406                 -l options.
407
408       --heap-limit=number
409                 See --match-limit below.
410
411       --help    Output  a  help  message, giving brief details of the command
412                 options and file type support, and then exit.  Anything  else
413                 on the command line is ignored.
414
415       -I        Ignore   binary   files.  This  is  equivalent  to  --binary-
416                 files=without-match.
417
418       -i, --ignore-case
419                 Ignore upper/lower case distinctions during comparisons.
420
421       --include=pattern
422                 If any --include patterns are specified, the only files  that
423                 are processed are those whose names match one of the patterns
424                 and do not match an --exclude pattern. This option  does  not
425                 affect  directories,  but  it  applies  to all files, whether
426                 listed on the command line, obtained from --file-list, or  by
427                 scanning  a directory. The pattern is a PCRE2 regular expres‐
428                 sion, and is matched against the final component of the  file
429                 name,  not the entire path. The -F, -w, and -x options do not
430                 apply to this pattern. The option may be given any number  of
431                 times.  If a file name matches both an --include and an --ex‐
432                 clude pattern, it is excluded.  There is no  short  form  for
433                 this option.
434
435       --include-from=filename
436                 Treat  each  non-empty  line  of  the file as the data for an
437                 --include option. What constitutes a newline for this purpose
438                 is  the  operating system's default. The --newline option has
439                 no effect on this option. This option may be given any number
440                 of times; all the files are read.
441
442       --include-dir=pattern
443                 If  any --include-dir patterns are specified, the only direc‐
444                 tories that are processed are those whose names match one  of
445                 the  patterns and do not match an --exclude-dir pattern. This
446                 applies to all directories, whether  listed  on  the  command
447                 line,  obtained from --file-list, or by scanning a parent di‐
448                 rectory. The pattern is a PCRE2 regular  expression,  and  is
449                 matched  against  the  final component of the directory name,
450                 not the entire path. The -F, -w, and -x options do not  apply
451                 to this pattern. The option may be given any number of times.
452                 If a directory matches both --include-dir and  --exclude-dir,
453                 it is excluded. There is no short form for this option.
454
455       -L, --files-without-match
456                 Instead  of  outputting lines from the files, just output the
457                 names of the files that do not contain any lines  that  would
458                 have  been  output. Each file name is output once, on a sepa‐
459                 rate line by default, but if the -Z option is set,  they  are
460                 separated  by  zero  bytes  instead  of newlines. This option
461                 overrides any previous -H, -h, or -l options.
462
463       -l, --files-with-matches
464                 Instead of outputting lines from the files, just  output  the
465                 names of the files containing lines that would have been out‐
466                 put. Each file name is output once, on a separate  line,  but
467                 if the -Z option is set, they are separated by zero bytes in‐
468                 stead of newlines. Searching normally  stops  as  soon  as  a
469                 matching  line is found in a file. However, if the -c (count)
470                 option is also used, matching continues in  order  to  obtain
471                 the  correct  count,  and  those files that have at least one
472                 match are listed along with their counts. Using  this  option
473                 with  -c is a way of suppressing the listing of files with no
474                 matches that occurs with -c on its own. This option overrides
475                 any previous -H, -h, or -L options.
476
477       --label=name
478                 This option supplies a name to be used for the standard input
479                 when file names are being output. If not supplied, "(standard
480                 input)" is used. There is no short form for this option.
481
482       --line-buffered
483                 When  this  option is given, non-compressed input is read and
484                 processed line by line, and the output is flushed after  each
485                 write.  By  default,  input  is  read in large chunks, unless
486                 pcre2grep can determine that it is reading from  a  terminal,
487                 which is currently possible only in Unix-like environments or
488                 Windows. Output to terminal is normally automatically flushed
489                 by  the  operating system. This option can be useful when the
490                 input or output is attached to a pipe and  you  do  not  want
491                 pcre2grep  to  buffer up large amounts of data.  However, its
492                 use will affect performance, and the  -M  (multiline)  option
493                 ceases  to  work. When input is from a compressed .gz or .bz2
494                 file, --line-buffered is ignored.
495
496       --line-offsets
497                 Instead of showing lines or parts of lines that  match,  show
498                 each match as a line number, the offset from the start of the
499                 line, and a length. The line number is terminated by a  colon
500                 (as  usual; see the -n option), and the offset and length are
501                 separated by a comma. In this mode, --colour has  no  effect,
502                 and  no context is shown. That is, the -A, -B, and -C options
503                 are ignored. If there is more than one match in a line,  each
504                 of  them  is shown separately. This option is mutually exclu‐
505                 sive with --output, --file-offsets, and --only-matching.
506
507       --locale=locale-name
508                 This option specifies a locale to be used for pattern  match‐
509                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi‐
510                 ronment variables. If no locale is specified, the  PCRE2  li‐
511                 brary's default (usually the "C" locale) is used. There is no
512                 short form for this option.
513
514       -M, --multiline
515                 Allow patterns to match more than one line. When this  option
516                 is set, the PCRE2 library is called in "multiline" mode. This
517                 allows a matched string to extend past the end of a line  and
518                 continue  on one or more subsequent lines. Patterns used with
519                 -M may usefully contain literal newline characters and inter‐
520                 nal  occurrences of ^ and $ characters. The output for a suc‐
521                 cessful match may consist of more than one  line.  The  first
522                 line  is  the  line  in which the match started, and the last
523                 line is the line in which the match  ended.  If  the  matched
524                 string  ends  with a newline sequence, the output ends at the
525                 end of that line.  If -v is set,  none  of  the  lines  in  a
526                 multi-line  match  are output. Once a match has been handled,
527                 scanning restarts at the beginning of the line after the  one
528                 in which the match ended.
529
530                 The  newline  sequence  that separates multiple lines must be
531                 matched as part of the pattern.  For  example,  to  find  the
532                 phrase  "regular  expression" in a file where "regular" might
533                 be at the end of a line and "expression" at the start of  the
534                 next line, you could use this command:
535
536                   pcre2grep -M 'regular\s+expression' <file>
537
538                 The \s escape sequence matches any white space character, in‐
539                 cluding newlines, and is followed by + so as to match  trail‐
540                 ing  white  space  on the first line as well as possibly han‐
541                 dling a two-character newline sequence.
542
543                 There is a limit to the number of lines that can be  matched,
544                 imposed  by  the way that pcre2grep buffers the input file as
545                 it scans it. With a  sufficiently  large  processing  buffer,
546                 this should not be a problem, but the -M option does not work
547                 when input is read line by line (see --line-buffered.)
548
549       -m number, --max-count=number
550                 Stop processing after finding number matching lines, or  non-
551                 matching  lines if -v is also set. Any trailing context lines
552                 are output after the final match.  In  multiline  mode,  each
553                 multiline  match counts as just one line for this purpose. If
554                 this limit is reached when reading the standard input from  a
555                 regular file, the file is left positioned just after the last
556                 matching line.  If -c is also set, the count that  is  output
557                 is  never  greater  than number. This option has no effect if
558                 used with -L, -l, or -q, or when just checking for a match in
559                 a binary file.
560
561       --match-limit=number
562                 Processing  some  regular expression patterns may take a very
563                 long time to search for all possible matching strings. Others
564                 may  require  a  very large amount of memory. There are three
565                 options that set resource limits for matching.
566
567                 The --match-limit option provides a means of limiting comput‐
568                 ing  resource usage when processing patterns that are not go‐
569                 ing to match, but which have a very large number of possibil‐
570                 ities in their search trees. The classic example is a pattern
571                 that uses nested unlimited repeats. Internally, PCRE2  has  a
572                 counter  that  is  incremented each time around its main pro‐
573                 cessing loop. If the value set by --match-limit  is  reached,
574                 an error occurs.
575
576                 The  --heap-limit  option specifies, as a number of kibibytes
577                 (units of 1024 bytes), the maximum amount of heap memory that
578                 may be used for matching.
579
580                 The  --depth-limit  option  limits  the depth of nested back‐
581                 tracking points, which indirectly limits the amount of memory
582                 that is used. The amount of memory needed for each backtrack‐
583                 ing point depends on the number of capturing  parentheses  in
584                 the pattern, so the amount of memory that is used before this
585                 limit acts varies from pattern to pattern. This limit  is  of
586                 use only if it is set smaller than --match-limit.
587
588                 There  are no short forms for these options. The default lim‐
589                 its can be set when the PCRE2 library is  compiled;  if  they
590                 are  not specified, the defaults are very large and so effec‐
591                 tively unlimited.
592
593       --max-buffer-size=number
594                 This limits the expansion of  the  processing  buffer,  whose
595                 initial  size can be set by --buffer-size. The maximum buffer
596                 size is silently forced to be no smaller  than  the  starting
597                 buffer size.
598
599       -N newline-type, --newline=newline-type
600                 Six different conventions for indicating the ends of lines in
601                 scanned files are supported. For example:
602
603                   pcre2grep -N CRLF 'some pattern' <file>
604
605                 The newline type may be specified in upper, lower,  or  mixed
606                 case.  If the newline type is NUL, lines are separated by bi‐
607                 nary zero characters. The other types are the  single-charac‐
608                 ter  sequences  CR  (carriage  return) and LF (linefeed), the
609                 two-character sequence CRLF, an "anycrlf" type, which  recog‐
610                 nizes  any  of  the preceding three types, and an "any" type,
611                 for which any Unicode line ending sequence is assumed to  end
612                 a  line.  The Unicode sequences are the three just mentioned,
613                 plus VT (vertical tab, U+000B), FF (form feed,  U+000C),  NEL
614                 (next  line,  U+0085),  LS  (line  separator, U+2028), and PS
615                 (paragraph separator, U+2029).
616
617                 When the PCRE2 library is built, a  default  line-ending  se‐
618                 quence  is specified.  This is normally the standard sequence
619                 for the operating system. Unless otherwise specified by  this
620                 option, pcre2grep uses the library's default.
621
622                 This  option makes it possible to use pcre2grep to scan files
623                 that have come from other environments without having to mod‐
624                 ify  their  line  endings.  If the data that is being scanned
625                 does not agree  with  the  convention  set  by  this  option,
626                 pcre2grep  may  behave in strange ways. Note that this option
627                 does not apply to files specified by the -f,  --exclude-from,
628                 or  --include-from options, which are expected to use the op‐
629                 erating system's standard newline sequence.
630
631       -n, --line-number
632                 Precede each output line by its line number in the file, fol‐
633                 lowed  by  a colon for matching lines or a hyphen for context
634                 lines. If the file name is also being output, it precedes the
635                 line  number.  When  the  -M option causes a pattern to match
636                 more than one line, only the first is preceded  by  its  line
637                 number. This option is forced if --line-offsets is used.
638
639       --no-jit  If  the  PCRE2 library is built with support for just-in-time
640                 compiling (which speeds up matching), pcre2grep automatically
641                 makes use of this, unless it was explicitly disabled at build
642                 time. This option can be used to disable the use  of  JIT  at
643                 run  time. It is provided for testing and working round prob‐
644                 lems.  It should never be needed in normal use.
645
646       -O text, --output=text
647                 When there is a match, instead of outputting  the  line  that
648                 matched,  output just the text specified in this option, fol‐
649                 lowed by an operating-system standard newline. In this  mode,
650                 --colour  has  no  effect, and no context is shown.  That is,
651                 the -A, -B, and -C options are ignored. The --newline  option
652                 has  no  effect  on  this option, which is mutually exclusive
653                 with  --only-matching,  --file-offsets,  and  --line-offsets.
654                 However,  like  --only-matching,  if  there  is more than one
655                 match in a line, each of them causes a line of output.
656
657                 Escape sequences starting with a dollar character may be used
658                 to insert the contents of the matched part of the line and/or
659                 captured substrings into the text.
660
661                 $<digits> or ${<digits>} is replaced  by  the  captured  sub‐
662                 string  of  the  given  decimal  number; zero substitutes the
663                 whole match. If the number is greater than the number of cap‐
664                 turing  substrings,  or if the capture is unset, the replace‐
665                 ment is empty.
666
667                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
668                 form  feed;  $n by newline; $r by carriage return; $t by tab;
669                 $v by vertical tab.
670
671                 $o<digits> or $o{<digits>} is replaced by the character whose
672                 code  point  is the given octal number. In the first form, up
673                 to three octal digits are processed.  When  more  digits  are
674                 needed  in Unicode mode to specify a wide character, the sec‐
675                 ond form must be used.
676
677                 $x<digits> or $x{<digits>} is replaced by the character  rep‐
678                 resented  by the given hexadecimal number. In the first form,
679                 up to two hexadecimal digits are processed. When more  digits
680                 are  needed  in Unicode mode to specify a wide character, the
681                 second form must be used.
682
683                 Any other character is substituted by itself. In  particular,
684                 $$ is replaced by a single dollar.
685
686       -o, --only-matching
687                 Show only the part of the line that matched a pattern instead
688                 of the whole line. In this mode, no context  is  shown.  That
689                 is,  the -A, -B, and -C options are ignored. If there is more
690                 than one match in a line, each of them is  shown  separately,
691                 on  a separate line of output. If -o is combined with -v (in‐
692                 vert the sense of the match to find non-matching  lines),  no
693                 output  is  generated,  but  the return code is set appropri‐
694                 ately. If the matched portion of the line is  empty,  nothing
695                 is  output  unless  the  file  name  or line number are being
696                 printed, in which case they are shown on an  otherwise  empty
697                 line.  This  option  is  mutually  exclusive  with  --output,
698                 --file-offsets and --line-offsets.
699
700       -onumber, --only-matching=number
701                 Show only the part of the line  that  matched  the  capturing
702                 parentheses of the given number. Up to 50 capturing parenthe‐
703                 ses are supported by default. This limit can be  changed  via
704                 the  --om-capture option. A pattern may contain any number of
705                 capturing parentheses, but only those whose number is  within
706                 the  limit can be accessed by -o. An error occurs if the num‐
707                 ber specified by -o is greater than the limit.
708
709                 -o0 is the same as -o without a number. Because these options
710                 can  be given without an argument (see above), if an argument
711                 is present, it must be given in the same shell item, for  ex‐
712                 ample,  -o3  or --only-matching=2. The comments given for the
713                 non-argument case above also apply to  this  option.  If  the
714                 specified  capturing parentheses do not exist in the pattern,
715                 or were not set in the match, nothing is  output  unless  the
716                 file name or line number are being output.
717
718                 If  this  option is given multiple times, multiple substrings
719                 are output for each match,  in  the  order  the  options  are
720                 given,  and  all on one line. For example, -o3 -o1 -o3 causes
721                 the substrings matched by capturing parentheses 3 and  1  and
722                 then  3 again to be output. By default, there is no separator
723                 (but see the next but one option).
724
725       --om-capture=number
726                 Set the number of capturing parentheses that can be  accessed
727                 by -o. The default is 50.
728
729       --om-separator=text
730                 Specify  a  separating string for multiple occurrences of -o.
731                 The default is an empty string. Separating strings are  never
732                 coloured.
733
734       -q, --quiet
735                 Work quietly, that is, display nothing except error messages.
736                 The exit status indicates whether or  not  any  matches  were
737                 found.
738
739       -r, --recursive
740                 If  any given path is a directory, recursively scan the files
741                 it contains, taking note of any --include and --exclude  set‐
742                 tings.  By  default, a directory is read as a normal file; in
743                 some operating systems this gives an  immediate  end-of-file.
744                 This  option is a shorthand for setting the -d option to "re‐
745                 curse".
746
747       --recursion-limit=number
748                 This is an obsolete synonym for --depth-limit.  See  --match-
749                 limit above for details.
750
751       -s, --no-messages
752                 Suppress  error  messages  about  non-existent  or unreadable
753                 files. Such files are quietly skipped.  However,  the  return
754                 code is still 2, even if matches were found in other files.
755
756       -t, --total-count
757                 This  option  is  useful when scanning more than one file. If
758                 used on its own, -t suppresses all output except for a  grand
759                 total  number  of matching lines (or non-matching lines if -v
760                 is used) in all the files. If -t is used with -c, a grand to‐
761                 tal  is  output  except  when the previous output is just one
762                 line. In other words, it is not output when just  one  file's
763                 count  is  listed.  If file names are being output, the grand
764                 total is preceded by "TOTAL:". Otherwise, it appears as  just
765                 another  number.  The  -t option is ignored when used with -L
766                 (list files without matches), because the grand  total  would
767                 always be zero.
768
769       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
770                 has been compiled with UTF-8 support. All patterns (including
771                 those  for any --exclude and --include options) and all lines
772                 that are scanned must be valid strings of  UTF-8  characters.
773                 If an invalid UTF-8 string is encountered, an error occurs.
774
775       -U, --utf-allow-invalid
776                 As  --utf,  but in addition subject lines may contain invalid
777                 UTF-8 code unit sequences. These can never form part  of  any
778                 pattern  match.  Patterns  themselves, however, must still be
779                 valid UTF-8 strings. This facility allows valid UTF-8 strings
780                 to be sought within arbitrary byte sequences in executable or
781                 other binary files. For more details about matching  in  non-
782                 valid UTF-8 strings, see the pcre2unicode(3) documentation.
783
784       -V, --version
785                 Write  the version numbers of pcre2grep and the PCRE2 library
786                 to the standard output and then exit. Anything  else  on  the
787                 command line is ignored.
788
789       -v, --invert-match
790                 Invert  the  sense  of  the match, so that lines which do not
791                 match any of the patterns are the ones that are  found.  When
792                 this  option  is  set,  options  such  as --only-matching and
793                 --output, which specify parts of a match that are to be  out‐
794                 put, are ignored.
795
796       -w, --word-regex, --word-regexp
797                 Force the patterns only to match "words". That is, there must
798                 be a word boundary at the  start  and  end  of  each  matched
799                 string.  This is equivalent to having "\b(?:" at the start of
800                 each pattern, and ")\b" at the end. This option applies  only
801                 to  the  patterns  that  are  matched against the contents of
802                 files; it does not apply to patterns specified by any of  the
803                 --include or --exclude options.
804
805       -x, --line-regex, --line-regexp
806                 Force  the  patterns to start matching only at the beginnings
807                 of lines, and in  addition,  require  them  to  match  entire
808                 lines. In multiline mode the match may be more than one line.
809                 This is equivalent to having "^(?:" at the start of each pat‐
810                 tern  and  ")$"  at  the end. This option applies only to the
811                 patterns that are matched against the contents of  files;  it
812                 does  not apply to patterns specified by any of the --include
813                 or --exclude options.
814
815       -Z, --null
816                 Terminate files names in the regular output with a zero  byte
817                 (the  NUL  character)  instead of what would normally appear.
818                 This is useful when file  names  contain  unusual  characters
819                 such  as  colons,  hyphens, or even newlines. The option does
820                 not apply to file names in error messages.
821

ENVIRONMENT VARIABLES

823
824       The environment variables LC_ALL and LC_CTYPE are examined, in that or‐
825       der, for a locale. The first one that is set is used. This can be over‐
826       ridden by the --locale option. If no locale is set, the PCRE2 library's
827       default (usually the "C" locale) is used.
828

NEWLINES

830
831       The  -N  (--newline) option allows pcre2grep to scan files with newline
832       conventions that differ from the default. This option affects only  the
833       way  scanned files are processed. It does not affect the interpretation
834       of files specified by the -f,  --file-list,  --exclude-from,  or  --in‐
835       clude-from options.
836
837       Any  parts  of the scanned input files that are written to the standard
838       output are copied with whatever newline sequences they have in the  in‐
839       put.  However,  if  the final line of a file is output, and it does not
840       end with a newline sequence, a newline sequence is added. If  the  new‐
841       line  setting  is  CR, LF, CRLF or NUL, that line ending is output; for
842       the other settings (ANYCRLF or ANY) a single NL is used.
843
844       The newline setting does not affect the way in which  pcre2grep  writes
845       newlines  in  informational  messages  to the standard output and error
846       streams.  Under Windows, the standard output is set to  be  binary,  so
847       that  "\r\n" at the ends of output lines that are copied from the input
848       is not converted to "\r\r\n" by the C I/O library. This means that  any
849       messages  written  to the standard output must end with "\r\n". For all
850       other operating systems, and for all messages  to  the  standard  error
851       stream, "\n" is used.
852

OPTIONS COMPATIBILITY

854
855       Many of the short and long forms of pcre2grep's options are the same as
856       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
857       terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
858       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
859       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi‐
860       line, -N, --newline,  --om-separator,  --output,  -u,  --utf,  -U,  and
861       --utf-allow-invalid options are specific to pcre2grep, as is the use of
862       the --only-matching option with a capturing parentheses number.
863
864       Although most of the common options work the same way, a few  are  dif‐
865       ferent  in pcre2grep. For example, the --include option's argument is a
866       glob for GNU grep, but a regular expression for pcre2grep. If both  the
867       -c  and  -l  options are given, GNU grep lists only file names, without
868       counts, but pcre2grep gives the counts as well.
869

OPTIONS WITH DATA

871
872       There are four different ways in which an option with data can be spec‐
873       ified.   If  a  short  form option is used, the data may follow immedi‐
874       ately, or (with one exception) in the next command line item. For exam‐
875       ple:
876
877         -f/some/file
878         -f /some/file
879
880       The  exception is the -o option, which may appear with or without data.
881       Because of this, if data is present, it must follow immediately in  the
882       same item, for example -o3.
883
884       If  a long form option is used, the data may appear in the same command
885       line item, separated by an equals character, or (with  two  exceptions)
886       it may appear in the next command line item. For example:
887
888         --file=/some/file
889         --file /some/file
890
891       Note,  however, that if you want to supply a file name beginning with ~
892       as data in a shell command, and have the shell expand ~ to a  home  di‐
893       rectory,  you  must separate the file name from the option, because the
894       shell does not treat ~ specially unless it is at the start of an item.
895
896       The exceptions to the above are the --colour (or --color)  and  --only-
897       matching  options,  for which the data is optional. If one of these op‐
898       tions does have data, it must be given in  the  first  form,  using  an
899       equals character. Otherwise pcre2grep will assume that it has no data.
900

USING PCRE2'S CALLOUT FACILITY

902
903       pcre2grep  has,  by  default,  support for calling external programs or
904       scripts or echoing specific strings during matching by  making  use  of
905       PCRE2's  callout  facility.  However, this support can be completely or
906       partially disabled when pcre2grep is built. You can  find  out  whether
907       your  binary has support for callouts by running it with the --help op‐
908       tion. If callout support is completely disabled, all callouts  in  pat‐
909       terns are ignored by pcre2grep.  If the facility is partially disabled,
910       calling external programs is not supported, and callouts  that  request
911       it are ignored.
912
913       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
914       ment is either a number or a quoted string (see the pcre2callout  docu‐
915       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
916       only callouts with string arguments are useful.
917
918   Echoing a specific string
919
920       Starting the callout string with a pipe character  invokes  an  echoing
921       facility that avoids calling an external program or script. This facil‐
922       ity is always available, provided that  callouts  were  not  completely
923       disabled  when  pcre2grep  was built. The rest of the callout string is
924       processed as a zero-terminated string, which means it should  not  con‐
925       tain  any  internal  binary  zeros. It is written to the output, having
926       first been passed through the same escape processing as text  from  the
927       --output  (-O) option (see above). However, $0 cannot be used to insert
928       a matched substring because the match is still  in  progress.  Instead,
929       the  single  character '0' is inserted. Any syntax errors in the string
930       (for example, a dollar not followed by another  character)  causes  the
931       callout  to be ignored. No terminator is added to the output string, so
932       if you want a newline, you must include it explicitly using the  escape
933       $n. For example:
934
935         pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
936
937       Matching  continues normally after the string is output. If you want to
938       see only the callout output but not any output from  an  actual  match,
939       you should end the pattern with (*FAIL).
940
941   Calling external programs or scripts
942
943       This facility can be independently disabled when pcre2grep is built. It
944       is supported for Windows, where a call to _spawnvp() is used, for  VMS,
945       where  lib$spawn()  is  used,  and  for any Unix-like environment where
946       fork() and execv() are available.
947
948       If the callout string does not start with a pipe (vertical bar) charac‐
949       ter,  it  is parsed into a list of substrings separated by pipe charac‐
950       ters. The first substring must be an executable name, with the  follow‐
951       ing substrings specifying arguments:
952
953         executable_name|arg1|arg2|...
954
955       Any  substring  (including  the executable name) may contain escape se‐
956       quences started by a dollar character. These are the same  as  for  the
957       --output (-O) option documented above, except that $0 cannot insert the
958       matched string because the match is still  in  progress.  Instead,  the
959       character '0' is inserted. If you need a literal dollar or pipe charac‐
960       ter in any substring, use $$ or $| respectively. Here is an example:
961
962         echo -e "abcde\n12345" | pcre2grep \
963           '(?x)(.)(..(.))
964           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
965
966         Output:
967
968           Arg1: [a] [bcd] [d] Arg2: |a| ()
969           abcde
970           Arg1: [1] [234] [4] Arg2: |1| ()
971           12345
972
973       The parameters for the system call that is used to run the  program  or
974       script are zero-terminated strings. This means that binary zero charac‐
975       ters in the callout argument will cause premature termination of  their
976       substrings,  and  therefore should not be present. Any syntax errors in
977       the string (for example, a dollar not followed  by  another  character)
978       causes the callout to be ignored.  If running the program fails for any
979       reason (including the non-existence of the executable), a local  match‐
980       ing failure occurs and the matcher backtracks in the normal way.
981

MATCHING ERRORS

983
984       It  is  possible  to supply a regular expression that takes a very long
985       time to fail to match certain lines.  Such  patterns  normally  involve
986       nested  indefinite repeats, for example: (a+)*\d when matched against a
987       line of a's with no final digit. The PCRE2 matching function has a  re‐
988       source  limit  that  causes it to abort in these circumstances. If this
989       happens, pcre2grep outputs an error message and the  line  that  caused
990       the  problem  to  the  standard error stream. If there are more than 20
991       such errors, pcre2grep gives up.
992
993       The --match-limit option of pcre2grep can be used to  set  the  overall
994       resource  limit.  There are also other limits that affect the amount of
995       memory used during matching; see the  discussion  of  --heap-limit  and
996       --depth-limit above.
997

DIAGNOSTICS

999
1000       Exit status is 0 if any matches were found, 1 if no matches were found,
1001       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
1002       files  (even if matches were found in other files) or too many matching
1003       errors. Using the -s option to suppress error messages about inaccessi‐
1004       ble files does not affect the return code.
1005
1006       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
1007       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
1008       exit(1).
1009

SEE ALSO

1011
1012       pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
1013

AUTHOR

1015
1016       Philip Hazel
1017       Retired from University Computing Service
1018       Cambridge, England.
1019

REVISION

1021
1022       Last updated: 21 November 2022
1023       Copyright (c) 1997-2022 University of Cambridge.
1024
1025
1026
1027PCRE2 10.41                    21 November 2022                   PCRE2GREP(1)
Impressum