1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4

NAME

6       pcre2grep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcre2grep  searches  files  for  character patterns, in the same way as
14       other grep commands do,  but  it  uses  the  PCRE2  regular  expression
15       library  to  support  patterns  that  are  compatible  with the regular
16       expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
17       of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
18       syntax and semantics of the regular expressions that PCRE2 supports.
19
20       Patterns, whether supplied on the command line or in a  separate  file,
21       are given without delimiters. For example:
22
23         pcre2grep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with slashes, as is common in Perl scripts), they  are  interpreted  as
27       part  of  the pattern. Quotes can of course be used to delimit patterns
28       on the command line because they are  interpreted  by  the  shell,  and
29       indeed  quotes  are required if a pattern contains white space or shell
30       metacharacters.
31
32       The first argument that follows any option settings is treated  as  the
33       single  pattern  to be matched when neither -e nor -f is present.  Con‐
34       versely, when one or both of these options are  used  to  specify  pat‐
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcre2grep  reads  the  standard  input.  The
39       standard  input can also be referenced by a name consisting of a single
40       hyphen.  For example:
41
42         pcre2grep some-pattern file1 - file3
43
44       Input files are searched line by  line.  By  default,  each  line  that
45       matches  a  pattern  is  copied to the standard output, and if there is
46       more than one file, the file name is output at the start of each  line,
47       followed  by  a  colon.  However, there are options that can change how
48       pcre2grep behaves. In particular, the -M option makes  it  possible  to
49       search  for  strings  that  span  line  boundaries. What defines a line
50       boundary is controlled by the -N (--newline) option.
51
52       The amount of memory used for buffering files that are being scanned is
53       controlled  by  parameters  that  can  be  set by the --buffer-size and
54       --max-buffer-size options. The first of these sets the size  of  buffer
55       that  is obtained at the start of processing. If an input file contains
56       very long lines, a larger buffer may be  needed;  this  is  handled  by
57       automatically extending the buffer, up to the limit specified by --max-
58       buffer-size. The default values for these parameters can  be  set  when
59       pcre2grep  is  built;  if nothing is specified, the defaults are set to
60       20KiB and 1MiB respectively. An error occurs if a line is too long  and
61       the buffer can no longer be expanded.
62
63       The  block  of  memory that is actually used is three times the "buffer
64       size", to allow for buffering "before" and "after" lines. If the buffer
65       size  is too small, fewer than requested "before" and "after" lines may
66       be output.
67
68       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
69       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
70       pattern (specified by the use of -e and/or -f), each pattern is applied
71       to  each  line  in the order in which they are defined, except that all
72       the -e patterns are tried before the -f patterns.
73
74       By default, as soon as one pattern matches a line, no further  patterns
75       are considered. However, if --colour (or --color) is used to colour the
76       matching substrings, or if --only-matching, --file-offsets, or  --line-
77       offsets  is  used  to  output  only  the  part of the line that matched
78       (either shown literally, or as an offset), scanning resumes immediately
79       following  the  match,  so that further matches on the same line can be
80       found. If there are multiple  patterns,  they  are  all  tried  on  the
81       remainder  of  the  line, but patterns that follow the one that matched
82       are not tried on the earlier part of the line.
83
84       This behaviour means that the order  in  which  multiple  patterns  are
85       specified  can affect the output when one of the above options is used.
86       This is no longer the same behaviour as GNU grep, which now manages  to
87       display  earlier  matches  for  later  patterns (as long as there is no
88       overlap).
89
90       Patterns that can match an empty string are accepted, but empty  string
91       matches   are   never   recognized.   An   example   is   the   pattern
92       "(super)?(man)?", in which all components are  optional.  This  pattern
93       finds  all  occurrences  of  both "super" and "man"; the output differs
94       from matching with "super|man" when only the  matching  substrings  are
95       being shown.
96
97       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
98       the value to set a locale when calling the PCRE2 library.  The --locale
99       option can be used to override this.
100

SUPPORT FOR COMPRESSED FILES

102
103       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
104       read compressed files whose names end in .gz or .bz2, respectively. You
105       can  find out whether your pcre2grep binary has support for one or both
106       of these file types by running it with the --help option. If the appro‐
107       priate support is not present, all files are treated as plain text. The
108       standard input is always so treated. When input is  from  a  compressed
109       .gz or .bz2 file, the --line-buffered option is ignored.
110

BINARY FILES

112
113       By  default,  a  file that contains a binary zero byte within the first
114       1024 bytes is identified as a binary file, and is processed  specially.
115       (GNU grep identifies binary files in this manner.) However, if the new‐
116       line type is specified as "nul", that is,  the  line  terminator  is  a
117       binary  zero,  the  test  for  a  binary  file  is not applied. See the
118       --binary-files option for a means of changing the way binary files  are
119       handled.
120

BINARY ZEROS IN PATTERNS

122
123       Patterns  passed  from the command line are strings that are terminated
124       by a binary zero, so cannot contain internal zeros.  However,  patterns
125       that are read from a file via the -f option may contain binary zeros.
126

OPTIONS

128
129       The  order  in  which some of the options appear can affect the output.
130       For example, both the -H and -l options affect  the  printing  of  file
131       names.  Whichever  comes later in the command line will be the one that
132       takes effect. Similarly, except where noted  below,  if  an  option  is
133       given  twice,  the  later setting is used. Numerical values for options
134       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
135       1024*1024 respectively.
136
137       --        This terminates the list of options. It is useful if the next
138                 item on the command line starts with a hyphen but is  not  an
139                 option.  This  allows for the processing of patterns and file
140                 names that start with hyphens.
141
142       -A number, --after-context=number
143                 Output up to number lines  of  context  after  each  matching
144                 line.  Fewer lines are output if the next match or the end of
145                 the file is reached, or if the  processing  buffer  size  has
146                 been  set  too  small.  If file names and/or line numbers are
147                 being output, a hyphen separator is used instead of  a  colon
148                 for  the  context  lines.  A  line  containing "--" is output
149                 between each group of lines, unless they are in fact contigu‐
150                 ous  in the input file. The value of number is expected to be
151                 relatively small. When -c is used, -A is ignored.
152
153       -a, --text
154                 Treat binary files as text. This is equivalent  to  --binary-
155                 files=text.
156
157       -B number, --before-context=number
158                 Output  up  to  number  lines of context before each matching
159                 line. Fewer lines are output if the  previous  match  or  the
160                 start  of the file is within number lines, or if the process‐
161                 ing buffer size has been set too small. If file names  and/or
162                 line  numbers  are  being  output, a hyphen separator is used
163                 instead of a colon for the context lines. A  line  containing
164                 "--"  is  output between each group of lines, unless they are
165                 in fact contiguous in the input file. The value of number  is
166                 expected  to  be  relatively  small.  When  -c is used, -B is
167                 ignored.
168
169       --binary-files=word
170                 Specify how binary files are to be processed. If the word  is
171                 "binary"  (the  default),  pattern  matching  is performed on
172                 binary files, but the only  output  is  "Binary  file  <name>
173                 matches"  when a match succeeds. If the word is "text", which
174                 is equivalent to the -a or --text option,  binary  files  are
175                 processed  in  the  same way as any other file. In this case,
176                 when a match succeeds, the  output  may  be  binary  garbage,
177                 which  can  have  nasty effects if sent to a terminal. If the
178                 word is  "without-match",  which  is  equivalent  to  the  -I
179                 option,  binary  files  are  not  processed  at all; they are
180                 assumed not to be of interest and are skipped without causing
181                 any output or affecting the return code.
182
183       --buffer-size=number
184                 Set  the  parameter that controls how much memory is obtained
185                 at the start of processing for buffering files that are being
186                 scanned. See also --max-buffer-size below.
187
188       -C number, --context=number
189                 Output  number  lines  of  context both before and after each
190                 matching line.  This is equivalent to setting both -A and  -B
191                 to the same value.
192
193       -c, --count
194                 Do  not  output  lines from the files that are being scanned;
195                 instead output the number  of  lines  that  would  have  been
196                 shown, either because they matched, or, if -v is set, because
197                 they failed to match. By default, this count is  exactly  the
198                 same  as the number of lines that would have been output, but
199                 if the -M (multiline) option is used (without -v), there  may
200                 be  more suppressed lines than the count (that is, the number
201                 of matches).
202
203                 If no lines are selected, the number zero is output. If  sev‐
204                 eral  files are are being scanned, a count is output for each
205                 of them and the -t option can be used to cause a total to  be
206                 output  at  the  end.  However,  if  the --files-with-matches
207                 option is also  used,  only  those  files  whose  counts  are
208                 greater  than  zero  are listed. When -c is used, the -A, -B,
209                 and -C options are ignored.
210
211       --colour, --color
212                 If this option is given without any data, it is equivalent to
213                 "--colour=auto".   If  data  is required, it must be given in
214                 the same shell item, separated by an equals sign.
215
216       --colour=value, --color=value
217                 This option specifies under what circumstances the parts of a
218                 line that matched a pattern should be coloured in the output.
219                 By default, the output is not coloured. The value  (which  is
220                 optional,  see above) may be "never", "always", or "auto". In
221                 the latter case, colouring happens only if the standard  out‐
222                 put  is connected to a terminal. More resources are used when
223                 colouring is enabled, because pcre2grep has to search for all
224                 possible  matches in a line, not just one, in order to colour
225                 them all.
226
227                 The colour that is used can be specified by  setting  one  of
228                 the  environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
229                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
230                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
231                 GREP_COLORS or GREP_COLOR (in that order). The value  of  the
232                 variable  should  be  a string of two numbers, separated by a
233                 semicolon, except in the  case  of  GREP_COLORS,  which  must
234                 start with "ms=" or "mt=" followed by two semicolon-separated
235                 colours, terminated by the end of the string or by  a  colon.
236                 If  GREP_COLORS  does  not  start  with  "ms=" or "mt=" it is
237                 ignored, and GREP_COLOR is checked.
238
239                 If the string obtained from one of the above  variables  con‐
240                 tains any characters other than semicolon or digits, the set‐
241                 ting is ignored and the default colour is used. The string is
242                 copied directly into the control string for setting colour on
243                 a terminal, so it is your responsibility to ensure  that  the
244                 values  make  sense.  If  no relevant environment variable is
245                 set, the default is "1;31", which gives red.
246
247       -D action, --devices=action
248                 If an input path is  not  a  regular  file  or  a  directory,
249                 "action"  specifies  how  it is to be processed. Valid values
250                 are "read" (the default) or "skip" (silently skip the path).
251
252       -d action, --directories=action
253                 If an input path is a directory, "action" specifies how it is
254                 to  be  processed.   Valid  values are "read" (the default in
255                 non-Windows environments, for compatibility with  GNU  grep),
256                 "recurse"  (equivalent to the -r option), or "skip" (silently
257                 skip the path, the default in Windows environments).  In  the
258                 "read"  case,  directories  are read as if they were ordinary
259                 files. In some operating systems  the  effect  of  reading  a
260                 directory like this is an immediate end-of-file; in others it
261                 may provoke an error.
262
263       --depth-limit=number
264                 See --match-limit below.
265
266       -e pattern, --regex=pattern, --regexp=pattern
267                 Specify a pattern to be matched. This option can be used mul‐
268                 tiple times in order to specify several patterns. It can also
269                 be used as a way of specifying a single pattern  that  starts
270                 with  a hyphen. When -e is used, no argument pattern is taken
271                 from the command line; all  arguments  are  treated  as  file
272                 names.  There is no limit to the number of patterns. They are
273                 applied to each line in the order in which they  are  defined
274                 until one matches.
275
276                 If  -f is used with -e, the command line patterns are matched
277                 first, followed by the patterns from the file(s), independent
278                 of  the order in which these options are specified. Note that
279                 multiple use of -e is not the same as a single  pattern  with
280                 alternatives. For example, X|Y finds the first character in a
281                 line that is X or Y, whereas if the two  patterns  are  given
282                 separately, with X first, pcre2grep finds X if it is present,
283                 even if it follows Y in the line. It finds Y only if there is
284                 no  X  in  the line. This matters only if you are using -o or
285                 --colo(u)r to show the part(s) of the line that matched.
286
287       --exclude=pattern
288                 Files (but not directories) whose names match the pattern are
289                 skipped  without  being processed. This applies to all files,
290                 whether listed on the command  line,  obtained  from  --file-
291                 list, or by scanning a directory. The pattern is a PCRE2 reg‐
292                 ular expression, and is matched against the  final  component
293                 of  the  file  name,  not the entire path. The -F, -w, and -x
294                 options do not apply to this pattern. The option may be given
295                 any number of times in order to specify multiple patterns. If
296                 a file name matches both an --include and an  --exclude  pat‐
297                 tern, it is excluded. There is no short form for this option.
298
299       --exclude-from=filename
300                 Treat  each  non-empty  line  of  the file as the data for an
301                 --exclude option. What constitutes a newline when reading the
302                 file  is the operating system's default. The --newline option
303                 has no effect on this option. This option may be  given  more
304                 than once in order to specify a number of files to read.
305
306       --exclude-dir=pattern
307                 Directories whose names match the pattern are skipped without
308                 being processed, whatever  the  setting  of  the  --recursive
309                 option.  This  applies  to all directories, whether listed on
310                 the command line, obtained from --file-list, or by scanning a
311                 parent  directory. The pattern is a PCRE2 regular expression,
312                 and is matched against the final component of  the  directory
313                 name,  not the entire path. The -F, -w, and -x options do not
314                 apply to this pattern. The option may be given any number  of
315                 times  in order to specify more than one pattern. If a direc‐
316                 tory matches both  --include-dir  and  --exclude-dir,  it  is
317                 excluded. There is no short form for this option.
318
319       -F, --fixed-strings
320                 Interpret  each  data-matching  pattern  as  a  list of fixed
321                 strings, separated by  newlines,  instead  of  as  a  regular
322                 expression.  What  constitutes  a newline for this purpose is
323                 controlled by the --newline option. The -w (match as a  word)
324                 and  -x (match whole line) options can be used with -F.  They
325                 apply to each of the fixed strings. A line is selected if any
326                 of the fixed strings are found in it (subject to -w or -x, if
327                 present). This option applies only to the patterns  that  are
328                 matched  against  the contents of files; it does not apply to
329                 patterns specified by  any  of  the  --include  or  --exclude
330                 options.
331
332       -f filename, --file=filename
333                 Read  patterns  from  the  file, one per line, and match them
334                 against each line of input. As is the case with  patterns  on
335                 the  command line, no delimiters should be used. What consti‐
336                 tutes a newline when reading the file is the  operating  sys‐
337                 tem's  default interpretation of \n. The --newline option has
338                 no effect on this option. Trailing  white  space  is  removed
339                 from  each  line,  and blank lines are ignored. An empty file
340                 contains no patterns and therefore matches nothing.  Patterns
341                 read  from a file in this way may contain binary zeros, which
342                 are treated as ordinary data characters. See  also  the  com‐
343                 ments  about  multiple  patterns versus a single pattern with
344                 alternatives in the description of -e above.
345
346                 If this option is given more than  once,  all  the  specified
347                 files  are read. A data line is output if any of the patterns
348                 match it. A file name can be given as "-"  to  refer  to  the
349                 standard  input.  When  -f is used, patterns specified on the
350                 command line using -e may also be present;  they  are  tested
351                 before  the  file's  patterns.  However,  no other pattern is
352                 taken from the command line; all arguments are treated as the
353                 names of paths to be searched.
354
355       --file-list=filename
356                 Read  a  list  of  files  and/or  directories  that are to be
357                 scanned from the given file, one per line. What constitutes a
358                 newline  when  reading  the  file  is  the operating system's
359                 default. Trailing white space is removed from each line,  and
360                 blank lines are ignored. These paths are processed before any
361                 that are listed on the command line. The  file  name  can  be
362                 given  as  "-"  to refer to the standard input. If --file and
363                 --file-list are both specified  as  "-",  patterns  are  read
364                 first.  This is useful only when the standard input is a ter‐
365                 minal, from which further lines (the list of  files)  can  be
366                 read after an end-of-file indication. If this option is given
367                 more than once, all the specified files are read.
368
369       --file-offsets
370                 Instead of showing lines or parts of lines that  match,  show
371                 each  match  as  an  offset  from the start of the file and a
372                 length, separated by a comma. In this  mode,  no  context  is
373                 shown.  That  is,  the -A, -B, and -C options are ignored. If
374                 there is more than one match in a line, each of them is shown
375                 separately.  This option is mutually exclusive with --output,
376                 --line-offsets, and --only-matching.
377
378       -H, --with-filename
379                 Force the inclusion of the file name at the start  of  output
380                 lines when searching a single file. By default, the file name
381                 is not shown in this case.  For matching lines, the file name
382                 is followed by a colon; for context lines, a hyphen separator
383                 is used. If a line number is also being  output,  it  follows
384                 the  file  name. When the -M option causes a pattern to match
385                 more than one line, only the first is preceded  by  the  file
386                 name.  This  option  overrides  any  previous  -h,  -l, or -L
387                 options.
388
389       -h, --no-filename
390                 Suppress the output file names when searching multiple files.
391                 By  default,  file  names  are  shown when multiple files are
392                 searched. For matching lines, the file name is followed by  a
393                 colon;  for  context lines, a hyphen separator is used.  If a
394                 line number is also being output, it follows the  file  name.
395                 This option overrides any previous -H, -L, or -l options.
396
397       --heap-limit=number
398                 See --match-limit below.
399
400       --help    Output  a  help  message, giving brief details of the command
401                 options and file type support, and then exit.  Anything  else
402                 on the command line is ignored.
403
404       -I        Ignore   binary   files.  This  is  equivalent  to  --binary-
405                 files=without-match.
406
407       -i, --ignore-case
408                 Ignore upper/lower case distinctions during comparisons.
409
410       --include=pattern
411                 If any --include patterns are specified, the only files  that
412                 are  processed  are those that match one of the patterns (and
413                 do not match an --exclude  pattern).  This  option  does  not
414                 affect  directories,  but  it  applies  to all files, whether
415                 listed on the command line, obtained from --file-list, or  by
416                 scanning  a directory. The pattern is a PCRE2 regular expres‐
417                 sion, and is matched against the final component of the  file
418                 name,  not the entire path. The -F, -w, and -x options do not
419                 apply to this pattern. The option may be given any number  of
420                 times.  If  a  file  name  matches  both  an --include and an
421                 --exclude pattern, it is excluded.  There is  no  short  form
422                 for this option.
423
424       --include-from=filename
425                 Treat  each  non-empty  line  of  the file as the data for an
426                 --include option. What constitutes a newline for this purpose
427                 is  the  operating system's default. The --newline option has
428                 no effect on this option. This option may be given any number
429                 of times; all the files are read.
430
431       --include-dir=pattern
432                 If  any --include-dir patterns are specified, the only direc‐
433                 tories that are processed are those that  match  one  of  the
434                 patterns  (and  do  not match an --exclude-dir pattern). This
435                 applies to all directories, whether  listed  on  the  command
436                 line,  obtained  from  --file-list,  or  by scanning a parent
437                 directory. The pattern is a PCRE2 regular expression, and  is
438                 matched  against  the  final component of the directory name,
439                 not the entire path. The -F, -w, and -x options do not  apply
440                 to this pattern. The option may be given any number of times.
441                 If a directory matches both --include-dir and  --exclude-dir,
442                 it is excluded. There is no short form for this option.
443
444       -L, --files-without-match
445                 Instead  of  outputting lines from the files, just output the
446                 names of the files that do not contain any lines  that  would
447                 have  been  output. Each file name is output once, on a sepa‐
448                 rate line. This option overrides any previous -H, -h,  or  -l
449                 options.
450
451       -l, --files-with-matches
452                 Instead  of  outputting lines from the files, just output the
453                 names of the files containing lines that would have been out‐
454                 put.  Each  file  name  is  output  once, on a separate line.
455                 Searching normally stops as soon as a matching line is  found
456                 in  a  file.  However, if the -c (count) option is also used,
457                 matching continues in order to obtain the correct count,  and
458                 those  files  that  have  at least one match are listed along
459                 with their counts. Using this option with -c is a way of sup‐
460                 pressing  the  listing  of files with no matches. This opeion
461                 overrides any previous -H, -h, or -L options.
462
463       --label=name
464                 This option supplies a name to be used for the standard input
465                 when file names are being output. If not supplied, "(standard
466                 input)" is used. There is no short form for this option.
467
468       --line-buffered
469                 When this option is given, non-compressed input is  read  and
470                 processed  line by line, and the output is flushed after each
471                 write. By default, input is  read  in  large  chunks,  unless
472                 pcre2grep  can  determine  that it is reading from a terminal
473                 (which is currently possible only in  Unix-like  environments
474                 or  Windows).  Output  to  terminal is normally automatically
475                 flushed by the operating system. This option  can  be  useful
476                 when the input or output is attached to a pipe and you do not
477                 want pcre2grep to buffer up large amounts of data.   However,
478                 its  use  will  affect  performance,  and  the -M (multiline)
479                 option ceases to work. When input is from a compressed .gz or
480                 .bz2 file, --line-buffered is ignored.
481
482       --line-offsets
483                 Instead  of  showing lines or parts of lines that match, show
484                 each match as a line number, the offset from the start of the
485                 line,  and a length. The line number is terminated by a colon
486                 (as usual; see the -n option), and the offset and length  are
487                 separated  by  a  comma.  In  this mode, no context is shown.
488                 That is, the -A, -B, and -C options are ignored. If there  is
489                 more  than  one  match in a line, each of them is shown sepa‐
490                 rately. This option  is  mutually  exclusive  with  --output,
491                 --file-offsets, and --only-matching.
492
493       --locale=locale-name
494                 This  option specifies a locale to be used for pattern match‐
495                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi‐
496                 ronment  variables.  If  no  locale  is  specified, the PCRE2
497                 library's default (usually the "C" locale) is used. There  is
498                 no short form for this option.
499
500       --match-limit=number
501                 Processing  some  regular expression patterns may take a very
502                 long time to search for all possible matching strings. Others
503                 may  require  a  very large amount of memory. There are three
504                 options that set resource limits for matching.
505
506                 The --match-limit option provides a means of limiting comput‐
507                 ing  resource  usage  when  processing  patterns that are not
508                 going to match, but which have a very large number of  possi‐
509                 bilities in their search trees. The classic example is a pat‐
510                 tern that uses nested unlimited  repeats.  Internally,  PCRE2
511                 has  a  counter that is incremented each time around its main
512                 processing  loop.  If  the  value  set  by  --match-limit  is
513                 reached, an error occurs.
514
515                 The  --heap-limit  option specifies, as a number of kibibytes
516                 (units of 1024 bytes), the amount of heap memory that may  be
517                 used for matching. Heap memory is needed only if matching the
518                 pattern requires a significant number of nested  backtracking
519                 points to be remembered. This parameter can be set to zero to
520                 forbid the use of heap memory altogether.
521
522                 The --depth-limit option limits the  depth  of  nested  back‐
523                 tracking points, which indirectly limits the amount of memory
524                 that is used. The amount of memory needed for each backtrack‐
525                 ing  point  depends on the number of capturing parentheses in
526                 the pattern, so the amount of memory that is used before this
527                 limit  acts  varies from pattern to pattern. This limit is of
528                 use only if it is set smaller than --match-limit.
529
530                 There are no short forms for these options. The default  lim‐
531                 its  can  be  set when the PCRE2 library is compiled; if they
532                 are not specified, the defaults are very large and so  effec‐
533                 tively unlimited.
534
535       --max-buffer-size=number
536                 This  limits  the  expansion  of the processing buffer, whose
537                 initial size can be set by --buffer-size. The maximum  buffer
538                 size  is  silently  forced to be no smaller than the starting
539                 buffer size.
540
541       -M, --multiline
542                 Allow patterns to match more than one line. When this  option
543                 is set, the PCRE2 library is called in "multiline" mode. This
544                 allows a matched string to extend past the end of a line  and
545                 continue  on one or more subsequent lines. Patterns used with
546                 -M may usefully contain literal newline characters and inter‐
547                 nal  occurrences of ^ and $ characters. The output for a suc‐
548                 cessful match may consist of more than one  line.  The  first
549                 line  is  the  line  in which the match started, and the last
550                 line is the line in which the match  ended.  If  the  matched
551                 string  ends  with a newline sequence, the output ends at the
552                 end of that line.  If -v is set,  none  of  the  lines  in  a
553                 multi-line  match  are output. Once a match has been handled,
554                 scanning restarts at the beginning of the line after the  one
555                 in which the match ended.
556
557                 The  newline  sequence  that separates multiple lines must be
558                 matched as part of the pattern.  For  example,  to  find  the
559                 phrase  "regular  expression" in a file where "regular" might
560                 be at the end of a line and "expression" at the start of  the
561                 next line, you could use this command:
562
563                   pcre2grep -M 'regular\s+expression' <file>
564
565                 The  \s  escape  sequence  matches any white space character,
566                 including newlines, and is followed  by  +  so  as  to  match
567                 trailing  white  space  on the first line as well as possibly
568                 handling a two-character newline sequence.
569
570                 There is a limit to the number of lines that can be  matched,
571                 imposed  by  the way that pcre2grep buffers the input file as
572                 it scans it. With a  sufficiently  large  processing  buffer,
573                 this should not be a problem, but the -M option does not work
574                 when input is read line by line (see --line-buffered.)
575
576       -N newline-type, --newline=newline-type
577                 The PCRE2 library supports  five  different  conventions  for
578                 indicating  the  ends of lines. They are the single-character
579                 sequences CR (carriage return) and LF  (linefeed),  the  two-
580                 character  sequence CRLF, an "anycrlf" convention, which rec‐
581                 ognizes any of the preceding three types, and an  "any"  con‐
582                 vention, in which any Unicode line ending sequence is assumed
583                 to end a line. The Unicode sequences are the three just  men‐
584                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
585                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
586                 U+2028), and PS (paragraph separator, U+2029).
587
588                 When  the  PCRE2  library  is  built,  a  default line-ending
589                 sequence  is  specified.   This  is  normally  the   standard
590                 sequence for the operating system. Unless otherwise specified
591                 by this option, pcre2grep uses the  library's  default.   The
592                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
593                 ANY. This makes it possible to use pcre2grep  to  scan  files
594                 that have come from other environments without having to mod‐
595                 ify their line endings. If the data  that  is  being  scanned
596                 does  not  agree  with  the  convention  set  by this option,
597                 pcre2grep may behave in strange ways. Note that  this  option
598                 does  not apply to files specified by the -f, --exclude-from,
599                 or --include-from options, which  are  expected  to  use  the
600                 operating system's standard newline sequence.
601
602       -n, --line-number
603                 Precede each output line by its line number in the file, fol‐
604                 lowed by a colon for matching lines or a hyphen  for  context
605                 lines. If the file name is also being output, it precedes the
606                 line number. When the -M option causes  a  pattern  to  match
607                 more  than  one  line, only the first is preceded by its line
608                 number. This option is forced if --line-offsets is used.
609
610       --no-jit  If the PCRE2 library is built with support  for  just-in-time
611                 compiling (which speeds up matching), pcre2grep automatically
612                 makes use of this, unless it was explicitly disabled at build
613                 time.  This  option  can be used to disable the use of JIT at
614                 run time. It is provided for testing and working round  prob‐
615                 lems.  It should never be needed in normal use.
616
617       -O text, --output=text
618                 When  there  is a match, instead of outputting the whole line
619                 that matched, output just the  given  text.  This  option  is
620                 mutually  exclusive with --only-matching, --file-offsets, and
621                 --line-offsets. Escape sequences starting with a dollar char‐
622                 acter  may be used to insert the contents of the matched part
623                 of the line and/or captured substrings into the text.
624
625                 $<digits> or ${<digits>} is replaced  by  the  captured  sub‐
626                 string  of  the  given  decimal  number; zero substitutes the
627                 whole match. If the number is greater than the number of cap‐
628                 turing  substrings,  or if the capture is unset, the replace‐
629                 ment is empty.
630
631                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
632                 form  feed;  $n by newline; $r by carriage return; $t by tab;
633                 $v by vertical tab.
634
635                 $o<digits> is replaced by the character  represented  by  the
636                 given octal number; up to three digits are processed.
637
638                 $x<digits>  is  replaced  by the character represented by the
639                 given hexadecimal number; up to two digits are processed.
640
641                 Any other character is substituted by itself. In  particular,
642                 $$ is replaced by a single dollar.
643
644       -o, --only-matching
645                 Show only the part of the line that matched a pattern instead
646                 of the whole line. In this mode, no context  is  shown.  That
647                 is,  the -A, -B, and -C options are ignored. If there is more
648                 than one match in a line, each of them is  shown  separately,
649                 on  a  separate  line  of  output.  If -o is combined with -v
650                 (invert the sense of the match to find  non-matching  lines),
651                 no  output is generated, but the return code is set appropri‐
652                 ately. If the matched portion of the line is  empty,  nothing
653                 is  output  unless  the  file  name  or line number are being
654                 printed, in which case they are shown on an  otherwise  empty
655                 line.  This  option  is  mutually  exclusive  with  --output,
656                 --file-offsets and --line-offsets.
657
658       -onumber, --only-matching=number
659                 Show only the part of the line  that  matched  the  capturing
660                 parentheses of the given number. Up to 50 capturing parenthe‐
661                 ses are supported by default. This limit can be  changed  via
662                 the  --om-capture option. A pattern may contain any number of
663                 capturing parentheses, but only those whose number is  within
664                 the  limit can be accessed by -o. An error occurs if the num‐
665                 ber specified by -o is greater than the limit.
666
667                 -o0 is the same as -o without a number. Because these options
668                 can  be given without an argument (see above), if an argument
669                 is present, it must be given in  the  same  shell  item,  for
670                 example, -o3 or --only-matching=2. The comments given for the
671                 non-argument case above also apply to  this  option.  If  the
672                 specified  capturing parentheses do not exist in the pattern,
673                 or were not set in the match, nothing is  output  unless  the
674                 file name or line number are being output.
675
676                 If  this  option is given multiple times, multiple substrings
677                 are output for each match,  in  the  order  the  options  are
678                 given,  and  all on one line. For example, -o3 -o1 -o3 causes
679                 the substrings matched by capturing parentheses 3 and  1  and
680                 then  3 again to be output. By default, there is no separator
681                 (but see the next but one option).
682
683       --om-capture=number
684                 Set the number of capturing parentheses that can be  accessed
685                 by -o. The default is 50.
686
687       --om-separator=text
688                 Specify  a  separating string for multiple occurrences of -o.
689                 The default is an empty string. Separating strings are  never
690                 coloured.
691
692       -q, --quiet
693                 Work quietly, that is, display nothing except error messages.
694                 The exit status indicates whether or  not  any  matches  were
695                 found.
696
697       -r, --recursive
698                 If  any given path is a directory, recursively scan the files
699                 it contains, taking note of any --include and --exclude  set‐
700                 tings.  By  default, a directory is read as a normal file; in
701                 some operating systems this gives an  immediate  end-of-file.
702                 This  option  is  a  shorthand  for  setting the -d option to
703                 "recurse".
704
705       --recursion-limit=number
706                 See --match-limit above.
707
708       -s, --no-messages
709                 Suppress error  messages  about  non-existent  or  unreadable
710                 files.  Such  files  are quietly skipped. However, the return
711                 code is still 2, even if matches were found in other files.
712
713       -t, --total-count
714                 This option is useful when scanning more than  one  file.  If
715                 used  on its own, -t suppresses all output except for a grand
716                 total number of matching lines (or non-matching lines  if  -v
717                 is  used)  in  all  the files. If -t is used with -c, a grand
718                 total is output except when the previous output is  just  one
719                 line.  In  other words, it is not output when just one file's
720                 count is listed. If file names are being  output,  the  grand
721                 total  is preceded by "TOTAL:". Otherwise, it appears as just
722                 another number. The -t option is ignored when  used  with  -L
723                 (list  files  without matches), because the grand total would
724                 always be zero.
725
726       -u, --utf-8
727                 Operate in UTF-8 mode. This option is available only if PCRE2
728                 has been compiled with UTF-8 support. All patterns (including
729                 those for any --exclude and --include options) and  all  sub‐
730                 ject  lines  that  are scanned must be valid strings of UTF-8
731                 characters.
732
733       -V, --version
734                 Write the version numbers of pcre2grep and the PCRE2  library
735                 to  the  standard  output and then exit. Anything else on the
736                 command line is ignored.
737
738       -v, --invert-match
739                 Invert the sense of the match, so that  lines  which  do  not
740                 match any of the patterns are the ones that are found.
741
742       -w, --word-regex, --word-regexp
743                 Force the patterns only to match "words". That is, there must
744                 be a word boundary at the  start  and  end  of  each  matched
745                 string.  This is equivalent to having "\b(?:" at the start of
746                 each pattern, and ")\b" at the end. This option applies  only
747                 to  the  patterns  that  are  matched against the contents of
748                 files; it does not apply to patterns specified by any of  the
749                 --include or --exclude options.
750
751       -x, --line-regex, --line-regexp
752                 Force  the  patterns to start matching only at the beginnings
753                 of lines, and in  addition,  require  them  to  match  entire
754                 lines. In multiline mode the match may be more than one line.
755                 This is equivalent to having "^(?:" at the start of each pat‐
756                 tern  and  ")$"  at  the end. This option applies only to the
757                 patterns that are matched against the contents of  files;  it
758                 does  not apply to patterns specified by any of the --include
759                 or --exclude options.
760

ENVIRONMENT VARIABLES

762
763       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
764       order,  for  a  locale.  The first one that is set is used. This can be
765       overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
766       library's default (usually the "C" locale) is used.
767

NEWLINES

769
770       The -N (--newline) option allows pcre2grep to scan files with different
771       newline conventions from the default. Any parts of the input files that
772       are  written  to the standard output are copied identically, with what‐
773       ever newline sequences they have in the input. However, the setting  of
774       this  option  affects only the way scanned files are processed. It does
775       not affect the interpretation of files specified  by  the  -f,  --file-
776       list, --exclude-from, or --include-from options, nor does it affect the
777       way in which pcre2grep writes informational messages  to  the  standard
778       error and output streams. For these it uses the string "\n" to indicate
779       newlines, relying on the C I/O library to convert this to an  appropri‐
780       ate sequence.
781

OPTIONS COMPATIBILITY

783
784       Many of the short and long forms of pcre2grep's options are the same as
785       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
786       terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
787       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
788       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi‐
789       line, -N, --newline, --om-separator, --output, -u, and --utf-8  options
790       are  specific to pcre2grep, as is the use of the --only-matching option
791       with a capturing parentheses number.
792
793       Although most of the common options work the same way, a few  are  dif‐
794       ferent  in pcre2grep. For example, the --include option's argument is a
795       glob for GNU grep, but a regular expression for pcre2grep. If both  the
796       -c  and  -l  options are given, GNU grep lists only file names, without
797       counts, but pcre2grep gives the counts as well.
798

OPTIONS WITH DATA

800
801       There are four different ways in which an option with data can be spec‐
802       ified.   If  a  short  form option is used, the data may follow immedi‐
803       ately, or (with one exception) in the next command line item. For exam‐
804       ple:
805
806         -f/some/file
807         -f /some/file
808
809       The  exception is the -o option, which may appear with or without data.
810       Because of this, if data is present, it must follow immediately in  the
811       same item, for example -o3.
812
813       If  a long form option is used, the data may appear in the same command
814       line item, separated by an equals character, or (with  two  exceptions)
815       it may appear in the next command line item. For example:
816
817         --file=/some/file
818         --file /some/file
819
820       Note,  however, that if you want to supply a file name beginning with ~
821       as data in a shell command, and have the  shell  expand  ~  to  a  home
822       directory, you must separate the file name from the option, because the
823       shell does not treat ~ specially unless it is at the start of an item.
824
825       The exceptions to the above are the --colour (or --color)  and  --only-
826       matching  options,  for  which  the  data  is optional. If one of these
827       options does have data, it must be given in the first  form,  using  an
828       equals character. Otherwise pcre2grep will assume that it has no data.
829

USING PCRE2'S CALLOUT FACILITY

831
832       pcre2grep  has,  by  default,  support for calling external programs or
833       scripts or echoing specific strings during matching by  making  use  of
834       PCRE2's  callout  facility.  However, this support can be completely or
835       partially disabled when pcre2grep is built. You can  find  out  whether
836       your  binary  has  support  for  callouts by running it with the --help
837       option. If callout support is completely disabled, all callouts in pat‐
838       terns are ignored by pcre2grep.  If the facility is partially disabled,
839       calling external programs is not supported, and callouts  that  request
840       it are ignored.
841
842       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
843       ment is either a number or a quoted string (see the pcre2callout  docu‐
844       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
845       only callouts with string arguments are useful.
846
847   Calling external programs or scripts
848
849       This facility can be independently disabled when pcre2grep is built. It
850       is  supported for Windows, where a call to _spawnvp() is used, for VMS,
851       where lib$spawn() is used, and  for  any  other  Unix-like  environment
852       where fork() and execv() are available.
853
854       If the callout string does not start with a pipe (vertical bar) charac‐
855       ter, it is parsed into a list of substrings separated by  pipe  charac‐
856       ters.  The first substring must be an executable name, with the follow‐
857       ing substrings specifying arguments:
858
859         executable_name|arg1|arg2|...
860
861       Any substring  (including  the  executable  name)  may  contain  escape
862       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
863       replaced by the captured substring of the given decimal  number,  which
864       must  be greater than zero. If the number is greater than the number of
865       capturing substrings, or if the capture is unset,  the  replacement  is
866       empty.
867
868       Any  other  character  is  substituted  by itself. In particular, $$ is
869       replaced by a single dollar and $| is replaced  by  a  pipe  character.
870       Here is an example:
871
872         echo -e "abcde\n12345" | pcre2grep \
873           '(?x)(.)(..(.))
874           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
875
876         Output:
877
878           Arg1: [a] [bcd] [d] Arg2: |a| ()
879           abcde
880           Arg1: [1] [234] [4] Arg2: |1| ()
881           12345
882
883       The  parameters  for the system call that is used to run the program or
884       script are zero-terminated strings. This means that binary zero charac‐
885       ters  in the callout argument will cause premature termination of their
886       substrings, and therefore should not be present. Any syntax  errors  in
887       the  string  (for  example, a dollar not followed by another character)
888       cause the callout to be ignored. If running the program fails  for  any
889       reason  (including the non-existence of the executable), a local match‐
890       ing failure occurs and the matcher backtracks in the normal way.
891
892   Echoing a specific string
893
894       This facility is always available, provided that callouts were not com‐
895       pletely disabled when pcre2grep was built. If the callout string starts
896       with a pipe (vertical bar) character, the rest of the string is written
897       to the output, having been passed through the same escape processing as
898       text from the --output option. This provides a simple echoing  facility
899       that  avoids  calling  an  external program or script. No terminator is
900       added to the string, so if you want a  newline,  you  must  include  it
901       explicitly.  Matching continues normally after the string is output. If
902       you want to see only the callout output but  not  any  output  from  an
903       actual match, you should end the relevant pattern with (*FAIL).
904

MATCHING ERRORS

906
907       It  is  possible  to supply a regular expression that takes a very long
908       time to fail to match certain lines.  Such  patterns  normally  involve
909       nested  indefinite repeats, for example: (a+)*\d when matched against a
910       line of a's with no final digit. The  PCRE2  matching  function  has  a
911       resource  limit that causes it to abort in these circumstances. If this
912       happens, pcre2grep outputs an error message and the  line  that  caused
913       the  problem  to  the  standard error stream. If there are more than 20
914       such errors, pcre2grep gives up.
915
916       The --match-limit option of pcre2grep can be used to  set  the  overall
917       resource  limit.  There are also other limits that affect the amount of
918       memory used during matching; see the  discussion  of  --heap-limit  and
919       --depth-limit above.
920

DIAGNOSTICS

922
923       Exit status is 0 if any matches were found, 1 if no matches were found,
924       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
925       files  (even if matches were found in other files) or too many matching
926       errors. Using the -s option to suppress error messages about inaccessi‐
927       ble files does not affect the return code.
928
929       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
930       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
931       exit(1).
932

SEE ALSO

934
935       pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
936

AUTHOR

938
939       Philip Hazel
940       University Computing Service
941       Cambridge, England.
942

REVISION

944
945       Last updated: 24 November 2018
946       Copyright (c) 1997-2018 University of Cambridge.
947
948
949
950PCRE2 10.33                    24 November 2018                   PCRE2GREP(1)
Impressum