1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4

NAME

6       pcre2grep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcre2grep  searches  files  for  character patterns, in the same way as
14       other grep commands do,  but  it  uses  the  PCRE2  regular  expression
15       library  to  support  patterns  that  are  compatible  with the regular
16       expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
17       of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
18       syntax and semantics of the regular expressions that PCRE2 supports.
19
20       Patterns, whether supplied on the command line or in a  separate  file,
21       are given without delimiters. For example:
22
23         pcre2grep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with slashes, as is common in Perl scripts), they  are  interpreted  as
27       part  of  the pattern. Quotes can of course be used to delimit patterns
28       on the command line because they are  interpreted  by  the  shell,  and
29       indeed  quotes  are required if a pattern contains white space or shell
30       metacharacters.
31
32       The first argument that follows any option settings is treated  as  the
33       single  pattern  to be matched when neither -e nor -f is present.  Con‐
34       versely, when one or both of these options are  used  to  specify  pat‐
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcre2grep  reads  the  standard  input.  The
39       standard  input can also be referenced by a name consisting of a single
40       hyphen.  For example:
41
42         pcre2grep some-pattern file1 - file3
43
44       Input files are searched line by  line.  By  default,  each  line  that
45       matches  a  pattern  is  copied to the standard output, and if there is
46       more than one file, the file name is output at the start of each  line,
47       followed  by  a  colon.  However, there are options that can change how
48       pcre2grep behaves. In particular, the -M option makes  it  possible  to
49       search  for  strings  that  span  line  boundaries. What defines a line
50       boundary is controlled by the -N (--newline) option.
51
52       The amount of memory used for buffering files that are being scanned is
53       controlled  by  parameters  that  can  be  set by the --buffer-size and
54       --max-buffer-size options. The first of these sets the size  of  buffer
55       that  is obtained at the start of processing. If an input file contains
56       very long lines, a larger buffer may be  needed;  this  is  handled  by
57       automatically extending the buffer, up to the limit specified by --max-
58       buffer-size. The default values for these parameters can  be  set  when
59       pcre2grep  is  built;  if nothing is specified, the defaults are set to
60       20KiB and 1MiB respectively. An error occurs if a line is too long  and
61       the buffer can no longer be expanded.
62
63       The  block  of  memory that is actually used is three times the "buffer
64       size", to allow for buffering "before" and "after" lines. If the buffer
65       size  is too small, fewer than requested "before" and "after" lines may
66       be output.
67
68       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
69       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
70       pattern (specified by the use of -e and/or -f), each pattern is applied
71       to  each  line  in the order in which they are defined, except that all
72       the -e patterns are tried before the -f patterns.
73
74       By default, as soon as one pattern matches a line, no further  patterns
75       are considered. However, if --colour (or --color) is used to colour the
76       matching substrings, or if --only-matching, --file-offsets, or  --line-
77       offsets  is  used  to  output  only  the  part of the line that matched
78       (either shown literally, or as an offset), scanning resumes immediately
79       following  the  match,  so that further matches on the same line can be
80       found. If there are multiple  patterns,  they  are  all  tried  on  the
81       remainder  of  the  line, but patterns that follow the one that matched
82       are not tried on the earlier matched part of the line.
83
84       This behaviour means that the order  in  which  multiple  patterns  are
85       specified  can affect the output when one of the above options is used.
86       This is no longer the same behaviour as GNU grep, which now manages  to
87       display  earlier  matches  for  later  patterns (as long as there is no
88       overlap).
89
90       Patterns that can match an empty string are accepted, but empty  string
91       matches   are   never   recognized.   An   example   is   the   pattern
92       "(super)?(man)?", in which all components are  optional.  This  pattern
93       finds  all  occurrences  of  both "super" and "man"; the output differs
94       from matching with "super|man" when only the  matching  substrings  are
95       being shown.
96
97       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
98       the value to set a locale when calling the PCRE2 library.  The --locale
99       option can be used to override this.
100

SUPPORT FOR COMPRESSED FILES

102
103       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
104       read compressed files whose names end in .gz or .bz2, respectively. You
105       can  find out whether your pcre2grep binary has support for one or both
106       of these file types by running it with the --help option. If the appro‐
107       priate support is not present, all files are treated as plain text. The
108       standard input is always so treated. When input is  from  a  compressed
109       .gz or .bz2 file, the --line-buffered option is ignored.
110

BINARY FILES

112
113       By  default,  a  file that contains a binary zero byte within the first
114       1024 bytes is identified as a binary file, and is processed  specially.
115       However,  if  the  newline  type is specified as NUL, that is, the line
116       terminator is a binary zero, the test for a binary file is not applied.
117       See  the  --binary-files  option for a means of changing the way binary
118       files are handled.
119

BINARY ZEROS IN PATTERNS

121
122       Patterns passed from the command line are strings that  are  terminated
123       by  a  binary zero, so cannot contain internal zeros. However, patterns
124       that are read from a file via the -f option may contain binary zeros.
125

OPTIONS

127
128       The order in which some of the options appear can  affect  the  output.
129       For  example,  both  the  -H and -l options affect the printing of file
130       names. Whichever comes later in the command line will be the  one  that
131       takes  effect.  Similarly,  except  where  noted below, if an option is
132       given twice, the later setting is used. Numerical  values  for  options
133       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
134       1024*1024 respectively.
135
136       --        This terminates the list of options. It is useful if the next
137                 item  on  the command line starts with a hyphen but is not an
138                 option. This allows for the processing of patterns  and  file
139                 names that start with hyphens.
140
141       -A number, --after-context=number
142                 Output  up  to  number  lines  of context after each matching
143                 line. Fewer lines are output if the next match or the end  of
144                 the  file  is  reached,  or if the processing buffer size has
145                 been set too small. If file names  and/or  line  numbers  are
146                 being  output,  a hyphen separator is used instead of a colon
147                 for the context lines.  A  line  containing  "--"  is  output
148                 between each group of lines, unless they are in fact contigu‐
149                 ous in the input file. The value of number is expected to  be
150                 relatively small. When -c is used, -A is ignored.
151
152       -a, --text
153                 Treat  binary  files as text. This is equivalent to --binary-
154                 files=text.
155
156       -B number, --before-context=number
157                 Output up to number lines of  context  before  each  matching
158                 line.  Fewer  lines  are  output if the previous match or the
159                 start of the file is within number lines, or if the  process‐
160                 ing  buffer size has been set too small. If file names and/or
161                 line numbers are being output, a  hyphen  separator  is  used
162                 instead  of  a colon for the context lines. A line containing
163                 "--" is output between each group of lines, unless  they  are
164                 in  fact contiguous in the input file. The value of number is
165                 expected to be relatively small.  When  -c  is  used,  -B  is
166                 ignored.
167
168       --binary-files=word
169                 Specify  how binary files are to be processed. If the word is
170                 "binary" (the default),  pattern  matching  is  performed  on
171                 binary  files,  but  the  only  output is "Binary file <name>
172                 matches" when a match succeeds. If the word is "text",  which
173                 is  equivalent  to  the -a or --text option, binary files are
174                 processed in the same way as any other file.  In  this  case,
175                 when  a  match  succeeds,  the  output may be binary garbage,
176                 which can have nasty effects if sent to a  terminal.  If  the
177                 word  is  "without-match",  which  is  equivalent  to  the -I
178                 option, binary files are  not  processed  at  all;  they  are
179                 assumed not to be of interest and are skipped without causing
180                 any output or affecting the return code.
181
182       --buffer-size=number
183                 Set the parameter that controls how much memory  is  obtained
184                 at the start of processing for buffering files that are being
185                 scanned. See also --max-buffer-size below.
186
187       -C number, --context=number
188                 Output number lines of context both  before  and  after  each
189                 matching  line.  This is equivalent to setting both -A and -B
190                 to the same value.
191
192       -c, --count
193                 Do not output lines from the files that  are  being  scanned;
194                 instead  output  the  number  of  lines  that would have been
195                 shown, either because they matched, or, if -v is set, because
196                 they  failed  to match. By default, this count is exactly the
197                 same as the number of lines that would have been output,  but
198                 if  the -M (multiline) option is used (without -v), there may
199                 be more suppressed lines than the count (that is, the  number
200                 of matches).
201
202                 If  no lines are selected, the number zero is output. If sev‐
203                 eral files are are being scanned, a count is output for  each
204                 of  them and the -t option can be used to cause a total to be
205                 output at  the  end.  However,  if  the  --files-with-matches
206                 option  is  also  used,  only  those  files  whose counts are
207                 greater than zero are listed. When -c is used,  the  -A,  -B,
208                 and -C options are ignored.
209
210       --colour, --color
211                 If this option is given without any data, it is equivalent to
212                 "--colour=auto".  If data is required, it must  be  given  in
213                 the same shell item, separated by an equals sign.
214
215       --colour=value, --color=value
216                 This option specifies under what circumstances the parts of a
217                 line that matched a pattern should be coloured in the output.
218                 By  default,  the output is not coloured. The value (which is
219                 optional, see above) may be "never", "always", or "auto".  In
220                 the  latter case, colouring happens only if the standard out‐
221                 put is connected to a terminal. More resources are used  when
222                 colouring is enabled, because pcre2grep has to search for all
223                 possible matches in a line, not just one, in order to  colour
224                 them all.
225
226                 The  colour  that  is used can be specified by setting one of
227                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
228                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
229                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
230                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
231                 variable should be a string of two numbers,  separated  by  a
232                 semicolon,  except  in  the  case  of GREP_COLORS, which must
233                 start with "ms=" or "mt=" followed by two semicolon-separated
234                 colours,  terminated  by the end of the string or by a colon.
235                 If GREP_COLORS does not start  with  "ms="  or  "mt="  it  is
236                 ignored, and GREP_COLOR is checked.
237
238                 If  the  string obtained from one of the above variables con‐
239                 tains any characters other than semicolon or digits, the set‐
240                 ting is ignored and the default colour is used. The string is
241                 copied directly into the control string for setting colour on
242                 a  terminal,  so it is your responsibility to ensure that the
243                 values make sense. If no  relevant  environment  variable  is
244                 set, the default is "1;31", which gives red.
245
246       -D action, --devices=action
247                 If  an  input  path  is  not  a  regular file or a directory,
248                 "action" specifies how it is to be  processed.  Valid  values
249                 are "read" (the default) or "skip" (silently skip the path).
250
251       -d action, --directories=action
252                 If an input path is a directory, "action" specifies how it is
253                 to be processed.  Valid values are  "read"  (the  default  in
254                 non-Windows  environments,  for compatibility with GNU grep),
255                 "recurse" (equivalent to the -r option), or "skip"  (silently
256                 skip  the  path, the default in Windows environments). In the
257                 "read" case, directories are read as if  they  were  ordinary
258                 files.  In  some  operating  systems  the effect of reading a
259                 directory like this is an immediate end-of-file; in others it
260                 may provoke an error.
261
262       --depth-limit=number
263                 See --match-limit below.
264
265       -e pattern, --regex=pattern, --regexp=pattern
266                 Specify a pattern to be matched. This option can be used mul‐
267                 tiple times in order to specify several patterns. It can also
268                 be  used  as a way of specifying a single pattern that starts
269                 with a hyphen. When -e is used, no argument pattern is  taken
270                 from  the  command  line;  all  arguments are treated as file
271                 names. There is no limit to the number of patterns. They  are
272                 applied  to  each line in the order in which they are defined
273                 until one matches.
274
275                 If -f is used with -e, the command line patterns are  matched
276                 first, followed by the patterns from the file(s), independent
277                 of the order in which these options are specified. Note  that
278                 multiple  use  of -e is not the same as a single pattern with
279                 alternatives. For example, X|Y finds the first character in a
280                 line  that  is  X or Y, whereas if the two patterns are given
281                 separately, with X first, pcre2grep finds X if it is present,
282                 even if it follows Y in the line. It finds Y only if there is
283                 no X in the line. This matters only if you are  using  -o  or
284                 --colo(u)r to show the part(s) of the line that matched.
285
286       --exclude=pattern
287                 Files (but not directories) whose names match the pattern are
288                 skipped without being processed. This applies to  all  files,
289                 whether  listed  on  the  command line, obtained from --file-
290                 list, or by scanning a directory. The pattern is a PCRE2 reg‐
291                 ular  expression,  and is matched against the final component
292                 of the file name, not the entire path. The  -F,  -w,  and  -x
293                 options do not apply to this pattern. The option may be given
294                 any number of times in order to specify multiple patterns. If
295                 a  file  name matches both an --include and an --exclude pat‐
296                 tern, it is excluded. There is no short form for this option.
297
298       --exclude-from=filename
299                 Treat each non-empty line of the file  as  the  data  for  an
300                 --exclude option. What constitutes a newline when reading the
301                 file is the operating system's default. The --newline  option
302                 has  no  effect on this option. This option may be given more
303                 than once in order to specify a number of files to read.
304
305       --exclude-dir=pattern
306                 Directories whose names match the pattern are skipped without
307                 being  processed,  whatever  the  setting  of the --recursive
308                 option. This applies to all directories,  whether  listed  on
309                 the command line, obtained from --file-list, or by scanning a
310                 parent directory. The pattern is a PCRE2 regular  expression,
311                 and  is  matched against the final component of the directory
312                 name, not the entire path. The -F, -w, and -x options do  not
313                 apply  to this pattern. The option may be given any number of
314                 times in order to specify more than one pattern. If a  direc‐
315                 tory  matches  both  --include-dir  and  --exclude-dir, it is
316                 excluded. There is no short form for this option.
317
318       -F, --fixed-strings
319                 Interpret each data-matching  pattern  as  a  list  of  fixed
320                 strings,  separated  by  newlines,  instead  of  as a regular
321                 expression. What constitutes a newline for  this  purpose  is
322                 controlled  by the --newline option. The -w (match as a word)
323                 and -x (match whole line) options can be used with -F.   They
324                 apply to each of the fixed strings. A line is selected if any
325                 of the fixed strings are found in it (subject to -w or -x, if
326                 present).  This  option applies only to the patterns that are
327                 matched against the contents of files; it does not  apply  to
328                 patterns  specified  by  any  of  the  --include or --exclude
329                 options.
330
331       -f filename, --file=filename
332                 Read patterns from the file, one per  line,  and  match  them
333                 against  each  line of input. As is the case with patterns on
334                 the command line, no delimiters should be used. What  consti‐
335                 tutes  a  newline when reading the file is the operating sys‐
336                 tem's default interpretation of \n. The --newline option  has
337                 no  effect  on  this  option. Trailing white space is removed
338                 from each line, and blank lines are ignored.  An  empty  file
339                 contains  no patterns and therefore matches nothing. Patterns
340                 read from a file in this way may contain binary zeros,  which
341                 are  treated  as  ordinary data characters. See also the com‐
342                 ments about multiple patterns versus a  single  pattern  with
343                 alternatives in the description of -e above.
344
345                 If  this  option  is  given more than once, all the specified
346                 files are read. A data line is output if any of the  patterns
347                 match  it.  A  file  name can be given as "-" to refer to the
348                 standard input. When -f is used, patterns  specified  on  the
349                 command  line  using  -e may also be present; they are tested
350                 before the file's patterns.  However,  no  other  pattern  is
351                 taken from the command line; all arguments are treated as the
352                 names of paths to be searched.
353
354       --file-list=filename
355                 Read a list of  files  and/or  directories  that  are  to  be
356                 scanned from the given file, one per line. What constitutes a
357                 newline when reading  the  file  is  the  operating  system's
358                 default.  Trailing white space is removed from each line, and
359                 blank lines are ignored. These paths are processed before any
360                 that  are  listed  on  the command line. The file name can be
361                 given as "-" to refer to the standard input.  If  --file  and
362                 --file-list  are  both  specified  as  "-", patterns are read
363                 first. This is useful only when the standard input is a  ter‐
364                 minal,  from  which  further lines (the list of files) can be
365                 read after an end-of-file indication. If this option is given
366                 more than once, all the specified files are read.
367
368       --file-offsets
369                 Instead  of  showing lines or parts of lines that match, show
370                 each match as an offset from the start  of  the  file  and  a
371                 length,  separated  by  a  comma. In this mode, no context is
372                 shown. That is, the -A, -B, and -C options  are  ignored.  If
373                 there is more than one match in a line, each of them is shown
374                 separately. This option is mutually exclusive with  --output,
375                 --line-offsets, and --only-matching.
376
377       -H, --with-filename
378                 Force  the  inclusion of the file name at the start of output
379                 lines when searching a single file. By default, the file name
380                 is not shown in this case.  For matching lines, the file name
381                 is followed by a colon; for context lines, a hyphen separator
382                 is  used.  If  a line number is also being output, it follows
383                 the file name. When the -M option causes a pattern  to  match
384                 more  than  one  line, only the first is preceded by the file
385                 name. This option  overrides  any  previous  -h,  -l,  or  -L
386                 options.
387
388       -h, --no-filename
389                 Suppress the output file names when searching multiple files.
390                 By default, file names are  shown  when  multiple  files  are
391                 searched.  For matching lines, the file name is followed by a
392                 colon; for context lines, a hyphen separator is used.   If  a
393                 line  number  is also being output, it follows the file name.
394                 This option overrides any previous -H, -L, or -l options.
395
396       --heap-limit=number
397                 See --match-limit below.
398
399       --help    Output a help message, giving brief details  of  the  command
400                 options  and  file type support, and then exit. Anything else
401                 on the command line is ignored.
402
403       -I        Ignore  binary  files.  This  is  equivalent   to   --binary-
404                 files=without-match.
405
406       -i, --ignore-case
407                 Ignore upper/lower case distinctions during comparisons.
408
409       --include=pattern
410                 If  any --include patterns are specified, the only files that
411                 are processed are those whose names match one of the patterns
412                 and  do  not match an --exclude pattern. This option does not
413                 affect directories, but it  applies  to  all  files,  whether
414                 listed  on the command line, obtained from --file-list, or by
415                 scanning a directory. The pattern is a PCRE2 regular  expres‐
416                 sion,  and is matched against the final component of the file
417                 name, not the entire path. The -F, -w, and -x options do  not
418                 apply  to this pattern. The option may be given any number of
419                 times. If a file  name  matches  both  an  --include  and  an
420                 --exclude  pattern,  it  is excluded.  There is no short form
421                 for this option.
422
423       --include-from=filename
424                 Treat each non-empty line of the file  as  the  data  for  an
425                 --include option. What constitutes a newline for this purpose
426                 is the operating system's default. The --newline  option  has
427                 no effect on this option. This option may be given any number
428                 of times; all the files are read.
429
430       --include-dir=pattern
431                 If any --include-dir patterns are specified, the only  direc‐
432                 tories  that are processed are those whose names match one of
433                 the patterns and do not match an --exclude-dir pattern.  This
434                 applies  to  all  directories,  whether listed on the command
435                 line, obtained from --file-list,  or  by  scanning  a  parent
436                 directory.  The pattern is a PCRE2 regular expression, and is
437                 matched against the final component of  the  directory  name,
438                 not  the entire path. The -F, -w, and -x options do not apply
439                 to this pattern. The option may be given any number of times.
440                 If  a directory matches both --include-dir and --exclude-dir,
441                 it is excluded. There is no short form for this option.
442
443       -L, --files-without-match
444                 Instead of outputting lines from the files, just  output  the
445                 names  of  the files that do not contain any lines that would
446                 have been output. Each file name is output once, on  a  sepa‐
447                 rate  line.  This option overrides any previous -H, -h, or -l
448                 options.
449
450       -l, --files-with-matches
451                 Instead of outputting lines from the files, just  output  the
452                 names of the files containing lines that would have been out‐
453                 put. Each file name is  output  once,  on  a  separate  line.
454                 Searching  normally stops as soon as a matching line is found
455                 in a file. However, if the -c (count) option  is  also  used,
456                 matching  continues in order to obtain the correct count, and
457                 those files that have at least one  match  are  listed  along
458                 with their counts. Using this option with -c is a way of sup‐
459                 pressing the listing of files with  no  matches  that  occurs
460                 with  -c  on  its own. This option overrides any previous -H,
461                 -h, or -L options.
462
463       --label=name
464                 This option supplies a name to be used for the standard input
465                 when file names are being output. If not supplied, "(standard
466                 input)" is used. There is no short form for this option.
467
468       --line-buffered
469                 When this option is given, non-compressed input is  read  and
470                 processed  line by line, and the output is flushed after each
471                 write. By default, input is  read  in  large  chunks,  unless
472                 pcre2grep  can  determine that it is reading from a terminal,
473                 which is currently possible only in Unix-like environments or
474                 Windows. Output to terminal is normally automatically flushed
475                 by the operating system. This option can be useful  when  the
476                 input  or  output  is  attached to a pipe and you do not want
477                 pcre2grep to buffer up large amounts of data.   However,  its
478                 use  will  affect  performance, and the -M (multiline) option
479                 ceases to work. When input is from a compressed .gz  or  .bz2
480                 file, --line-buffered is ignored.
481
482       --line-offsets
483                 Instead  of  showing lines or parts of lines that match, show
484                 each match as a line number, the offset from the start of the
485                 line,  and a length. The line number is terminated by a colon
486                 (as usual; see the -n option), and the offset and length  are
487                 separated  by  a  comma.  In  this mode, no context is shown.
488                 That is, the -A, -B, and -C options are ignored. If there  is
489                 more  than  one  match in a line, each of them is shown sepa‐
490                 rately. This option  is  mutually  exclusive  with  --output,
491                 --file-offsets, and --only-matching.
492
493       --locale=locale-name
494                 This  option specifies a locale to be used for pattern match‐
495                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi‐
496                 ronment  variables.  If  no  locale  is  specified, the PCRE2
497                 library's default (usually the "C" locale) is used. There  is
498                 no short form for this option.
499
500       -M, --multiline
501                 Allow  patterns to match more than one line. When this option
502                 is set, the PCRE2 library is called in "multiline" mode. This
503                 allows  a matched string to extend past the end of a line and
504                 continue on one or more subsequent lines. Patterns used  with
505                 -M may usefully contain literal newline characters and inter‐
506                 nal occurrences of ^ and $ characters. The output for a  suc‐
507                 cessful  match  may  consist of more than one line. The first
508                 line is the line in which the match  started,  and  the  last
509                 line  is  the  line  in which the match ended. If the matched
510                 string ends with a newline sequence, the output ends  at  the
511                 end  of  that  line.   If  -v  is set, none of the lines in a
512                 multi-line match are output. Once a match has  been  handled,
513                 scanning  restarts at the beginning of the line after the one
514                 in which the match ended.
515
516                 The newline sequence that separates multiple  lines  must  be
517                 matched  as  part  of  the  pattern. For example, to find the
518                 phrase "regular expression" in a file where  "regular"  might
519                 be  at the end of a line and "expression" at the start of the
520                 next line, you could use this command:
521
522                   pcre2grep -M 'regular\s+expression' <file>
523
524                 The \s escape sequence matches  any  white  space  character,
525                 including  newlines,  and  is  followed  by  + so as to match
526                 trailing white space on the first line as  well  as  possibly
527                 handling a two-character newline sequence.
528
529                 There  is a limit to the number of lines that can be matched,
530                 imposed by the way that pcre2grep buffers the input  file  as
531                 it  scans  it.  With  a sufficiently large processing buffer,
532                 this should not be a problem, but the -M option does not work
533                 when input is read line by line (see --line-buffered.)
534
535       -m number, --max-count=number
536                 Stop  processing after finding number matching lines, or non-
537                 matching lines if -v is also set. Any trailing context  lines
538                 are  output  after  the  final match. In multiline mode, each
539                 multiline match counts as just one line for this purpose.  If
540                 this  limit is reached when reading the standard input from a
541                 regular file, the file is left positioned just after the last
542                 matching  line.   If -c is also set, the count that is output
543                 is never greater than number. This option has  no  effect  if
544                 used with -L, -l, or -q, or when just checking for a match in
545                 a binary file.
546
547       --match-limit=number
548                 Processing some regular expression patterns may take  a  very
549                 long time to search for all possible matching strings. Others
550                 may require a very large amount of memory.  There  are  three
551                 options that set resource limits for matching.
552
553                 The --match-limit option provides a means of limiting comput‐
554                 ing resource usage when  processing  patterns  that  are  not
555                 going  to match, but which have a very large number of possi‐
556                 bilities in their search trees. The classic example is a pat‐
557                 tern  that  uses  nested unlimited repeats. Internally, PCRE2
558                 has a counter that is incremented each time around  its  main
559                 processing  loop.  If  the  value  set  by  --match-limit  is
560                 reached, an error occurs.
561
562                 The --heap-limit option specifies, as a number  of  kibibytes
563                 (units  of 1024 bytes), the amount of heap memory that may be
564                 used for matching. Heap memory is needed only if matching the
565                 pattern  requires a significant number of nested backtracking
566                 points to be remembered. This parameter can be set to zero to
567                 forbid the use of heap memory altogether.
568
569                 The  --depth-limit  option  limits  the depth of nested back‐
570                 tracking points, which indirectly limits the amount of memory
571                 that is used. The amount of memory needed for each backtrack‐
572                 ing point depends on the number of capturing  parentheses  in
573                 the pattern, so the amount of memory that is used before this
574                 limit acts varies from pattern to pattern. This limit  is  of
575                 use only if it is set smaller than --match-limit.
576
577                 There  are no short forms for these options. The default lim‐
578                 its can be set when the PCRE2 library is  compiled;  if  they
579                 are  not specified, the defaults are very large and so effec‐
580                 tively unlimited.
581
582       --max-buffer-size=number
583                 This limits the expansion of  the  processing  buffer,  whose
584                 initial  size can be set by --buffer-size. The maximum buffer
585                 size is silently forced to be no smaller  than  the  starting
586                 buffer size.
587
588       -N newline-type, --newline=newline-type
589                 Six different conventions for indicating the ends of lines in
590                 scanned files are supported. For example:
591
592                   pcre2grep -N CRLF 'some pattern' <file>
593
594                 The newline type may be specified in upper, lower,  or  mixed
595                 case.  If  the  newline  type  is NUL, lines are separated by
596                 binary zero characters. The other types are the  single-char‐
597                 acter  sequences  CR (carriage return) and LF (linefeed), the
598                 two-character sequence CRLF, an "anycrlf" type, which  recog‐
599                 nizes  any  of  the preceding three types, and an "any" type,
600                 for which any Unicode line ending sequence is assumed to  end
601                 a  line.  The Unicode sequences are the three just mentioned,
602                 plus VT (vertical tab, U+000B), FF (form feed,  U+000C),  NEL
603                 (next  line,  U+0085),  LS  (line  separator, U+2028), and PS
604                 (paragraph separator, U+2029).
605
606                 When the  PCRE2  library  is  built,  a  default  line-ending
607                 sequence   is  specified.   This  is  normally  the  standard
608                 sequence for the operating system. Unless otherwise specified
609                 by this option, pcre2grep uses the library's default.
610
611                 This  option makes it possible to use pcre2grep to scan files
612                 that have come from other environments without having to mod‐
613                 ify  their  line  endings.  If the data that is being scanned
614                 does not agree  with  the  convention  set  by  this  option,
615                 pcre2grep  may  behave in strange ways. Note that this option
616                 does not apply to files specified by the -f,  --exclude-from,
617                 or  --include-from  options,  which  are  expected to use the
618                 operating system's standard newline sequence.
619
620       -n, --line-number
621                 Precede each output line by its line number in the file, fol‐
622                 lowed  by  a colon for matching lines or a hyphen for context
623                 lines. If the file name is also being output, it precedes the
624                 line  number.  When  the  -M option causes a pattern to match
625                 more than one line, only the first is preceded  by  its  line
626                 number. This option is forced if --line-offsets is used.
627
628       --no-jit  If  the  PCRE2 library is built with support for just-in-time
629                 compiling (which speeds up matching), pcre2grep automatically
630                 makes use of this, unless it was explicitly disabled at build
631                 time. This option can be used to disable the use  of  JIT  at
632                 run  time. It is provided for testing and working round prob‐
633                 lems.  It should never be needed in normal use.
634
635       -O text, --output=text
636                 When there is a match, instead of outputting  the  line  that
637                 matched,  output just the text specified in this option, fol‐
638                 lowed by an operating-system standard newline. In this  mode,
639                 no  context is shown. That is, the -A, -B, and -C options are
640                 ignored. The --newline option has no effect on  this  option,
641                 which is mutually exclusive with --only-matching, --file-off‐
642                 sets, and --line-offsets. However, like  --only-matching,  if
643                 there is more than one match in a line, each of them causes a
644                 line of output.
645
646                 Escape sequences starting with a dollar character may be used
647                 to insert the contents of the matched part of the line and/or
648                 captured substrings into the text.
649
650                 $<digits> or ${<digits>} is replaced  by  the  captured  sub‐
651                 string  of  the  given  decimal  number; zero substitutes the
652                 whole match. If the number is greater than the number of cap‐
653                 turing  substrings,  or if the capture is unset, the replace‐
654                 ment is empty.
655
656                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
657                 form  feed;  $n by newline; $r by carriage return; $t by tab;
658                 $v by vertical tab.
659
660                 $o<digits> or $o{<digits>} is replaced by the character whose
661                 code  point  is the given octal number. In the first form, up
662                 to three octal digits are processed.  When  more  digits  are
663                 needed  in Unicode mode to specify a wide character, the sec‐
664                 ond form must be used.
665
666                 $x<digits> or $x{<digits>} is replaced by the character  rep‐
667                 resented  by the given hexadecimal number. In the first form,
668                 up to two hexadecimal digits are processed. When more  digits
669                 are  needed  in Unicode mode to specify a wide character, the
670                 second form must be used.
671
672                 Any other character is substituted by itself. In  particular,
673                 $$ is replaced by a single dollar.
674
675       -o, --only-matching
676                 Show only the part of the line that matched a pattern instead
677                 of the whole line. In this mode, no context  is  shown.  That
678                 is,  the -A, -B, and -C options are ignored. If there is more
679                 than one match in a line, each of them is  shown  separately,
680                 on  a  separate  line  of  output.  If -o is combined with -v
681                 (invert the sense of the match to find  non-matching  lines),
682                 no  output is generated, but the return code is set appropri‐
683                 ately. If the matched portion of the line is  empty,  nothing
684                 is  output  unless  the  file  name  or line number are being
685                 printed, in which case they are shown on an  otherwise  empty
686                 line.  This  option  is  mutually  exclusive  with  --output,
687                 --file-offsets and --line-offsets.
688
689       -onumber, --only-matching=number
690                 Show only the part of the line  that  matched  the  capturing
691                 parentheses of the given number. Up to 50 capturing parenthe‐
692                 ses are supported by default. This limit can be  changed  via
693                 the  --om-capture option. A pattern may contain any number of
694                 capturing parentheses, but only those whose number is  within
695                 the  limit can be accessed by -o. An error occurs if the num‐
696                 ber specified by -o is greater than the limit.
697
698                 -o0 is the same as -o without a number. Because these options
699                 can  be given without an argument (see above), if an argument
700                 is present, it must be given in  the  same  shell  item,  for
701                 example, -o3 or --only-matching=2. The comments given for the
702                 non-argument case above also apply to  this  option.  If  the
703                 specified  capturing parentheses do not exist in the pattern,
704                 or were not set in the match, nothing is  output  unless  the
705                 file name or line number are being output.
706
707                 If  this  option is given multiple times, multiple substrings
708                 are output for each match,  in  the  order  the  options  are
709                 given,  and  all on one line. For example, -o3 -o1 -o3 causes
710                 the substrings matched by capturing parentheses 3 and  1  and
711                 then  3 again to be output. By default, there is no separator
712                 (but see the next but one option).
713
714       --om-capture=number
715                 Set the number of capturing parentheses that can be  accessed
716                 by -o. The default is 50.
717
718       --om-separator=text
719                 Specify  a  separating string for multiple occurrences of -o.
720                 The default is an empty string. Separating strings are  never
721                 coloured.
722
723       -q, --quiet
724                 Work quietly, that is, display nothing except error messages.
725                 The exit status indicates whether or  not  any  matches  were
726                 found.
727
728       -r, --recursive
729                 If  any given path is a directory, recursively scan the files
730                 it contains, taking note of any --include and --exclude  set‐
731                 tings.  By  default, a directory is read as a normal file; in
732                 some operating systems this gives an  immediate  end-of-file.
733                 This  option  is  a  shorthand  for  setting the -d option to
734                 "recurse".
735
736       --recursion-limit=number
737                 This is an obsolete synonym for --depth-limit.  See  --match-
738                 limit above for details.
739
740       -s, --no-messages
741                 Suppress  error  messages  about  non-existent  or unreadable
742                 files. Such files are quietly skipped.  However,  the  return
743                 code is still 2, even if matches were found in other files.
744
745       -t, --total-count
746                 This  option  is  useful when scanning more than one file. If
747                 used on its own, -t suppresses all output except for a  grand
748                 total  number  of matching lines (or non-matching lines if -v
749                 is used) in all the files. If -t is used  with  -c,  a  grand
750                 total  is  output except when the previous output is just one
751                 line. In other words, it is not output when just  one  file's
752                 count  is  listed.  If file names are being output, the grand
753                 total is preceded by "TOTAL:". Otherwise, it appears as  just
754                 another  number.  The  -t option is ignored when used with -L
755                 (list files without matches), because the grand  total  would
756                 always be zero.
757
758       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
759                 has been compiled with UTF-8 support. All patterns (including
760                 those  for any --exclude and --include options) and all lines
761                 that are scanned must be valid strings of  UTF-8  characters.
762                 If an invalid UTF-8 string is encountered, an error occurs.
763
764       -U, --utf-allow-invalid
765                 As  --utf,  but in addition subject lines may contain invalid
766                 UTF-8 code unit sequences. These can never form part  of  any
767                 pattern  match.  Patterns  themselves, however, must still be
768                 valid UTF-8 strings. This facility allows valid UTF-8 strings
769                 to be sought within arbitrary byte sequences in executable or
770                 other binary files. For more details about matching  in  non-
771                 valid UTF-8 strings, see the pcre2unicode(3) documentation.
772
773       -V, --version
774                 Write  the version numbers of pcre2grep and the PCRE2 library
775                 to the standard output and then exit. Anything  else  on  the
776                 command line is ignored.
777
778       -v, --invert-match
779                 Invert  the  sense  of  the match, so that lines which do not
780                 match any of the patterns are the ones that are  found.  When
781                 this  option  is  set,  options  such  as --only-matching and
782                 --output, which specify parts of a match that are to be  out‐
783                 put, are ignored.
784
785       -w, --word-regex, --word-regexp
786                 Force the patterns only to match "words". That is, there must
787                 be a word boundary at the  start  and  end  of  each  matched
788                 string.  This is equivalent to having "\b(?:" at the start of
789                 each pattern, and ")\b" at the end. This option applies  only
790                 to  the  patterns  that  are  matched against the contents of
791                 files; it does not apply to patterns specified by any of  the
792                 --include or --exclude options.
793
794       -x, --line-regex, --line-regexp
795                 Force  the  patterns to start matching only at the beginnings
796                 of lines, and in  addition,  require  them  to  match  entire
797                 lines. In multiline mode the match may be more than one line.
798                 This is equivalent to having "^(?:" at the start of each pat‐
799                 tern  and  ")$"  at  the end. This option applies only to the
800                 patterns that are matched against the contents of  files;  it
801                 does  not apply to patterns specified by any of the --include
802                 or --exclude options.
803

ENVIRONMENT VARIABLES

805
806       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
807       order,  for  a  locale.  The first one that is set is used. This can be
808       overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
809       library's default (usually the "C" locale) is used.
810

NEWLINES

812
813       The  -N  (--newline) option allows pcre2grep to scan files with newline
814       conventions that differ from the default. This option affects only  the
815       way  scanned files are processed. It does not affect the interpretation
816       of  files  specified  by  the  -f,  --file-list,   --exclude-from,   or
817       --include-from options.
818
819       Any  parts  of the scanned input files that are written to the standard
820       output are copied with whatever newline  sequences  they  have  in  the
821       input.  However, if the final line of a file is output, and it does not
822       end with a newline sequence, a newline sequence is added. If  the  new‐
823       line  setting  is  CR, LF, CRLF or NUL, that line ending is output; for
824       the other settings (ANYCRLF or ANY) a single NL is used.
825
826       The newline setting does not affect the way in which  pcre2grep  writes
827       newlines  in  informational  messages  to the standard output and error
828       streams.  Under Windows, the standard output is set to  be  binary,  so
829       that  "\r\n" at the ends of output lines that are copied from the input
830       is not converted to "\r\r\n" by the C I/O library. This means that  any
831       messages  written  to the standard output must end with "\r\n". For all
832       other operating systems, and for all messages  to  the  standard  error
833       stream, "\n" is used.
834

OPTIONS COMPATIBILITY

836
837       Many of the short and long forms of pcre2grep's options are the same as
838       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
839       terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
840       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
841       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi‐
842       line, -N, --newline,  --om-separator,  --output,  -u,  --utf,  -U,  and
843       --utf-allow-invalid options are specific to pcre2grep, as is the use of
844       the --only-matching option with a capturing parentheses number.
845
846       Although most of the common options work the same way, a few  are  dif‐
847       ferent  in pcre2grep. For example, the --include option's argument is a
848       glob for GNU grep, but a regular expression for pcre2grep. If both  the
849       -c  and  -l  options are given, GNU grep lists only file names, without
850       counts, but pcre2grep gives the counts as well.
851

OPTIONS WITH DATA

853
854       There are four different ways in which an option with data can be spec‐
855       ified.   If  a  short  form option is used, the data may follow immedi‐
856       ately, or (with one exception) in the next command line item. For exam‐
857       ple:
858
859         -f/some/file
860         -f /some/file
861
862       The  exception is the -o option, which may appear with or without data.
863       Because of this, if data is present, it must follow immediately in  the
864       same item, for example -o3.
865
866       If  a long form option is used, the data may appear in the same command
867       line item, separated by an equals character, or (with  two  exceptions)
868       it may appear in the next command line item. For example:
869
870         --file=/some/file
871         --file /some/file
872
873       Note,  however, that if you want to supply a file name beginning with ~
874       as data in a shell command, and have the  shell  expand  ~  to  a  home
875       directory, you must separate the file name from the option, because the
876       shell does not treat ~ specially unless it is at the start of an item.
877
878       The exceptions to the above are the --colour (or --color)  and  --only-
879       matching  options,  for  which  the  data  is optional. If one of these
880       options does have data, it must be given in the first  form,  using  an
881       equals character. Otherwise pcre2grep will assume that it has no data.
882

USING PCRE2'S CALLOUT FACILITY

884
885       pcre2grep  has,  by  default,  support for calling external programs or
886       scripts or echoing specific strings during matching by  making  use  of
887       PCRE2's  callout  facility.  However, this support can be completely or
888       partially disabled when pcre2grep is built. You can  find  out  whether
889       your  binary  has  support  for  callouts by running it with the --help
890       option. If callout support is completely disabled, all callouts in pat‐
891       terns are ignored by pcre2grep.  If the facility is partially disabled,
892       calling external programs is not supported, and callouts  that  request
893       it are ignored.
894
895       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
896       ment is either a number or a quoted string (see the pcre2callout  docu‐
897       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
898       only callouts with string arguments are useful.
899
900   Echoing a specific string
901
902       Starting the callout string with a pipe character  invokes  an  echoing
903       facility that avoids calling an external program or script. This facil‐
904       ity is always available, provided that  callouts  were  not  completely
905       disabled  when  pcre2grep  was built. The rest of the callout string is
906       processed as a zero-terminated string, which means it should  not  con‐
907       tain  any  internal  binary  zeros. It is written to the output, having
908       first been passed through the same escape processing as text  from  the
909       --output  (-O) option (see above). However, $0 cannot be used to insert
910       a matched substring because the match is still  in  progress.  Instead,
911       the  single  character '0' is inserted. Any syntax errors in the string
912       (for example, a dollar not followed by another  character)  causes  the
913       callout  to be ignored. No terminator is added to the output string, so
914       if you want a newline, you must include it explicitly using the  escape
915       $n. For example:
916
917         pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
918
919       Matching  continues normally after the string is output. If you want to
920       see only the callout output but not any output from  an  actual  match,
921       you should end the pattern with (*FAIL).
922
923   Calling external programs or scripts
924
925       This facility can be independently disabled when pcre2grep is built. It
926       is supported for Windows, where a call to _spawnvp() is used, for  VMS,
927       where  lib$spawn()  is  used,  and  for any Unix-like environment where
928       fork() and execv() are available.
929
930       If the callout string does not start with a pipe (vertical bar) charac‐
931       ter,  it  is parsed into a list of substrings separated by pipe charac‐
932       ters. The first substring must be an executable name, with the  follow‐
933       ing substrings specifying arguments:
934
935         executable_name|arg1|arg2|...
936
937       Any  substring  (including  the  executable  name)  may  contain escape
938       sequences started by a dollar character. These are the same as for  the
939       --output (-O) option documented above, except that $0 cannot insert the
940       matched string because the match is still  in  progress.  Instead,  the
941       character '0' is inserted. If you need a literal dollar or pipe charac‐
942       ter in any substring, use $$ or $| respectively. Here is an example:
943
944         echo -e "abcde\n12345" | pcre2grep \
945           '(?x)(.)(..(.))
946           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
947
948         Output:
949
950           Arg1: [a] [bcd] [d] Arg2: |a| ()
951           abcde
952           Arg1: [1] [234] [4] Arg2: |1| ()
953           12345
954
955       The parameters for the system call that is used to run the  program  or
956       script are zero-terminated strings. This means that binary zero charac‐
957       ters in the callout argument will cause premature termination of  their
958       substrings,  and  therefore should not be present. Any syntax errors in
959       the string (for example, a dollar not followed  by  another  character)
960       causes the callout to be ignored.  If running the program fails for any
961       reason (including the non-existence of the executable), a local  match‐
962       ing failure occurs and the matcher backtracks in the normal way.
963

MATCHING ERRORS

965
966       It  is  possible  to supply a regular expression that takes a very long
967       time to fail to match certain lines.  Such  patterns  normally  involve
968       nested  indefinite repeats, for example: (a+)*\d when matched against a
969       line of a's with no final digit. The  PCRE2  matching  function  has  a
970       resource  limit that causes it to abort in these circumstances. If this
971       happens, pcre2grep outputs an error message and the  line  that  caused
972       the  problem  to  the  standard error stream. If there are more than 20
973       such errors, pcre2grep gives up.
974
975       The --match-limit option of pcre2grep can be used to  set  the  overall
976       resource  limit.  There are also other limits that affect the amount of
977       memory used during matching; see the  discussion  of  --heap-limit  and
978       --depth-limit above.
979

DIAGNOSTICS

981
982       Exit status is 0 if any matches were found, 1 if no matches were found,
983       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
984       files  (even if matches were found in other files) or too many matching
985       errors. Using the -s option to suppress error messages about inaccessi‐
986       ble files does not affect the return code.
987
988       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
989       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
990       exit(1).
991

SEE ALSO

993
994       pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
995

AUTHOR

997
998       Philip Hazel
999       University Computing Service
1000       Cambridge, England.
1001

REVISION

1003
1004       Last updated: 04 October 2020
1005       Copyright (c) 1997-2020 University of Cambridge.
1006
1007
1008
1009PCRE2 10.36                     04 October 2020                   PCRE2GREP(1)
Impressum