1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4

NAME

6       pcre2grep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcre2grep  searches  files  for  character patterns, in the same way as
14       other grep commands do,  but  it  uses  the  PCRE2  regular  expression
15       library  to  support  patterns  that  are  compatible  with the regular
16       expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
17       of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
18       syntax and semantics of the regular expressions that PCRE2 supports.
19
20       Patterns, whether supplied on the command line or in a  separate  file,
21       are given without delimiters. For example:
22
23         pcre2grep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with slashes, as is common in Perl scripts), they  are  interpreted  as
27       part  of  the pattern. Quotes can of course be used to delimit patterns
28       on the command line because they are  interpreted  by  the  shell,  and
29       indeed  quotes  are required if a pattern contains white space or shell
30       metacharacters.
31
32       The first argument that follows any option settings is treated  as  the
33       single  pattern  to be matched when neither -e nor -f is present.  Con‐
34       versely, when one or both of these options are  used  to  specify  pat‐
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcre2grep  reads  the  standard  input.  The
39       standard  input can also be referenced by a name consisting of a single
40       hyphen.  For example:
41
42         pcre2grep some-pattern file1 - file3
43
44       Input files are searched line by  line.  By  default,  each  line  that
45       matches  a  pattern  is  copied to the standard output, and if there is
46       more than one file, the file name is output at the start of each  line,
47       followed  by  a  colon.  However, there are options that can change how
48       pcre2grep behaves. In particular, the -M option makes  it  possible  to
49       search  for  strings  that  span  line  boundaries. What defines a line
50       boundary is controlled by the -N (--newline) option.
51
52       The amount of memory used for buffering files that are being scanned is
53       controlled  by  parameters  that  can  be  set by the --buffer-size and
54       --max-buffer-size options. The first of these sets the size  of  buffer
55       that  is obtained at the start of processing. If an input file contains
56       very long lines, a larger buffer may be  needed;  this  is  handled  by
57       automatically extending the buffer, up to the limit specified by --max-
58       buffer-size. The default values for these parameters can  be  set  when
59       pcre2grep  is  built;  if nothing is specified, the defaults are set to
60       20KiB and 1MiB respectively. An error occurs if a line is too long  and
61       the buffer can no longer be expanded.
62
63       The  block  of  memory that is actually used is three times the "buffer
64       size", to allow for buffering "before" and "after" lines. If the buffer
65       size  is too small, fewer than requested "before" and "after" lines may
66       be output.
67
68       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
69       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
70       pattern (specified by the use of -e and/or -f), each pattern is applied
71       to  each  line  in the order in which they are defined, except that all
72       the -e patterns are tried before the -f patterns.
73
74       By default, as soon as one pattern matches a line, no further  patterns
75       are considered. However, if --colour (or --color) is used to colour the
76       matching substrings, or if --only-matching, --file-offsets, or  --line-
77       offsets  is  used  to  output  only  the  part of the line that matched
78       (either shown literally, or as an offset), scanning resumes immediately
79       following  the  match,  so that further matches on the same line can be
80       found. If there are multiple  patterns,  they  are  all  tried  on  the
81       remainder  of  the  line, but patterns that follow the one that matched
82       are not tried on the earlier part of the line.
83
84       This behaviour means that the order  in  which  multiple  patterns  are
85       specified  can affect the output when one of the above options is used.
86       This is no longer the same behaviour as GNU grep, which now manages  to
87       display  earlier  matches  for  later  patterns (as long as there is no
88       overlap).
89
90       Patterns that can match an empty string are accepted, but empty  string
91       matches   are   never   recognized.   An   example   is   the   pattern
92       "(super)?(man)?", in which all components are  optional.  This  pattern
93       finds  all  occurrences  of  both "super" and "man"; the output differs
94       from matching with "super|man" when only the  matching  substrings  are
95       being shown.
96
97       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
98       the value to set a locale when calling the PCRE2 library.  The --locale
99       option can be used to override this.
100

SUPPORT FOR COMPRESSED FILES

102
103       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
104       read compressed files whose names end in .gz or .bz2, respectively. You
105       can  find out whether your pcre2grep binary has support for one or both
106       of these file types by running it with the --help option. If the appro‐
107       priate support is not present, all files are treated as plain text. The
108       standard input is always so treated. When input is  from  a  compressed
109       .gz or .bz2 file, the --line-buffered option is ignored.
110

BINARY FILES

112
113       By  default,  a  file that contains a binary zero byte within the first
114       1024 bytes is identified as a binary file, and is processed  specially.
115       (GNU grep identifies binary files in this manner.) However, if the new‐
116       line type is specified as NUL, that is, the line terminator is a binary
117       zero, the test for a binary file is not applied. See the --binary-files
118       option for a means of changing the way binary files are handled.
119

BINARY ZEROS IN PATTERNS

121
122       Patterns passed from the command line are strings that  are  terminated
123       by  a  binary zero, so cannot contain internal zeros. However, patterns
124       that are read from a file via the -f option may contain binary zeros.
125

OPTIONS

127
128       The order in which some of the options appear can  affect  the  output.
129       For  example,  both  the  -H and -l options affect the printing of file
130       names. Whichever comes later in the command line will be the  one  that
131       takes  effect.  Similarly,  except  where  noted below, if an option is
132       given twice, the later setting is used. Numerical  values  for  options
133       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
134       1024*1024 respectively.
135
136       --        This terminates the list of options. It is useful if the next
137                 item  on  the command line starts with a hyphen but is not an
138                 option. This allows for the processing of patterns  and  file
139                 names that start with hyphens.
140
141       -A number, --after-context=number
142                 Output  up  to  number  lines  of context after each matching
143                 line. Fewer lines are output if the next match or the end  of
144                 the  file  is  reached,  or if the processing buffer size has
145                 been set too small. If file names  and/or  line  numbers  are
146                 being  output,  a hyphen separator is used instead of a colon
147                 for the context lines.  A  line  containing  "--"  is  output
148                 between each group of lines, unless they are in fact contigu‐
149                 ous in the input file. The value of number is expected to  be
150                 relatively small. When -c is used, -A is ignored.
151
152       -a, --text
153                 Treat  binary  files as text. This is equivalent to --binary-
154                 files=text.
155
156       -B number, --before-context=number
157                 Output up to number lines of  context  before  each  matching
158                 line.  Fewer  lines  are  output if the previous match or the
159                 start of the file is within number lines, or if the  process‐
160                 ing  buffer size has been set too small. If file names and/or
161                 line numbers are being output, a  hyphen  separator  is  used
162                 instead  of  a colon for the context lines. A line containing
163                 "--" is output between each group of lines, unless  they  are
164                 in  fact contiguous in the input file. The value of number is
165                 expected to be relatively small.  When  -c  is  used,  -B  is
166                 ignored.
167
168       --binary-files=word
169                 Specify  how binary files are to be processed. If the word is
170                 "binary" (the default),  pattern  matching  is  performed  on
171                 binary  files,  but  the  only  output is "Binary file <name>
172                 matches" when a match succeeds. If the word is "text",  which
173                 is  equivalent  to  the -a or --text option, binary files are
174                 processed in the same way as any other file.  In  this  case,
175                 when  a  match  succeeds,  the  output may be binary garbage,
176                 which can have nasty effects if sent to a  terminal.  If  the
177                 word  is  "without-match",  which  is  equivalent  to  the -I
178                 option, binary files are  not  processed  at  all;  they  are
179                 assumed not to be of interest and are skipped without causing
180                 any output or affecting the return code.
181
182       --buffer-size=number
183                 Set the parameter that controls how much memory  is  obtained
184                 at the start of processing for buffering files that are being
185                 scanned. See also --max-buffer-size below.
186
187       -C number, --context=number
188                 Output number lines of context both  before  and  after  each
189                 matching  line.  This is equivalent to setting both -A and -B
190                 to the same value.
191
192       -c, --count
193                 Do not output lines from the files that  are  being  scanned;
194                 instead  output  the  number  of  lines  that would have been
195                 shown, either because they matched, or, if -v is set, because
196                 they  failed  to match. By default, this count is exactly the
197                 same as the number of lines that would have been output,  but
198                 if  the -M (multiline) option is used (without -v), there may
199                 be more suppressed lines than the count (that is, the  number
200                 of matches).
201
202                 If  no lines are selected, the number zero is output. If sev‐
203                 eral files are are being scanned, a count is output for  each
204                 of  them and the -t option can be used to cause a total to be
205                 output at  the  end.  However,  if  the  --files-with-matches
206                 option  is  also  used,  only  those  files  whose counts are
207                 greater than zero are listed. When -c is used,  the  -A,  -B,
208                 and -C options are ignored.
209
210       --colour, --color
211                 If this option is given without any data, it is equivalent to
212                 "--colour=auto".  If data is required, it must  be  given  in
213                 the same shell item, separated by an equals sign.
214
215       --colour=value, --color=value
216                 This option specifies under what circumstances the parts of a
217                 line that matched a pattern should be coloured in the output.
218                 By  default,  the output is not coloured. The value (which is
219                 optional, see above) may be "never", "always", or "auto".  In
220                 the  latter case, colouring happens only if the standard out‐
221                 put is connected to a terminal. More resources are used  when
222                 colouring is enabled, because pcre2grep has to search for all
223                 possible matches in a line, not just one, in order to  colour
224                 them all.
225
226                 The  colour  that  is used can be specified by setting one of
227                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
228                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
229                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
230                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
231                 variable should be a string of two numbers,  separated  by  a
232                 semicolon,  except  in  the  case  of GREP_COLORS, which must
233                 start with "ms=" or "mt=" followed by two semicolon-separated
234                 colours,  terminated  by the end of the string or by a colon.
235                 If GREP_COLORS does not start  with  "ms="  or  "mt="  it  is
236                 ignored, and GREP_COLOR is checked.
237
238                 If  the  string obtained from one of the above variables con‐
239                 tains any characters other than semicolon or digits, the set‐
240                 ting is ignored and the default colour is used. The string is
241                 copied directly into the control string for setting colour on
242                 a  terminal,  so it is your responsibility to ensure that the
243                 values make sense. If no  relevant  environment  variable  is
244                 set, the default is "1;31", which gives red.
245
246       -D action, --devices=action
247                 If  an  input  path  is  not  a  regular file or a directory,
248                 "action" specifies how it is to be  processed.  Valid  values
249                 are "read" (the default) or "skip" (silently skip the path).
250
251       -d action, --directories=action
252                 If an input path is a directory, "action" specifies how it is
253                 to be processed.  Valid values are  "read"  (the  default  in
254                 non-Windows  environments,  for compatibility with GNU grep),
255                 "recurse" (equivalent to the -r option), or "skip"  (silently
256                 skip  the  path, the default in Windows environments). In the
257                 "read" case, directories are read as if  they  were  ordinary
258                 files.  In  some  operating  systems  the effect of reading a
259                 directory like this is an immediate end-of-file; in others it
260                 may provoke an error.
261
262       --depth-limit=number
263                 See --match-limit below.
264
265       -e pattern, --regex=pattern, --regexp=pattern
266                 Specify a pattern to be matched. This option can be used mul‐
267                 tiple times in order to specify several patterns. It can also
268                 be  used  as a way of specifying a single pattern that starts
269                 with a hyphen. When -e is used, no argument pattern is  taken
270                 from  the  command  line;  all  arguments are treated as file
271                 names. There is no limit to the number of patterns. They  are
272                 applied  to  each line in the order in which they are defined
273                 until one matches.
274
275                 If -f is used with -e, the command line patterns are  matched
276                 first, followed by the patterns from the file(s), independent
277                 of the order in which these options are specified. Note  that
278                 multiple  use  of -e is not the same as a single pattern with
279                 alternatives. For example, X|Y finds the first character in a
280                 line  that  is  X or Y, whereas if the two patterns are given
281                 separately, with X first, pcre2grep finds X if it is present,
282                 even if it follows Y in the line. It finds Y only if there is
283                 no X in the line. This matters only if you are  using  -o  or
284                 --colo(u)r to show the part(s) of the line that matched.
285
286       --exclude=pattern
287                 Files (but not directories) whose names match the pattern are
288                 skipped without being processed. This applies to  all  files,
289                 whether  listed  on  the  command line, obtained from --file-
290                 list, or by scanning a directory. The pattern is a PCRE2 reg‐
291                 ular  expression,  and is matched against the final component
292                 of the file name, not the entire path. The  -F,  -w,  and  -x
293                 options do not apply to this pattern. The option may be given
294                 any number of times in order to specify multiple patterns. If
295                 a  file  name matches both an --include and an --exclude pat‐
296                 tern, it is excluded. There is no short form for this option.
297
298       --exclude-from=filename
299                 Treat each non-empty line of the file  as  the  data  for  an
300                 --exclude option. What constitutes a newline when reading the
301                 file is the operating system's default. The --newline  option
302                 has  no  effect on this option. This option may be given more
303                 than once in order to specify a number of files to read.
304
305       --exclude-dir=pattern
306                 Directories whose names match the pattern are skipped without
307                 being  processed,  whatever  the  setting  of the --recursive
308                 option. This applies to all directories,  whether  listed  on
309                 the command line, obtained from --file-list, or by scanning a
310                 parent directory. The pattern is a PCRE2 regular  expression,
311                 and  is  matched against the final component of the directory
312                 name, not the entire path. The -F, -w, and -x options do  not
313                 apply  to this pattern. The option may be given any number of
314                 times in order to specify more than one pattern. If a  direc‐
315                 tory  matches  both  --include-dir  and  --exclude-dir, it is
316                 excluded. There is no short form for this option.
317
318       -F, --fixed-strings
319                 Interpret each data-matching  pattern  as  a  list  of  fixed
320                 strings,  separated  by  newlines,  instead  of  as a regular
321                 expression. What constitutes a newline for  this  purpose  is
322                 controlled  by the --newline option. The -w (match as a word)
323                 and -x (match whole line) options can be used with -F.   They
324                 apply to each of the fixed strings. A line is selected if any
325                 of the fixed strings are found in it (subject to -w or -x, if
326                 present).  This  option applies only to the patterns that are
327                 matched against the contents of files; it does not  apply  to
328                 patterns  specified  by  any  of  the  --include or --exclude
329                 options.
330
331       -f filename, --file=filename
332                 Read patterns from the file, one per  line,  and  match  them
333                 against  each  line of input. As is the case with patterns on
334                 the command line, no delimiters should be used. What  consti‐
335                 tutes  a  newline when reading the file is the operating sys‐
336                 tem's default interpretation of \n. The --newline option  has
337                 no  effect  on  this  option. Trailing white space is removed
338                 from each line, and blank lines are ignored.  An  empty  file
339                 contains  no patterns and therefore matches nothing. Patterns
340                 read from a file in this way may contain binary zeros,  which
341                 are  treated  as  ordinary data characters. See also the com‐
342                 ments about multiple patterns versus a  single  pattern  with
343                 alternatives in the description of -e above.
344
345                 If  this  option  is  given more than once, all the specified
346                 files are read. A data line is output if any of the  patterns
347                 match  it.  A  file  name can be given as "-" to refer to the
348                 standard input. When -f is used, patterns  specified  on  the
349                 command  line  using  -e may also be present; they are tested
350                 before the file's patterns.  However,  no  other  pattern  is
351                 taken from the command line; all arguments are treated as the
352                 names of paths to be searched.
353
354       --file-list=filename
355                 Read a list of  files  and/or  directories  that  are  to  be
356                 scanned from the given file, one per line. What constitutes a
357                 newline when reading  the  file  is  the  operating  system's
358                 default.  Trailing white space is removed from each line, and
359                 blank lines are ignored. These paths are processed before any
360                 that  are  listed  on  the command line. The file name can be
361                 given as "-" to refer to the standard input.  If  --file  and
362                 --file-list  are  both  specified  as  "-", patterns are read
363                 first. This is useful only when the standard input is a  ter‐
364                 minal,  from  which  further lines (the list of files) can be
365                 read after an end-of-file indication. If this option is given
366                 more than once, all the specified files are read.
367
368       --file-offsets
369                 Instead  of  showing lines or parts of lines that match, show
370                 each match as an offset from the start  of  the  file  and  a
371                 length,  separated  by  a  comma. In this mode, no context is
372                 shown. That is, the -A, -B, and -C options  are  ignored.  If
373                 there is more than one match in a line, each of them is shown
374                 separately. This option is mutually exclusive with  --output,
375                 --line-offsets, and --only-matching.
376
377       -H, --with-filename
378                 Force  the  inclusion of the file name at the start of output
379                 lines when searching a single file. By default, the file name
380                 is not shown in this case.  For matching lines, the file name
381                 is followed by a colon; for context lines, a hyphen separator
382                 is  used.  If  a line number is also being output, it follows
383                 the file name. When the -M option causes a pattern  to  match
384                 more  than  one  line, only the first is preceded by the file
385                 name. This option  overrides  any  previous  -h,  -l,  or  -L
386                 options.
387
388       -h, --no-filename
389                 Suppress the output file names when searching multiple files.
390                 By default, file names are  shown  when  multiple  files  are
391                 searched.  For matching lines, the file name is followed by a
392                 colon; for context lines, a hyphen separator is used.   If  a
393                 line  number  is also being output, it follows the file name.
394                 This option overrides any previous -H, -L, or -l options.
395
396       --heap-limit=number
397                 See --match-limit below.
398
399       --help    Output a help message, giving brief details  of  the  command
400                 options  and  file type support, and then exit. Anything else
401                 on the command line is ignored.
402
403       -I        Ignore  binary  files.  This  is  equivalent   to   --binary-
404                 files=without-match.
405
406       -i, --ignore-case
407                 Ignore upper/lower case distinctions during comparisons.
408
409       --include=pattern
410                 If  any --include patterns are specified, the only files that
411                 are processed are those that match one of the  patterns  (and
412                 do  not  match  an  --exclude  pattern). This option does not
413                 affect directories, but it  applies  to  all  files,  whether
414                 listed  on the command line, obtained from --file-list, or by
415                 scanning a directory. The pattern is a PCRE2 regular  expres‐
416                 sion,  and is matched against the final component of the file
417                 name, not the entire path. The -F, -w, and -x options do  not
418                 apply  to this pattern. The option may be given any number of
419                 times. If a file  name  matches  both  an  --include  and  an
420                 --exclude  pattern,  it  is excluded.  There is no short form
421                 for this option.
422
423       --include-from=filename
424                 Treat each non-empty line of the file  as  the  data  for  an
425                 --include option. What constitutes a newline for this purpose
426                 is the operating system's default. The --newline  option  has
427                 no effect on this option. This option may be given any number
428                 of times; all the files are read.
429
430       --include-dir=pattern
431                 If any --include-dir patterns are specified, the only  direc‐
432                 tories  that  are  processed  are those that match one of the
433                 patterns (and do not match an  --exclude-dir  pattern).  This
434                 applies  to  all  directories,  whether listed on the command
435                 line, obtained from --file-list,  or  by  scanning  a  parent
436                 directory.  The pattern is a PCRE2 regular expression, and is
437                 matched against the final component of  the  directory  name,
438                 not  the entire path. The -F, -w, and -x options do not apply
439                 to this pattern. The option may be given any number of times.
440                 If  a directory matches both --include-dir and --exclude-dir,
441                 it is excluded. There is no short form for this option.
442
443       -L, --files-without-match
444                 Instead of outputting lines from the files, just  output  the
445                 names  of  the files that do not contain any lines that would
446                 have been output. Each file name is output once, on  a  sepa‐
447                 rate  line.  This option overrides any previous -H, -h, or -l
448                 options.
449
450       -l, --files-with-matches
451                 Instead of outputting lines from the files, just  output  the
452                 names of the files containing lines that would have been out‐
453                 put. Each file name is  output  once,  on  a  separate  line.
454                 Searching  normally stops as soon as a matching line is found
455                 in a file. However, if the -c (count) option  is  also  used,
456                 matching  continues in order to obtain the correct count, and
457                 those files that have at least one  match  are  listed  along
458                 with their counts. Using this option with -c is a way of sup‐
459                 pressing the listing of files with no  matches.  This  opeion
460                 overrides any previous -H, -h, or -L options.
461
462       --label=name
463                 This option supplies a name to be used for the standard input
464                 when file names are being output. If not supplied, "(standard
465                 input)" is used. There is no short form for this option.
466
467       --line-buffered
468                 When  this  option is given, non-compressed input is read and
469                 processed line by line, and the output is flushed after  each
470                 write.  By  default,  input  is  read in large chunks, unless
471                 pcre2grep can determine that it is reading  from  a  terminal
472                 (which  is  currently possible only in Unix-like environments
473                 or Windows). Output to  terminal  is  normally  automatically
474                 flushed  by  the  operating system. This option can be useful
475                 when the input or output is attached to a pipe and you do not
476                 want  pcre2grep to buffer up large amounts of data.  However,
477                 its use will  affect  performance,  and  the  -M  (multiline)
478                 option ceases to work. When input is from a compressed .gz or
479                 .bz2 file, --line-buffered is ignored.
480
481       --line-offsets
482                 Instead of showing lines or parts of lines that  match,  show
483                 each match as a line number, the offset from the start of the
484                 line, and a length. The line number is terminated by a  colon
485                 (as  usual; see the -n option), and the offset and length are
486                 separated by a comma. In this  mode,  no  context  is  shown.
487                 That  is, the -A, -B, and -C options are ignored. If there is
488                 more than one match in a line, each of them  is  shown  sepa‐
489                 rately.  This  option  is  mutually  exclusive with --output,
490                 --file-offsets, and --only-matching.
491
492       --locale=locale-name
493                 This option specifies a locale to be used for pattern  match‐
494                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi‐
495                 ronment variables. If  no  locale  is  specified,  the  PCRE2
496                 library's  default (usually the "C" locale) is used. There is
497                 no short form for this option.
498
499       --match-limit=number
500                 Processing some regular expression patterns may take  a  very
501                 long time to search for all possible matching strings. Others
502                 may require a very large amount of memory.  There  are  three
503                 options that set resource limits for matching.
504
505                 The --match-limit option provides a means of limiting comput‐
506                 ing resource usage when  processing  patterns  that  are  not
507                 going  to match, but which have a very large number of possi‐
508                 bilities in their search trees. The classic example is a pat‐
509                 tern  that  uses  nested unlimited repeats. Internally, PCRE2
510                 has a counter that is incremented each time around  its  main
511                 processing  loop.  If  the  value  set  by  --match-limit  is
512                 reached, an error occurs.
513
514                 The --heap-limit option specifies, as a number  of  kibibytes
515                 (units  of 1024 bytes), the amount of heap memory that may be
516                 used for matching. Heap memory is needed only if matching the
517                 pattern  requires a significant number of nested backtracking
518                 points to be remembered. This parameter can be set to zero to
519                 forbid the use of heap memory altogether.
520
521                 The  --depth-limit  option  limits  the depth of nested back‐
522                 tracking points, which indirectly limits the amount of memory
523                 that is used. The amount of memory needed for each backtrack‐
524                 ing point depends on the number of capturing  parentheses  in
525                 the pattern, so the amount of memory that is used before this
526                 limit acts varies from pattern to pattern. This limit  is  of
527                 use only if it is set smaller than --match-limit.
528
529                 There  are no short forms for these options. The default lim‐
530                 its can be set when the PCRE2 library is  compiled;  if  they
531                 are  not specified, the defaults are very large and so effec‐
532                 tively unlimited.
533
534       --max-buffer-size=number
535                 This limits the expansion of  the  processing  buffer,  whose
536                 initial  size can be set by --buffer-size. The maximum buffer
537                 size is silently forced to be no smaller  than  the  starting
538                 buffer size.
539
540       -M, --multiline
541                 Allow  patterns to match more than one line. When this option
542                 is set, the PCRE2 library is called in "multiline" mode. This
543                 allows  a matched string to extend past the end of a line and
544                 continue on one or more subsequent lines. Patterns used  with
545                 -M may usefully contain literal newline characters and inter‐
546                 nal occurrences of ^ and $ characters. The output for a  suc‐
547                 cessful  match  may  consist of more than one line. The first
548                 line is the line in which the match  started,  and  the  last
549                 line  is  the  line  in which the match ended. If the matched
550                 string ends with a newline sequence, the output ends  at  the
551                 end  of  that  line.   If  -v  is set, none of the lines in a
552                 multi-line match are output. Once a match has  been  handled,
553                 scanning  restarts at the beginning of the line after the one
554                 in which the match ended.
555
556                 The newline sequence that separates multiple  lines  must  be
557                 matched  as  part  of  the  pattern. For example, to find the
558                 phrase "regular expression" in a file where  "regular"  might
559                 be  at the end of a line and "expression" at the start of the
560                 next line, you could use this command:
561
562                   pcre2grep -M 'regular\s+expression' <file>
563
564                 The \s escape sequence matches  any  white  space  character,
565                 including  newlines,  and  is  followed  by  + so as to match
566                 trailing white space on the first line as  well  as  possibly
567                 handling a two-character newline sequence.
568
569                 There  is a limit to the number of lines that can be matched,
570                 imposed by the way that pcre2grep buffers the input  file  as
571                 it  scans  it.  With  a sufficiently large processing buffer,
572                 this should not be a problem, but the -M option does not work
573                 when input is read line by line (see --line-buffered.)
574
575       -N newline-type, --newline=newline-type
576                 Six different conventions for indicating the ends of lines in
577                 scanned files are supported. For example:
578
579                   pcre2grep -N CRLF 'some pattern' <file>
580
581                 The newline type may be specified in upper, lower,  or  mixed
582                 case.  If  the  newline  type  is NUL, lines are separated by
583                 binary zero characters. The other types are the  single-char‐
584                 acter  sequences  CR (carriage return) and LF (linefeed), the
585                 two-character sequence CRLF, an "anycrlf" type, which  recog‐
586                 nizes  any  of  the preceding three types, and an "any" type,
587                 for which any Unicode line ending sequence is assumed to  end
588                 a  line.  The Unicode sequences are the three just mentioned,
589                 plus VT (vertical tab, U+000B), FF (form feed,  U+000C),  NEL
590                 (next  line,  U+0085),  LS  (line  separator, U+2028), and PS
591                 (paragraph separator, U+2029).
592
593                 When the  PCRE2  library  is  built,  a  default  line-ending
594                 sequence   is  specified.   This  is  normally  the  standard
595                 sequence for the operating system. Unless otherwise specified
596                 by this option, pcre2grep uses the library's default.
597
598                 This  option makes it possible to use pcre2grep to scan files
599                 that have come from other environments without having to mod‐
600                 ify  their  line  endings.  If the data that is being scanned
601                 does not agree  with  the  convention  set  by  this  option,
602                 pcre2grep  may  behave in strange ways. Note that this option
603                 does not apply to files specified by the -f,  --exclude-from,
604                 or  --include-from  options,  which  are  expected to use the
605                 operating system's standard newline sequence.
606
607       -n, --line-number
608                 Precede each output line by its line number in the file, fol‐
609                 lowed  by  a colon for matching lines or a hyphen for context
610                 lines. If the file name is also being output, it precedes the
611                 line  number.  When  the  -M option causes a pattern to match
612                 more than one line, only the first is preceded  by  its  line
613                 number. This option is forced if --line-offsets is used.
614
615       --no-jit  If  the  PCRE2 library is built with support for just-in-time
616                 compiling (which speeds up matching), pcre2grep automatically
617                 makes use of this, unless it was explicitly disabled at build
618                 time. This option can be used to disable the use  of  JIT  at
619                 run  time. It is provided for testing and working round prob‐
620                 lems.  It should never be needed in normal use.
621
622       -O text, --output=text
623                 When there is a match, instead of outputting the  whole  line
624                 that  matched,  output  just  the  given text, followed by an
625                 operating-system standard newline.  The --newline option  has
626                 no  effect  on  this option, which is mutually exclusive with
627                 --only-matching, --file-offsets, and  --line-offsets.  Escape
628                 sequences  starting  with  a  dollar character may be used to
629                 insert the contents of the matched part of  the  line  and/or
630                 captured substrings into the text.
631
632                 $<digits>  or  ${<digits>}  is  replaced by the captured sub‐
633                 string of the given  decimal  number;  zero  substitutes  the
634                 whole match. If the number is greater than the number of cap‐
635                 turing substrings, or if the capture is unset,  the  replace‐
636                 ment is empty.
637
638                 $a  is replaced by bell; $b by backspace; $e by escape; $f by
639                 form feed; $n by newline; $r by carriage return; $t  by  tab;
640                 $v by vertical tab.
641
642                 $o<digits>  is  replaced  by the character represented by the
643                 given octal number; up to three digits are processed.
644
645                 $x<digits> is replaced by the character  represented  by  the
646                 given hexadecimal number; up to two digits are processed.
647
648                 Any  other character is substituted by itself. In particular,
649                 $$ is replaced by a single dollar.
650
651       -o, --only-matching
652                 Show only the part of the line that matched a pattern instead
653                 of  the  whole  line. In this mode, no context is shown. That
654                 is, the -A, -B, and -C options are ignored. If there is  more
655                 than  one  match in a line, each of them is shown separately,
656                 on a separate line of output.  If  -o  is  combined  with  -v
657                 (invert  the  sense of the match to find non-matching lines),
658                 no output is generated, but the return code is set  appropri‐
659                 ately.  If  the matched portion of the line is empty, nothing
660                 is output unless the file  name  or  line  number  are  being
661                 printed,  in  which case they are shown on an otherwise empty
662                 line.  This  option  is  mutually  exclusive  with  --output,
663                 --file-offsets and --line-offsets.
664
665       -onumber, --only-matching=number
666                 Show  only  the  part  of the line that matched the capturing
667                 parentheses of the given number. Up to 50 capturing parenthe‐
668                 ses  are  supported by default. This limit can be changed via
669                 the --om-capture option. A pattern may contain any number  of
670                 capturing  parentheses, but only those whose number is within
671                 the limit can be accessed by -o. An error occurs if the  num‐
672                 ber specified by -o is greater than the limit.
673
674                 -o0 is the same as -o without a number. Because these options
675                 can be given without an argument (see above), if an  argument
676                 is  present,  it  must  be  given in the same shell item, for
677                 example, -o3 or --only-matching=2. The comments given for the
678                 non-argument  case  above  also  apply to this option. If the
679                 specified capturing parentheses do not exist in the  pattern,
680                 or  were  not  set in the match, nothing is output unless the
681                 file name or line number are being output.
682
683                 If this option is given multiple times,  multiple  substrings
684                 are  output  for  each  match,  in  the order the options are
685                 given, and all on one line. For example, -o3 -o1  -o3  causes
686                 the  substrings  matched by capturing parentheses 3 and 1 and
687                 then 3 again to be output. By default, there is no  separator
688                 (but see the next but one option).
689
690       --om-capture=number
691                 Set  the number of capturing parentheses that can be accessed
692                 by -o. The default is 50.
693
694       --om-separator=text
695                 Specify a separating string for multiple occurrences  of  -o.
696                 The  default is an empty string. Separating strings are never
697                 coloured.
698
699       -q, --quiet
700                 Work quietly, that is, display nothing except error messages.
701                 The  exit  status  indicates  whether or not any matches were
702                 found.
703
704       -r, --recursive
705                 If any given path is a directory, recursively scan the  files
706                 it  contains, taking note of any --include and --exclude set‐
707                 tings. By default, a directory is read as a normal  file;  in
708                 some  operating  systems this gives an immediate end-of-file.
709                 This option is a shorthand  for  setting  the  -d  option  to
710                 "recurse".
711
712       --recursion-limit=number
713                 See --match-limit above.
714
715       -s, --no-messages
716                 Suppress  error  messages  about  non-existent  or unreadable
717                 files. Such files are quietly skipped.  However,  the  return
718                 code is still 2, even if matches were found in other files.
719
720       -t, --total-count
721                 This  option  is  useful when scanning more than one file. If
722                 used on its own, -t suppresses all output except for a  grand
723                 total  number  of matching lines (or non-matching lines if -v
724                 is used) in all the files. If -t is used  with  -c,  a  grand
725                 total  is  output except when the previous output is just one
726                 line. In other words, it is not output when just  one  file's
727                 count  is  listed.  If file names are being output, the grand
728                 total is preceded by "TOTAL:". Otherwise, it appears as  just
729                 another  number.  The  -t option is ignored when used with -L
730                 (list files without matches), because the grand  total  would
731                 always be zero.
732
733       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
734                 has been compiled with UTF-8 support. All patterns (including
735                 those  for  any --exclude and --include options) and all sub‐
736                 ject lines that are scanned must be valid  strings  of  UTF-8
737                 characters.
738
739       -U, --utf-allow-invalid
740                 As  --utf,  but in addition subject lines may contain invalid
741                 UTF-8 code unit sequences. These can never form part  of  any
742                 pattern match. This facility allows valid UTF-8 strings to be
743                 sought in executable or other binary files.  For more details
744                 about  matching in non-valid UTF-8 strings, see the pcre2uni‐
745                 code(3) documentation.
746
747       -V, --version
748                 Write the version numbers of pcre2grep and the PCRE2  library
749                 to  the  standard  output and then exit. Anything else on the
750                 command line is ignored.
751
752       -v, --invert-match
753                 Invert the sense of the match, so that  lines  which  do  not
754                 match any of the patterns are the ones that are found.
755
756       -w, --word-regex, --word-regexp
757                 Force the patterns only to match "words". That is, there must
758                 be a word boundary at the  start  and  end  of  each  matched
759                 string.  This is equivalent to having "\b(?:" at the start of
760                 each pattern, and ")\b" at the end. This option applies  only
761                 to  the  patterns  that  are  matched against the contents of
762                 files; it does not apply to patterns specified by any of  the
763                 --include or --exclude options.
764
765       -x, --line-regex, --line-regexp
766                 Force  the  patterns to start matching only at the beginnings
767                 of lines, and in  addition,  require  them  to  match  entire
768                 lines. In multiline mode the match may be more than one line.
769                 This is equivalent to having "^(?:" at the start of each pat‐
770                 tern  and  ")$"  at  the end. This option applies only to the
771                 patterns that are matched against the contents of  files;  it
772                 does  not apply to patterns specified by any of the --include
773                 or --exclude options.
774

ENVIRONMENT VARIABLES

776
777       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
778       order,  for  a  locale.  The first one that is set is used. This can be
779       overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
780       library's default (usually the "C" locale) is used.
781

NEWLINES

783
784       The  -N  (--newline) option allows pcre2grep to scan files with newline
785       conventions that differ from the default. This option affects only  the
786       way  scanned files are processed. It does not affect the interpretation
787       of  files  specified  by  the  -f,  --file-list,   --exclude-from,   or
788       --include-from options.
789
790       Any  parts  of the scanned input files that are written to the standard
791       output are copied with whatever newline  sequences  they  have  in  the
792       input.  However, if the final line of a file is output, and it does not
793       end with a newline sequence, a newline sequence is added. If  the  new‐
794       line  setting  is  CR, LF, CRLF or NUL, that line ending is output; for
795       the other settings (ANYCRLF or ANY) a single NL is used.
796
797       The newline setting does not affect the way in which  pcre2grep  writes
798       newlines  in  informational  messages  to the standard output and error
799       streams.  Under Windows, the standard output is set to  be  binary,  so
800       that  "\r\n" at the ends of output lines that are copied from the input
801       is not converted to "\r\r\n" by the C I/O library. This means that  any
802       messages  written  to the standard output must end with "\r\n". For all
803       other operating systems, and for all messages  to  the  standard  error
804       stream, "\n" is used.
805

OPTIONS COMPATIBILITY

807
808       Many of the short and long forms of pcre2grep's options are the same as
809       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
810       terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
811       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
812       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi‐
813       line, -N, --newline,  --om-separator,  --output,  -u,  --utf,  -U,  and
814       --utf-allow-invalid options are specific to pcre2grep, as is the use of
815       the --only-matching option with a capturing parentheses number.
816
817       Although most of the common options work the same way, a few  are  dif‐
818       ferent  in pcre2grep. For example, the --include option's argument is a
819       glob for GNU grep, but a regular expression for pcre2grep. If both  the
820       -c  and  -l  options are given, GNU grep lists only file names, without
821       counts, but pcre2grep gives the counts as well.
822

OPTIONS WITH DATA

824
825       There are four different ways in which an option with data can be spec‐
826       ified.   If  a  short  form option is used, the data may follow immedi‐
827       ately, or (with one exception) in the next command line item. For exam‐
828       ple:
829
830         -f/some/file
831         -f /some/file
832
833       The  exception is the -o option, which may appear with or without data.
834       Because of this, if data is present, it must follow immediately in  the
835       same item, for example -o3.
836
837       If  a long form option is used, the data may appear in the same command
838       line item, separated by an equals character, or (with  two  exceptions)
839       it may appear in the next command line item. For example:
840
841         --file=/some/file
842         --file /some/file
843
844       Note,  however, that if you want to supply a file name beginning with ~
845       as data in a shell command, and have the  shell  expand  ~  to  a  home
846       directory, you must separate the file name from the option, because the
847       shell does not treat ~ specially unless it is at the start of an item.
848
849       The exceptions to the above are the --colour (or --color)  and  --only-
850       matching  options,  for  which  the  data  is optional. If one of these
851       options does have data, it must be given in the first  form,  using  an
852       equals character. Otherwise pcre2grep will assume that it has no data.
853

USING PCRE2'S CALLOUT FACILITY

855
856       pcre2grep  has,  by  default,  support for calling external programs or
857       scripts or echoing specific strings during matching by  making  use  of
858       PCRE2's  callout  facility.  However, this support can be completely or
859       partially disabled when pcre2grep is built. You can  find  out  whether
860       your  binary  has  support  for  callouts by running it with the --help
861       option. If callout support is completely disabled, all callouts in pat‐
862       terns are ignored by pcre2grep.  If the facility is partially disabled,
863       calling external programs is not supported, and callouts  that  request
864       it are ignored.
865
866       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
867       ment is either a number or a quoted string (see the pcre2callout  docu‐
868       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
869       only callouts with string arguments are useful.
870
871   Calling external programs or scripts
872
873       This facility can be independently disabled when pcre2grep is built. It
874       is  supported for Windows, where a call to _spawnvp() is used, for VMS,
875       where lib$spawn() is used, and  for  any  other  Unix-like  environment
876       where fork() and execv() are available.
877
878       If the callout string does not start with a pipe (vertical bar) charac‐
879       ter, it is parsed into a list of substrings separated by  pipe  charac‐
880       ters.  The first substring must be an executable name, with the follow‐
881       ing substrings specifying arguments:
882
883         executable_name|arg1|arg2|...
884
885       Any substring  (including  the  executable  name)  may  contain  escape
886       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
887       replaced by the captured substring of the given decimal  number,  which
888       must  be greater than zero. If the number is greater than the number of
889       capturing substrings, or if the capture is unset,  the  replacement  is
890       empty.
891
892       Any  other  character  is  substituted  by itself. In particular, $$ is
893       replaced by a single dollar and $| is replaced  by  a  pipe  character.
894       Here is an example:
895
896         echo -e "abcde\n12345" | pcre2grep \
897           '(?x)(.)(..(.))
898           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
899
900         Output:
901
902           Arg1: [a] [bcd] [d] Arg2: |a| ()
903           abcde
904           Arg1: [1] [234] [4] Arg2: |1| ()
905           12345
906
907       The  parameters  for the system call that is used to run the program or
908       script are zero-terminated strings. This means that binary zero charac‐
909       ters  in the callout argument will cause premature termination of their
910       substrings, and therefore should not be present. Any syntax  errors  in
911       the  string  (for  example, a dollar not followed by another character)
912       cause the callout to be ignored. If running the program fails  for  any
913       reason  (including the non-existence of the executable), a local match‐
914       ing failure occurs and the matcher backtracks in the normal way.
915
916   Echoing a specific string
917
918       This facility is always available, provided that callouts were not com‐
919       pletely disabled when pcre2grep was built. If the callout string starts
920       with a pipe (vertical bar) character, the rest of the string is written
921       to the output, having been passed through the same escape processing as
922       text from the --output option. This provides a simple echoing  facility
923       that  avoids  calling  an  external program or script. No terminator is
924       added to the string, so if you want a  newline,  you  must  include  it
925       explicitly.  Matching continues normally after the string is output. If
926       you want to see only the callout output but  not  any  output  from  an
927       actual match, you should end the relevant pattern with (*FAIL).
928

MATCHING ERRORS

930
931       It  is  possible  to supply a regular expression that takes a very long
932       time to fail to match certain lines.  Such  patterns  normally  involve
933       nested  indefinite repeats, for example: (a+)*\d when matched against a
934       line of a's with no final digit. The  PCRE2  matching  function  has  a
935       resource  limit that causes it to abort in these circumstances. If this
936       happens, pcre2grep outputs an error message and the  line  that  caused
937       the  problem  to  the  standard error stream. If there are more than 20
938       such errors, pcre2grep gives up.
939
940       The --match-limit option of pcre2grep can be used to  set  the  overall
941       resource  limit.  There are also other limits that affect the amount of
942       memory used during matching; see the  discussion  of  --heap-limit  and
943       --depth-limit above.
944

DIAGNOSTICS

946
947       Exit status is 0 if any matches were found, 1 if no matches were found,
948       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
949       files  (even if matches were found in other files) or too many matching
950       errors. Using the -s option to suppress error messages about inaccessi‐
951       ble files does not affect the return code.
952
953       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
954       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
955       exit(1).
956

SEE ALSO

958
959       pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
960

AUTHOR

962
963       Philip Hazel
964       University Computing Service
965       Cambridge, England.
966

REVISION

968
969       Last updated: 25 January 2020
970       Copyright (c) 1997-2020 University of Cambridge.
971
972
973
974PCRE2 10.35                     25 January 2020                   PCRE2GREP(1)
Impressum