1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4

NAME

6       pcre2grep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcre2grep  searches  files  for  character patterns, in the same way as
14       other grep commands do,  but  it  uses  the  PCRE2  regular  expression
15       library  to  support  patterns  that  are  compatible  with the regular
16       expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
17       of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
18       syntax and semantics of the regular expressions that PCRE2 supports.
19
20       Patterns, whether supplied on the command line or in a  separate  file,
21       are given without delimiters. For example:
22
23         pcre2grep Thursday /etc/motd
24
25       If you attempt to use delimiters (for example, by surrounding a pattern
26       with slashes, as is common in Perl scripts), they  are  interpreted  as
27       part  of  the pattern. Quotes can of course be used to delimit patterns
28       on the command line because they are  interpreted  by  the  shell,  and
29       indeed  quotes  are required if a pattern contains white space or shell
30       metacharacters.
31
32       The first argument that follows any option settings is treated  as  the
33       single  pattern  to be matched when neither -e nor -f is present.  Con‐
34       versely, when one or both of these options are  used  to  specify  pat‐
35       terns, all arguments are treated as path names. At least one of -e, -f,
36       or an argument pattern must be provided.
37
38       If no files are specified, pcre2grep  reads  the  standard  input.  The
39       standard  input can also be referenced by a name consisting of a single
40       hyphen.  For example:
41
42         pcre2grep some-pattern file1 - file3
43
44       Input files are searched line by  line.  By  default,  each  line  that
45       matches  a  pattern  is  copied to the standard output, and if there is
46       more than one file, the file name is output at the start of each  line,
47       followed  by  a  colon.  However, there are options that can change how
48       pcre2grep behaves. In particular, the -M option makes  it  possible  to
49       search  for  strings  that  span  line  boundaries. What defines a line
50       boundary is controlled by the -N (--newline) option.
51
52       The amount of memory used for buffering files that are being scanned is
53       controlled  by  parameters  that  can  be  set by the --buffer-size and
54       --max-buffer-size options. The first of these sets the size  of  buffer
55       that  is obtained at the start of processing. If an input file contains
56       very long lines, a larger buffer may be  needed;  this  is  handled  by
57       automatically extending the buffer, up to the limit specified by --max-
58       buffer-size. The default values for these parameters are specified when
59       pcre2grep  is built, with the default defaults being 20K and 1M respec‐
60       tively. An error occurs if a line is too long and  the  buffer  can  no
61       longer be expanded.
62
63       The  block  of  memory that is actually used is three times the "buffer
64       size", to allow for buffering "before" and "after" lines. If the buffer
65       size  is too small, fewer than requested "before" and "after" lines may
66       be output.
67
68       Patterns can be no longer than 8K or BUFSIZ  bytes,  whichever  is  the
69       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
70       pattern (specified by the use of -e and/or -f), each pattern is applied
71       to  each  line  in the order in which they are defined, except that all
72       the -e patterns are tried before the -f patterns.
73
74       By default, as soon as one pattern matches a line, no further  patterns
75       are considered. However, if --colour (or --color) is used to colour the
76       matching substrings, or if --only-matching, --file-offsets, or  --line-
77       offsets  is  used  to  output  only  the  part of the line that matched
78       (either shown literally, or as an offset), scanning resumes immediately
79       following  the  match,  so that further matches on the same line can be
80       found. If there are multiple  patterns,  they  are  all  tried  on  the
81       remainder  of  the  line, but patterns that follow the one that matched
82       are not tried on the earlier part of the line.
83
84       This behaviour means that the order  in  which  multiple  patterns  are
85       specified  can affect the output when one of the above options is used.
86       This is no longer the same behaviour as GNU grep, which now manages  to
87       display  earlier  matches  for  later  patterns (as long as there is no
88       overlap).
89
90       Patterns that can match an empty string are accepted, but empty  string
91       matches   are   never   recognized.   An   example   is   the   pattern
92       "(super)?(man)?", in which all components are  optional.  This  pattern
93       finds  all  occurrences  of  both "super" and "man"; the output differs
94       from matching with "super|man" when only the  matching  substrings  are
95       being shown.
96
97       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
98       the value to set a locale when calling the PCRE2 library.  The --locale
99       option can be used to override this.
100

SUPPORT FOR COMPRESSED FILES

102
103       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
104       read files whose names end in .gz or .bz2, respectively. You  can  find
105       out whether your binary has support for one or both of these file types
106       by running it with the --help option. If the appropriate support is not
107       present,  files are treated as plain text. The standard input is always
108       so treated.
109

BINARY FILES

111
112       By default, a file that contains a binary zero byte  within  the  first
113       1024  bytes is identified as a binary file, and is processed specially.
114       (GNU grep also  identifies  binary  files  in  this  manner.)  See  the
115       --binary-files  option for a means of changing the way binary files are
116       handled.
117

OPTIONS

119
120       The order in which some of the options appear can  affect  the  output.
121       For  example,  both  the  -h and -l options affect the printing of file
122       names. Whichever comes later in the command line will be the  one  that
123       takes  effect.  Similarly,  except  where  noted below, if an option is
124       given twice, the later setting is used. Numerical  values  for  options
125       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
126       1024*1024 respectively.
127
128       --        This terminates the list of options. It is useful if the next
129                 item  on  the command line starts with a hyphen but is not an
130                 option. This allows for the processing of patterns  and  file
131                 names that start with hyphens.
132
133       -A number, --after-context=number
134                 Output  up  to  number  lines  of context after each matching
135                 line. Fewer lines are output if the next match or the end  of
136                 the  file  is  reached,  or if the processing buffer size has
137                 been set too small. If file names  and/or  line  numbers  are
138                 being  output,  a hyphen separator is used instead of a colon
139                 for the context lines.  A  line  containing  "--"  is  output
140                 between each group of lines, unless they are in fact contigu‐
141                 ous in the input file. The value of number is expected to  be
142                 relatively small. When -c is used, -A is ignored.
143
144       -a, --text
145                 Treat  binary  files as text. This is equivalent to --binary-
146                 files=text.
147
148       -B number, --before-context=number
149                 Output up to number lines of  context  before  each  matching
150                 line.  Fewer  lines  are  output if the previous match or the
151                 start of the file is within number lines, or if the  process‐
152                 ing  buffer size has been set too small. If file names and/or
153                 line numbers are being output, a  hyphen  separator  is  used
154                 instead  of  a colon for the context lines. A line containing
155                 "--" is output between each group of lines, unless  they  are
156                 in  fact contiguous in the input file. The value of number is
157                 expected to be relatively small.  When  -c  is  used,  -B  is
158                 ignored.
159
160       --binary-files=word
161                 Specify  how binary files are to be processed. If the word is
162                 "binary" (the default),  pattern  matching  is  performed  on
163                 binary  files,  but  the  only  output is "Binary file <name>
164                 matches" when a match succeeds. If the word is "text",  which
165                 is  equivalent  to  the -a or --text option, binary files are
166                 processed in the same way as any other file.  In  this  case,
167                 when  a  match  succeeds,  the  output may be binary garbage,
168                 which can have nasty effects if sent to a  terminal.  If  the
169                 word  is  "without-match",  which  is  equivalent  to  the -I
170                 option, binary files are  not  processed  at  all;  they  are
171                 assumed not to be of interest and are skipped without causing
172                 any output or affecting the return code.
173
174       --buffer-size=number
175                 Set the parameter that controls how much memory  is  obtained
176                 at the start of processing for buffering files that are being
177                 scanned. See also --max-buffer-size below.
178
179       -C number, --context=number
180                 Output number lines of context both  before  and  after  each
181                 matching  line.  This is equivalent to setting both -A and -B
182                 to the same value.
183
184       -c, --count
185                 Do not output lines from the files that  are  being  scanned;
186                 instead  output  the  number  of  lines  that would have been
187                 shown, either because they matched, or, if -v is set, because
188                 they  failed  to match. By default, this count is exactly the
189                 same as the number of lines that would have been output,  but
190                 if  the -M (multiline) option is used (without -v), there may
191                 be more suppressed lines than the count (that is, the  number
192                 of matches).
193
194                 If  no lines are selected, the number zero is output. If sev‐
195                 eral files are are being scanned, a count is output for  each
196                 of  them and the -t option can be used to cause a total to be
197                 output at  the  end.  However,  if  the  --files-with-matches
198                 option  is  also  used,  only  those  files  whose counts are
199                 greater than zero are listed. When -c is used,  the  -A,  -B,
200                 and -C options are ignored.
201
202       --colour, --color
203                 If this option is given without any data, it is equivalent to
204                 "--colour=auto".  If data is required, it must  be  given  in
205                 the same shell item, separated by an equals sign.
206
207       --colour=value, --color=value
208                 This option specifies under what circumstances the parts of a
209                 line that matched a pattern should be coloured in the output.
210                 By  default,  the output is not coloured. The value (which is
211                 optional, see above) may be "never", "always", or "auto".  In
212                 the  latter case, colouring happens only if the standard out‐
213                 put is connected to a terminal. More resources are used  when
214                 colouring is enabled, because pcre2grep has to search for all
215                 possible matches in a line, not just one, in order to  colour
216                 them all.
217
218                 The  colour  that  is used can be specified by setting one of
219                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
220                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
221                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
222                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
223                 variable should be a string of two numbers,  separated  by  a
224                 semicolon,  except  in  the  case  of GREP_COLORS, which must
225                 start with "ms=" or "mt=" followed by two semicolon-separated
226                 colours,  terminated  by the end of the string or by a colon.
227                 If GREP_COLORS does not start  with  "ms="  or  "mt="  it  is
228                 ignored, and GREP_COLOR is checked.
229
230                 If  the  string obtained from one of the above variables con‐
231                 tains any characters other than semicolon or digits, the set‐
232                 ting is ignored and the default colour is used. The string is
233                 copied directly into the control string for setting colour on
234                 a  terminal,  so it is your responsibility to ensure that the
235                 values make sense. If no  relevant  environment  variable  is
236                 set, the default is "1;31", which gives red.
237
238       -D action, --devices=action
239                 If  an  input  path  is  not  a  regular file or a directory,
240                 "action" specifies how it is to be  processed.  Valid  values
241                 are "read" (the default) or "skip" (silently skip the path).
242
243       -d action, --directories=action
244                 If an input path is a directory, "action" specifies how it is
245                 to be processed.  Valid values are  "read"  (the  default  in
246                 non-Windows  environments,  for compatibility with GNU grep),
247                 "recurse" (equivalent to the -r option), or "skip"  (silently
248                 skip  the  path, the default in Windows environments). In the
249                 "read" case, directories are read as if  they  were  ordinary
250                 files.  In  some  operating  systems  the effect of reading a
251                 directory like this is an immediate end-of-file; in others it
252                 may provoke an error.
253
254       -e pattern, --regex=pattern, --regexp=pattern
255                 Specify a pattern to be matched. This option can be used mul‐
256                 tiple times in order to specify several patterns. It can also
257                 be  used  as a way of specifying a single pattern that starts
258                 with a hyphen. When -e is used, no argument pattern is  taken
259                 from  the  command  line;  all  arguments are treated as file
260                 names. There is no limit to the number of patterns. They  are
261                 applied  to  each line in the order in which they are defined
262                 until one matches.
263
264                 If -f is used with -e, the command line patterns are  matched
265                 first, followed by the patterns from the file(s), independent
266                 of the order in which these options are specified. Note  that
267                 multiple  use  of -e is not the same as a single pattern with
268                 alternatives. For example, X|Y finds the first character in a
269                 line  that  is  X or Y, whereas if the two patterns are given
270                 separately, with X first, pcre2grep finds X if it is present,
271                 even if it follows Y in the line. It finds Y only if there is
272                 no X in the line. This matters only if you are  using  -o  or
273                 --colo(u)r to show the part(s) of the line that matched.
274
275       --exclude=pattern
276                 Files (but not directories) whose names match the pattern are
277                 skipped without being processed. This applies to  all  files,
278                 whether  listed  on  the  command line, obtained from --file-
279                 list, or by scanning a directory. The pattern is a PCRE2 reg‐
280                 ular  expression,  and is matched against the final component
281                 of the file name, not the entire path. The  -F,  -w,  and  -x
282                 options do not apply to this pattern. The option may be given
283                 any number of times in order to specify multiple patterns. If
284                 a  file  name matches both an --include and an --exclude pat‐
285                 tern, it is excluded. There is no short form for this option.
286
287       --exclude-from=filename
288                 Treat each non-empty line of the file  as  the  data  for  an
289                 --exclude option. What constitutes a newline when reading the
290                 file is the operating system's default. The --newline  option
291                 has  no  effect on this option. This option may be given more
292                 than once in order to specify a number of files to read.
293
294       --exclude-dir=pattern
295                 Directories whose names match the pattern are skipped without
296                 being  processed,  whatever  the  setting  of the --recursive
297                 option. This applies to all directories,  whether  listed  on
298                 the command line, obtained from --file-list, or by scanning a
299                 parent directory. The pattern is a PCRE2 regular  expression,
300                 and  is  matched against the final component of the directory
301                 name, not the entire path. The -F, -w, and -x options do  not
302                 apply  to this pattern. The option may be given any number of
303                 times in order to specify more than one pattern. If a  direc‐
304                 tory  matches  both  --include-dir  and  --exclude-dir, it is
305                 excluded. There is no short form for this option.
306
307       -F, --fixed-strings
308                 Interpret each data-matching  pattern  as  a  list  of  fixed
309                 strings,  separated  by  newlines,  instead  of  as a regular
310                 expression. What constitutes a newline for  this  purpose  is
311                 controlled  by the --newline option. The -w (match as a word)
312                 and -x (match whole line) options can be used with -F.   They
313                 apply to each of the fixed strings. A line is selected if any
314                 of the fixed strings are found in it (subject to -w or -x, if
315                 present).  This  option applies only to the patterns that are
316                 matched against the contents of files; it does not  apply  to
317                 patterns  specified  by  any  of  the  --include or --exclude
318                 options.
319
320       -f filename, --file=filename
321                 Read patterns from the file, one per  line,  and  match  them
322                 against  each  line of input. What constitutes a newline when
323                 reading the file  is  the  operating  system's  default.  The
324                 --newline  option  has  no  effect  on this option.  Trailing
325                 white space is removed from each line, and  blank  lines  are
326                 ignored.  An  empty  file  contains no patterns and therefore
327                 matches nothing. See also the comments  about  multiple  pat‐
328                 terns  versus  a  single  pattern  with  alternatives  in the
329                 description of -e above.
330
331                 If this option is given more than  once,  all  the  specified
332                 files  are read. A data line is output if any of the patterns
333                 match it. A file name can be given as "-"  to  refer  to  the
334                 standard  input.  When  -f is used, patterns specified on the
335                 command line using -e may also be present;  they  are  tested
336                 before  the  file's  patterns.  However,  no other pattern is
337                 taken from the command line; all arguments are treated as the
338                 names of paths to be searched.
339
340       --file-list=filename
341                 Read  a  list  of  files  and/or  directories  that are to be
342                 scanned from the given file, one  per  line.  Trailing  white
343                 space is removed from each line, and blank lines are ignored.
344                 These paths are processed before any that are listed  on  the
345                 command  line.  The file name can be given as "-" to refer to
346                 the standard input.  If --file and --file-list are both spec‐
347                 ified  as  "-",  patterns are read first. This is useful only
348                 when the standard input is a  terminal,  from  which  further
349                 lines  (the  list  of files) can be read after an end-of-file
350                 indication. If this option is given more than once,  all  the
351                 specified files are read.
352
353       --file-offsets
354                 Instead  of  showing lines or parts of lines that match, show
355                 each match as an offset from the start  of  the  file  and  a
356                 length,  separated  by  a  comma. In this mode, no context is
357                 shown. That is, the -A, -B, and -C options  are  ignored.  If
358                 there is more than one match in a line, each of them is shown
359                 separately. This option is mutually  exclusive  with  --line-
360                 offsets and --only-matching.
361
362       -H, --with-filename
363                 Force  the  inclusion of the file name at the start of output
364                 lines when searching a single file. By default, the file name
365                 is not shown in this case.  For matching lines, the file name
366                 is followed by a colon; for context lines, a hyphen separator
367                 is  used.  If  a line number is also being output, it follows
368                 the file name. When the -M option causes a pattern  to  match
369                 more  than  one  line, only the first is preceded by the file
370                 name.
371
372       -h, --no-filename
373                 Suppress the output file names when searching multiple files.
374                 By  default,  file  names  are  shown when multiple files are
375                 searched. For matching lines, the file name is followed by  a
376                 colon;  for  context lines, a hyphen separator is used.  If a
377                 line number is also being output, it follows the file name.
378
379       --help    Output a help message, giving brief details  of  the  command
380                 options  and  file type support, and then exit. Anything else
381                 on the command line is ignored.
382
383       -I        Ignore  binary  files.  This  is  equivalent   to   --binary-
384                 files=without-match.
385
386       -i, --ignore-case
387                 Ignore upper/lower case distinctions during comparisons.
388
389       --include=pattern
390                 If  any --include patterns are specified, the only files that
391                 are processed are those that match one of the  patterns  (and
392                 do  not  match  an  --exclude  pattern). This option does not
393                 affect directories, but it  applies  to  all  files,  whether
394                 listed  on the command line, obtained from --file-list, or by
395                 scanning a directory. The pattern is a PCRE2 regular  expres‐
396                 sion,  and is matched against the final component of the file
397                 name, not the entire path. The -F, -w, and -x options do  not
398                 apply  to this pattern. The option may be given any number of
399                 times. If a file  name  matches  both  an  --include  and  an
400                 --exclude  pattern,  it  is excluded.  There is no short form
401                 for this option.
402
403       --include-from=filename
404                 Treat each non-empty line of the file  as  the  data  for  an
405                 --include option. What constitutes a newline for this purpose
406                 is the operating system's default. The --newline  option  has
407                 no effect on this option. This option may be given any number
408                 of times; all the files are read.
409
410       --include-dir=pattern
411                 If any --include-dir patterns are specified, the only  direc‐
412                 tories  that  are  processed  are those that match one of the
413                 patterns (and do not match an  --exclude-dir  pattern).  This
414                 applies  to  all  directories,  whether listed on the command
415                 line, obtained from --file-list,  or  by  scanning  a  parent
416                 directory.  The pattern is a PCRE2 regular expression, and is
417                 matched against the final component of  the  directory  name,
418                 not  the entire path. The -F, -w, and -x options do not apply
419                 to this pattern. The option may be given any number of times.
420                 If  a directory matches both --include-dir and --exclude-dir,
421                 it is excluded. There is no short form for this option.
422
423       -L, --files-without-match
424                 Instead of outputting lines from the files, just  output  the
425                 names  of  the files that do not contain any lines that would
426                 have been output. Each file name is output once, on  a  sepa‐
427                 rate line.
428
429       -l, --files-with-matches
430                 Instead  of  outputting lines from the files, just output the
431                 names of the files containing lines that would have been out‐
432                 put.  Each  file  name  is  output  once, on a separate line.
433                 Searching normally stops as soon as a matching line is  found
434                 in  a  file.  However, if the -c (count) option is also used,
435                 matching continues in order to obtain the correct count,  and
436                 those  files  that  have  at least one match are listed along
437                 with their counts. Using this option with -c is a way of sup‐
438                 pressing the listing of files with no matches.
439
440       --label=name
441                 This option supplies a name to be used for the standard input
442                 when file names are being output. If not supplied, "(standard
443                 input)" is used. There is no short form for this option.
444
445       --line-buffered
446                 When  this  option is given, input is read and processed line
447                 by line, and the output  is  flushed  after  each  write.  By
448                 default,  input is read in large chunks, unless pcre2grep can
449                 determine that it is reading from a terminal (which  is  cur‐
450                 rently  possible  only  in Unix-like environments). Output to
451                 terminal is normally automatically flushed by  the  operating
452                 system. This option can be useful when the input or output is
453                 attached to a pipe and you do not want pcre2grep to buffer up
454                 large  amounts  of data. However, its use will affect perfor‐
455                 mance, and the -M (multiline) option ceases to work.
456
457       --line-offsets
458                 Instead of showing lines or parts of lines that  match,  show
459                 each match as a line number, the offset from the start of the
460                 line, and a length. The line number is terminated by a  colon
461                 (as  usual; see the -n option), and the offset and length are
462                 separated by a comma. In this  mode,  no  context  is  shown.
463                 That  is, the -A, -B, and -C options are ignored. If there is
464                 more than one match in a line, each of them  is  shown  sepa‐
465                 rately. This option is mutually exclusive with --file-offsets
466                 and --only-matching.
467
468       --locale=locale-name
469                 This option specifies a locale to be used for pattern  match‐
470                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi‐
471                 ronment variables. If  no  locale  is  specified,  the  PCRE2
472                 library's  default (usually the "C" locale) is used. There is
473                 no short form for this option.
474
475       --match-limit=number
476                 Processing some regular expression  patterns  can  require  a
477                 very  large amount of memory, leading in some cases to a pro‐
478                 gram crash if not enough is available.   Other  patterns  may
479                 take  a  very  long  time to search for all possible matching
480                 strings.  The  pcre2_match()  function  that  is  called   by
481                 pcre2grep  to  do  the  matching  has two parameters that can
482                 limit the resources that it uses.
483
484                 The  --match-limit  option  provides  a  means  of   limiting
485                 resource usage when processing patterns that are not going to
486                 match, but which have a very large number of possibilities in
487                 their  search  trees.  The  classic example is a pattern that
488                 uses nested unlimited repeats. Internally, PCRE2 uses a func‐
489                 tion  called  match()  which  it  calls repeatedly (sometimes
490                 recursively). The limit set by --match-limit  is  imposed  on
491                 the  number  of times this function is called during a match,
492                 which has the effect of limiting the amount  of  backtracking
493                 that can take place.
494
495                 The --recursion-limit option is similar to --match-limit, but
496                 instead of limiting the total number of times that match() is
497                 called, it limits the depth of recursive calls, which in turn
498                 limits the amount of memory that can be used.  The  recursion
499                 depth  is  a  smaller  number than the total number of calls,
500                 because not all calls to match() are recursive. This limit is
501                 of use only if it is set smaller than --match-limit.
502
503                 There  are no short forms for these options. The default set‐
504                 tings are specified when the PCRE2 library is compiled,  with
505                 the default default being 10 million.
506
507       --max-buffer-size=number
508                 This  limits  the  expansion  of the processing buffer, whose
509                 initial size can be set by --buffer-size. The maximum  buffer
510                 size  is  silently  forced to be no smaller than the starting
511                 buffer size.
512
513       -M, --multiline
514                 Allow patterns to match more than one line. When this  option
515                 is set, the PCRE2 library is called in "multiline" mode. This
516                 allows a matched string to extend past the end of a line  and
517                 continue  on one or more subsequent lines. Patterns used with
518                 -M may usefully contain literal newline characters and inter‐
519                 nal  occurrences of ^ and $ characters. The output for a suc‐
520                 cessful match may consist of more than one  line.  The  first
521                 line  is  the  line  in which the match started, and the last
522                 line is the line in which the match  ended.  If  the  matched
523                 string  ends  with a newline sequence, the output ends at the
524                 end of that line.  If -v is set,  none  of  the  lines  in  a
525                 multi-line  match  are output. Once a match has been handled,
526                 scanning restarts at the beginning of the line after the  one
527                 in which the match ended.
528
529                 The  newline  sequence  that separates multiple lines must be
530                 matched as part of the pattern.  For  example,  to  find  the
531                 phrase  "regular  expression" in a file where "regular" might
532                 be at the end of a line and "expression" at the start of  the
533                 next line, you could use this command:
534
535                   pcre2grep -M 'regular\s+expression' <file>
536
537                 The  \s  escape  sequence  matches any white space character,
538                 including newlines, and is followed  by  +  so  as  to  match
539                 trailing  white  space  on the first line as well as possibly
540                 handling a two-character newline sequence.
541
542                 There is a limit to the number of lines that can be  matched,
543                 imposed  by  the way that pcre2grep buffers the input file as
544                 it scans it. With a  sufficiently  large  processing  buffer,
545                 this should not be a problem, but the -M option does not work
546                 when input is read line by line (see --line-buffered.)
547
548       -N newline-type, --newline=newline-type
549                 The PCRE2 library supports  five  different  conventions  for
550                 indicating  the  ends of lines. They are the single-character
551                 sequences CR (carriage return) and LF  (linefeed),  the  two-
552                 character  sequence CRLF, an "anycrlf" convention, which rec‐
553                 ognizes any of the preceding three types, and an  "any"  con‐
554                 vention, in which any Unicode line ending sequence is assumed
555                 to end a line. The Unicode sequences are the three just  men‐
556                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
557                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
558                 U+2028), and PS (paragraph separator, U+2029).
559
560                 When  the  PCRE2  library  is  built,  a  default line-ending
561                 sequence  is  specified.   This  is  normally  the   standard
562                 sequence for the operating system. Unless otherwise specified
563                 by this option, pcre2grep uses the  library's  default.   The
564                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
565                 ANY. This makes it possible to use pcre2grep  to  scan  files
566                 that have come from other environments without having to mod‐
567                 ify their line endings. If the data  that  is  being  scanned
568                 does  not  agree  with  the  convention  set  by this option,
569                 pcre2grep may behave in strange ways. Note that  this  option
570                 does  not apply to files specified by the -f, --exclude-from,
571                 or --include-from options, which  are  expected  to  use  the
572                 operating system's standard newline sequence.
573
574       -n, --line-number
575                 Precede each output line by its line number in the file, fol‐
576                 lowed by a colon for matching lines or a hyphen  for  context
577                 lines. If the file name is also being output, it precedes the
578                 line number. When the -M option causes  a  pattern  to  match
579                 more  than  one  line, only the first is preceded by its line
580                 number. This option is forced if --line-offsets is used.
581
582       --no-jit  If the PCRE2 library is built with support  for  just-in-time
583                 compiling (which speeds up matching), pcre2grep automatically
584                 makes use of this, unless it was explicitly disabled at build
585                 time.  This  option  can be used to disable the use of JIT at
586                 run time. It is provided for testing and working round  prob‐
587                 lems.  It should never be needed in normal use.
588
589       -o, --only-matching
590                 Show only the part of the line that matched a pattern instead
591                 of the whole line. In this mode, no context  is  shown.  That
592                 is,  the -A, -B, and -C options are ignored. If there is more
593                 than one match in a line, each of them is  shown  separately,
594                 on  a  separate  line  of  output.  If -o is combined with -v
595                 (invert the sense of the match to find  non-matching  lines),
596                 no  output is generated, but the return code is set appropri‐
597                 ately. If the matched portion of the line is  empty,  nothing
598                 is  output  unless  the  file  name  or line number are being
599                 printed, in which case they are shown on an  otherwise  empty
600                 line.  This  option is mutually exclusive with --file-offsets
601                 and --line-offsets.
602
603       -onumber, --only-matching=number
604                 Show only the part of the line  that  matched  the  capturing
605                 parentheses of the given number. Up to 32 capturing parenthe‐
606                 ses are supported, and -o0 is equivalent to -o without a num‐
607                 ber.  Because  these options can be given without an argument
608                 (see above), if an argument is present, it must be  given  in
609                 the  same  shell item, for example, -o3 or --only-matching=2.
610                 The comments given for the non-argument case above also apply
611                 to  this  case. If the specified capturing parentheses do not
612                 exist in the pattern, or were not set in the  match,  nothing
613                 is  output unless the file name or line number are being out‐
614                 put.
615
616                 If this option is given multiple times,  multiple  substrings
617                 are  output  for  each  match,  in  the order the options are
618                 given, and all on one line. For example, -o3 -o1  -o3  causes
619                 the  substrings  matched by capturing parentheses 3 and 1 and
620                 then 3 again to be output. By default, there is no  separator
621                 (but see the next option).
622
623       --om-separator=text
624                 Specify  a  separating string for multiple occurrences of -o.
625                 The default is an empty string. Separating strings are  never
626                 coloured.
627
628       -q, --quiet
629                 Work quietly, that is, display nothing except error messages.
630                 The exit status indicates whether or  not  any  matches  were
631                 found.
632
633       -r, --recursive
634                 If  any given path is a directory, recursively scan the files
635                 it contains, taking note of any --include and --exclude  set‐
636                 tings.  By  default, a directory is read as a normal file; in
637                 some operating systems this gives an  immediate  end-of-file.
638                 This  option  is  a  shorthand  for  setting the -d option to
639                 "recurse".
640
641       --recursion-limit=number
642                 See --match-limit above.
643
644       -s, --no-messages
645                 Suppress error  messages  about  non-existent  or  unreadable
646                 files.  Such  files  are quietly skipped. However, the return
647                 code is still 2, even if matches were found in other files.
648
649       -t, --total-count
650                 This option is useful when scanning more than  one  file.  If
651                 used  on its own, -t suppresses all output except for a grand
652                 total number of matching lines (or non-matching lines  if  -v
653                 is  used)  in  all  the files. If -t is used with -c, a grand
654                 total is output except when the previous output is  just  one
655                 line.  In  other words, it is not output when just one file's
656                 count is listed. If file names are being  output,  the  grand
657                 total  is preceded by "TOTAL:". Otherwise, it appears as just
658                 another number. The -t option is ignored when  used  with  -L
659                 (list  files  without matches), because the grand total would
660                 always be zero.
661
662       -u, --utf-8
663                 Operate in UTF-8 mode. This option is available only if PCRE2
664                 has been compiled with UTF-8 support. All patterns (including
665                 those for any --exclude and --include options) and  all  sub‐
666                 ject  lines  that  are scanned must be valid strings of UTF-8
667                 characters.
668
669       -V, --version
670                 Write the version numbers of pcre2grep and the PCRE2  library
671                 to  the  standard  output and then exit. Anything else on the
672                 command line is ignored.
673
674       -v, --invert-match
675                 Invert the sense of the match, so that  lines  which  do  not
676                 match any of the patterns are the ones that are found.
677
678       -w, --word-regex, --word-regexp
679                 Force the patterns to match only whole words. This is equiva‐
680                 lent to having \b at the start and end of the  pattern.  This
681                 option  applies only to the patterns that are matched against
682                 the contents of files; it does not apply to  patterns  speci‐
683                 fied by any of the --include or --exclude options.
684
685       -x, --line-regex, --line-regexp
686                 Force  the  patterns to be anchored (each must start matching
687                 at the beginning of a line) and in addition, require them  to
688                 match  entire  lines. In multiline mode the match may be more
689                 than one line. This is equivalent to having \A and \Z charac‐
690                 ters  at  the  start  and  end  of each alternative top-level
691                 branch in every pattern. This option applies only to the pat‐
692                 terns that are matched against the contents of files; it does
693                 not apply to patterns specified by any of  the  --include  or
694                 --exclude options.
695

ENVIRONMENT VARIABLES

697
698       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
699       order, for a locale. The first one that is set is  used.  This  can  be
700       overridden  by  the  --locale  option.  If  no locale is set, the PCRE2
701       library's default (usually the "C" locale) is used.
702

NEWLINES

704
705       The -N (--newline) option allows pcre2grep to scan files with different
706       newline conventions from the default. Any parts of the input files that
707       are written to the standard output are copied identically,  with  what‐
708       ever  newline sequences they have in the input. However, the setting of
709       this option does not affect the interpretation of  files  specified  by
710       the -f, --exclude-from, or --include-from options, which are assumed to
711       use the operating system's  standard  newline  sequence,  nor  does  it
712       affect  the way in which pcre2grep writes informational messages to the
713       standard error and output streams. For these it uses the string "\n" to
714       indicate  newlines,  relying on the C I/O library to convert this to an
715       appropriate sequence.
716

OPTIONS COMPATIBILITY

718
719       Many of the short and long forms of pcre2grep's options are the same as
720       in  the GNU grep program. Any long option of the form --xxx-regexp (GNU
721       terminology) is also available as --xxx-regex (PCRE2 terminology). How‐
722       ever,  the  --file-list, --file-offsets, --include-dir, --line-offsets,
723       --locale, --match-limit, -M, --multiline, -N,  --newline,  --om-separa‐
724       tor,  --recursion-limit,  -u,  and  --utf-8  options  are  specific  to
725       pcre2grep, as is the use of the --only-matching option with a capturing
726       parentheses number.
727
728       Although  most  of the common options work the same way, a few are dif‐
729       ferent in pcre2grep. For example, the --include option's argument is  a
730       glob  for GNU grep, but a regular expression for pcre2grep. If both the
731       -c and -l options are given, GNU grep lists only  file  names,  without
732       counts, but pcre2grep gives the counts as well.
733

OPTIONS WITH DATA

735
736       There are four different ways in which an option with data can be spec‐
737       ified.  If a short form option is used, the  data  may  follow  immedi‐
738       ately, or (with one exception) in the next command line item. For exam‐
739       ple:
740
741         -f/some/file
742         -f /some/file
743
744       The exception is the -o option, which may appear with or without  data.
745       Because  of this, if data is present, it must follow immediately in the
746       same item, for example -o3.
747
748       If a long form option is used, the data may appear in the same  command
749       line  item,  separated by an equals character, or (with two exceptions)
750       it may appear in the next command line item. For example:
751
752         --file=/some/file
753         --file /some/file
754
755       Note, however, that if you want to supply a file name beginning with  ~
756       as  data  in  a  shell  command,  and have the shell expand ~ to a home
757       directory, you must separate the file name from the option, because the
758       shell does not treat ~ specially unless it is at the start of an item.
759
760       The  exceptions  to the above are the --colour (or --color) and --only-
761       matching options, for which the data  is  optional.  If  one  of  these
762       options  does  have  data, it must be given in the first form, using an
763       equals character. Otherwise pcre2grep will assume that it has no data.
764

CALLING EXTERNAL SCRIPTS

766
767       pcre2grep has, by default, support for  calling  external  programs  or
768       scripts during matching by making use of PCRE2's callout facility. How‐
769       ever, this support can be disabled when pcre2grep  is  built.  You  can
770       find  out  whether  your  binary has support for callouts by running it
771       with the --help option. If the support is not enabled, all callouts  in
772       patterns are ignored by pcre2grep.
773
774       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu‐
775       ment is either a number or a quoted string (see the pcre2callout  docu‐
776       mentation  for  details).  Numbered  callouts are ignored by pcre2grep.
777       String arguments are parsed as a list of substrings separated  by  pipe
778       (vertical  bar)  characters.  The first substring must be an executable
779       name, with the following substrings specifying arguments:
780
781         executable_name|arg1|arg2|...
782
783       Any substring  (including  the  executable  name)  may  contain  escape
784       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
785       replaced by the captured substring of the given decimal  number,  which
786       must  be greater than zero. If the number is greater than the number of
787       capturing substrings, or if the capture is unset,  the  replacement  is
788       empty.
789
790       Any  other  character  is  substituted  by itself. In particular, $$ is
791       replaced by a single dollar and $| is replaced  by  a  pipe  character.
792       Here is an example:
793
794         echo -e "abcde\n12345" | pcre2grep \
795           '(?x)(.)(..(.))
796           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
797
798         Output:
799
800           Arg1: [a] [bcd] [d] Arg2: |a| ()
801           abcde
802           Arg1: [1] [234] [4] Arg2: |1| ()
803           12345
804
805       The parameters for the execv() system call that is used to run the pro‐
806       gram or script are zero-terminated strings. This means that binary zero
807       characters  in the callout argument will cause premature termination of
808       their substrings, and therefore  should  not  be  present.  Any  syntax
809       errors  in  the  string  (for example, a dollar not followed by another
810       character) cause the callout to be  ignored.  If  running  the  program
811       fails for any reason (including the non-existence of the executable), a
812       local matching failure occurs and the matcher backtracks in the  normal
813       way.
814

MATCHING ERRORS

816
817       It  is  possible  to supply a regular expression that takes a very long
818       time to fail to match certain lines.  Such  patterns  normally  involve
819       nested  indefinite repeats, for example: (a+)*\d when matched against a
820       line of a's with no final digit. The  PCRE2  matching  function  has  a
821       resource  limit that causes it to abort in these circumstances. If this
822       happens, pcre2grep outputs an error message and the  line  that  caused
823       the  problem  to  the  standard error stream. If there are more than 20
824       such errors, pcre2grep gives up.
825
826       The --match-limit option of pcre2grep can be used to  set  the  overall
827       resource  limit; there is a second option called --recursion-limit that
828       sets a limit on the amount of memory (usually stack) that is used  (see
829       the discussion of these options above).
830

DIAGNOSTICS

832
833       Exit status is 0 if any matches were found, 1 if no matches were found,
834       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
835       files  (even if matches were found in other files) or too many matching
836       errors. Using the -s option to suppress error messages about inaccessi‐
837       ble files does not affect the return code.
838

SEE ALSO

840
841       pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
842

AUTHOR

844
845       Philip Hazel
846       University Computing Service
847       Cambridge, England.
848

REVISION

850
851       Last updated: 31 December 2016
852       Copyright (c) 1997-2016 University of Cambridge.
853
854
855
856PCRE2 10.23                    31 December 2016                   PCRE2GREP(1)
Impressum