1PCREGREP(1)                 General Commands Manual                PCREGREP(1)
2
3
4

NAME

6       pcregrep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcregrep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcregrep  searches  files  for  character  patterns, in the same way as
14       other grep commands do, but it uses the PCRE regular expression library
15       to support patterns that are compatible with the regular expressions of
16       Perl 5. See pcrepattern(3) for a full description of syntax and  seman‐
17       tics of the regular expressions that PCRE supports.
18
19       Patterns,  whether  supplied on the command line or in a separate file,
20       are given without delimiters. For example:
21
22         pcregrep Thursday /etc/motd
23
24       If you attempt to use delimiters (for example, by surrounding a pattern
25       with  slashes,  as  is common in Perl scripts), they are interpreted as
26       part of the pattern. Quotes can of course be used to  delimit  patterns
27       on  the  command  line  because  they are interpreted by the shell, and
28       indeed they are required if a pattern contains  white  space  or  shell
29       metacharacters.
30
31       The  first  argument that follows any option settings is treated as the
32       single pattern to be matched when neither -e nor -f is  present.   Con‐
33       versely,  when  one  or  both of these options are used to specify pat‐
34       terns, all arguments are treated as path names. At least one of -e, -f,
35       or an argument pattern must be provided.
36
37       If no files are specified, pcregrep reads the standard input. The stan‐
38       dard input can also be referenced by a  name  consisting  of  a  single
39       hyphen.  For example:
40
41         pcregrep some-pattern /file1 - /file3
42
43       By  default, each line that matches a pattern is copied to the standard
44       output, and if there is more than one file, the file name is output  at
45       the start of each line, followed by a colon. However, there are options
46       that can change how pcregrep behaves.  In  particular,  the  -M  option
47       makes  it  possible  to  search for patterns that span line boundaries.
48       What defines a line  boundary  is  controlled  by  the  -N  (--newline)
49       option.
50
51       Patterns  are  limited  to  8K  or  BUFSIZ characters, whichever is the
52       greater.  BUFSIZ is defined in <stdio.h>. When there is more  than  one
53       pattern (specified by the use of -e and/or -f), each pattern is applied
54       to each line in the order in which they are defined,  except  that  all
55       the -e patterns are tried before the -f patterns.
56
57       By  default,  as soon as one pattern matches (or fails to match when -v
58       is used), no further patterns are considered. However, if --colour  (or
59       --color) is used to colour the matching substrings, or if --only-match‐
60       ing, --file-offsets, or --line-offsets is used to output only the  part
61       of  the  line  that  matched (either shown literally, or as an offset),
62       scanning resumes immediately  following  the  match,  so  that  further
63       matches  on the same line can be found. If there are multiple patterns,
64       they are all tried on the remainder of the line, but patterns that fol‐
65       low the one that matched are not tried on the earlier part of the line.
66
67       This is the same behaviour as GNU grep, but it does mean that the order
68       in which multiple patterns are specified can affect the output when one
69       of the above options is used.
70
71       Patterns  that can match an empty string are accepted, but empty string
72       matches   are   never   recognized.   An   example   is   the   pattern
73       "(super)?(man)?",  in  which  all components are optional. This pattern
74       finds all occurrences of both "super" and  "man";  the  output  differs
75       from  matching  with  "super|man" when only the matching substrings are
76       being shown.
77
78       If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
79       the  value to set a locale when calling the PCRE library.  The --locale
80       option can be used to override this.
81

SUPPORT FOR COMPRESSED FILES

83
84       It is possible to compile pcregrep so that it uses libz  or  libbz2  to
85       read  files  whose names end in .gz or .bz2, respectively. You can find
86       out whether your binary has support for one or both of these file types
87       by running it with the --help option. If the appropriate support is not
88       present, files are treated as plain text. The standard input is  always
89       so treated.
90

OPTIONS

92
93       The  order  in  which some of the options appear can affect the output.
94       For example, both the -h and -l options affect  the  printing  of  file
95       names.  Whichever  comes later in the command line will be the one that
96       takes effect.
97
98       --        This terminate the list of options. It is useful if the  next
99                 item  on  the command line starts with a hyphen but is not an
100                 option. This allows for the processing of patterns and  file‐
101                 names that start with hyphens.
102
103       -A number, --after-context=number
104                 Output  number  lines of context after each matching line. If
105                 filenames and/or line numbers are being output, a hyphen sep‐
106                 arator  is  used  instead of a colon for the context lines. A
107                 line containing "--" is output between each group  of  lines,
108                 unless  they  are  in  fact contiguous in the input file. The
109                 value of number is expected to be relatively small.  However,
110                 pcregrep guarantees to have up to 8K of following text avail‐
111                 able for context output.
112
113       -B number, --before-context=number
114                 Output number lines of context before each matching line.  If
115                 filenames and/or line numbers are being output, a hyphen sep‐
116                 arator is used instead of a colon for the  context  lines.  A
117                 line  containing  "--" is output between each group of lines,
118                 unless they are in fact contiguous in  the  input  file.  The
119                 value  of number is expected to be relatively small. However,
120                 pcregrep guarantees to have up to 8K of preceding text avail‐
121                 able for context output.
122
123       -C number, --context=number
124                 Output  number  lines  of  context both before and after each
125                 matching line.  This is equivalent to setting both -A and  -B
126                 to the same value.
127
128       -c, --count
129                 Do  not output individual lines from the files that are being
130                 scanned; instead output the number of lines that would other‐
131                 wise  have  been  shown. If no lines are selected, the number
132                 zero is output. If several files are  are  being  scanned,  a
133                 count  is  output  for each of them. However, if the --files-
134                 with-matches option is also  used,  only  those  files  whose
135                 counts are greater than zero are listed. When -c is used, the
136                 -A, -B, and -C options are ignored.
137
138       --colour, --color
139                 If this option is given without any data, it is equivalent to
140                 "--colour=auto".   If  data  is required, it must be given in
141                 the same shell item, separated by an equals sign.
142
143       --colour=value, --color=value
144                 This option specifies under what circumstances the parts of a
145                 line that matched a pattern should be coloured in the output.
146                 By default, the output is not coloured. The value  (which  is
147                 optional,  see above) may be "never", "always", or "auto". In
148                 the latter case, colouring happens only if the standard  out‐
149                 put  is connected to a terminal. More resources are used when
150                 colouring is enabled, because pcregrep has to search for  all
151                 possible  matches in a line, not just one, in order to colour
152                 them all.
153
154                 The colour that is used can be specified by setting the envi‐
155                 ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
156                 of this variable should be a string of two numbers, separated
157                 by  a  semicolon.  They  are copied directly into the control
158                 string for setting colour  on  a  terminal,  so  it  is  your
159                 responsibility  to ensure that they make sense. If neither of
160                 the environment variables is  set,  the  default  is  "1;31",
161                 which gives red.
162
163       -D action, --devices=action
164                 If  an  input  path  is  not  a  regular file or a directory,
165                 "action" specifies how it is to be  processed.  Valid  values
166                 are "read" (the default) or "skip" (silently skip the path).
167
168       -d action, --directories=action
169                 If an input path is a directory, "action" specifies how it is
170                 to be processed.  Valid  values  are  "read"  (the  default),
171                 "recurse"  (equivalent to the -r option), or "skip" (silently
172                 skip the path). In the default case, directories are read  as
173                 if  they  were  ordinary files. In some operating systems the
174                 effect of reading a directory like this is an immediate  end-
175                 of-file.
176
177       -e pattern, --regex=pattern, --regexp=pattern
178                 Specify a pattern to be matched. This option can be used mul‐
179                 tiple times in order to specify several patterns. It can also
180                 be  used  as a way of specifying a single pattern that starts
181                 with a hyphen. When -e is used, no argument pattern is  taken
182                 from  the  command  line;  all  arguments are treated as file
183                 names. There is an overall maximum of 100 patterns. They  are
184                 applied  to  each line in the order in which they are defined
185                 until one matches (or fails to match if -v is used). If -f is
186                 used  with  -e,  the command line patterns are matched first,
187                 followed by the patterns from the file,  independent  of  the
188                 order  in which these options are specified. Note that multi‐
189                 ple use of -e is not the same as a single pattern with alter‐
190                 natives. For example, X|Y finds the first character in a line
191                 that is X or Y, whereas if the two patterns are  given  sepa‐
192                 rately, pcregrep finds X if it is present, even if it follows
193                 Y in the line. It finds Y only if there is no X in the  line.
194                 This  really  matters  only  if  you are using -o to show the
195                 part(s) of the line that matched.
196
197       --exclude=pattern
198                 When pcregrep is searching the files in a directory as a con‐
199                 sequence  of  the  -r  (recursive search) option, any regular
200                 files whose names match the pattern are excluded. Subdirecto‐
201                 ries  are  not  excluded  by  this  option; they are searched
202                 recursively, subject to the --exclude_dir  and  --include_dir
203                 options.  The  pattern  is  a PCRE regular expression, and is
204                 matched against the final component of the file name (not the
205                 entire  path).  If  a  file  name  matches both --include and
206                 --exclude, it is excluded.  There is no short form  for  this
207                 option.
208
209       --exclude_dir=pattern
210                 When  pcregrep  is searching the contents of a directory as a
211                 consequence of the -r (recursive search) option,  any  subdi‐
212                 rectories  whose  names match the pattern are excluded. (Note
213                 that the --exclude option does  not  affect  subdirectories.)
214                 The  pattern  is  a  PCRE  regular expression, and is matched
215                 against the final component  of  the  name  (not  the  entire
216                 path).  If a subdirectory name matches both --include_dir and
217                 --exclude_dir, it is excluded. There is  no  short  form  for
218                 this option.
219
220       -F, --fixed-strings
221                 Interpret  each pattern as a list of fixed strings, separated
222                 by newlines, instead of  as  a  regular  expression.  The  -w
223                 (match  as  a  word) and -x (match whole line) options can be
224                 used with -F. They apply to each of the fixed strings. A line
225                 is selected if any of the fixed strings are found in it (sub‐
226                 ject to -w or -x, if present).
227
228       -f filename, --file=filename
229                 Read a number of patterns from the file, one  per  line,  and
230                 match  them against each line of input. A data line is output
231                 if any of the patterns match it. The filename can be given as
232                 "-" to refer to the standard input. When -f is used, patterns
233                 specified on the command line using -e may also  be  present;
234                 they are tested before the file's patterns. However, no other
235                 pattern is taken from the command  line;  all  arguments  are
236                 treated  as  file  names.  There is an overall maximum of 100
237                 patterns. Trailing white space is removed from each line, and
238                 blank  lines  are ignored. An empty file contains no patterns
239                 and therefore matches nothing. See also  the  comments  about
240                 multiple  patterns  versus a single pattern with alternatives
241                 in the description of -e above.
242
243       --file-offsets
244                 Instead of showing lines or parts of lines that  match,  show
245                 each  match  as  an  offset  from the start of the file and a
246                 length, separated by a comma. In this  mode,  no  context  is
247                 shown.  That  is,  the -A, -B, and -C options are ignored. If
248                 there is more than one match in a line, each of them is shown
249                 separately.  This  option  is mutually exclusive with --line-
250                 offsets and --only-matching.
251
252       -H, --with-filename
253                 Force the inclusion of the filename at the  start  of  output
254                 lines  when searching a single file. By default, the filename
255                 is not shown in this case. For matching lines,  the  filename
256                 is followed by a colon; for context lines, a hyphen separator
257                 is used. If a line number is also being  output,  it  follows
258                 the file name.
259
260       -h, --no-filename
261                 Suppress  the output filenames when searching multiple files.
262                 By default, filenames  are  shown  when  multiple  files  are
263                 searched.  For  matching lines, the filename is followed by a
264                 colon; for context lines, a hyphen separator is used.   If  a
265                 line number is also being output, it follows the file name.
266
267       --help    Output  a  help  message, giving brief details of the command
268                 options and file type support, and then exit.
269
270       -i, --ignore-case
271                 Ignore upper/lower case distinctions during comparisons.
272
273       --include=pattern
274                 When pcregrep is searching the files in a directory as a con‐
275                 sequence of the -r (recursive search) option, only those reg‐
276                 ular files whose names match the pattern are included. Subdi‐
277                 rectories  are always included and searched recursively, sub‐
278                 ject to the --include_dir and --exclude_dir options. The pat‐
279                 tern is a PCRE regular expression, and is matched against the
280                 final component of the file name (not the entire path). If  a
281                 file  name  matches  both  --include  and  --exclude,  it  is
282                 excluded. There is no short form for this option.
283
284       --include_dir=pattern
285                 When pcregrep is searching the contents of a directory  as  a
286                 consequence  of  the -r (recursive search) option, only those
287                 subdirectories whose names match the  pattern  are  included.
288                 (Note  that  the --include option does not affect subdirecto‐
289                 ries.) The pattern is  a  PCRE  regular  expression,  and  is
290                 matched  against  the  final  component  of the name (not the
291                 entire  path).  If   a   subdirectory   name   matches   both
292                 --include_dir  and --exclude_dir, it is excluded. There is no
293                 short form for this option.
294
295       -L, --files-without-match
296                 Instead of outputting lines from the files, just  output  the
297                 names  of  the files that do not contain any lines that would
298                 have been output. Each file name is output once, on  a  sepa‐
299                 rate line.
300
301       -l, --files-with-matches
302                 Instead  of  outputting lines from the files, just output the
303                 names of the files containing lines that would have been out‐
304                 put.  Each  file  name  is  output  once, on a separate line.
305                 Searching normally stops as soon as a matching line is  found
306                 in  a  file.  However, if the -c (count) option is also used,
307                 matching continues in order to obtain the correct count,  and
308                 those  files  that  have  at least one match are listed along
309                 with their counts. Using this option with -c is a way of sup‐
310                 pressing the listing of files with no matches.
311
312       --label=name
313                 This option supplies a name to be used for the standard input
314                 when file names are being output. If not supplied, "(standard
315                 input)" is used. There is no short form for this option.
316
317       --line-buffered
318                 When  this  option is given, input is read and processed line
319                 by line, and the output  is  flushed  after  each  write.  By
320                 default,  input  is read in large chunks, unless pcregrep can
321                 determine that it is reading from a terminal (which  is  cur‐
322                 rently  possible only in Unix environments). Output to termi‐
323                 nal is normally automatically flushed by the  operating  sys‐
324                 tem.  This  option  can be useful when the input or output is
325                 attached to a pipe and you do not want pcregrep to buffer  up
326                 large  amounts  of data. However, its use will affect perfor‐
327                 mance, and the -M (multiline) option ceases to work.
328
329       --line-offsets
330                 Instead of showing lines or parts of lines that  match,  show
331                 each match as a line number, the offset from the start of the
332                 line, and a length. The line number is terminated by a  colon
333                 (as  usual; see the -n option), and the offset and length are
334                 separated by a comma. In this  mode,  no  context  is  shown.
335                 That  is, the -A, -B, and -C options are ignored. If there is
336                 more than one match in a line, each of them  is  shown  sepa‐
337                 rately. This option is mutually exclusive with --file-offsets
338                 and --only-matching.
339
340       --locale=locale-name
341                 This option specifies a locale to be used for pattern  match‐
342                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi‐
343                 ronment variables.  If  no  locale  is  specified,  the  PCRE
344                 library's  default (usually the "C" locale) is used. There is
345                 no short form for this option.
346
347       -M, --multiline
348                 Allow patterns to match more than one line. When this  option
349                 is given, patterns may usefully contain literal newline char‐
350                 acters and internal occurrences of ^ and  $  characters.  The
351                 output  for  any one match may consist of more than one line.
352                 When this option is set, the PCRE library is called in  "mul‐
353                 tiline"  mode.   There is a limit to the number of lines that
354                 can be matched, imposed by the way that pcregrep buffers  the
355                 input  file as it scans it. However, pcregrep ensures that at
356                 least 8K characters or the rest of the document (whichever is
357                 the  shorter)  are  available for forward matching, and simi‐
358                 larly the previous 8K characters (or all the previous charac‐
359                 ters,  if  fewer  than 8K) are guaranteed to be available for
360                 lookbehind assertions. This option does not work  when  input
361                 is read line by line (see --line-buffered.)
362
363       -N newline-type, --newline=newline-type
364                 The  PCRE  library  supports  five  different conventions for
365                 indicating the ends of lines. They are  the  single-character
366                 sequences  CR  (carriage  return) and LF (linefeed), the two-
367                 character sequence CRLF, an "anycrlf" convention, which  rec‐
368                 ognizes  any  of the preceding three types, and an "any" con‐
369                 vention, in which any Unicode line ending sequence is assumed
370                 to  end a line. The Unicode sequences are the three just men‐
371                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF   (formfeed,
372                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
373                 U+2028), and PS (paragraph separator, U+2029).
374
375                 When  the  PCRE  library  is  built,  a  default  line-ending
376                 sequence   is  specified.   This  is  normally  the  standard
377                 sequence for the operating system. Unless otherwise specified
378                 by  this  option,  pcregrep  uses the library's default.  The
379                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
380                 ANY.  This  makes  it  possible to use pcregrep on files that
381                 have come from other environments without  having  to  modify
382                 their  line  endings.  If the data that is being scanned does
383                 not agree with the convention set by  this  option,  pcregrep
384                 may behave in strange ways.
385
386       -n, --line-number
387                 Precede each output line by its line number in the file, fol‐
388                 lowed by a colon for matching lines or a hyphen  for  context
389                 lines.  If the filename is also being output, it precedes the
390                 line number. This option is forced if --line-offsets is used.
391
392       -o, --only-matching
393                 Show only the part of the line that  matched  a  pattern.  In
394                 this  mode,  no context is shown. That is, the -A, -B, and -C
395                 options are ignored. If there is more than  one  match  in  a
396                 line,  each  of  them  is shown separately. If -o is combined
397                 with -v (invert the sense of the match to  find  non-matching
398                 lines),  no  output  is generated, but the return code is set
399                 appropriately. This option is mutually exclusive with --file-
400                 offsets and --line-offsets.
401
402       -q, --quiet
403                 Work quietly, that is, display nothing except error messages.
404                 The exit status indicates whether or  not  any  matches  were
405                 found.
406
407       -r, --recursive
408                 If  any given path is a directory, recursively scan the files
409                 it contains, taking note of any --include and --exclude  set‐
410                 tings.  By  default, a directory is read as a normal file; in
411                 some operating systems this gives an  immediate  end-of-file.
412                 This  option  is  a  shorthand  for  setting the -d option to
413                 "recurse".
414
415       -s, --no-messages
416                 Suppress error  messages  about  non-existent  or  unreadable
417                 files.  Such  files  are quietly skipped. However, the return
418                 code is still 2, even if matches were found in other files.
419
420       -u, --utf-8
421                 Operate in UTF-8 mode. This option is available only if  PCRE
422                 has  been compiled with UTF-8 support. Both patterns and sub‐
423                 ject lines must be valid strings of UTF-8 characters.
424
425       -V, --version
426                 Write the version numbers of pcregrep and  the  PCRE  library
427                 that is being used to the standard error stream.
428
429       -v, --invert-match
430                 Invert  the  sense  of  the match, so that lines which do not
431                 match any of the patterns are the ones that are found.
432
433       -w, --word-regex, --word-regexp
434                 Force the patterns to match only whole words. This is equiva‐
435                 lent to having \b at the start and end of the pattern.
436
437       -x, --line-regex, --line-regexp
438                 Force  the  patterns to be anchored (each must start matching
439                 at the beginning of a line) and in addition, require them  to
440                 match  entire  lines.  This  is  equivalent to having ^ and $
441                 characters at the start and end of each alternative branch in
442                 every pattern.
443

ENVIRONMENT VARIABLES

445
446       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
447       order, for a locale. The first one that is set is  used.  This  can  be
448       overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
449       library's default (usually the "C" locale) is used.
450

NEWLINES

452
453       The -N (--newline) option allows pcregrep to scan files with  different
454       newline  conventions  from  the  default.  However, the setting of this
455       option does not affect the way in which pcregrep writes information  to
456       the  standard  error  and  output streams. It uses the string "\n" in C
457       printf() calls to indicate newlines, relying on the C  I/O  library  to
458       convert  this  to  an  appropriate  sequence if the output is sent to a
459       file.
460

OPTIONS COMPATIBILITY

462
463       The majority of short and long forms of pcregrep's options are the same
464       as  in  the  GNU grep program. Any long option of the form --xxx-regexp
465       (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
466       However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
467       specific to pcregrep. If both the -c and -l options are given, GNU grep
468       lists only file names, without counts, but pcregrep gives the counts.
469

OPTIONS WITH DATA

471
472       There are four different ways in which an option with data can be spec‐
473       ified.  If a short form option is used, the  data  may  follow  immedi‐
474       ately, or in the next command line item. For example:
475
476         -f/some/file
477         -f /some/file
478
479       If  a long form option is used, the data may appear in the same command
480       line item, separated by an equals character, or (with one exception) it
481       may appear in the next command line item. For example:
482
483         --file=/some/file
484         --file /some/file
485
486       Note,  however, that if you want to supply a file name beginning with ~
487       as data in a shell command, and have the  shell  expand  ~  to  a  home
488       directory, you must separate the file name from the option, because the
489       shell does not treat ~ specially unless it is at the start of an item.
490
491       The exception to the above is the --colour  (or  --color)  option,  for
492       which  the  data is optional. If this option does have data, it must be
493       given in the first form, using an equals character. Otherwise  it  will
494       be assumed that it has no data.
495

MATCHING ERRORS

497
498       It  is  possible  to supply a regular expression that takes a very long
499       time to fail to match certain lines.  Such  patterns  normally  involve
500       nested  indefinite repeats, for example: (a+)*\d when matched against a
501       line of a's with no final digit.  The  PCRE  matching  function  has  a
502       resource  limit that causes it to abort in these circumstances. If this
503       happens, pcregrep outputs an error message and the line that caused the
504       problem  to  the  standard error stream. If there are more than 20 such
505       errors, pcregrep gives up.
506

DIAGNOSTICS

508
509       Exit status is 0 if any matches were found, 1 if no matches were found,
510       and  2 for syntax errors and non-existent or inacessible files (even if
511       matches were found in other files) or too many matching  errors.  Using
512       the  -s  option to suppress error messages about inaccessble files does
513       not affect the return code.
514

SEE ALSO

516
517       pcrepattern(3), pcretest(1).
518

AUTHOR

520
521       Philip Hazel
522       University Computing Service
523       Cambridge CB2 3QH, England.
524

REVISION

526
527       Last updated: 21 May 2010
528       Copyright (c) 1997-2010 University of Cambridge.
529
530
531
532                                                                   PCREGREP(1)
Impressum