1PCREGREP(1)                 General Commands Manual                PCREGREP(1)
2
3
4

NAME

6       pcregrep - a grep with Perl-compatible regular expressions.
7

SYNOPSIS

9       pcregrep [options] [long options] [pattern] [path1 path2 ...]
10

DESCRIPTION

12
13       pcregrep  searches  files  for  character  patterns, in the same way as
14       other grep commands do, but it uses the PCRE regular expression library
15       to support patterns that are compatible with the regular expressions of
16       Perl 5. See pcrepattern(3) for a full description of syntax and  seman‐
17       tics of the regular expressions that PCRE supports.
18
19       Patterns,  whether  supplied on the command line or in a separate file,
20       are given without delimiters. For example:
21
22         pcregrep Thursday /etc/motd
23
24       If you attempt to use delimiters (for example, by surrounding a pattern
25       with  slashes,  as  is common in Perl scripts), they are interpreted as
26       part of the pattern. Quotes can of course be used on the  command  line
27       because they are interpreted by the shell, and indeed they are required
28       if a pattern contains white space or shell metacharacters.
29
30       The first argument that follows any option settings is treated  as  the
31       single  pattern  to be matched when neither -e nor -f is present.  Con‐
32       versely, when one or both of these options are  used  to  specify  pat‐
33       terns, all arguments are treated as path names. At least one of -e, -f,
34       or an argument pattern must be provided.
35
36       If no files are specified, pcregrep reads the standard input. The stan‐
37       dard  input  can  also  be  referenced by a name consisting of a single
38       hyphen.  For example:
39
40         pcregrep some-pattern /file1 - /file3
41
42       By default, each line that matches the pattern is copied to  the  stan‐
43       dard  output, and if there is more than one file, the file name is out‐
44       put at the start of each line. However,  there  are  options  that  can
45       change how pcregrep behaves. In particular, the -M option makes it pos‐
46       sible to search for patterns that span line boundaries. What defines  a
47       line boundary is controlled by the -N (--newline) option.
48
49       Patterns  are  limited  to  8K  or  BUFSIZ characters, whichever is the
50       greater.  BUFSIZ is defined in <stdio.h>.
51
52       If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
53       the  value to set a locale when calling the PCRE library.  The --locale
54       option can be used to override this.
55

OPTIONS

57
58       --        This terminate the list of options. It is useful if the  next
59                 item  on  the command line starts with a hyphen but is not an
60                 option. This allows for the processing of patterns and  file‐
61                 names that start with hyphens.
62
63       -A number, --after-context=number
64                 Output  number  lines of context after each matching line. If
65                 filenames and/or line numbers are being output, a hyphen sep‐
66                 arator  is  used  instead of a colon for the context lines. A
67                 line containing "--" is output between each group  of  lines,
68                 unless  they  are  in  fact contiguous in the input file. The
69                 value of number is expected to be relatively small.  However,
70                 pcregrep guarantees to have up to 8K of following text avail‐
71                 able for context output.
72
73       -B number, --before-context=number
74                 Output number lines of context before each matching line.  If
75                 filenames and/or line numbers are being output, a hyphen sep‐
76                 arator is used instead of a colon for the  context  lines.  A
77                 line  containing  "--" is output between each group of lines,
78                 unless they are in fact contiguous in  the  input  file.  The
79                 value  of number is expected to be relatively small. However,
80                 pcregrep guarantees to have up to 8K of preceding text avail‐
81                 able for context output.
82
83       -C number, --context=number
84                 Output  number  lines  of  context both before and after each
85                 matching line.  This is equivalent to setting both -A and  -B
86                 to the same value.
87
88       -c, --count
89                 Do  not  output individual lines; instead just output a count
90                 of the number of lines that would otherwise have been output.
91                 If  several  files  are  given, a count is output for each of
92                 them. In this mode, the -A, -B, and -C options are ignored.
93
94       --colour, --color
95                 If this option is given without any data, it is equivalent to
96                 "--colour=auto".   If  data  is required, it must be given in
97                 the same shell item, separated by an equals sign.
98
99       --colour=value, --color=value
100                 This option specifies under what circumstances the part of  a
101                 line that matched a pattern should be coloured in the output.
102                 The value may be "never" (the default), "always", or  "auto".
103                 In  the  latter  case, colouring happens only if the standard
104                 output is connected to a terminal. The colour can  be  speci‐
105                 fied  by  setting the environment variable PCREGREP_COLOUR or
106                 PCREGREP_COLOR. The value of this variable should be a string
107                 of  two  numbers,  separated by a semicolon.  They are copied
108                 directly into the control string for setting colour on a ter‐
109                 minal,  so it is your responsibility to ensure that they make
110                 sense. If neither of the environment variables  is  set,  the
111                 default is "1;31", which gives red.
112
113       -D action, --devices=action
114                 If  an  input  path  is  not  a  regular file or a directory,
115                 "action" specifies how it is to be  processed.  Valid  values
116                 are "read" (the default) or "skip" (silently skip the path).
117
118       -d action, --directories=action
119                 If an input path is a directory, "action" specifies how it is
120                 to be processed.  Valid  values  are  "read"  (the  default),
121                 "recurse"  (equivalent to the -r option), or "skip" (silently
122                 skip the path). In the default case, directories are read  as
123                 if  they  were  ordinary files. In some operating systems the
124                 effect of reading a directory like this is an immediate  end-
125                 of-file.
126
127       -e pattern, --regex=pattern,
128                 --regexp=pattern Specify a pattern to be matched. This option
129                 can be used multiple times in order to specify  several  pat‐
130                 terns.  It  can  also be used as a way of specifying a single
131                 pattern that starts with a hyphen. When -e is used, no  argu‐
132                 ment  pattern  is  taken from the command line; all arguments
133                 are treated as file names. There is an overall maximum of 100
134                 patterns. They are applied to each line in the order in which
135                 they are defined until one matches (or fails to match  if  -v
136                 is  used).  If  -f is used with -e, the command line patterns
137                 are matched first, followed by the patterns  from  the  file,
138                 independent  of  the  order in which these options are speci‐
139                 fied. Note that multiple use of -e is not the same as a  sin‐
140                 gle  pattern  with  alternatives.  For example, X|Y finds the
141                 first character in a line that is X or Y, whereas if the  two
142                 patterns  are  given  separately,  pcregrep  finds X if it is
143                 present, even if it follows Y in the line. It finds Y only if
144                 there  is  no  X in the line. This really matters only if you
145                 are using -o to show the portion of the line that matched.
146
147       --exclude=pattern
148                 When pcregrep is searching the files in a directory as a con‐
149                 sequence of the -r (recursive search) option, any files whose
150                 names match the pattern are excluded. The pattern is  a  PCRE
151                 regular expression. If a file name matches both --include and
152                 --exclude, it is excluded. There is no short  form  for  this
153                 option.
154
155       -F, --fixed-strings
156                 Interpret  each pattern as a list of fixed strings, separated
157                 by newlines, instead of  as  a  regular  expression.  The  -w
158                 (match  as  a  word) and -x (match whole line) options can be
159                 used with -F. They apply to each of the fixed strings. A line
160                 is selected if any of the fixed strings are found in it (sub‐
161                 ject to -w or -x, if present).
162
163       -f filename, --file=filename
164                 Read a number of patterns from the file, one  per  line,  and
165                 match  them against each line of input. A data line is output
166                 if any of the patterns match it. The filename can be given as
167                 "-" to refer to the standard input. When -f is used, patterns
168                 specified on the command line using -e may also  be  present;
169                 they are tested before the file's patterns. However, no other
170                 pattern is taken from the command  line;  all  arguments  are
171                 treated  as  file  names.  There is an overall maximum of 100
172                 patterns. Trailing white space is removed from each line, and
173                 blank  lines  are ignored. An empty file contains no patterns
174                 and therefore matches nothing.
175
176       -H, --with-filename
177                 Force the inclusion of the filename at the  start  of  output
178                 lines  when searching a single file. By default, the filename
179                 is not shown in this case. For matching lines,  the  filename
180                 is  followed  by  a  colon  and a space; for context lines, a
181                 hyphen separator is used. If a line number is also being out‐
182                 put, it follows the file name without a space.
183
184       -h, --no-filename
185                 Suppress  the output filenames when searching multiple files.
186                 By default, filenames  are  shown  when  multiple  files  are
187                 searched.  For  matching lines, the filename is followed by a
188                 colon and a space; for context lines, a hyphen  separator  is
189                 used.  If  a line number is also being output, it follows the
190                 file name without a space.
191
192       --help    Output a brief help message and exit.
193
194       -i, --ignore-case
195                 Ignore upper/lower case distinctions during comparisons.
196
197       --include=pattern
198                 When pcregrep is searching the files in a directory as a con‐
199                 sequence  of  the  -r  (recursive  search) option, only those
200                 files whose names match the pattern are included. The pattern
201                 is  a  PCRE  regular  expression. If a file name matches both
202                 --include and --exclude, it is excluded. There  is  no  short
203                 form for this option.
204
205       -L, --files-without-match
206                 Instead  of  outputting lines from the files, just output the
207                 names of the files that do not contain any lines  that  would
208                 have  been  output. Each file name is output once, on a sepa‐
209                 rate line.
210
211       -l, --files-with-matches
212                 Instead of outputting lines from the files, just  output  the
213                 names of the files containing lines that would have been out‐
214                 put. Each file name is  output  once,  on  a  separate  line.
215                 Searching  stops  as  soon  as  a matching line is found in a
216                 file.
217
218       --label=name
219                 This option supplies a name to be used for the standard input
220                 when file names are being output. If not supplied, "(standard
221                 input)" is used. There is no short form for this option.
222
223       --locale=locale-name
224                 This option specifies a locale to be used for pattern  match‐
225                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi‐
226                 ronment variables.  If  no  locale  is  specified,  the  PCRE
227                 library's  default (usually the "C" locale) is used. There is
228                 no short form for this option.
229
230       -M, --multiline
231                 Allow patterns to match more than one line. When this  option
232                 is given, patterns may usefully contain literal newline char‐
233                 acters and internal occurrences of ^ and  $  characters.  The
234                 output  for  any one match may consist of more than one line.
235                 When this option is set, the PCRE library is called in  "mul‐
236                 tiline"  mode.   There is a limit to the number of lines that
237                 can be matched, imposed by the way that pcregrep buffers  the
238                 input  file as it scans it. However, pcregrep ensures that at
239                 least 8K characters or the rest of the document (whichever is
240                 the  shorter)  are  available for forward matching, and simi‐
241                 larly the previous 8K characters (or all the previous charac‐
242                 ters,  if  fewer  than 8K) are guaranteed to be available for
243                 lookbehind assertions.
244
245       -N newline-type, --newline=newline-type
246                 The PCRE library  supports  five  different  conventions  for
247                 indicating  the  ends of lines. They are the single-character
248                 sequences CR (carriage return) and LF  (linefeed),  the  two-
249                 character  sequence CRLF, an "anycrlf" convention, which rec‐
250                 ognizes any of the preceding three types, and an  "any"  con‐
251                 vention, in which any Unicode line ending sequence is assumed
252                 to end a line. The Unicode sequences are the three just  men‐
253                 tioned,   plus  VT  (vertical  tab,  U+000B),  FF  (formfeed,
254                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
255                 U+2028), and PS (paragraph separator, U+2029).
256
257                 When  the  PCRE  library  is  built,  a  default  line-ending
258                 sequence  is  specified.   This  is  normally  the   standard
259                 sequence for the operating system. Unless otherwise specified
260                 by this option, pcregrep uses  the  library's  default.   The
261                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
262                 ANY. This makes it possible to use  pcregrep  on  files  that
263                 have  come  from  other environments without having to modify
264                 their line endings. If the data that is  being  scanned  does
265                 not  agree  with  the convention set by this option, pcregrep
266                 may behave in strange ways.
267
268       -n, --line-number
269                 Precede each output line by its line number in the file, fol‐
270                 lowed  by  a colon and a space for matching lines or a hyphen
271                 and a space for context lines. If the filename is also  being
272                 output, it precedes the line number.
273
274       -o, --only-matching
275                 Show  only  the  part  of the line that matched a pattern. In
276                 this mode, no context is shown. That is, the -A, -B,  and  -C
277                 options are ignored.
278
279       -q, --quiet
280                 Work quietly, that is, display nothing except error messages.
281                 The exit status indicates whether or  not  any  matches  were
282                 found.
283
284       -r, --recursive
285                 If  any given path is a directory, recursively scan the files
286                 it contains, taking note of any --include and --exclude  set‐
287                 tings.  By  default, a directory is read as a normal file; in
288                 some operating systems this gives an  immediate  end-of-file.
289                 This  option  is  a  shorthand  for  setting the -d option to
290                 "recurse".
291
292       -s, --no-messages
293                 Suppress error  messages  about  non-existent  or  unreadable
294                 files.  Such  files  are quietly skipped. However, the return
295                 code is still 2, even if matches were found in other files.
296
297       -u, --utf-8
298                 Operate in UTF-8 mode. This option is available only if  PCRE
299                 has  been compiled with UTF-8 support. Both patterns and sub‐
300                 ject lines must be valid strings of UTF-8 characters.
301
302       -V, --version
303                 Write the version numbers of pcregrep and  the  PCRE  library
304                 that is being used to the standard error stream.
305
306       -v, --invert-match
307                 Invert  the  sense  of  the match, so that lines which do not
308                 match any of the patterns are the ones that are found.
309
310       -w, --word-regex, --word-regexp
311                 Force the patterns to match only whole words. This is equiva‐
312                 lent to having \b at the start and end of the pattern.
313
314       -x, --line-regex, --line-regexp
315                 Force  the  patterns to be anchored (each must start matching
316                 at the beginning of a line) and in addition, require them  to
317                 match  entire  lines.  This  is  equivalent to having ^ and $
318                 characters at the start and end of each alternative branch in
319                 every pattern.
320

ENVIRONMENT VARIABLES

322
323       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
324       order, for a locale. The first one that is set is  used.  This  can  be
325       overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
326       library's default (usually the "C" locale) is used.
327

NEWLINES

329
330       The -N (--newline) option allows pcregrep to scan files with  different
331       newline  conventions  from  the  default.  However, the setting of this
332       option does not affect the way in which pcregrep writes information  to
333       the  standard  error  and  output streams. It uses the string "\n" in C
334       printf() calls to indicate newlines, relying on the C  I/O  library  to
335       convert  this  to  an  appropriate  sequence if the output is sent to a
336       file.
337

OPTIONS COMPATIBILITY

339
340       The majority of short and long forms of pcregrep's options are the same
341       as  in  the  GNU grep program. Any long option of the form --xxx-regexp
342       (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
343       However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
344       specific to pcregrep.
345

OPTIONS WITH DATA

347
348       There are four different ways in which an option with data can be spec‐
349       ified.   If  a  short  form option is used, the data may follow immedi‐
350       ately, or in the next command line item. For example:
351
352         -f/some/file
353         -f /some/file
354
355       If a long form option is used, the data may appear in the same  command
356       line item, separated by an equals character, or (with one exception) it
357       may appear in the next command line item. For example:
358
359         --file=/some/file
360         --file /some/file
361
362       Note, however, that if you want to supply a file name beginning with  ~
363       as  data  in  a  shell  command,  and have the shell expand ~ to a home
364       directory, you must separate the file name from the option, because the
365       shell does not treat ~ specially unless it is at the start of an item.
366
367       The  exception  to  the  above is the --colour (or --color) option, for
368       which the data is optional. If this option does have data, it  must  be
369       given  in  the first form, using an equals character. Otherwise it will
370       be assumed that it has no data.
371

MATCHING ERRORS

373
374       It is possible to supply a regular expression that takes  a  very  long
375       time  to  fail  to  match certain lines. Such patterns normally involve
376       nested indefinite repeats, for example: (a+)*\d when matched against  a
377       line  of  a's  with  no  final  digit. The PCRE matching function has a
378       resource limit that causes it to abort in these circumstances. If  this
379       happens, pcregrep outputs an error message and the line that caused the
380       problem to the standard error stream. If there are more  than  20  such
381       errors, pcregrep gives up.
382

DIAGNOSTICS

384
385       Exit status is 0 if any matches were found, 1 if no matches were found,
386       and 2 for syntax errors and non-existent or inacessible files (even  if
387       matches  were  found in other files) or too many matching errors. Using
388       the -s option to suppress error messages about inaccessble  files  does
389       not affect the return code.
390

SEE ALSO

392
393       pcrepattern(3), pcretest(1).
394

AUTHOR

396
397       Philip Hazel
398       University Computing Service
399       Cambridge CB2 3QH, England.
400

REVISION

402
403       Last updated: 16 April 2007
404       Copyright (c) 1997-2007 University of Cambridge.
405
406
407
408                                                                   PCREGREP(1)
Impressum