pdfgrep(1)

1PDFGREP(1)                      Pdfgrep Manual                      PDFGREP(1)
2
3
4

NAME

6       pdfgrep - search PDF files for a regular expression
7

SYNOPSIS

9       pdfgrep [OPTION...] PATTERN [FILE...]
10       pdfgrep [OPTION...] [-e PATTERN | -f FILE] [FILE...]
11

DESCRIPTION

13       Search for PATTERN in each PDF FILE and print matching lines. By
14       default, PATTERN is an extended regular expression.
15
16       pdfgrep tries to be mostly compatible with GNU grep with some
17       PDF-specific distinctions and additional options. Most notably, -n
18       prints page instead of line numbers.
19

OPTIONS

21   General Information
22       --help
23           Print a short summary of the options.
24
25       -V, --version
26           Show version information.
27
28   Pattern Interpretation
29       -F, --fixed-strings
30           Interpret PATTERN as a list of fixed strings separated by newlines,
31           any of which is to be matched.
32
33       -P, --perl-regexp
34           Interpret PATTERN as a Perl compatible regular expression (PCRE).
35           See pcresyntax(3) for a quick overview.
36
37   Matching Control
38       -e PATTERN, --regexp=PATTERN
39           Use PATTERN as the pattern to search for. If this option is
40           specified multiple times or combined with --file, all patterns are
41           tried in turn until one of them matches.
42
43       -f FILE, --file=FILE
44           Read patterns from FILE, one per line. If FILE contains multiple
45           patterns or if this option is applied multiple times or combined
46           with -e, all patterns are tried in turn until one of them matches.
47           An empty pattern list matches nothing.
48
49       -i, --ignore-case
50           Ignore case distinctions in both the PATTERN and the input files.
51
52   General Output Control
53       -c, --count
54           Suppress normal output. Instead print the number of matches for
55           each input file. Note that unlike grep, multiple matches on the
56           same page will be counted individually.
57
58       -p, --page-count
59           Like -c, but prints the number of matches per page. Implies -n.
60
61       --color WHEN
62           Surround file names, page numbers and matched text with escape
63           sequences to display them in color on the terminal.  WHEN can be:
64
65
66
67           always   Always use colors, even
68                    when stdout is not a
69                    terminal.
70           never    Do not use colors.
71           auto     Use colors only when
72                    stdout is a terminal (this
73                    is the default).
74
75
76       -L, --files-without-match
77           Suppress normal output. Instead print the name of each input file
78           that doesn’t contain a match. This works well with -Z, but many
79           other output options like -n or -c are ignored when -L is
80           specified.
81
82       -l, --files-with-matches
83           Suppress normal output. Instead print the name of each input file
84           that contains a match. This works well with -Z, but many other
85           output options like -n or -c are ignored when -l is specified.
86
87       -m, --max-count NUM
88           Stop reading a file after NUM matches. When the -c or --count
89           option is also used, pdfgrep does not output a count greater than
90           NUM.
91
92       -o, --only-matching
93           Print only the matched part of a line without any surrounding
94           context.
95
96       -q, --quiet
97           Suppress all normal output to stdout. Exit immediately with exit
98           status 0 if a match is found, even in case of errors. Use this if
99           you only care about the presence of matches, not their number or
100           content.
101
102   Line Prefix Control
103       -H, --with-filename
104           Print the file name for each match. This is the default setting
105           when there is more than one file to search.
106
107       -h, --no-filename
108           Suppress the prefixing of file name on output. This is the default
109           setting when there is only one file to search.
110
111       -n, --page-number
112           Prefix each match with the number of the page where it was found.
113
114       -Z, --null
115           Output a null byte (called NUL in ASCII and '\0' in C) instead of
116           the colon that usually separates a filename from the rest of the
117           line. This option makes the output unambiguous in the presence of
118           colons, spaces or newlines in the filename. It can be used in
119           conjunction with commands such as xargs -0 or perl -0.
120
121       --match-prefix-separator SEP
122           Changes the colon used to separate filename, line number and text
123           in the output to SEP, which can be an arbitrary string. This is
124           useful when filenames contain colons, but only for interactive
125           usage. For scripting, --null should be used.
126
127   Context Control
128       -A NUM, --after-context=NUM
129           Print NUM lines of context after matching lines. Contiguous groups
130           of matches are separated by a line containing --. With -o, this
131           option has no effect.
132
133       -B NUM, --before-context=NUM
134           Print NUM lines of context before matching lines. Contiguous groups
135           of matches are separated by a line containing --. With -o, this
136           option has no effect.
137
138       -C NUM, --context=NUM
139           Print NUM lines of context before and after matching lines.
140           Contiguous groups of matches are separated by a line containing --.
141           With -o, this option has no effect.
142
143   File Selection
144       -r, --recursive
145           Recursively search all files (restricted by --include and
146           --exclude) under each directory, following symlinks only if they
147           are on the command line.
148
149       -R, --dereference-recursive
150           Same as -r, but follows all symlinks.
151
152       --exclude=GLOB
153           Skip files whose base name matches GLOB. See glob(7) for wildcards
154           you can use. You can use this option multiple times to exclude more
155           patterns. It takes precedence over --include. Note, that in- and
156           excludes apply only to files found via --recursive and not to the
157           argument list.
158
159       --include=GLOB
160           Only search files whose base name matches GLOB. See --exclude for
161           details. The default is *.pdf.
162
163   Other Options
164       --cache
165           Use a cache for the rendered text to speed up the operation on
166           large files.
167
168       --password=PASSWORD
169           Use PASSWORD to decrypt the PDF-files. Can be specified multiple
170           times; all passwords will be tried on all PDFs.  Note that this
171           password will show up in your command history and the output of
172           ps(1). So please do not use this if the security of PASSWORD is
173           important.
174
175       --page-range=RANGE
176           Limit search to a specified set of pages.  RANGE is a comma
177           separated list of either a single page number or a range expression
178           of the form PAGE1-PAGE2. Example: 2-3,5,7-10.
179
180       --debug
181           Enable debug output.  Note: Due to limitations of poppler before
182           version 0.30.0, some debug output is also printed without --debug
183           when using such a poppler version.
184
185       --warn-empty
186           Print a warning to stderr if a PDF contains no searchable text.
187           This is the case for PDFs that consist only of images, for example
188           scanned documents.
189
190       --unac
191           Remove accents and ligatures from both the search pattern and the
192           PDF documents. This is useful if you want to search for a word
193           containing "ae", but the PDF uses the single character "æ" instead.
194           See unac(3) and unaccent(1) for details.
195
196           This option is experimental and only available if pdfgrep is
197           compiled with unac support.
198

EXIT STATUS

200       Normally, the exit status is 0 if at least one match is found, 1 if no
201       match is found and 2 if an error occurred. But if the --quiet or -q
202       option is used and a match was found, pdfgrep will return 0 regardless
203       of errors.
204

ENVIRONMENT VARIABLES

206       The behavior of pdfgrep is affected by the following environment
207       variable.
208
209       GREP_COLORS
210           Specifies the colors and other attributes used to highlight various
211           parts of the output. The syntax and values are like GREP_COLORS of
212           grep. See grep(1) for more details. Currently only the capabilities
213           mt, ms, mc, fn, ln and se are used by pdfgrep, where mt, ms and mc
214           have the same effect.
215

FILES

217       ${XDG_CACHE_HOME}/pdfgrep/*
218           Cache files written and used when --cache is enabled. At most 200
219           cache entries older than a day are retained.
220

EXAMPLES

222       Print the first ten lines matching pattern and print their page number:
223
224               pdfgrep -n --max-count 10 pattern foo.pdf
225
226       Search all .pdf files whose names begin with foo recursively in the
227       current directory:
228
229               pdfgrep -r --include "foo*.pdf" pattern
230
231       Search all PDFs in the current directory for foo that also contain bar:
232
233               pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo
234
235       Search all .pdf files that are smaller than 12M recursively in the
236       current directory:
237
238               find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern
239
240           Note that in contrast to the previous examples, this task could not
241           be solved with pdfgrep alone, but the Unix tools find(1) and
242           xargs(1) had to be used. That’s because pdfgrep itself doesn’t
243           include options to exclude files by their size. But as you see, it
244           doesn’t have to!
245

BUGS

247   Reporting Bugs
248       Bugs can either be reportet to the mailing list
249       (pdfgrep-users@pdfgrep.org) or to the bugtracker on gitlab
250       (https://gitlab.com/pdfgrep/pdfgrep/issues).
251

AUTHORS

253       pdfgrep is maintained by Hans-Peter Deifel.
254
255       See the AUTHORS file in the source for a full list of contributors.
256