1PDFGREP(1) Pdfgrep Manual PDFGREP(1)
2
3
4
6 pdfgrep - search PDF files for a regular expression
7
9 pdfgrep [OPTION...] PATTERN [FILE...]
10 pdfgrep [OPTION...] [-e PATTERN | -f FILE] [FILE...]
11
13 Search for PATTERN in each PDF FILE and print matching lines. By
14 default, PATTERN is an extended regular expression.
15
16 pdfgrep tries to be mostly compatible with GNU grep with some
17 PDF-specific distinctions and additional options. Most notably, -n
18 prints page instead of line numbers.
19
21 General Information
22 --help
23 Print a short summary of the options.
24
25 -V, --version
26 Show version information.
27
28 Pattern Interpretation
29 -F, --fixed-strings
30 Interpret PATTERN as a list of fixed strings separated by newlines,
31 any of which is to be matched.
32
33 -P, --perl-regexp
34 Interpret PATTERN as a Perl compatible regular expression (PCRE).
35 See pcresyntax(3) for a quick overview.
36
37 Matching Control
38 -e PATTERN, --regexp=PATTERN
39 Use PATTERN as the pattern to search for. If this option is
40 specified multiple times or combined with --file, all patterns are
41 tried in turn until one of them matches.
42
43 -f FILE, --file=FILE
44 Read patterns from FILE, one per line. If FILE contains multiple
45 patterns or if this option is applied multiple times or combined
46 with -e, all patterns are tried in turn until one of them matches.
47 An empty pattern list matches nothing.
48
49 -i, --ignore-case
50 Ignore case distinctions in both the PATTERN and the input files.
51
52 General Output Control
53 -c, --count
54 Suppress normal output. Instead print the number of matches for
55 each input file. Note that unlike grep, multiple matches on the
56 same page will be counted individually.
57
58 -p, --page-count
59 Like -c, but prints the number of matches per page. Implies -n.
60
61 --color WHEN
62 Surround file names, page numbers and matched text with escape
63 sequences to display them in color on the terminal. WHEN can be:
64
65
66
67 always Always use colors, even
68 when stdout is not a
69 terminal.
70 never Do not use colors.
71 auto Use colors only when
72 stdout is a terminal (this
73 is the default).
74
75
76 -L, --files-without-match
77 Suppress normal output. Instead print the name of each input file
78 that doesn’t contain a match. This works well with -Z, but many
79 other output options like -n or -c are ignored when -L is
80 specified.
81
82 -l, --files-with-matches
83 Suppress normal output. Instead print the name of each input file
84 that contains a match. This works well with -Z, but many other
85 output options like -n or -c are ignored when -l is specified.
86
87 -m, --max-count NUM
88 Stop reading a file after NUM matches. When the -c or --count
89 option is also used, pdfgrep does not output a count greater than
90 NUM.
91
92 -o, --only-matching
93 Print only the matched part of a line without any surrounding
94 context.
95
96 -q, --quiet
97 Suppress all normal output to stdout. Exit immediately with exit
98 status 0 if a match is found, even in case of errors. Use this if
99 you only care about the presence of matches, not their number or
100 content.
101
102 Line Prefix Control
103 -H, --with-filename
104 Print the file name for each match. This is the default setting
105 when there is more than one file to search.
106
107 -h, --no-filename
108 Suppress the prefixing of file name on output. This is the default
109 setting when there is only one file to search.
110
111 -n, --page-number
112 Prefix each match with the number of the page where it was found.
113
114 -Z, --null
115 Output a null byte (called NUL in ASCII and '\0' in C) instead of
116 the colon that usually separates a filename from the rest of the
117 line. This option makes the output unambiguous in the presence of
118 colons, spaces or newlines in the filename. It can be used in
119 conjunction with commands such as xargs -0 or perl -0.
120
121 --match-prefix-separator SEP
122 Changes the colon used to separate filename, line number and text
123 in the output to SEP, which can be an arbitrary string. This is
124 useful when filenames contain colons, but only for interactive
125 usage. For scripting, --null should be used.
126
127 Context Control
128 -A NUM, --after-context=NUM
129 Print NUM lines of context after matching lines. Contiguous groups
130 of matches are separated by a line containing --. With -o, this
131 option has no effect.
132
133 -B NUM, --before-context=NUM
134 Print NUM lines of context before matching lines. Contiguous groups
135 of matches are separated by a line containing --. With -o, this
136 option has no effect.
137
138 -C NUM, --context=NUM
139 Print NUM lines of context before and after matching lines.
140 Contiguous groups of matches are separated by a line containing --.
141 With -o, this option has no effect.
142
143 File Selection
144 -r, --recursive
145 Recursively search all files (restricted by --include and
146 --exclude) under each directory, following symlinks only if they
147 are on the command line.
148
149 -R, --dereference-recursive
150 Same as -r, but follows all symlinks.
151
152 --exclude=GLOB
153 Skip files whose base name matches GLOB. See glob(7) for wildcards
154 you can use. You can use this option multiple times to exclude more
155 patterns. It takes precedence over --include. Note, that in- and
156 excludes apply only to files found via --recursive and not to the
157 argument list.
158
159 --include=GLOB
160 Only search files whose base name matches GLOB. See --exclude for
161 details. The default is *.pdf.
162
163 Other Options
164 --cache
165 Use a cache for the rendered text to speed up the operation on
166 large files.
167
168 --password=PASSWORD
169 Use PASSWORD to decrypt the PDF-files. Can be specified multiple
170 times; all passwords will be tried on all PDFs. Note that this
171 password will show up in your command history and the output of
172 ps(1). So please do not use this if the security of PASSWORD is
173 important.
174
175 --page-range=RANGE
176 Limit search to a specified set of pages. RANGE is a comma
177 separated list of either a single page number or a range expression
178 of the form PAGE1-PAGE2. Example: 2-3,5,7-10.
179
180 --debug
181 Enable debug output. Note: Due to limitations of poppler before
182 version 0.30.0, some debug output is also printed without --debug
183 when using such a poppler version.
184
185 --warn-empty
186 Print a warning to stderr if a PDF contains no searchable text.
187 This is the case for PDFs that consist only of images, for example
188 scanned documents.
189
190 --unac
191 Remove accents and ligatures from both the search pattern and the
192 PDF documents. This is useful if you want to search for a word
193 containing "ae", but the PDF uses the single character "æ" instead.
194 See unac(3) and unaccent(1) for details.
195
196 This option is experimental and only available if pdfgrep is
197 compiled with unac support.
198
200 Normally, the exit status is 0 if at least one match is found, 1 if no
201 match is found and 2 if an error occurred. But if the --quiet or -q
202 option is used and a match was found, pdfgrep will return 0 regardless
203 of errors.
204
206 The behavior of pdfgrep is affected by the following environment
207 variable.
208
209 GREP_COLORS
210 Specifies the colors and other attributes used to highlight various
211 parts of the output. The syntax and values are like GREP_COLORS of
212 grep. See grep(1) for more details. Currently only the capabilities
213 mt, ms, mc, fn, ln and se are used by pdfgrep, where mt, ms and mc
214 have the same effect.
215
217 ${XDG_CACHE_HOME}/pdfgrep/*
218 Cache files written and used when --cache is enabled. At most 200
219 cache entries older than a day are retained.
220
222 Print the first ten lines matching pattern and print their page number:
223
224 pdfgrep -n --max-count 10 pattern foo.pdf
225
226 Search all .pdf files whose names begin with foo recursively in the
227 current directory:
228
229 pdfgrep -r --include "foo*.pdf" pattern
230
231 Search all PDFs in the current directory for foo that also contain bar:
232
233 pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo
234
235 Search all .pdf files that are smaller than 12M recursively in the
236 current directory:
237
238 find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern
239
240 Note that in contrast to the previous examples, this task could not
241 be solved with pdfgrep alone, but the Unix tools find(1) and
242 xargs(1) had to be used. That’s because pdfgrep itself doesn’t
243 include options to exclude files by their size. But as you see, it
244 doesn’t have to!
245
247 Reporting Bugs
248 Bugs can either be reportet to the mailing list
249 (pdfgrep-users@pdfgrep.org) or to the bugtracker on gitlab
250 (https://gitlab.com/pdfgrep/pdfgrep/issues).
251
253 pdfgrep is maintained by Hans-Peter Deifel.
254
255 See the AUTHORS file in the source for a full list of contributors.
256
258 grep(1), pcre(3), regex(7)
259
260 See pdfgrep’s website https://pdfgrep.org for more information,
261 downloads, git repository and more.
262
263
264
265Pdfgrep 2.1.1 11/19/2018 PDFGREP(1)