1PDFTOHTML(1) General Commands Manual PDFTOHTML(1)
2
3
4
6 pdftohtml - program to convert PDF files into HTML, XML and PNG images
7
9 pdftohtml [options] <PDF-file> [<HTML-file> <XML-file>]
10
12 This manual page documents briefly the pdftohtml command. This manual
13 page was written for the Debian GNU/Linux distribution because the
14 original program does not have a manual page.
15
16 pdftohtml is a program that converts PDF documents into HTML. It gener‐
17 ates its output in the current working directory. If PDF-file is ´-',
18 it reads the PDF file from stdin.
19
21 A summary of options are included below.
22
23 -h, -help
24 Show summary of options.
25
26 -f <int>
27 first page to print
28
29 -l <int>
30 last page to print
31
32 -q do not print any messages or errors
33
34 -v print copyright and version info
35
36 -p exchange .pdf links with .html
37
38 -c generate complex output
39
40 -s generate single HTML that includes all pages
41
42 -dataurls
43 use data URLs instead of external images in HTML. No available
44 in all platforms
45
46 -i ignore images
47
48 -noframes
49 generate no frames. Not supported in complex output mode.
50
51 -stdout
52 use standard output
53
54 -zoom <fp>
55 zoom the PDF document (default 1.5) (1 means 72 DPI)
56
57 -xml output for XML post-processing
58
59 -noroundcoord
60 do not round coordinates (with XML output only)
61
62 -enc <string>
63 output text encoding name
64
65 -opw <string>
66 owner password (for encrypted files)
67
68 -upw <string>
69 user password (for encrypted files)
70
71 -hidden
72 force hidden text extraction
73
74 -fmt image file format for Splash output (png or jpg). If complex is
75 selected, but -fmt is not specified, -fmt png will be assumed
76
77 -nomerge
78 do not merge paragraphs
79
80 -nodrm override document DRM settings
81
82 -wbt <fp>
83 adjust the word break threshold percent. Default is 10. Word
84 break occurs when distance between two adjacent characters is
85 greater than this percent of character height.
86
87 -fontfullname
88 outputs the font name without any substitutions.
89
90
92 Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is
93 based and benefits a lot from Derek Noonburg's xpdf package.
94
95 This manual page was written by Søren Boll Overgaard <boll@debian.org>,
96 for the Debian GNU/Linux system (but may be used by others).
97
99 pdfdetach(1), pdffonts(1), pdfimages(1), pdfinfo(1), pdftocairo(1),
100 pdftoppm(1), pdftops(1), pdftotext(1) pdfseparate(1), pdfsig(1), pdfu‐
101 nite(1)
102
103
104
105 PDFTOHTML(1)