1PDF::Builder::Docs(3) User Contributed Perl DocumentationPDF::Builder::Docs(3)
2
3
4
6 PDF::Builder::Docs - additional documentation for Builder module
7
9 Software Development Kit
10 There are four levels of involvement with PDF::Builder. Depending on
11 what you want to do, different kinds of installs are recommended.
12
13 1. Simply installing PDF::Builder as a prerequisite for running some
14 other package. All you need to do is install the CPAN package for
15 PDF::Builder, and it will load the .pm files into your Perl library. If
16 the other package prereqs PDF::Builder, its installer may download and
17 install PDF::Builder automatically.
18
19 2. You want to write a Perl program that uses PDF::Builder functions.
20 In addition to installing PDF::Builder from CPAN, you will want
21 documentation on it. Obtain a copy of the product from GitHub
22 (https://github.com/PhilterPaper/Perl-PDF-Builder) or as a gzipped tar
23 file from CPAN. This includes a utility to build (from POD) a library
24 of HTML documents, as well as examples (examples/ directory) and
25 contributed sample programs (contrib/ directory).
26
27 3. You want to modify PDF::Builder files. In addition to the CPAN and
28 GitHub distributions, you may choose to keep a local Git repository for
29 tracking your changes. Depending on whether or not your PDF::Builder
30 copy is being used for production purposes, you may want to do your
31 editing and testing in the Perl library installation (live) or in a
32 different place. The "t" tests (t/ directory) and examples provide good
33 regression tests to ensure that you haven't broken anything. If you do
34 your editing on the live code, don't forget when done to copy the
35 changes back into the master version you keep!
36
37 4. You want to contribute to the development of PDF::Builder. You will
38 need a local Git repository (and a GitHub account), so that when you've
39 got it all done, you can issue a "Pull Request" to bring it to our
40 attention. We can't guarantee that your work will be incorporated into
41 the project, but at least we will look at it. From time to time, a new
42 CPAN version will be issued.
43
44 If you want to make substantial changes for public use, and can't come
45 to a meeting of minds with us, you can even start your own GitHub
46 project and register a new CPAN project (that's what we did, forking
47 PDF::API2). Please don't just assume that we don't want your changes --
48 at least propose what you want to do in writing, so we can consider it.
49 We're always looking for people to help out and expand PDF::Builder.
50
51 Optional Libraries
52 PDF::Builder can make use of some optional libraries, which are not
53 required for a successful installation. If you want improved speed and
54 capabilities for certain functions, you may want to install and use
55 these libraries:
56
57 * Graphics::TIFF -- PDF::Builder inherited a rather slow, buggy, and
58 limited TIFF image library from PDF::API2. If Graphics::TIFF (available
59 on CPAN, uses libtiff.a) is installed, PDF::Builder will use that
60 instead, unless you specify that it is to use the old, pure Perl
61 library. The only time you might want to consider this is when you need
62 to pass an open filehandle to "image_tiff" instead of a file name. See
63 resolved bug reports RT 84665 and RT 118047, as well as "image_tiff",
64 for more information.
65
66 * Image::PNG::Libpng -- PDF::Builder inherited a rather slow and buggy
67 pure Perl PNG image library from PDF::API2. If Image::PNG::Libpng
68 (available on CPAN, uses libpng.a) is installed, PDF::Builder will use
69 that instead, unless you specify that it is to use the old, pure Perl
70 library. Using the new library will give you improved speed, the
71 ability to use 16 bit samples, and the ability to read interlaced PNG
72 files. See resolved bug report RT 124349, as well as "image_png", for
73 more information.
74
75 * HarfBuzz::Shaper -- This library enables PDF::Builder to handle
76 complex scripts (Arabic, Devanagari, etc.) as well as non-LTR writing
77 systems. It is also useful for Latin and other simple scripts, for
78 ligatures and improved kerning. HarfBuzz::Shaper is based on a set of
79 HarfBuzz libraries, which it will attempt to build if they are not
80 found. See "textHS" for more information.
81
82 * Text::Markdown -- This library is used if you want to format
83 "Markdown" style code in PDF::Builder, via the column() method. It
84 translates a certain dialect of Markdown into HTML, which is then
85 further processed.
86
87 * HTML::TreeBuilder -- This library is used to format HTML input into a
88 data structure which PDF::Builder can interpret, via the column()
89 method. Note that if Markdown input is used, it will also need
90 HTML::TreeBuilder to handle the HTML the Markdown is translated to.
91
92 Note that the installation process will not attempt to install these
93 libraries automatically. If you don't wish to use one or more of them,
94 you are free to not install the optional librarie(s). If you may want
95 to make use of one or more, consider installing them before installing
96 PDF::Builder, so that any t-tests and/or examples that make use of
97 these libraries may be run during installation and checkout of
98 PDF::Builder. Remember, you can always install an optional library
99 later, if you want to make use of it.
100
101 Strings (Character Text)
102 Perl, and hence PDF::Builder, use strings that support the full range
103 of Unicode characters. When importing strings into a Perl program, for
104 example by reading text from a file, you must be aware of what their
105 character encoding is. Single-byte encodings (default is 'latin1'),
106 represented as bytes of value 0x00 through 0xFF (0..255), will produce
107 different results if you do something that depends on the encoding,
108 such as sorting, searching, or comparing any two non-ASCII characters.
109 This also applies to any characters (text) hard coded into the Perl
110 program.
111
112 You can always decode the text from external encoding (ASCII, UTF-8,
113 Latin-3, etc.) into the Perl (internal) UTF-8 multibyte encoding. This
114 uses one to four bytes to represent each character. See pragma "utf8"
115 and module "Encode" for details about decoding text. Note that only
116 TrueType fonts ("ttfont") can make direct use of UTF-8-encoded text.
117 Other font types (core, T1, etc.) can only use single-byte encoded
118 text. If your text is ASCII, Latin-1, or CP-1252, you can just leave
119 the Perl strings as the default single-byte encoding.
120
121 Then, there is the matter of encoding the output to match up with
122 available font character sets. You're not actually translating the text
123 on output, but are telling the output system (and Reader) what encoding
124 the output byte stream represents, and what character glyphs they
125 should generate.
126
127 If you confine your text to plain ASCII (0x00 .. 0x7F byte values) or
128 even Latin-1 or CP-1252 (0x00 .. 0xFF byte values), you can use default
129 (non-UTF-8) Perl strings and use the default output encoding
130 (WinAnsiEncoding), which is more-or-less Windows CP-1252 (a superset in
131 turn, of ISO-8859-1 Latin-1). If your text uses any other characters,
132 you will need to be aware of what encoding your text strings are (in
133 the Perl string and for declaring output glyph generation). See "Core
134 Fonts", "PS Fonts" and "TrueType Fonts" in "FONT METHODS" for
135 additional information.
136
137 Some Internal Details
138
139 Some of the following may be a bit scary or confusing to beginners, so
140 don't be afraid to skip over it until you're ready for it...
141
142 Perl (and PDF::Builder) internally use strings which are either single-
143 byte (ISO-8859-1/Latin-1) or multibyte UTF-8 encoded (there is an
144 internal flag marking the string as UTF-8 or not). If you work
145 strictly in ASCII or Latin-1 or CP-1252 (each a superset of the
146 previous), you should be OK in not doing anything special about your
147 string encoding. You can just use the default Perl single byte strings
148 (internally marked as not UTF-8) and the default output encoding
149 (WinAnsiEncoding).
150
151 If you intend to use input from a variety of sources, you should
152 consider decoding (converting) your text to UTF-8, which will provide
153 an internally consistent representation (and your Perl code itself
154 should be saved in UTF-8, in case you want to use any hard coded non-
155 ASCII characters). In any string, non-ASCII characters (0x80 or higher)
156 would be converted to the Perl UTF-8 internal representation, via
157 "$string = Encode::decode(MY_ENCODING, $input);". "MY_ENCODING" would
158 be a string like 'latin1', 'cp-1252', 'utf8', etc. Similar capabilities
159 are available for declaring a file to be in a certain encoding.
160
161 Be aware that if you use UTF-8 encoding for your text, that only
162 TrueType font output ("ttfont") can handle it directly. Corefont and
163 Type1 output will require that the text will have to be converted back
164 into a single-byte encoding (using "Encode::encode"), which may need to
165 be declared with "encode" (for "corefont" or "psfont"). If you have any
166 characters not found in the selected single-byte encoding (but are
167 found in the font itself), you will need to use "automap" to break up
168 the font glyphs into 256 character planes, map such characters to 0x00
169 .. 0xFF in the appropriate plane, and switch between font planes as
170 necessary.
171
172 Core and Type1 fonts (output) use the byte values in the string
173 (single-byte encoding only!) and provide a byte-to-glyph mapping record
174 for each plane. TrueType outputs a group of four hexadecimal digits
175 representing the "CId" (character ID) of each character. The CId does
176 not correspond to either the single-byte or UTF-8 internal
177 representations of the characters.
178
179 The bottom line is that you need to know what the internal
180 representation of your text is, so that the output routines can tell
181 the PDF reader about it (via the PDF file). The text will not be
182 translated upon output, but the PDF reader needs to know what the
183 encoding in use is, so it knows what glyph to associate with each byte
184 (or byte sequence).
185
186 Note that some operating systems and Perl flavors are reputed to be
187 strict about encoding names. For example, latin1 (an alias) may be
188 rejected as invalid, while iso-8859-1 (a canonical value) will work.
189
190 By the way, it is recommended that you be using at least Perl 5.10 if
191 you are going to be using any non-ASCII characters. Perl 5.8 may be a
192 little unpredictable in handling such text.
193
194 Rendering Order
195 For better or worse, for compatibility purposes, PDF::Builder continues
196 the same rendering model as used by PDF::API2 (and possibly its
197 predecessors). That is, all graphics for one graphics object are put
198 into one record, and all text output for one text object goes into
199 another record. Which one is output first, is whichever is declared
200 first. This can lead to unexpected results, where items are rendered in
201 (apparently) the wrong order. That is, text and graphics items are not
202 necessarily output (rendered) in the same order as they were created in
203 code. Two items in the same object (e.g., $text) will be rendered in
204 the same order as they were coded, but items from different objects may
205 not be rendered in the expected order. The following example (source
206 code and annotated PDF excerpts) will hopefully illustrate the issue:
207
208 use strict;
209 use warnings;
210 use PDF::Builder;
211
212 # demonstrate text and graphics object order
213 #
214 my $fname = "objorder";
215
216 my $paper_size = "Letter";
217
218 # see the text and graphics stream contents
219 my $pdf = PDF::Builder->new(compress => 'none');
220 $pdf->mediabox($paper_size);
221 my $page = $pdf->page();
222 # adjust path for your operating system
223 my $fontTR = $pdf->ttfont('C:\\Windows\\Fonts\\timesbd.ttf');
224
225 For the first group, you might expect the "under" line to be output,
226 then the filled circle (disc) partly covering it, then the "over" line
227 covering the disc, and finally a filled rectangle (bar) over both
228 lines. What actually happened is that the $grfx graphics object was
229 declared first, so everything in that object (the disc and bar) is
230 output first, and the text object $text (both lines) comes afterwards.
231 The result is that the text lines are on top of the graphics drawings.
232
233 # ----------------------------
234 # 1. text, orange ball over, text over, bar over
235
236 my $grfx1 = $page->gfx();
237 my $text1 = $page->text();
238 $text1->font($fontTR, 20); # 20 pt Times Roman bold
239
240 $text1->fillcolor('black');
241 $grfx1->strokecolor('blue');
242 $grfx1->fillcolor('orange');
243
244 $text1->translate(50,700);
245 $text1->text_left("This text should be under everything.");
246
247 $grfx1->circle(100,690, 30);
248 $grfx1->fillstroke();
249
250 $text1->translate(50,670);
251 $text1->text_left("This text should be over the ball and under the bar.");
252
253 $grfx1->rect(160,660, 20,70);
254 $grfx1->fillstroke();
255
256 % ---------------- group 1: define graphics object first, then text
257 11 0 obj << /Length 690 >> stream % obj 11 is graphics for (1)
258 0 0 1 RG % stroke blue
259 1 0.647059 0 rg % fill orange
260 130 690 m ... c h B % draw and fill circle
261 160 660 20 70 re B % draw and fill bar
262 endstream endobj
263
264 12 0 obj << /Length 438 >> stream % obj 12 is text for (1)
265 BT
266 /TiCBA 20 Tf % Times Roman Bold 20pt
267 0 0 0 rg % fill black
268 1 0 0 1 50 700 Tm % position text
269 <0037 ... 0011> Tj % "under" line
270 1 0 0 1 50 670 Tm % position text
271 <0037 ... 0011> Tj % "over" line
272 ET
273 endstream endobj
274
275 The second group is the same as the first, with the only difference
276 being that the text object was declared first, and then the graphics
277 object. The result is that the two text lines are rendered first, and
278 then the disc and bar are drawn over them.
279
280 # ----------------------------
281 # 2. (1) again, with graphics and text order reversed
282
283 my $text2 = $page->text();
284 my $grfx2 = $page->gfx();
285 $text2->font($fontTR, 20); # 20 pt Times Roman bold
286
287 $text2->fillcolor('black');
288 $grfx2->strokecolor('blue');
289 $grfx2->fillcolor('orange');
290
291 $text2->translate(50,600);
292 $text2->text_left("This text should be under everything.");
293
294 $grfx2->circle(100,590, 30);
295 $grfx2->fillstroke();
296
297 $text2->translate(50,570);
298 $text2->text_left("This text should be over the ball and under the bar.");
299
300 $grfx2->rect(160,560, 20,70);
301 $grfx2->fillstroke();
302
303 % ---------------- group 2: define text object first, then graphics
304 13 0 obj << /Length 438 >> stream % obj 13 is text for (2)
305 BT
306 /TiCBA 20 Tf % Times Roman Bold 20pt
307 0 0 0 rg % fill black
308 1 0 0 1 50 600 Tm % position text
309 <0037 ... 0011> Tj % "under" line
310 1 0 0 1 50 570 Tm % position text
311 <0037 ... 0011> Tj % "over" line
312 ET
313 endstream endobj
314
315 14 0 obj << /Length 690 >> stream % obj 14 is graphics for (2)
316 0 0 1 RG % stroke blue
317 1 0.647059 0 rg % fill orange
318 130 590 m ... h B % draw and fill circle
319 160 560 20 70 re B % draw and fill bar
320 endstream endobj
321
322 The third group defines two text and two graphics objects, in the order
323 that they are expected in. The "under" text line is output first, then
324 the orange disc graphics is output, partly covering the text. The
325 "over" text line is now output -- it's actually over the disc, but is
326 orange because the previous object stream (first graphics object) left
327 the fill color (also used for text) as orange, because we didn't
328 explicitly set the fill color before outputting the second text line.
329 This is not "inheritance" so much as it is whatever the graphics
330 (drawing) state (used for both "graphics" and "text") is left in at the
331 end of one object, it's the state at the beginning of the next object.
332 If you wish to control this, consider surrounding the graphics or text
333 calls with save() and restore() calls to save and restore (push and
334 pop) the graphics state to what it was at the save(). Finally, the bar
335 is drawn over everything.
336
337 # ----------------------------
338 # 3. (2) again, with two graphics and two text objects
339
340 my $text3 = $page->text();
341 my $grfx3 = $page->gfx();
342 $text3->font($fontTR, 20); # 20 pt Times Roman bold
343 my $text4 = $page->text();
344 my $grfx4 = $page->gfx();
345 $text4->font($fontTR, 20); # 20 pt Times Roman bold
346
347 $text3->fillcolor('black');
348 $grfx3->strokecolor('blue');
349 $grfx3->fillcolor('orange');
350 # $text4->fillcolor('yellow');
351 # $grfx4->strokecolor('red');
352 # $grfx4->fillcolor('purple');
353
354 $text3->translate(50,500);
355 $text3->text_left("This text should be under everything.");
356
357 $grfx3->circle(100,490, 30);
358 $grfx3->fillstroke();
359
360 $text4->translate(50,470);
361 $text4->text_left("This text should be over the ball and under the bar.");
362
363 $grfx4->rect(160,460, 20,70);
364 $grfx4->fillstroke();
365
366 % ---------------- group 3: define text1, graphics1, text2, graphics2
367 15 0 obj << /Length 206 >> stream % obj 15 is text1 for (3)
368 BT
369 /TiCBA 20 Tf % Times Roman Bold 20pt
370 0 0 0 rg % fill black
371 1 0 0 1 50 500 Tm % position text
372 <0037 ... 0011> Tj % "under" line
373 ET
374 endstream endobj
375
376 16 0 obj << /Length 671 >> stream % obj 16 is graphics1 for (3) circle
377 0 0 1 RG % stroke blue
378 1 0.647059 0 rg % fill orange
379 130 490 m ... h B % draw and fill circle
380 endstream endobj
381
382 17 0 obj << /Length 257 >> stream % obj 17 is text2 for (3)
383 BT
384 /TiCBA 20 Tf % Times Roman Bold 20pt
385 1 0 0 1 50 470 Tm % position text
386 <0037 ... 0011> Tj % "over" line
387 ET
388 endstream endobj
389
390 18 0 obj << /Length 20 >> stream % obj 18 is graphics for (3) bar
391 160 460 20 70 re B % draw and fill bar
392 endstream endobj
393
394 The fourth group is the same as the third, except that we define the
395 fill color for the text in the second line. This makes it clear that
396 the "over" line (in yellow) was written after the orange disc, and
397 still before the bar.
398
399 # ----------------------------
400 # 4. (3) again, a new set of colors for second group
401
402 my $text3 = $page->text();
403 my $grfx3 = $page->gfx();
404 $text3->font($fontTR, 20); # 20 pt Times Roman bold
405 my $text4 = $page->text();
406 my $grfx4 = $page->gfx();
407 $text4->font($fontTR, 20); # 20 pt Times Roman bold
408
409 $text3->fillcolor('black');
410 $grfx3->strokecolor('blue');
411 $grfx3->fillcolor('orange');
412 $text4->fillcolor('yellow');
413 $grfx4->strokecolor('red');
414 $grfx4->fillcolor('purple');
415
416 $text3->translate(50,400);
417 $text3->text_left("This text should be under everything.");
418
419 $grfx3->circle(100,390, 30);
420 $grfx3->fillstroke();
421
422 $text4->translate(50,370);
423 $text4->text_left("This text should be over the ball and under the bar.");
424
425 $grfx4->rect(160,360, 20,70);
426 $grfx4->fillstroke();
427
428 % ---------------- group 4: define text1, graphics1, text2, graphics2 with colors for 2
429 19 0 obj << /Length 206 >> stream % obj 19 is text1 for (4)
430 BT
431 /TiCBA 20 Tf % Times Roman Bold 20pt
432 0 0 0 rg % fill black
433 1 0 0 1 50 400 Tm % position text
434 <0037 ... 0011> Tj % "under" line
435 ET
436 endstream endobj
437
438 20 0 obj << /Length 671 >> stream % obj 20 is graphics1 for (4) circle
439 0 0 1 RG % stroke blue
440 1 0.647059 0 rg % fill orange
441 130 390 m ... h B % draw and fill circle
442 endstream endobj
443
444 21 0 obj << /Length 266 >> stream % obj 21 is text2 for (4)
445 BT
446 /TiCBA 20 Tf % Times Roman Bold 20pt
447 1 1 0 rg % fill yellow
448 1 0 0 1 50 370 Tm % position text
449 <0037 ... 0011> Tj % "over" line
450 ET
451 endstream endobj
452
453 22 0 obj << /Length 52 >> stream % obj 22 is graphics for (4) bar
454 1 0 0 RG % stroke red
455 0.498039 0 0.498039 rg % fill purple
456 160 360 20 70 re B % draw and fill rectangle (bar)
457 endstream endobj
458
459 # ----------------------------
460 $pdf->saveas("$fname.pdf");
461
462 The separation of text and graphics means that only some text methods
463 are available in a graphics object, and only some graphics methods are
464 available in a text object. There is much overlap, but they differ.
465 There's really no reason the code couldn't have been written (in
466 PDF::API2, or earlier) as outputting to a single object, which would
467 keep everything in the same order as the method calls. An advantage
468 would be less object and stream overhead in the PDF file. The only
469 drawback might be that an object might more easily overflow and require
470 splitting into multiple objects, but that should be rare.
471
472 You should always be able to manually split an object by simply ending
473 output to the first object, and picking up with output to the second
474 object, so long as it was created immediately after the first object.
475 The graphics state at the end of the first object should be the initial
476 state at the beginning of the second object. However, use caution when
477 dealing with text objects -- the PDF specification states that the Text
478 matrices are not carried over from one object to the next (BT resets
479 them), so you may need to reset some settings.
480
481 $grfx1 = $page->gfx();
482 $grfx2 = $page->gfx();
483 # write a huge amount of stuff to $grfx1
484 # write a huge amount of stuff to $grfx2, picking up where $grfx1 left off
485
486 In any case, now that you understand the rendering order and how the
487 order of object declarations affects it, how text and graphics are
488 drawn can now be completely controlled as desired. There is really no
489 need to add another "both" type object that will handle all graphics
490 and text objects, as that would probably be a major code bloat for very
491 little benefit. However, it could be considered in the future if there
492 is a demonstrated need for it, such as serious PDF file size bloat due
493 to the extra object overhead when interleaving text and graphics
494 output.
495
496 There is not currently a general facility for mixed-use objects, but a
497 limited example is the current implementation of underline, line-
498 through, and overline text (within column() markup); which are
499 performed within the text object, temporarily exiting (ET) to graphics
500 mode to draw the lines, and then returning (BT) to text mode. This was
501 done so that baseline coordinate adjustments could be easily made.
502 Since "BT" resets some text settings, this needs to be done with care!
503
504 PDF Versions Supported
505 When creating a PDF file using the functions in PDF::Builder, the
506 output is marked as PDF 1.4. This does not mean that all PDF
507 functionality up through 1.4 is supported! There are almost surely
508 features missing as far back as the PDF 1.0 standard.
509
510 The big problem is when a PDF of version 1.5 or higher is imported or
511 opened in PDF::Builder. If it contains content that is actually
512 unsupported by this software, there is a chance that something will
513 break. This does not guarantee that a PDF marked as "1.7" will go down
514 in flames when read by PDF::Builder, or that a PDF written back out
515 will break in a Reader, but the possibility is there. Much PDF writer
516 software simply marks its output as the highest version of PDF at the
517 time (usually 1.7), even if there is no content beyond, say, 1.2.
518 There is some handling of PDF 1.5 items in PDF::Builder, such as cross
519 reference streams, but support beyond 1.4 is very limited. All we can
520 say is to be careful when handling PDFs whose version is above 1.4, and
521 test thoroughly, as they may break at some point.
522
523 PDF::Builder includes a simple version control mechanism, where the
524 initial PDF version to be output (default 1.4) can be set by the
525 programmer. Input PDFs greater than 1.4 (current output level) will
526 receive a warning (can be suppressed) that the output level will be
527 raised to that level. The use of PDF features greater than the current
528 output level will likewise trigger a warning that the output level is
529 to be raised to the necessary level. If this is not desired, you should
530 avoid using those PDF features which are higher than the desired PDF
531 output level.
532
533 History
534 PDF::API2 was originally written by Alfred Reibenschuh, derived from
535 Martin Hosken's Text::PDF via the Text::PDF::API wrapper. In 2009,
536 Otto Hirr started the PDF::API3 fork, but it never went anywhere. In
537 2011, PDF::API2 maintenance was taken over by Steve Simms. In 2017,
538 PDF::Builder was forked by Phil M. Perry, who desired a more aggressive
539 schedule of new features and bug fixes than Simms was providing,
540 although some of Simms's work has been ported from PDF::API2.
541
542 According to "pdfapi2_for_fun_and_profit_APW2005.pdf" (on
543 http://pdfapi2.sourceforge.net, an unmaintained site), the history of
544 PDF::API2 (the predecessor to PDF::Builder) goes as such:
545
546 • First Code implemented based on PDFlib-0.6 (AFPL)
547 • Changed to Text::PDF with a total rewrite as Text::PDF::API
548 (procedural)
549 • Unmaintainable Code triggered rewrite into new Namespace
550 PDF::API2 (object-oriented, LGPL)
551 • Object-Structure streamlined in 0.4x
552
553 At Simms's request, the name of the new offering was changed from
554 PDF::API4 to PDF::Builder, to reduce the chance of confusion due to
555 parallel development. Perry's intent is to keep all internal methods
556 as upwardly compatible with PDF::API2 as possible, although it is
557 likely that there will be some drift (incompatibilities) over time. At
558 least initially, any program written based on PDF::API2 should be
559 convertible to PDF::Builder simply by changing "API2" anywhere it
560 occurs to "Builder". See the INFO/KNOWN_INCOMP known incompatibilities
561 file for further information.
562
563 Thanks...
564
565 Many users have helped out by reporting bugs and requesting
566 enhancements. A special shout out goes to those who have contributed
567 code and tests, or coordinated their package development with the needs
568 of PDF::Builder: Ben Bullock, Cary Gravel, Gregor Herrmann, Petr Pisar,
569 Jeffrey Ratcliffe, Steve Simms (via PDF::API2 fixes), and Johan
570 Vromans. Drop me a line if I've overlooked your contribution!
571
573 Note: older versions of this package named various (hash element)
574 options with leading dashes (hyphens) in the name, e.g., '-encode'. The
575 use of a dash is now optional, and options are documented with names
576 not using dashes. At some point in the future, it is possible that
577 support for dashed names will be deprecated (and eventually withdrawn),
578 so it would be good practice to start using undashed names in new and
579 revised code.
580
581 After saving a file...
582 Note that a PDF object such as $pdf cannot continue to be used after
583 saving an output PDF file or string with $pdf->save(), saveas(), or
584 stringify(). There is some cleanup and other operations done internally
585 which make the object unusable for further operations. You will likely
586 receive an error message about can't call method new_obj on an
587 undefined value if you try to keep using a PDF object.
588
589 IntegrityCheck
590 The PDF::Builder methods that open an existing PDF file, pass it by the
591 integrity checker method, "$self->IntegrityCheck(level, content)". This
592 method servers two purposes: 1) to find any "/Version" settings that
593 override the PDF version found in the PDF heading, and 2) perform some
594 basic validations on the contents of the PDF.
595
596 The "level" parameter accepts the following values:
597
598 0 = Do not output any diagnostic messages; just return any version
599 override.
600 1 = Output error-level (serious) diagnostic messages, as well as
601 returning any version override.
602 Errors include, in no place was the /Root object specified, or if
603 it was, the indicated object was not found. An object claims
604 another object as its child (/Kids list), but another object has
605 already claimed that child. An object claims a child, but that
606 child does not list a Parent, or the child lists a different
607 Parent.
608
609 2 = Output error- (serious) and warning- (less serious) level
610 diagnostic messages, as well as returning any version override. This is
611 the default.
612 3 = Output error- (serious), warning- (less serious), and note-
613 (informational) level diagnostic messages, as well as returning any
614 version override.
615 Notes include, in no place was the (optional) /Info object
616 specified, or if it was, the indicated object was not found. An
617 object was referenced, but no entry for it was found among the
618 objects. (This may be OK if the object is not defined, or is on the
619 free list, as the reference will then be ignored.) An object is
620 defined, but it appears that no other object is referencing it.
621
622 4 = Output error-, warning-, and note-level diagnostic messages, as
623 well as returning any version override. Also dump the diagnostic data
624 structure.
625 5 = Output error-, warning-, and note-level diagnostic messages, as
626 well as returning any version override. Also dump the diagnostic data
627 structure and the $self data structure (generally useful only if you
628 have already read in the PDF file).
629
630 The version is a string (e.g., '1.5') if found, otherwise "undef"
631 (undefined value) is returned.
632
633 For controlling the "automatic" call to IntegrityCheck (via opens), the
634 level may be given with the option (flag) "diaglevel => n", where "n"
635 is between 0 and 5.
636
637 Preferences - set user display preferences
638 $pdf->preferences(%options)
639 Controls viewing preferences for the PDF.
640
641 Page Mode Options
642
643 fullscreen
644 Full-screen mode, with no menu bar, window controls, or any
645 other window visible.
646
647 thumbs
648 Thumbnail images visible.
649
650 outlines
651 Document outline visible.
652
653 Page Layout Options
654
655 singlepage
656 Display one page at a time.
657
658 onecolumn
659 Display the pages in one column.
660
661 twocolumnleft
662 Display the pages in two columns, with oddnumbered pages on the
663 left.
664
665 twocolumnright
666 Display the pages in two columns, with oddnumbered pages on the
667 right.
668
669 Viewer Options
670
671 hidetoolbar
672 Specifying whether to hide tool bars.
673
674 hidemenubar
675 Specifying whether to hide menu bars.
676
677 hidewindowui
678 Specifying whether to hide user interface elements.
679
680 fitwindow
681 Specifying whether to resize the document's window to the size
682 of the displayed page.
683
684 centerwindow
685 Specifying whether to position the document's window in the
686 center of the screen.
687
688 displaytitle
689 Specifying whether the window's title bar should display the
690 document title taken from the Title entry of the document
691 information dictionary.
692
693 afterfullscreenthumbs
694 Thumbnail images visible after Full-screen mode.
695
696 afterfullscreenoutlines
697 Document outline visible after Full-screen mode.
698
699 printscalingnone
700 Set the default print setting for page scaling to none.
701
702 simplex
703 Print single-sided by default.
704
705 duplexflipshortedge
706 Print duplex by default and flip on the short edge of the
707 sheet.
708
709 duplexfliplongedge
710 Print duplex by default and flip on the long edge of the sheet.
711
712 Page Fit Options
713
714 These options are used for the "firstpage" layout, as well as for
715 Annotations, Named Destinations and Outlines.
716
717 'fit' => 1
718 Display the page designated by $page, with its contents magnified
719 just enough to fit the entire page within the window both
720 horizontally and vertically. If the required horizontal and
721 vertical magnification factors are different, use the smaller of
722 the two, centering the page within the window in the other
723 dimension.
724
725 'fith' => $top
726 Display the page designated by $page, with the vertical coordinate
727 $top positioned at the top edge of the window and the contents of
728 the page magnified just enough to fit the entire width of the page
729 within the window.
730
731 'fitv' => $left
732 Display the page designated by $page, with the horizontal
733 coordinate $left positioned at the left edge of the window and the
734 contents of the page magnified just enough to fit the entire height
735 of the page within the window.
736
737 'fitr' => [ $left, $bottom, $right, $top ]
738 Display the page designated by $page, with its contents magnified
739 just enough to fit the rectangle specified by the coordinates
740 $left, $bottom, $right, and $top entirely within the window both
741 horizontally and vertically. If the required horizontal and
742 vertical magnification factors are different, use the smaller of
743 the two, centering the rectangle within the window in the other
744 dimension.
745
746 'fitb' => 1
747 Display the page designated by $page, with its contents magnified
748 just enough to fit its bounding box entirely within the window both
749 horizontally and vertically. If the required horizontal and
750 vertical magnification factors are different, use the smaller of
751 the two, centering the bounding box within the window in the other
752 dimension.
753
754 'fitbh' => $top
755 Display the page designated by $page, with the vertical coordinate
756 $top positioned at the top edge of the window and the contents of
757 the page magnified just enough to fit the entire width of its
758 bounding box within the window.
759
760 'fitbv' => $left
761 Display the page designated by $page, with the horizontal
762 coordinate $left positioned at the left edge of the window and the
763 contents of the page magnified just enough to fit the entire height
764 of its bounding box within the window.
765
766 'xyz' => [ $left, $top, $zoom ]
767 Display the page designated by $page, with the coordinates
768 "$[$left, $top]" positioned at the top-left corner of the window
769 and the contents of the page magnified by the factor $zoom. A zero
770 (0) value for any of the parameters $left, $top, or $zoom specifies
771 that the current value of that parameter is to be retained
772 unchanged.
773
774 Initial Page Options
775
776 firstpage => [ $page, %options ]
777 Specifying the page (either a page number or a page object) to be
778 displayed, plus one of the location options listed above in "Page
779 Fit Options".
780
781 Example
782
783 $pdf->preferences(
784 fullscreen => 1,
785 onecolumn => 1,
786 afterfullscreenoutlines => 1,
787 firstpage => [$page, fit => 1],
788 );
789
790 info Example
791 %h = $pdf->info(
792 'Author' => "Alfred Reibenschuh",
793 'CreationDate' => "D:20020911000000+01'00'",
794 'ModDate' => "D:YYYYMMDDhhmmssOHH'mm'",
795 'Creator' => "fredos-script.pl",
796 'Producer' => "PDF::Builder",
797 'Title' => "some Publication",
798 'Subject' => "perl ?",
799 'Keywords' => "all good things are pdf"
800 );
801 print "Author: $h{'Author'}\n";
802
803 XMP XML example
804 $xml = $pdf->xmpMetadata();
805 print "PDFs Metadata reads: $xml\n";
806 $xml=<<EOT;
807 <?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
808 <?adobe-xap-filters esc="CRLF"?>
809 <x:xmpmeta
810 xmlns:x='adobe:ns:meta/'
811 x:xmptk='XMP toolkit 2.9.1-14, framework 1.6'>
812 <rdf:RDF
813 xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
814 xmlns:iX='http://ns.adobe.com/iX/1.0/'>
815 <rdf:Description
816 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
817 xmlns:pdf='http://ns.adobe.com/pdf/1.3/'
818 pdf:Producer='Acrobat Distiller 6.0.1 for Macintosh'></rdf:Description>
819 <rdf:Description
820 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
821 xmlns:xap='http://ns.adobe.com/xap/1.0/'
822 xap:CreateDate='2004-11-14T08:41:16Z'
823 xap:ModifyDate='2004-11-14T16:38:50-08:00'
824 xap:CreatorTool='FrameMaker 7.0'
825 xap:MetadataDate='2004-11-14T16:38:50-08:00'></rdf:Description>
826 <rdf:Description
827 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
828 xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/'
829 xapMM:DocumentID='uuid:919b9378-369c-11d9-a2b5-000393c97fd8'/></rdf:Description>
830 <rdf:Description
831 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
832 xmlns:dc='http://purl.org/dc/elements/1.1/'
833 dc:format='application/pdf'>
834 <dc:description>
835 <rdf:Alt>
836 <rdf:li xml:lang='x-default'>Adobe Portable Document Format (PDF)</rdf:li>
837 </rdf:Alt>
838 </dc:description>
839 <dc:creator>
840 <rdf:Seq>
841 <rdf:li>Adobe Systems Incorporated</rdf:li>
842 </rdf:Seq>
843 </dc:creator>
844 <dc:title>
845 <rdf:Alt>
846 <rdf:li xml:lang='x-default'>PDF Reference, version 1.6</rdf:li>
847 </rdf:Alt>
848 </dc:title>
849 </rdf:Description>
850 </rdf:RDF>
851 </x:xmpmeta>
852 <?xpacket end='w'?>
853 EOT
854
855 $xml = $pdf->xmpMetadata($xml);
856 print "PDF metadata now reads: $xml\n";
857
858 "BOX" METHODS
859 A general note: Use care if specifying a different Media Box (or other
860 "box") for a page, than the global "box" setting, to define the whole
861 "chain" of boxes on the page, to avoid surprises. For example, to
862 define a global Media Box (paper size) and a global Crop Box, and then
863 define a new page-level Media Box without defining a new page-level
864 Crop Box, may give odd results in the resultant cropping. Such
865 combinations are not well defined.
866
867 All dimensions in boxes default to the default User Unit, which is
868 points (1/72 inch). Note that the PDF specification limits sizes and
869 coordinates to 14400 User Units (200 inches, for the default User Unit
870 of one point), and Adobe products (so far) follow this limit for
871 Acrobat and Distiller. It is worth noting that other PDF writers and
872 readers may choose to ignore the 14400 unit limit, with or without the
873 use of a specified User Unit. Therefore, PDF::Builder does not enforce
874 any limits on coordinates -- it's your responsibility to consider what
875 readers and other PDF tools may be used with a PDF you produce! Also
876 note that earlier Acrobat readers had coordinate limits as small as
877 3240 User Units (45 inches), and minimum media size of 72 or 3 User
878 Units.
879
880 User Units
881
882 $pdf->userunit($number)
883 The default User Unit in the PDF coordinate system is one point
884 (1/72 inch). You can think of it as a scale factor to enable larger
885 (or even, smaller) documents. This method may be used (for PDF 1.6
886 and higher) to set the User Unit to some number of points. For
887 example, userunit(72) will set the scale multiplier to 72.0 points
888 per User Unit, or 1 inch to the User Unit. Any number greater than
889 zero is acceptable, although some readers and tools may not handle
890 User Units of less than 1.0 very well.
891
892 Not all readers respect the User Unit, if you give one, or handle
893 it in exactly the same way. Adobe Distiller, for one, does not use
894 it. How User Units are handled may vary from reader to reader.
895 Adobe Acrobat, at this writing, respects User Unit in version 7.0
896 and up, but limits it to 75000 (giving a maximum document size of
897 15 million inches or 236.7 miles or 381 km). Other readers and PDF
898 tools may allow a larger (or smaller) limit.
899
900 Your Mileage May Vary: Some readers ignore a global User Unit
901 setting and do not have pages inherit it (PDF::Builder duplicates
902 it on each page to simulate inheritance). Some readers may give
903 spurious warnings about truncated content when a Media Box is
904 changed while User Units are being used. Some readers do strange
905 things with Crop Boxes when a User Unit is in effect.
906
907 Depending on the reader used, the effect of a larger User Unit
908 (greater than 1) may mean lower resolution (chunkier or coarser
909 appearance) in the rendered document. If you're printing something
910 the size of a highway billboard, this may not matter to you, but
911 you should be aware of the possibility (even with fractional
912 coordinates). Conversely, a User Unit of less than 1.0 (if
913 permitted) reduces the allowable size of your document, but may
914 result in greater resolution.
915
916 A global (PDF level) User Unit setting is inherited by each page
917 (an action by PDF::Builder, not necessarily automatically done by
918 the reader), or can be overridden by calling userunit in the page.
919 Do not give more than one global userunit setting, as only the last
920 one will be used. Setting a page's User Unit (if "$page->"
921 instead) is permitted (overriding the global setting for this
922 page). However, many sources recommend against doing this, as
923 results may not be as expected (once again, depending on the quirks
924 of the reader).
925
926 Remember to call "userunit" before calling anything having to do
927 with page or box sizes, or coordinates. Especially when setting
928 'named' box sizes, the methods need to know the current User Unit
929 so that named page sizes (in points) may be scaled down to the
930 current User Unit.
931
932 Media Box
933
934 $pdf->mediabox($name)
935 $pdf->mediabox($name, orient => 'orientation' )
936 $pdf->mediabox($w,$h)
937 $pdf->mediabox($llx,$lly, $urx,$ury)
938 ($llx,$lly, $urx,$ury) = $pdf->mediabox()
939 Sets the global Media Box (or page's Media Box, if "$page->"
940 instead). This defines the width and height (or by corner
941 coordinates, or by standard name) of the output page itself, such
942 as the physical paper size. This is normally the largest of the
943 "boxes". If any subsidiary box (within it) exceeds the media box,
944 the portion of the material or boxes outside of the Media Box will
945 be ignored. That is, the Media Box is the One Box to Rule Them All,
946 and is the overall limit for other boxes (some documentation refers
947 to the Media Box as "clipping" other boxes). In addition, the Media
948 Box defines the overall coordinate system for text and graphics
949 operations.
950
951 If no arguments are given, the current Media Box (global or page)
952 coordinates are returned instead. The former "get_mediabox" (page
953 only) function is deprecated and will likely be removed some time
954 in the future. In addition, when setting the Media Box, the
955 resulting coordinates are returned. This permits you to specify the
956 page size by a name (alias) and get the dimensions back, all in one
957 call.
958
959 Note that many printers can not print all the way to the physical
960 edge of the paper, so you should plan to leave some blank margin,
961 even outside of any crop marks and bleeds. Printers and on-screen
962 readers are free to discard any content found outside the Media
963 Box, and printers may discard some material just inside the Media
964 Box.
965
966 A global Media Box is required by the PDF spec; if not explicitly
967 given, PDF::Builder will set the global Media Box to US Letter size
968 (8.5in x 11in). This is the media size that will be used for all
969 pages if you do not specify a "mediabox" call on a page. That is, a
970 global (PDF level) mediabox setting is inherited by each page, or
971 can be overridden by setting mediabox in the page. Do not give more
972 than one global mediabox setting, as only the last one will be
973 used.
974
975 If you give a single string name (e.g., 'A4'), you may optionally
976 add an orientation to turn the page 90 degrees into Landscape mode:
977 "orient => 'L'" or "orient => 'l'". "orient" is the only option
978 recognized, and a string beginning with an 'L' or 'l' (for
979 Landscape) is the only value of interest (anything else is treated
980 as Portrait mode). The y axis still runs from 0 at the bottom of
981 the page to what used to be the page width (now, height) at the
982 top, and likewise for the x axis: 0 at left to (former) height at
983 the right. That is, the coordinate system is the same as before,
984 except that the height and width are different.
985
986 The lower left corner does not have to be 0,0. It can be any values
987 you want, including negative values (so long as the resulting
988 media's sides are at least one point long). "mediabox" sets the
989 coordinate system (including the origin) of the graphics and text
990 that will be drawn, as well as for subsequent "boxes". It's even
991 possible to give any two opposite corners (such as upper left and
992 lower right). The coordinate system will be rearranged (by the
993 Reader) to still be the conventional minimum "x" and "y" in the
994 lower left (i.e., you can't make "y" increase from top to bottom!).
995
996 Example:
997
998 $pdf = PDF::Builder->new();
999 $pdf->mediabox('A4'); # A4 size (595 Pt wide by 842 Pt high)
1000 ...
1001 $pdf->saveas('our/new.pdf');
1002
1003 $pdf = PDF::Builder->new();
1004 $pdf->mediabox(595, 842); # A4 size, with implicit 0,0 LL corner
1005 ...
1006 $pdf->saveas('our/new.pdf');
1007
1008 $pdf = PDF::Builder->new;
1009 $pdf->mediabox(0, 0, 595, 842); # A4 size, with explicit 0,0 LL corner
1010 ...
1011 $pdf->saveas('our/new.pdf');
1012
1013 See the PDF::Builder::Resource::PaperSizes source code for the full
1014 list of supported names (aliases) and their dimensions in points.
1015 You are free to add additional paper sizes to this file, if you
1016 wish. You might want to do this if you frequently use a standard
1017 page size in rotated (Landscape) mode. See also the "getPaperSizes"
1018 call in PDF::Builder::Util. These names (aliases) are also usable
1019 in other "box" calls, although useful only if the "box" is the same
1020 size as the full media (Media Box), and you don't mind their
1021 starting at 0,0.
1022
1023 Crop Box
1024
1025 $pdf->cropbox($name)
1026 $pdf->cropbox($name, orient => 'orientation')
1027 $pdf->cropbox($w,$h)
1028 $pdf->cropbox($llx,$lly, $urx,$ury)
1029 ($llx,$lly, $urx,$ury) = $pdf->cropbox()
1030 Sets the global Crop Box (or page's Crop Box, if "$page->"
1031 instead). This will define the media size to which the output will
1032 later be clipped. Note that this does not itself output any crop
1033 marks to guide cutting of the paper! PDF Readers should consider
1034 this to be the visible portion of the page, and anything found
1035 outside it may be clipped (invisible). By default, it is equal to
1036 the Media Box, but may be defined to be smaller, in the coordinate
1037 system set by the Media Box. A global setting will be inherited by
1038 each page, but can be overridden on a per-page basis.
1039
1040 A Reader or Printer may choose to discard any clipped (invisible)
1041 part of the page, and show only the area within the Crop Box. For
1042 example, if your page Media Box is A4 (0,0 to 595,842 Points), and
1043 your Crop Box is (100,100 to 495,742), a reader such as Adobe
1044 Acrobat Reader may show you a page 395 by 642 Points in size (i.e.,
1045 just the visible area of your page). Other Readers may show you the
1046 full media size (Media Box) and a 100 Point wide blank area (in
1047 this example) around the visible content.
1048
1049 If no arguments are given, the current Crop Box (global or page)
1050 coordinates are returned instead. The former "get_cropbox" (page
1051 only) function is deprecated and will likely be removed some time
1052 in the future. If a Crop Box has not been defined, the Media Box
1053 coordinates (which always exist) will be returned instead. In
1054 addition, when setting the Crop Box, the resulting coordinates are
1055 returned. This permits you to specify the crop box by a name
1056 (alias) and get the dimensions back, all in one call.
1057
1058 Do not confuse the Crop Box with the "Trim Box", which shows where
1059 printed paper is expected to actually be cut. Some PDF Readers may
1060 reduce the visible "paper" background to the size of the crop box;
1061 others may simply omit any content outside it. Either way, you
1062 would lose any trim or crop marks, printer instructions, color
1063 alignment dots, or other content outside the Crop Box. A good use
1064 of the Crop Box would be limit printing to the area where a printer
1065 can reliably put down ink, and leave white the edge areas where
1066 paper-handling mechanisms prevent ink or toner from being applied.
1067 This would keep you from accidentally putting valuable content in
1068 an area where a printer will refuse to print, yet permit you to
1069 include a bleed area and space for printer's marks and
1070 instructions. Needless to say, if your printer cannot print to the
1071 very edge of the paper, you will need to trim (cut) the printed
1072 sheets to get true bleeds.
1073
1074 A global (PDF level) cropbox setting is inherited by each page, or
1075 can be overridden by setting cropbox in the page. As with
1076 "mediabox", only one crop box may be set at this (PDF) level. As
1077 with "mediabox", a named media size may have an orientation (l or
1078 L) for Landscape mode. Note that the PDF level global Crop Box
1079 will be used even if the page gets its own Media Box. That is, the
1080 page's Crop Box inherits the global Crop Box, not the page Media
1081 Box, even if the page has its own media size! If you set the page's
1082 own Media Box, you should consider also explicitly setting the page
1083 Crop Box (and other boxes).
1084
1085 Bleed Box
1086
1087 $pdf->bleedbox($name)
1088 $pdf->bleedbox($name, orient => 'orientation')
1089 $pdf->bleedbox($w,$h)
1090 $pdf->bleedbox($llx,$lly, $urx,$ury)
1091 ($llx,$lly, $urx,$ury) = $pdf->bleedbox()
1092 Sets the global Bleed Box (or page's Bleed Box, if "$page->"
1093 instead). This is typically used in printing on paper, where you
1094 want ink or color (such as thumb tabs) to be printed a bit beyond
1095 the final paper size, to ensure that the cut paper bleeds (the cut
1096 goes through the ink), rather than accidentally leaving some white
1097 paper visible outside. Allow enough "bleed" over the expected trim
1098 line to account for minor variations in paper handling, folding,
1099 and cutting; to avoid showing white paper at the edge. The Bleed
1100 Box is where printing could actually extend to; the Trim Box is
1101 normally within it, where the paper would actually be cut. The
1102 default value is equal to the Crop Box, but is often a bit smaller.
1103 The space between the Bleed Box and the Crop Box is available for
1104 printer instructions, color alignment dots, etc., while crop marks
1105 (trim guides) are at least partly within the bleed area (and should
1106 be printed after content is printed).
1107
1108 If no arguments are given, the current Bleed Box (global or page)
1109 coordinates are returned instead. The former "get_bleedbox" (page
1110 only) function is deprecated and will likely be removed some time
1111 in the future. If a Bleed Box has not been defined, the Crop Box
1112 coordinates (if defined) will be returned, otherwise the Media Box
1113 coordinates (which always exist) will be returned. In addition,
1114 when setting the Bleed Box, the resulting coordinates are returned.
1115 This permits you to specify the bleed box by a name (alias) and get
1116 the dimensions back, all in one call.
1117
1118 A global (PDF level) bleedbox setting is inherited by each page, or
1119 can be overridden by setting bleedbox in the page. As with
1120 "mediabox", only one bleed box may be set at this (PDF) level. As
1121 with "mediabox", a named media size may have an orientation (l or
1122 L) for Landscape mode. Note that the PDF level global Bleed Box
1123 will be used even if the page gets its own Crop Box. That is, the
1124 page's Bleed Box inherits the global Bleed Box, not the page Crop
1125 Box, even if the page has its own media size! If you set the page's
1126 own Media Box or Crop Box, you should consider also explicitly
1127 setting the page Bleed Box (and other boxes).
1128
1129 Trim Box
1130
1131 $pdf->trimbox($name)
1132 $pdf->trimbox($name, orient => 'orientation')
1133 $pdf->trimbox($w,$h)
1134 $pdf->trimbox($llx,$lly, $urx,$ury)
1135 ($llx,$lly, $urx,$ury) = $pdf->trimbox()
1136 Sets the global Trim Box (or page's Trim Box, if "$page->"
1137 instead). This is supposed to be the actual dimensions of the
1138 finished page (after trimming of the paper). In some production
1139 environments, it is useful to have printer's instructions, cut
1140 marks, and so on outside of the trim box. The default value is
1141 equal to Crop Box, but is often a bit smaller than any Bleed Box,
1142 to allow the desired "bleed" effect.
1143
1144 If no arguments are given, the current Trim Box (global or page)
1145 coordinates are returned instead. The former "get_trimbox" (page
1146 only) function is deprecated and will likely be removed some time
1147 in the future. If a Trim Box has not been defined, the Crop Box
1148 coordinates (if defined) will be returned, otherwise the Media Box
1149 coordinates (which always exist) will be returned. In addition,
1150 when setting the Trim Box, the resulting coordinates are returned.
1151 This permits you to specify the trim box by a name (alias) and get
1152 the dimensions back, all in one call.
1153
1154 A global (PDF level) trimbox setting is inherited by each page, or
1155 can be overridden by setting trimbox in the page. As with
1156 "mediabox", only one trim box may be set at this (PDF) level. As
1157 with "mediabox", a named media size may have an orientation (l or
1158 L) for Landscape mode. Note that the PDF level global Trim Box
1159 will be used even if the page gets its own Crop Box. That is, the
1160 page's Trim Box inherits the global Trim Box, not the page Crop
1161 Box, even if the page has its own media size! If you set the page's
1162 own Media Box or Crop Box, you should consider also explicitly
1163 setting the page Trim Box (and other boxes).
1164
1165 Art Box
1166
1167 $pdf->artbox($name)
1168 $pdf->artbox($name, orient => 'orientation')
1169 $pdf->artbox($w,$h)
1170 $pdf->artbox($llx,$lly, $urx,$ury)
1171 ($llx,$lly, $urx,$ury) = $pdf->artbox()
1172 Sets the global Art Box (or page's Art Box, if "$page->" instead).
1173 This is supposed to define "the extent of the page's meaningful
1174 content (including [margins])". It might exclude some content, such
1175 as Headlines or headings. Any binding or punched-holes margin would
1176 typically be outside of the Art Box, as would be page numbers and
1177 running headers and footers. The default value is equal to the Crop
1178 Box, although normally it would be no larger than any Trim Box. The
1179 Art Box may often be used for defining "important" content (e.g.,
1180 excluding advertisements) that may or may not be brought over to
1181 another page (e.g., N-up printing).
1182
1183 If no arguments are given, the current Art Box (global or page)
1184 coordinates are returned instead. The former "get_artbox" (page
1185 only) function is deprecated and will likely be removed some time
1186 in the future. If an Art Box has not been defined, the Crop Box
1187 coordinates (if defined) will be returned, otherwise the Media Box
1188 coordinates (which always exist) will be returned. In addition,
1189 when setting the Art Box, the resulting coordinates are returned.
1190 This permits you to specify the art box by a name (alias) and get
1191 the dimensions back, all in one call.
1192
1193 A global (PDF level) artbox setting is inherited by each page, or
1194 can be overridden by setting artbox in the page. As with
1195 "mediabox", only one art box may be set at this (PDF) level. As
1196 with "mediabox", a named media size may have an orientation (l or
1197 L) for Landscape mode. Note that the PDF level global Art Box will
1198 be used even if the page gets its own Crop Box. That is, the page's
1199 Art Box inherits the global Art Box, not the page Crop Box, even if
1200 the page has its own media size! If you set the page's own Media
1201 Box or Crop Box, you should consider also explicitly setting the
1202 page Art Box (and other boxes).
1203
1204 Suggested Box Usage
1205
1206 See "examples/Boxes.pl" for an example of using boxes.
1207
1208 How you define your boxes (or let them default) is up to you, depending
1209 on whether you're duplex printing US Letter or A4 on your laser
1210 printer, to be spiral bound on the bind margin, or engaging a
1211 professional printer. In the latter case, discuss in advance with the
1212 print firm what capabilities (and limitations) they have and what
1213 information they need from a PDF file. For instance, they may not want
1214 a Crop Box defined, and may call for very specific box sizes. For large
1215 press runs, they may print multiple pages (N-up) duplexed on large web
1216 roll "signatures", which are then intricately folded and guillotined
1217 (trimmed) and bound together into books or magazines. You would usually
1218 just supply a PDF with all the pages; they would take care of the
1219 signature layout (which includes offsets and 180 degree rotations).
1220
1221 (As an aside, don't count on a printer having any particular font
1222 available, so be sure to ask. Usually they will want you to embed all
1223 fonts used, but ask first, and double-check before handing over the
1224 print job! TTF/OTF fonts (ttfont()) are embedded by default, but other
1225 fonts (core, ps, bdf, cjk) are not! A printer may have a core font
1226 collection, but they are free to substitute a "workalike" font for any
1227 given core font, and the results may not match what you saw on your
1228 PC!)
1229
1230 On the assumption that you're using a single sheet (US Letter or A4)
1231 laser or inkjet printer, are you planning to trim each sheet down to a
1232 smaller final size? If so, you can do true bleeds by defining a Trim
1233 Box and a slightly larger Bleed Box. You would print bleeds (all the
1234 way to the finished edge) out to the Bleed Box, but nothing is enforced
1235 about the Bleed Box. At the other end of the spectrum, you would define
1236 the Media Box to be the physical paper size being printed on. Most
1237 printers reserve a little space on the sides (and possibly top and
1238 bottom) for paper handling, so it is often good to define your Crop Box
1239 as the printable area. Remember that the Media Box sets the coordinate
1240 system used, so you still need to avoid going outside the Crop Box with
1241 content (most readers and printers will not show any ink outside of the
1242 Crop Box). Whether or not you define a Crop Box, you're going to almost
1243 always end up with white paper on at least the sides.
1244
1245 For small in-house jobs, you probably won't need color alignment dots
1246 and other such professional instructions and information between the
1247 Bleed Box and the Crop Box, but crop marks for trimming (if used)
1248 should go just outside the Trim Box (partly or wholly within the Bleed
1249 Box), and be drawn after all content. If you're not trimming the paper,
1250 don't try to do any bleed effects (including solid background color
1251 pages/covers), as you will usually have a white edge around the sheet
1252 anyway. Don't count on a PDF document never being physically printed,
1253 and not just displayed (where you can do things like bleed all the way
1254 to the media edge). Finally, for single sheet printing, an Art Box is
1255 probably unnecessary, but if you're combining pages into N-up prints,
1256 or doing other manipulations, it may be useful.
1257
1258 Box Inheritance
1259
1260 What Media, Crop, Bleed, Trim, and Art Boxes a page gets can be a
1261 little complicated. Note that usually, only the Media and Crop Boxes
1262 will have a clear visual effect. The visual effect of the other boxes
1263 (if any) may be very subtle.
1264
1265 First, everything is set at the global (PDF) level. The Media Box is
1266 always defined, and defaults to US Letter (8.5 inches wide by 11 inches
1267 high). The global Crop Box inherits the Media Box, unless explicitly
1268 defined. The Bleed, Trim, and Art Boxes inherit the Crop Box, unless
1269 explicitly defined. A global box should only be defined once, as the
1270 last one defined is the one that will be written to the PDF!
1271
1272 Second, a page inherits the global boxes, for its initial settings. You
1273 may call any of the box set methods ("cropbox", "trimbox", etc.) to
1274 explicitly set (override) any box for this page. Note that setting a
1275 new Media Box for the page does not reset the page's Crop Box -- it
1276 still uses whatever it inherited from the global Crop Box. You would
1277 need to explicitly set the page's Crop Box if you want a different
1278 setting. Likewise, the page's Bleed, Trim, and Art Boxes will not be
1279 reset by a new page Crop Box -- they will still inherit from the global
1280 (PDF) settings.
1281
1282 Third, the page Media Box (the one actually used for output pages),
1283 clips or limits all the other boxes to extend no larger than its size.
1284 For example, if the Media Box is US Letter, and you set a Crop Box of
1285 A4 size, the smaller of the two heights (11 inches) would be effective,
1286 and the smaller of the two widths (8.26 inches, 595 Points) would be
1287 effective. The given dimensions of a box are returned on query (get),
1288 not the effective dimensions clipped by the Media Box.
1289
1290 FONT METHODS
1291 Core Fonts
1292
1293 Core fonts are limited to single byte encodings. You cannot use UTF-8
1294 or other multibyte encodings with core fonts. The default encoding for
1295 the core fonts is WinAnsiEncoding (roughly the CP-1252 superset of
1296 ISO-8859-1). See the "encode" option below to change this encoding.
1297 See "font automap" in PDF::Builder::Resource::Font method for
1298 information on accessing more than 256 glyphs in a font, using planes,
1299 although there is no guarantee that future changes to font files will
1300 permit consistent results.
1301
1302 Note that core fonts use fixed lists of expected glyphs, along with
1303 metrics such as their widths. This may not exactly match up with
1304 whatever local font file is used by the PDF reader. It's usually pretty
1305 close, but many cases have been found where the list of glyphs is
1306 different between the core fonts and various local font files, so be
1307 aware of this.
1308
1309 To allow UTF-8 text and extended glyph counts, you should consider
1310 replacing your use of core fonts with TrueType (.ttf) and OpenType
1311 (.otf) fonts. There are tools, such as FontForge, which can do a fairly
1312 good (though, not perfect) job of converting a Type1 font library to
1313 OTF.
1314
1315 Examples:
1316
1317 $font1 = $pdf->corefont('Times-Roman', encode => 'latin2');
1318 $font2 = $pdf->corefont('Times-Bold');
1319 $font3 = $pdf->corefont('Helvetica');
1320 $font4 = $pdf->corefont('ZapfDingbats');
1321
1322 Valid %options are:
1323
1324 encode
1325 Changes the encoding of the font from its default. Notice that the
1326 encoding (not the entire font's glyph list) is shown in a PDF
1327 object (record), listing 256 glyphs associated with this encoding
1328 (and that are available in this font).
1329
1330 dokern
1331 Enables kerning if data is available.
1332
1333 Notes:
1334
1335 Even though these are called "core" fonts, they are not shipped with
1336 PDF::Builder, but are expected to be found on the machine with the PDF
1337 reader. Most core fonts are installed with a PDF reader, and thus are
1338 not coordinated with PDF::Builder. PDF::Builder does ship with core
1339 font metrics files (width, glyph names, etc.), but these cannot be
1340 guaranteed to be in sync with what the PDF reader has installed!
1341
1342 There are some 14 core fonts (regular, italic, bold, and bold-italic
1343 for Times [serif], Helvetica [sans serif], Courier [fixed pitch]; plus
1344 two symbol fonts) that are supposed to be available on any PDF reader,
1345 although other fonts with very similar metrics are often substituted.
1346 You should not count on any of the 15 Windows core fonts (Bank Gothic,
1347 Georgia, Trebuchet, Verdana, and two more symbol fonts) being present,
1348 especially on Linux, Mac, or other non-Windows platforms. Be aware if
1349 you are producing PDFs to be read on a variety of different systems!
1350
1351 If you want to ensure the widest portability for a PDF document you
1352 produce, you should consider using TTF fonts (instead of core fonts)
1353 and embedding them in the document. This ensures that there will be no
1354 substitutions, that all metrics are known and match the glyphs, UTF-8
1355 encoding can be used, and that the glyphs will be available on the
1356 reader's machine. At least on Windows platforms, most of the fonts are
1357 TTF anyway, which are used behind the scenes for "core" fonts, while
1358 missing most of the capabilities of TTF (now or possibly later in
1359 PDF::Builder) such as embedding, ligatures, UTF-8, etc. The downside
1360 is, obviously, that the resulting PDF file will be larger because it
1361 includes the font(s). There might also be copyright or licensing issues
1362 with the redistribution of font files in this manner (you might want to
1363 check, before widely distributing a PDF document with embedded fonts,
1364 although many do permit the part of the font used, to be embedded.).
1365
1366 See also PDF::Builder::Resource::Font::CoreFont.
1367
1368 PS Fonts
1369
1370 PS (T1) fonts are limited to single byte encodings. You cannot use
1371 UTF-8 or other multibyte encodings with T1 fonts. The default encoding
1372 for the T1 fonts is WinAnsiEncoding (roughly the CP-1252 superset of
1373 ISO-8859-1). See the "encode" option below to change this encoding.
1374 See "font automap" in PDF::Builder::Resource::Font method for
1375 information on accessing more than 256 glyphs in a font, using planes,
1376 although there is no guarantee that future changes to font files will
1377 permit consistent results. Note: many Type1 fonts are limited to 256
1378 glyphs, but some are available with more than 256 glyphs. Still, a
1379 maximum of 256 at a time are usable.
1380
1381 "psfont" accepts both ASCII (.pfa) and binary (.pfb) Type1 glyph files.
1382 Font metrics can be supplied in either ASCII (.afm) or binary (.pfm)
1383 format, as can be seen in the examples given below. It is possible to
1384 use .pfa with .pfm and .pfb with .afm if that's what's available. The
1385 ASCII and binary files have the same content, just in different
1386 formats.
1387
1388 To allow UTF-8 text and extended glyph counts in one font, you should
1389 consider replacing your use of Type1 fonts with TrueType (.ttf) and
1390 OpenType (.otf) fonts. There are tools, such as FontForge, which can do
1391 a fairly good (though, not perfect) job of converting your font library
1392 to OTF.
1393
1394 Examples:
1395
1396 $font1 = $pdf->psfont('Times-Book.pfa', afmfile => 'Times-Book.afm');
1397 $font2 = $pdf->psfont('/fonts/Synest-FB.pfb', pfmfile => '/fonts/Synest-FB.pfm');
1398
1399 Valid %options are:
1400
1401 encode
1402 Changes the encoding of the font from its default. Notice that the
1403 encoding (not the entire font's glyph list) is shown in a PDF
1404 object (record), listing 256 glyphs associated with this encoding
1405 (and that are available in this font).
1406
1407 afmfile
1408 Specifies the location of the ASCII font metrics file (.afm). It
1409 may be used with either an ASCII (.pfa) or binary (.pfb) glyph
1410 file.
1411
1412 pfmfile
1413 Specifies the location of the binary font metrics file (.pfm). It
1414 may be used with either an ASCII (.pfa) or binary (.pfb) glyph
1415 file.
1416
1417 dokern
1418 Enables kerning if data is available.
1419
1420 Note: these T1 (Type1) fonts are not shipped with PDF::Builder, but are
1421 expected to be found on the machine with the PDF reader. Most PDF
1422 readers do not install T1 fonts, and it is up to the user of the PDF
1423 reader to install the needed fonts. Unlike TrueType fonts, PS (T1)
1424 fonts are not embedded in the PDF, and must be supplied on the Reader
1425 end.
1426
1427 See also PDF::Builder::Resource::Font::Postscript.
1428
1429 TrueType Fonts
1430
1431 Warning: BaseEncoding is not set by default for TrueType fonts, so text
1432 in the PDF isn't searchable (by the PDF reader) unless a ToUnicode CMap
1433 is included. A ToUnicode CMap is included by default (unicodemap set to
1434 1) by PDF::Builder, but allows it to be disabled (for performance and
1435 file size reasons) by setting unicodemap to 0. This will produce non-
1436 searchable text, which, besides being annoying to users, may prevent
1437 screen readers and other aids to disabled users from working correctly!
1438
1439 Examples:
1440
1441 $font1 = $pdf->ttfont('Times.ttf');
1442 $font2 = $pdf->ttfont('Georgia.otf');
1443
1444 Valid %options are:
1445
1446 encode
1447 Changes the encoding of the font from its default
1448 (WinAnsiEncoding).
1449
1450 Note that for a single byte encoding (e.g., 'latin1'), you are
1451 limited to 256 characters defined for that encoding. 'automap' does
1452 not work with TrueType. If you want more characters than that, use
1453 'utf8' encoding with a UTF-8 encoded text string.
1454
1455 isocmap
1456 Use the ISO Unicode Map instead of the default MS Unicode Map.
1457
1458 unicodemap
1459 If 1 (default), output ToUnicode CMap to permit text searches and
1460 screen readers. Set to 0 to save space by not including the
1461 ToUnicode CMap, but text searching and screen reading will not be
1462 possible.
1463
1464 dokern
1465 Enables kerning if data is available.
1466
1467 noembed
1468 Disables embedding of the font file. Note that this is potentially
1469 hazardous, as the glyphs provided on the PDF reader machine may not
1470 match what was used on the PDF writer machine (the one running
1471 PDF::Builder)! If you know for sure that all PDF readers will be
1472 using the same TTF or OTF file you're using with PDF::Builder; not
1473 embedding the font may be acceptable, in return for a smaller PDF
1474 file size. Note that the Reader needs to know where to find the
1475 font file -- it can't be in any random place, but typically needs
1476 to be listed in a path that the Reader follows. Otherwise, it will
1477 be unable to render the text!
1478
1479 The only value for the "noembed" flag currently checked for is 1,
1480 which means to not embed the font file in the PDF. Any other value
1481 currently results in the font file being embedded (by default),
1482 although in the future, other values might be given significance
1483 (such as checking permission bits).
1484
1485 Some additional comments on embedding font file(s) into the PDF:
1486 besides substantially increasing the size of the PDF (even if the
1487 font is subsetted, by default), PDF::Builder does not check the
1488 font file for any flags indicating font licensing issues and
1489 limitations on use. A font foundry may not permit embedding at all,
1490 may permit a subset of the font to be embedded, may permit a full
1491 font to be embedded, and may specify what can be done with an
1492 embedded font (e.g., may or may not be extracted for further use
1493 beyond displaying this one PDF). When you choose to use (and embed)
1494 a font, you should be aware of any such licensing issues.
1495
1496 nosubset
1497 Disables subsetting of a TTF/OTF font, when embedded. By default,
1498 only the glyphs used by a document are included in the file, and
1499 not the entire font. This can result in a tremendous savings in
1500 PDF file size. If you intend to allow the PDF to be edited by
1501 users, not having the entire font glyph set available may cause
1502 problems, so be aware of that (and consider using "nosubset => 1".
1503 Setting this flag to any value results in the entire font glyph set
1504 being embedded in the file. It might be a good idea to use only the
1505 value 1, in case other values are assigned roles in the future.
1506
1507 debug
1508 If set to 1 (default is 0), diagnostic information is output about
1509 the CMap processing.
1510
1511 usecmf
1512 If set to 1 (default is 0), the first priority is to make use of
1513 one of the four ".cmap" files for CJK fonts. This is the old way of
1514 processing TTF files. If, after all is said and done, a working
1515 internal CMap hasn't been found (for usecmf=>0), ttfont() will fall
1516 back to using a ".cmap" file if possible.
1517
1518 cmaps
1519 This flag may be set to a string listing the Platform/Encoding
1520 pairs to look for of any internal CMaps in the font file, in the
1521 desired order (highest priority first). If one list (comma and/or
1522 space-separated pairs) is given, it is used for both Windows and
1523 non-Windows platforms (on which PDF::Builder is running, not the
1524 PDF reader's). Two lists, separated by a semicolon ; may be given,
1525 with the first being used for a Windows platform and the second for
1526 non-Windows. The default list is "0/6 3/10 0/4 3/1 0/3; 0/6 0/4
1527 3/10 0/3 3/1". Finally, instead of a P/E list, a string "find_ms"
1528 may be given to tell it to simply call the Font::TTF find_ms()
1529 method to find a (preferably Windows) internal CMap. "cmaps" set to
1530 'find_ms' would emulate the old way of looking for CMaps. Symbol
1531 fonts (3/0) always use find_ms(), and the new default lookup is (if
1532 ".cmap" isn't used, see "usecmf") to try to get a match with the
1533 default list for the appropriate OS. If none can be found,
1534 find_ms() is tried, and as last resort use the ".cmap" (if
1535 available), even if "usecmf" is not 1.
1536
1537 CJK Fonts
1538
1539 Examples:
1540
1541 $font = $pdf->cjkfont('korean');
1542 $font = $pdf->cjkfont('traditional');
1543
1544 Valid %options are:
1545
1546 encode
1547 Changes the encoding of the font from its default.
1548
1549 Warning: Unlike "ttfont", the font file is not embedded in the output
1550 PDF file. This is evidently behavior left over from the early days of
1551 CJK fonts, where the "Cmap" and "Data" were always external files,
1552 rather than internal tables. If you need a CJK-using PDF file to embed
1553 the font, for portability, you can create a PDF using "cjkfont", and
1554 then use an external utility (e.g., "pdfcairo") to embed the font in
1555 the PDF. It may also be possible to use "ttfont" instead, to produce
1556 the PDF, provided you can deduce the correct font file name from
1557 examining the PDF file (e.g., on my Windows system, the "Ming" font
1558 would be "$font = $pdf->ttfont("C:/Program Files/Adobe/Acrobat
1559 DC/Resource/CIDFont/AdobeMingStd-Light.otf")". Of course, the font
1560 file used would have to be ".ttf" or ".otf". It may act a little
1561 differently than "cjkfont" (due a a different Cmap), but you should be
1562 able to embed the font file into the PDF.
1563
1564 See also PDF::Builder::Resource::CIDFont::CJKFont
1565
1566 Synthetic Fonts
1567
1568 Warning: BaseEncoding is not set by default for these fonts, so text in
1569 the PDF isn't searchable (by the PDF reader) unless a ToUnicode CMap is
1570 included. A ToUnicode CMap is included by default (unicodemap set to 1)
1571 by PDF::Builder, but allows it to be disabled (for performance and file
1572 size reasons) by setting unicodemap to 0. This will produce non-
1573 searchable text, which, besides being annoying to users, may prevent
1574 screen readers and other aids to disabled users from working correctly!
1575
1576 Examples:
1577
1578 $cf = $pdf->corefont('Times-Roman', encode => 'latin1');
1579 $sf = $pdf->synfont($cf, condense => 0.85); # compressed 85%
1580 $sfb = $pdf->synfont($cf, bold => 1); # embolden by 10em
1581 $sfi = $pdf->synfont($cf, oblique => -12); # italic at -12 degrees
1582
1583 Valid %options are:
1584
1585 condense
1586 Character width condense/expand factor (0.1-0.9 = condense, 1 =
1587 normal/default, 1.1+ = expand). It is the multiplier to apply to
1588 the width of each character.
1589
1590 oblique
1591 Italic angle (+/- degrees, default 0), sets skew of character box.
1592
1593 bold
1594 Emboldening factor (0.1+, bold = 1, heavy = 2, ...), additional
1595 thickness to draw outline of character (with a heavier line width)
1596 before filling.
1597
1598 space
1599 Additional character spacing in milliems (0-1000)
1600
1601 caps
1602 0 for normal text, 1 for small caps. Implemented by asking the
1603 font what the uppercased translation (single character) is for a
1604 given character, and outputting it at 80% height and 88% width
1605 (heavier vertical stems are better looking than a straight 80%
1606 scale).
1607
1608 Note that only lower case letters which appear in the "standard"
1609 font (plane 0 for core fonts and PS fonts) will be small-capped.
1610 This may include eszett (German sharp s), which becomes SS, and
1611 dotless i and j which become I and J respectively. There are many
1612 other accented Latin alphabet letters which may show up in planes 1
1613 and higher. Ligatures (e.g., ij and ffl) do not have uppercase
1614 equivalents, nor does a long s. If you have text which includes
1615 such characters, you may want to consider preprocessing it to
1616 replace them with Latin character expansions (e.g., i+j and f+f+l)
1617 before small-capping.
1618
1619 Note that CJK fonts (created with the "cjkfont" method) do not work
1620 properly with "synfont". This is due to a different internal structure
1621 of the CJK fonts, as compared to corefont, ttfont, and psfont base
1622 fonts. If you require a synthesized (modified) CJK font, you might try
1623 finding the TTF or OTF original, use "ttfont" to create the base font,
1624 and running "synfont" against that, in the manner described for
1625 embedding "CJK Fonts".
1626
1627 See also PDF::Builder::Resource::Font::SynFont
1628
1629 IMAGE METHODS
1630 This is additional information on enhanced libraries available for TIFF
1631 and PNG images. See specific information listings for GD, GIF, JPEG,
1632 and PNM image formats. In addition, see "examples/Content.pl" for an
1633 example of placing an image on a page, as well as using in a "Form".
1634
1635 Why is my image flipped or rotated?
1636
1637 Something not uncommonly seen when using JPEG photos in a PDF is that
1638 the images will be rotated and/or mirrored (flipped). This may happen
1639 when using TIFF images too. What happens is that the camera stores an
1640 image just as it comes off the CCD sensor, regardless of the camera
1641 orientation, and does not rotate it to the correct orientation! It does
1642 store a separate "orientation" flag to suggest how the image might be
1643 corrected, but not all image processing obeys this flag (PDF::Builder
1644 does not.). For example, if you take a "portrait" (tall) photo of a
1645 tree (with the phone held vertically), and then use it in a PDF, the
1646 tree may appear to have been cut down! (appears in landscape mode)
1647
1648 I have found some code that should allow the "image_jpeg" or "image"
1649 routine to auto-rotate to (supposedly) the correct orientation, by
1650 looking for the Exif metadata "Orientation" tag in the file. However,
1651 three problems arise: 1) if a photo has been edited, and rotated or
1652 flipped in the process, there is no guarantee that the Orientation tag
1653 has been corrected. 2) more than one Orientation tag may exist (e.g.,
1654 in the binary APP1/Exif header, and in XML data), and they may not
1655 agree with each other -- which should be used? 3) the code would need
1656 to uncompress the raster data, swap and/or transpose rows and/or
1657 columns, and recompress the raster data for inclusion into the PDF.
1658 This is costly and error-prone. In any case, the user would need to be
1659 able to override any auto-rotate function.
1660
1661 For the time being, PDF::Builder will simply leave it up to the user of
1662 the library to take care of rotating and/or flipping an image which
1663 displays incorrectly. It is possible that we will consider adding some
1664 sort of query or warning that the image appears to not be "normally"
1665 oriented (Orientation value 1 or "Top-left"), according to the
1666 Orientation flag. You can consider either (re-)saving the photo in an
1667 editor such as PhotoShop or GIMP, or using PDF::Builder code similar to
1668 the following (for images rotated 180 degrees):
1669
1670 $pW = 612; $pH = 792; # page dimensions (US Letter)
1671 my $img = $pdf->image_jpeg("AliceLake.jpeg");
1672 # raw size WxH 4032x3024, scaled down to 504x378
1673 $sW = 4032/8; $sH = 3024/8;
1674 # intent is to center on US Letter sized page (LL at 54,207)
1675 # Orientation flag on this image is 3 (rotated 180 degrees).
1676 # if naively displayed (just $gfx->image call), it will be upside down
1677
1678 $gfx->save();
1679
1680 ## method 0: simple display, is rotated 180 degrees!
1681 #$gfx->image($img, ($pW-$sW)/2,($pH-$sH)/2, $sW,$sH);
1682
1683 ## method 1: translate, then rotate
1684 #$gfx->translate($pW,$pH); # to new origin (media UR corner)
1685 #$gfx->rotate(180); # rotate around new origin
1686 #$gfx->image($img, ($pW-$sW)/2,($pH-$sH)/2, $sW,$sH);
1687 # image's UR corner, not LL
1688
1689 # method 2: rotate, then translate
1690 $gfx->rotate(180); # rotate around current origin
1691 $gfx->translate(-$sW,-$sH); # translate in rotated coordinates
1692 $gfx->image($img, -($pW-$sW)/2,-($pH-$sH)/2, $sW,$sH);
1693 # image's UR corner, not LL
1694
1695 ## method 3: flip (mirror) twice
1696 #$scale = 1; # not rescaling here
1697 #$size_page = $pH/$scale;
1698 #$invScale = 1.0/$scale;
1699 #$gfx->add("-$invScale 0 0 -$invScale 0 $size_page cm");
1700 #$gfx->image($img, -($pW-$sW)/2-$sW,($pH-$sH)/2, $sW,$sH);
1701
1702 $gfx->restore();
1703
1704 If your image is also mirrored (flipped about an axis), simple rotation
1705 will not suffice. You could do something with a reversal of the
1706 coordinate system, as in "method 3" above (see "Advanced Methods" in
1707 PDF::Builder::Content). To mirror only left/right, the second $invScale
1708 would be positive; to mirror only top/bottom, the first would be
1709 positive. If all else fails, you could save a mirrored copy in a photo
1710 editor. 90 or 270 degree rotations will require a "rotate" call,
1711 possibly with "cm" usage to reverse mirroring. Incidentally, do not
1712 confuse this issue with the coordinate flipping performed by some
1713 Chrome browsers when printing a page to PDF.
1714
1715 Note that TIFF images may have the same rotation/mirroring problems as
1716 JPEG, which is not surprising, as the Exif format was lifted from TIFF
1717 for use in JPEG. The cure will be similar to JPEG's.
1718
1719 TIFF Images
1720
1721 Note that the Graphics::TIFF support library does not currently permit
1722 a filehandle for $file.
1723
1724 PDF::Builder will use the Graphics::TIFF support library for TIFF
1725 functions, if it is available, unless explicitly told not to. Your code
1726 can test whether Graphics::TIFF is available by examining
1727 "$tiff->usesLib()" or "$pdf->LA_GT()".
1728
1729 = -1
1730 Graphics::TIFF is installed, but your code has specified "nouseGT",
1731 to not use it. The old, pure Perl, code (buggy!) will be used
1732 instead, as if Graphics::TIFF was not installed.
1733
1734 = 0 Graphics::TIFF is not installed. Not all systems are able to
1735 successfully install this package, as it requires libtiff.a.
1736
1737 = 1 Graphics::TIFF is installed and is being used.
1738
1739 Options:
1740
1741 nouseGT => 1
1742 Do not use the Graphics::TIFF library, even if it's available.
1743 Normally you would want to use this library, but there may be cases
1744 where you don't, such as when you want to use a file handle instead
1745 of a name.
1746
1747 silent => 1
1748 Do not give the message that Graphics::TIFF is not installed. This
1749 message will be given only once, but you may want to suppress it,
1750 such as during t-tests.
1751
1752 PNG Images
1753
1754 PDF::Builder will use the Image::PNG::Libpng support library for PNG
1755 functions, if it is available, unless explicitly told not to. Your code
1756 can test whether Image::PNG::Libpng is available by examining
1757 "$png->usesLib()" or "$pdf->LA_IPL()".
1758
1759 = -1
1760 Image::PNG::Libpng is installed, but your code has specified
1761 "nouseIPL", to not use it. The old, pure Perl, code (slower and
1762 less capable) will be used instead, as if Image::PNG::Libpng was
1763 not installed.
1764
1765 = 0 Image::PNG::Libpng is not installed. Not all systems are able to
1766 successfully install this package, as it requires libpng.a.
1767
1768 = 1 Image::PNG::Libpng is installed and is being used.
1769
1770 Options:
1771
1772 nouseIPL => 1
1773 Do not use the Image::PNG::Libpng library, even if it's available.
1774 Normally you would want to use this library, when available, but
1775 there may be cases where you don't.
1776
1777 silent => 1
1778 Do not give the message that Image::PNG::Libpng is not installed.
1779 This message will be given only once, but you may want to suppress
1780 it, such as during t-tests.
1781
1782 notrans => 1
1783 No transparency -- ignore tRNS chunk if provided, ignore Alpha
1784 channel if provided.
1785
1786 USING SHAPER (HarfBuzz::Shaper library)
1787 # if HarfBuzz::Shaper is not installed, either bail out, or try to
1788 # use regular TTF calls instead
1789 my $rc;
1790 $rc = eval {
1791 require HarfBuzz::Shaper;
1792 1;
1793 };
1794 if (!defined $rc) { $rc = 0; }
1795 if ($rc == 0) {
1796 # bail out in some manner
1797 } else {
1798 # can use Shaper
1799 }
1800
1801 my $fontfile = '/WINDOWS/Fonts/times.ttf'; # used by both Shaper and textHS
1802 my $fontsize = 15; # used by both Shaper and textHS
1803 my $font = $pdf->ttfont($fontfile);
1804 $text->font($font, $fontsize);
1805
1806 my $hb = HarfBuzz::Shaper->new(); # only need to set up once
1807 my %settings; # for textHS(), not Shaper
1808 $settings{'dump'} = 1; # see the diagnostics
1809 $settings{'script'} = 'Latn';
1810 $settings('dir'} = 'L'; # LTR
1811 $settings{'features'} = (); # required
1812
1813 # -- set language (override automatic setting)
1814 #$settings{'language'} = 'en';
1815 #$hb->set_language( 'en_US' );
1816 # -- turn OFF ligatures
1817 #push @{ $settings{'features'} }, 'liga';
1818 #$hb->add_features( 'liga' );
1819 # -- turn OFF kerning
1820 #push @{ $settings{'features'} }, 'kern';
1821 #$hb->add_features( 'kern' );
1822 $hb->set_font($fontfile);
1823 $hb->set_size($fontsize);
1824 $hb->set_text("Let's eat waffles in the field for brunch.");
1825 # expect ffl and fi ligatures, and perhaps some kerning
1826
1827 my $info = $hb->shaper();
1828 $text->textHS($info, \%settings); # strikethru, underline allowed
1829
1830 The package HarfBuzz::Shaper may be optionally installed in order to
1831 use the text-shaping capabilities of the HarfBuzz library. These
1832 include kerning and ligatures in Western scripts (such as the Latin
1833 alphabet). More complex scripts can be handled, such as Arabic family
1834 and Indic scripts, where multiple forms of a character may be
1835 automatically selected, characters may be reordered, and other
1836 modifications made. The examples/HarfBuzz.pl script gives some examples
1837 of what may be done.
1838
1839 Keep in mind that HarfBuzz works only with TrueType (.ttf) and OpenType
1840 (.otf) font files. It will not work with PostScript (Type1), core,
1841 bitmapped, or CJK fonts. Not all .ttf fonts have the instructions
1842 necessary to guide HarfBuzz, but most proper .otf fonts do. In other
1843 words, there are no guarantees that a particular font file will work
1844 with Shaper!
1845
1846 The basic idea is to break up text into "chunks" which are of the same
1847 script (alphabet), language, direction, font face, font size, and
1848 variant (italic, bold, etc.). These could range from a single character
1849 to paragraph-length strings of text. These are fed to HarfBuzz::Shaper,
1850 along with flags, the font file to be used, and other supporting
1851 information, to create an array of output glyphs. Each element is a
1852 hash describing the glyph to be output, including its name (if
1853 available), its glyph ID (number) in the selected font, its x and y
1854 displacement (usually 0), and its "advance" x and y values, all in
1855 points. For horizontal languages (LTR and RTL), the y advance is
1856 normally 0 and the x advance is the font's character width, less any
1857 kerning amount.
1858
1859 Shaper will attempt to figure out the script used and the text
1860 direction, based on the Unicode range; and a reasonable guess at the
1861 language used. The language can be overridden, but currently the script
1862 and text direction cannot be overridden.
1863
1864 An important note: the number of glyphs (array elements) may not be
1865 equal to the number of Unicode points (characters) given in the chunk's
1866 text string! Sometimes a character will be decomposed into several
1867 pieces (multiple glyphs); sometimes multiple characters may be combined
1868 into a single ligature glyph; and characters may be reordered
1869 (especially in Indic and Southeast Asian languages). As well, for
1870 Right-to-Left (bidirectional) scripts such as Hebrew and Arabic
1871 families, the text is output in Left-to-Right order (reversed from the
1872 input).
1873
1874 With due care, a Shaper array can be manipulated in code. The elements
1875 are more or less independent of each other, so elements can be
1876 modified, rearranged, inserted, or deleted. You might adjust the
1877 position of a glyph with 'dx' and 'dy' hash elements. The 'ax' value
1878 should be left alone, so that the wrong kerning isn't calculated, but
1879 you might need to adjust the "advance x" value by means of one of the
1880 following:
1881
1882 axs is a value to be substituted for 'ax' (points)
1883 axsp is a substituted value (percentage) of the original 'ax'
1884 axr reduces 'ax' by the value (points). If negative, increase 'ax'
1885 axrp reduces 'ax' by the given percentage. Again, negative increases
1886 'ax'
1887
1888 Caution: a given character's glyph ID is not necessarily going to be
1889 the same between any two fonts! For example, an ASCII space (U+0020)
1890 might be "<0001>" in one font, and "<0003>" in another font (even one
1891 closely related!). A U+00A0 required blank (non-breaking space) may be
1892 output as a regular ASCII space U+0020. Take care if you need to find a
1893 particular glyph in the array, especially if the number of elements
1894 don't match. Consider making a text string of "marker" characters
1895 (space, nbsp, hyphen, soft hyphen, etc.) and processing it through
1896 HarfBuzz::Shaper to get the corresponding glyph numbers. You may have
1897 to count spaces, say, to see where you could break a glyph array to fit
1898 a line.
1899
1900 The advancewidthHS() method uses the same inputs as does textHS().
1901 Like advancewidth(), it returns the chunk length in points. Unlike
1902 advancewidth(), you cannot override the glyph array's font, font size,
1903 etc.
1904
1905 Once you have your (possibly modified) array of glyphs, you feed it to
1906 the textHS() method to render it to the page. Remember that this method
1907 handles only a single line of text; it does not do line splitting or
1908 fitting -- that you currently need to do manually. For Western scripts
1909 (e.g., Latin), that might not be too difficult, but for other scripts
1910 that involve extensive modification of the raw characters, it may be
1911 quite difficult to split words, but you still may be able to split at
1912 inter-word spaces.
1913
1914 A useful, but not exhaustive, set of functions are allowed by textHS()
1915 use. Support includes direction setting (top-to-bottom and bottom-to-
1916 top directions, e.g., for Far Eastern languages in traditional
1917 orientation), and explicit script names and language (depending on what
1918 support HarfBuzz itself gives). Not yet supported are features such as
1919 discretionary ligatures and manual selection of glyphs (e.g., swashes
1920 and alternate forms).
1921
1922 Currently, textHS() can only handle a single text string. We are
1923 looking at how fitting to a line length (splitting up an array) could
1924 be done, as well as how words might be split on hard and soft hyphens.
1925 At some point, full paragraph and page shaping could be possible.
1926
1927
1928
1929perl v5.36.0 2023-01-23 PDF::Builder::Docs(3)