1PDF::Builder::Docs(3) User Contributed Perl DocumentationPDF::Builder::Docs(3)
2
3
4
6 PDF::Builder::Docs - additional documentation for Builder module
7
9 Software Development Kit
10 There are four levels of involvement with PDF::Builder. Depending on
11 what you want to do, different kinds of installs are recommended.
12
13 1. Simply installing PDF::Builder as a prerequisite for running some
14 other package. All you need to do is install the CPAN package for
15 PDF::Builder, and it will load the .pm files into your Perl library. If
16 the other package prereqs PDF::Builder, its installer may download and
17 install PDF::Builder automatically.
18
19 2. You want to write a Perl program that uses PDF::Builder functions.
20 In addition to installing PDF::Builder from CPAN, you will want
21 documentation on it. Obtain a copy of the product from GitHub
22 (https://github.com/PhilterPaper/Perl-PDF-Builder) or as a gzipped tar
23 file from CPAN. This includes a utility to build (from POD) a library
24 of HTML documents, as well as examples (examples/ directory) and
25 contributed sample programs (contrib/ directory).
26
27 3. You want to modify PDF::Builder files. In addition to the CPAN and
28 GitHub distributions, you may choose to keep a local Git repository for
29 tracking your changes. Depending on whether or not your PDF::Builder
30 copy is being used for production purposes, you may want to do your
31 editing and testing in the Perl library installation (live) or in a
32 different place. The "t" tests (t/ directory) and examples provide good
33 regression tests to ensure that you haven't broken anything. If you do
34 your editing on the live code, don't forget when done to copy the
35 changes back into the master version you keep!
36
37 4. You want to contribute to the development of PDF::Builder. You will
38 need a local Git repository (and a GitHub account), so that when you've
39 got it all done, you can issue a "Pull Request" to bring it to our
40 attention. We can't guarantee that your work will be incorporated into
41 the project, but at least we will look at it. From time to time, a new
42 CPAN version will be issued.
43
44 If you want to make substantial changes for public use, and can't come
45 to a meeting of minds with us, you can even start your own GitHub
46 project and register a new CPAN project (that's what we did, forking
47 PDF::API2). Please don't just assume that we don't want your changes --
48 at least propose what you want to do in writing, so we can consider it.
49 We're always looking for people to help out and expand PDF::Builder.
50
51 Optional Libraries
52 PDF::Builder can make use of some optional libraries, which are not
53 required for a successful installation. If you want improved speed and
54 capabilities for certain functions, you may want to install and use
55 these libraries:
56
57 * Graphics::TIFF -- PDF::Builder inherited a rather slow, buggy, and
58 limited TIFF image library from PDF::API2. If Graphics::TIFF (available
59 on CPAN, uses libtiff.a) is installed, PDF::Builder will use that
60 instead, unless you specify that it is to use the old, pure Perl
61 library. The only time you might want to consider this is when you need
62 to pass an open filehandle to "image_tiff" instead of a file name. See
63 resolved bug reports RT 84665 and RT 118047, as well as "image_tiff",
64 for more information.
65
66 * Image::PNG::Libpng -- PDF::Builder inherited a rather slow and buggy
67 pure Perl PNG image library from PDF::API2. If Image::PNG::Libpng
68 (available on CPAN, uses libpng.a) is installed, PDF::Builder will use
69 that instead, unless you specify that it is to use the old, pure Perl
70 library. Using the new library will give you improved speed, the
71 ability to use 16 bit samples, and the ability to read interlaced PNG
72 files. See resolved bug report RT 124349, as well as "image_png", for
73 more information.
74
75 * HarfBuzz::Shaper -- This library enables PDF::Builder to handle
76 complex scripts (Arabic, Devanagari, etc.) as well as non-LTR writing
77 systems. It is also useful for Latin and other simple scripts, for
78 ligatures and improved kerning. HarfBuzz::Shaper is based on a set of
79 HarfBuzz libraries, which it will attempt to build if they are not
80 found. See "textHS" for more information.
81
82 Note that the installation process will attempt to install these
83 libraries automatically. If you don't wish to use one or more of them,
84 you are free to uninstall the optional librarie(s). If one or more
85 failed to install, no need to panic -- you simply won't be able to use
86 some advanced features, unless you are able to manually install the
87 modules (e.g., with "cpan install").
88
89 Strings (Character Text)
90 Perl, and hence PDF::Builder, use strings that support the full range
91 of Unicode characters. When importing strings into a Perl program, for
92 example by reading text from a file, you must be aware of what their
93 character encoding is. Single-byte encodings (default is 'latin1'),
94 represented as bytes of value 0x00 through 0xFF (0..255), will produce
95 different results if you do something that depends on the encoding,
96 such as sorting, searching, or comparing any two non-ASCII characters.
97 This also applies to any characters (text) hard coded into the Perl
98 program.
99
100 You can always decode the text from external encoding (ASCII, UTF-8,
101 Latin-3, etc.) into the Perl (internal) UTF-8 multibyte encoding. This
102 uses one to four bytes to represent each character. See pragma "utf8"
103 and module "Encode" for details about decoding text. Note that only
104 TrueType fonts ("ttfont") can make direct use of UTF-8-encoded text.
105 Other font types (core, T1, etc.) can only use single-byte encoded
106 text. If your text is ASCII, Latin-1, or CP-1252, you can just leave
107 the Perl strings as the default single-byte encoding.
108
109 Then, there is the matter of encoding the output to match up with
110 available font character sets. You're not actually translating the text
111 on output, but are telling the output system (and Reader) what encoding
112 the output byte stream represents, and what character glyphs they
113 should generate.
114
115 If you confine your text to plain ASCII (0x00 .. 0x7F byte values) or
116 even Latin-1 or CP-1252 (0x00 .. 0xFF byte values), you can use default
117 (non-UTF-8) Perl strings and use the default output encoding
118 (WinAnsiEncoding), which is more-or-less Windows CP-1252 (a superset in
119 turn, of ISO-8859-1 Latin-1). If your text uses any other characters,
120 you will need to be aware of what encoding your text strings are (in
121 the Perl string and for declaring output glyph generation). See "Core
122 Fonts", "PS Fonts" and "TrueType Fonts" in "FONT METHODS" for
123 additional information.
124
125 Some Internal Details
126
127 Some of the following may be a bit scary or confusing to beginners, so
128 don't be afraid to skip over it until you're ready for it...
129
130 Perl (and PDF::Builder) internally use strings which are either single-
131 byte (ISO-8859-1/Latin-1) or multibyte UTF-8 encoded (there is an
132 internal flag marking the string as UTF-8 or not). If you work
133 strictly in ASCII or Latin-1 or CP-1252 (each a superset of the
134 previous), you should be OK in not doing anything special about your
135 string encoding. You can just use the default Perl single byte strings
136 (internally marked as not UTF-8) and the default output encoding
137 (WinAnsiEncoding).
138
139 If you intend to use input from a variety of sources, you should
140 consider decoding (converting) your text to UTF-8, which will provide
141 an internally consistent representation (and your Perl code itself
142 should be saved in UTF-8, in case you want to use any hard coded non-
143 ASCII characters). In any string, non-ASCII characters (0x80 or higher)
144 would be converted to the Perl UTF-8 internal representation, via
145 "$string = Encode::decode(MY_ENCODING, $input);". "MY_ENCODING" would
146 be a string like 'latin1', 'cp-1252', 'utf8', etc. Similar capabilities
147 are available for declaring a file to be in a certain encoding.
148
149 Be aware that if you use UTF-8 encoding for your text, that only
150 TrueType font output ("ttfont") can handle it directly. Corefont and
151 Type1 output will require that the text will have to be converted back
152 into a single-byte encoding (using "Encode::encode"), which may need to
153 be declared with "encode" (for "corefont" or "psfont"). If you have any
154 characters not found in the selected single-byte encoding (but are
155 found in the font itself), you will need to use "automap" to break up
156 the font glyphs into 256 character planes, map such characters to 0x00
157 .. 0xFF in the appropriate plane, and switch between font planes as
158 necessary.
159
160 Core and Type1 fonts (output) use the byte values in the string
161 (single-byte encoding only!) and provide a byte-to-glyph mapping record
162 for each plane. TrueType outputs a group of four hexadecimal digits
163 representing the "CId" (character ID) of each character. The CId does
164 not correspond to either the single-byte or UTF-8 internal
165 representations of the characters.
166
167 The bottom line is that you need to know what the internal
168 representation of your text is, so that the output routines can tell
169 the PDF reader about it (via the PDF file). The text will not be
170 translated upon output, but the PDF reader needs to know what the
171 encoding in use is, so it knows what glyph to associate with each byte
172 (or byte sequence).
173
174 Note that some operating systems and Perl flavors are reputed to be
175 strict about encoding names. For example, latin1 (an alias) may be
176 rejected as invalid, while iso-8859-1 (a canonical value) will work.
177
178 By the way, it is recommended that you be using at least Perl 5.10 if
179 you are going to be using any non-ASCII characters. Perl 5.8 may be a
180 little unpredictable in handling such text.
181
182 Rendering Order
183 For better or worse, for compatibility purposes, PDF::Builder continues
184 the same rendering model as used by PDF::API2 (and possibly its
185 predecessors). That is, all graphics for one graphics object are put
186 into one record, and all text output for one text object goes into
187 another record. Which one is output first, is whichever is declared
188 first. This can lead to unexpected results, where items are rendered in
189 (apparently) the wrong order. That is, text and graphics items are not
190 necessarily output (rendered) in the same order as they were created in
191 code. Two items in the same object (e.g., $text) will be rendered in
192 the same order as they were coded, but items from different objects may
193 not be rendered in the expected order. The following example (source
194 code and annotated PDF excerpts) will hopefully illustrate the issue:
195
196 use strict;
197 use warnings;
198 use PDF::Builder;
199
200 # demonstrate text and graphics object order
201 #
202 my $fname = "objorder";
203
204 my $paper_size = "Letter";
205
206 # see the text and graphics stream contents
207 my $pdf = PDF::Builder->new(compress => 'none');
208 $pdf->mediabox($paper_size);
209 my $page = $pdf->page();
210 # adjust path for your operating system
211 my $fontTR = $pdf->ttfont('C:\\Windows\\Fonts\\timesbd.ttf');
212
213 For the first group, you might expect the "under" line to be output,
214 then the filled circle (disc) partly covering it, then the "over" line
215 covering the disc, and finally a filled rectangle (bar) over both
216 lines. What actually happened is that the $grfx graphics object was
217 declared first, so everything in that object (the disc and bar) is
218 output first, and the text object $text (both lines) comes afterwards.
219 The result is that the text lines are on top of the graphics drawings.
220
221 # ----------------------------
222 # 1. text, orange ball over, text over, bar over
223
224 my $grfx1 = $page->gfx();
225 my $text1 = $page->text();
226 $text1->font($fontTR, 20); # 20 pt Times Roman bold
227
228 $text1->fillcolor('black');
229 $grfx1->strokecolor('blue');
230 $grfx1->fillcolor('orange');
231
232 $text1->translate(50,700);
233 $text1->text_left("This text should be under everything.");
234
235 $grfx1->circle(100,690, 30);
236 $grfx1->fillstroke();
237
238 $text1->translate(50,670);
239 $text1->text_left("This text should be over the ball and under the bar.");
240
241 $grfx1->rect(160,660, 20,70);
242 $grfx1->fillstroke();
243
244 % ---------------- group 1: define graphics object first, then text
245 11 0 obj << /Length 690 >> stream % obj 11 is graphics for (1)
246 0 0 1 RG % stroke blue
247 1 0.647059 0 rg % fill orange
248 130 690 m ... c h B % draw and fill circle
249 160 660 20 70 re B % draw and fill bar
250 endstream endobj
251
252 12 0 obj << /Length 438 >> stream % obj 12 is text for (1)
253 BT
254 /TiCBA 20 Tf % Times Roman Bold 20pt
255 0 0 0 rg % fill black
256 1 0 0 1 50 700 Tm % position text
257 <0037 ... 0011> Tj % "under" line
258 1 0 0 1 50 670 Tm % position text
259 <0037 ... 0011> Tj % "over" line
260 ET
261 endstream endobj
262
263 The second group is the same as the first, with the only difference
264 being that the text object was declared first, and then the graphics
265 object. The result is that the two text lines are rendered first, and
266 then the disc and bar are drawn over them.
267
268 # ----------------------------
269 # 2. (1) again, with graphics and text order reversed
270
271 my $text2 = $page->text();
272 my $grfx2 = $page->gfx();
273 $text2->font($fontTR, 20); # 20 pt Times Roman bold
274
275 $text2->fillcolor('black');
276 $grfx2->strokecolor('blue');
277 $grfx2->fillcolor('orange');
278
279 $text2->translate(50,600);
280 $text2->text_left("This text should be under everything.");
281
282 $grfx2->circle(100,590, 30);
283 $grfx2->fillstroke();
284
285 $text2->translate(50,570);
286 $text2->text_left("This text should be over the ball and under the bar.");
287
288 $grfx2->rect(160,560, 20,70);
289 $grfx2->fillstroke();
290
291 % ---------------- group 2: define text object first, then graphics
292 13 0 obj << /Length 438 >> stream % obj 13 is text for (2)
293 BT
294 /TiCBA 20 Tf % Times Roman Bold 20pt
295 0 0 0 rg % fill black
296 1 0 0 1 50 600 Tm % position text
297 <0037 ... 0011> Tj % "under" line
298 1 0 0 1 50 570 Tm % position text
299 <0037 ... 0011> Tj % "over" line
300 ET
301 endstream endobj
302
303 14 0 obj << /Length 690 >> stream % obj 14 is graphics for (2)
304 0 0 1 RG % stroke blue
305 1 0.647059 0 rg % fill orange
306 130 590 m ... h B % draw and fill circle
307 160 560 20 70 re B % draw and fill bar
308 endstream endobj
309
310 The third group defines two text and two graphics objects, in the order
311 that they are expected in. The "under" text line is output first, then
312 the orange disc graphics is output, partly covering the text. The
313 "over" text line is now output -- it's actually over the disc, but is
314 orange because the previous object stream (first graphics object) left
315 the fill color (also used for text) as orange, because we didn't
316 explicitly set the fill color before outputting the second text line.
317 This is not "inheritance" so much as it is whatever the graphics
318 (drawing) state (used for both "graphics" and "text") is left in at the
319 end of one object, it's the state at the beginning of the next object.
320 If you wish to control this, consider surrounding the graphics or text
321 calls with "save()" and "restore()" calls to save and restore (push and
322 pop) the graphics state to what it was at the "save()". Finally, the
323 bar is drawn over everything.
324
325 # ----------------------------
326 # 3. (2) again, with two graphics and two text objects
327
328 my $text3 = $page->text();
329 my $grfx3 = $page->gfx();
330 $text3->font($fontTR, 20); # 20 pt Times Roman bold
331 my $text4 = $page->text();
332 my $grfx4 = $page->gfx();
333 $text4->font($fontTR, 20); # 20 pt Times Roman bold
334
335 $text3->fillcolor('black');
336 $grfx3->strokecolor('blue');
337 $grfx3->fillcolor('orange');
338 # $text4->fillcolor('yellow');
339 # $grfx4->strokecolor('red');
340 # $grfx4->fillcolor('purple');
341
342 $text3->translate(50,500);
343 $text3->text_left("This text should be under everything.");
344
345 $grfx3->circle(100,490, 30);
346 $grfx3->fillstroke();
347
348 $text4->translate(50,470);
349 $text4->text_left("This text should be over the ball and under the bar.");
350
351 $grfx4->rect(160,460, 20,70);
352 $grfx4->fillstroke();
353
354 % ---------------- group 3: define text1, graphics1, text2, graphics2
355 15 0 obj << /Length 206 >> stream % obj 15 is text1 for (3)
356 BT
357 /TiCBA 20 Tf % Times Roman Bold 20pt
358 0 0 0 rg % fill black
359 1 0 0 1 50 500 Tm % position text
360 <0037 ... 0011> Tj % "under" line
361 ET
362 endstream endobj
363
364 16 0 obj << /Length 671 >> stream % obj 16 is graphics1 for (3) circle
365 0 0 1 RG % stroke blue
366 1 0.647059 0 rg % fill orange
367 130 490 m ... h B % draw and fill circle
368 endstream endobj
369
370 17 0 obj << /Length 257 >> stream % obj 17 is text2 for (3)
371 BT
372 /TiCBA 20 Tf % Times Roman Bold 20pt
373 1 0 0 1 50 470 Tm % position text
374 <0037 ... 0011> Tj % "over" line
375 ET
376 endstream endobj
377
378 18 0 obj << /Length 20 >> stream % obj 18 is graphics for (3) bar
379 160 460 20 70 re B % draw and fill bar
380 endstream endobj
381
382 The fourth group is the same as the third, except that we define the
383 fill color for the text in the second line. This makes it clear that
384 the "over" line (in yellow) was written after the orange disc, and
385 still before the bar.
386
387 # ----------------------------
388 # 4. (3) again, a new set of colors for second group
389
390 my $text3 = $page->text();
391 my $grfx3 = $page->gfx();
392 $text3->font($fontTR, 20); # 20 pt Times Roman bold
393 my $text4 = $page->text();
394 my $grfx4 = $page->gfx();
395 $text4->font($fontTR, 20); # 20 pt Times Roman bold
396
397 $text3->fillcolor('black');
398 $grfx3->strokecolor('blue');
399 $grfx3->fillcolor('orange');
400 $text4->fillcolor('yellow');
401 $grfx4->strokecolor('red');
402 $grfx4->fillcolor('purple');
403
404 $text3->translate(50,400);
405 $text3->text_left("This text should be under everything.");
406
407 $grfx3->circle(100,390, 30);
408 $grfx3->fillstroke();
409
410 $text4->translate(50,370);
411 $text4->text_left("This text should be over the ball and under the bar.");
412
413 $grfx4->rect(160,360, 20,70);
414 $grfx4->fillstroke();
415
416 % ---------------- group 4: define text1, graphics1, text2, graphics2 with colors for 2
417 19 0 obj << /Length 206 >> stream % obj 19 is text1 for (4)
418 BT
419 /TiCBA 20 Tf % Times Roman Bold 20pt
420 0 0 0 rg % fill black
421 1 0 0 1 50 400 Tm % position text
422 <0037 ... 0011> Tj % "under" line
423 ET
424 endstream endobj
425
426 20 0 obj << /Length 671 >> stream % obj 20 is graphics1 for (4) circle
427 0 0 1 RG % stroke blue
428 1 0.647059 0 rg % fill orange
429 130 390 m ... h B % draw and fill circle
430 endstream endobj
431
432 21 0 obj << /Length 266 >> stream % obj 21 is text2 for (4)
433 BT
434 /TiCBA 20 Tf % Times Roman Bold 20pt
435 1 1 0 rg % fill yellow
436 1 0 0 1 50 370 Tm % position text
437 <0037 ... 0011> Tj % "over" line
438 ET
439 endstream endobj
440
441 22 0 obj << /Length 52 >> stream % obj 22 is graphics for (4) bar
442 1 0 0 RG % stroke red
443 0.498039 0 0.498039 rg % fill purple
444 160 360 20 70 re B % draw and fill rectangle (bar)
445 endstream endobj
446
447 # ----------------------------
448 $pdf->saveas("$fname.pdf");
449
450 The separation of text and graphics means that only some text methods
451 are available in a graphics object, and only some graphics methods are
452 available in a text object. There is much overlap, but they differ.
453 There's really no reason the code couldn't have been written (in
454 PDF::API2, or earlier) as outputting to a single object, which would
455 keep everything in the same order as the method calls. An advantage
456 would be less object and stream overhead in the PDF file. The only
457 drawback might be that an object might more easily overflow and require
458 splitting into multiple objects, but that should be rare.
459
460 You should always be able to manually split an object by simply ending
461 output to the first object, and picking up with output to the second
462 object, so long as it was created immediately after the first object.
463 The graphics state at the end of the first object should be the initial
464 state at the beginning of the second object. However, use caution when
465 dealing with text objects -- the PDF specification states that the Text
466 matrices are not carried over from one object to the next (BT resets
467 them), so you may need to reset some settings.
468
469 $grfx1 = $page->gfx();
470 $grfx2 = $page->gfx();
471 # write a huge amount of stuff to $grfx1
472 # write a huge amount of stuff to $grfx2, picking up where $grfx1 left off
473
474 In any case, now that you understand the rendering order and how the
475 order of object declarations affects it, how text and graphics are
476 drawn can now be completely controlled as desired. There is really no
477 need to add another "both" type object that will handle all graphics
478 and text objects, as that would probably be a major code bloat for very
479 little benefit. However, it could be considered in the future if there
480 is a demonstrated need for it, such as serious PDF file size bloat due
481 to the extra object overhead when interleaving text and graphics
482 output.
483
484 PDF Versions Supported
485 When creating a PDF file using the functions in PDF::Builder, the
486 output is marked as PDF 1.4. This does not mean that all PDF
487 functionality up through 1.4 is supported! There are almost surely
488 features missing as far back as the PDF 1.0 standard.
489
490 The big problem is when a PDF of version 1.5 or higher is imported or
491 opened in PDF::Builder. If it contains content that is actually
492 unsupported by this software, there is a chance that something will
493 break. This does not guarantee that a PDF marked as "1.7" will go down
494 in flames when read by PDF::Builder, or that a PDF written back out
495 will break in a Reader, but the possibility is there. Much PDF writer
496 software simply marks its output as the highest version of PDF at the
497 time (usually 1.7), even if there is no content beyond, say, 1.2.
498 There is some handling of PDF 1.5 items in PDF::Builder, such as cross
499 reference streams, but support beyond 1.4 is very limited. All we can
500 say is to be careful when handling PDFs whose version is above 1.4, and
501 test thoroughly, as they may break at some point.
502
503 PDF::Builder includes a simple version control mechanism, where the
504 initial PDF version to be output (default 1.4) can be set by the
505 programmer. Input PDFs greater than 1.4 (current output level) will
506 receive a warning (can be suppressed) that the output level will be
507 raised to that level. The use of PDF features greater than the current
508 output level will likewise trigger a warning that the output level is
509 to be raised to the necessary level. If this is not desired, you should
510 avoid using those PDF features which are higher than the desired PDF
511 output level.
512
513 History
514 PDF::API2 was originally written by Alfred Reibenschuh, derived from
515 Martin Hosken's Text::PDF via the Text::PDF::API wrapper. In 2009,
516 Otto Hirr started the PDF::API3 fork, but it never went anywhere. In
517 2011, PDF::API2 maintenance was taken over by Steve Simms. In 2017,
518 PDF::Builder was forked by Phil M. Perry, who desired a more aggressive
519 schedule of new features and bug fixes than Simms was providing,
520 although some of Simms's work has been ported from PDF::API2.
521
522 According to "pdfapi2_for_fun_and_profit_APW2005.pdf" (on
523 http://pdfapi2.sourceforge.net, an unmaintained site), the history of
524 PDF::API2 (the predecessor to PDF::Builder) goes as such:
525
526   • First Code implemented based on PDFlib-0.6 (AFPL)
527   • Changed to Text::PDF with a total rewrite as Text::PDF::API
528 (procedural)
529   • Unmaintainable Code triggered rewrite into new Namespace
530 PDF::API2 (object-oriented, LGPL)
531   • Object-Structure streamlined in 0.4x
532
533 At Simms's request, the name of the new offering was changed from
534 PDF::API4 to PDF::Builder, to reduce the chance of confusion due to
535 parallel development. Perry's intent is to keep all internal methods
536 as upwardly compatible with PDF::API2 as possible, although it is
537 likely that there will be some drift (incompatibilities) over time. At
538 least initially, any program written based on PDF::API2 should be
539 convertible to PDF::Builder simply by changing "API2" anywhere it
540 occurs to "Builder". See the INFO/KNOWN_INCOMP known incompatibilities
541 file for further information.
542
543 Thanks...
544
545 Many users have helped out by reporting bugs and requesting
546 enhancements. A special shout out goes to those who have contributed
547 code and tests, or coordinated their package development with the needs
548 of PDF::Builder: Ben Bullock, Cary Gravel, Gregor Herrmann, Petr Pisar,
549 Jeffrey Ratcliffe, Steve Simms (via PDF::API2 fixes), and Johan
550 Vromans. Drop me a line if I've overlooked your contribution!
551
553 Note: older versions of this package named various (hash element)
554 options with leading dashes (hyphens) in the name, e.g., '-encode'. The
555 use of a dash is now optional, and options are documented with names
556 not using dashes. At some point in the future, it is possible that
557 support for dashed names will be deprecated (and eventually withdrawn),
558 so it would be good practice to start using undashed names in new and
559 revised code.
560
561 After saving a file...
562 Note that a PDF object such as $pdf cannot continue to be used after
563 saving an output PDF file or string with $pdf->"save()", "saveas()", or
564 "stringify()". There is some cleanup and other operations done
565 internally which make the object unusable for further operations. You
566 will likely receive an error message about can't call method new_obj on
567 an undefined value if you try to keep using a PDF object.
568
569 IntegrityCheck
570 The PDF::Builder methods that open an existing PDF file, pass it by the
571 integrity checker method, "$self->IntegrityCheck(level, content)". This
572 method servers two purposes: 1) to find any "/Version" settings that
573 override the PDF version found in the PDF heading, and 2) perform some
574 basic validations on the contents of the PDF.
575
576 The "level" parameter accepts the following values:
577
578 0 = Do not output any diagnostic messages; just return any version
579 override.
580 1 = Output error-level (serious) diagnostic messages, as well as
581 returning any version override.
582 Errors include, in no place was the /Root object specified, or if
583 it was, the indicated object was not found. An object claims
584 another object as its child (/Kids list), but another object has
585 already claimed that child. An object claims a child, but that
586 child does not list a Parent, or the child lists a different
587 Parent.
588
589 2 = Output error- (serious) and warning- (less serious) level
590 diagnostic messages, as well as returning any version override. This is
591 the default.
592 3 = Output error- (serious), warning- (less serious), and note-
593 (informational) level diagnostic messages, as well as returning any
594 version override.
595 Notes include, in no place was the (optional) /Info object
596 specified, or if it was, the indicated object was not found. An
597 object was referenced, but no entry for it was found among the
598 objects. (This may be OK if the object is not defined, or is on the
599 free list, as the reference will then be ignored.) An object is
600 defined, but it appears that no other object is referencing it.
601
602 4 = Output error-, warning-, and note-level diagnostic messages, as
603 well as returning any version override. Also dump the diagnostic data
604 structure.
605 5 = Output error-, warning-, and note-level diagnostic messages, as
606 well as returning any version override. Also dump the diagnostic data
607 structure and the $self data structure (generally useful only if you
608 have already read in the PDF file).
609
610 The version is a string (e.g., '1.5') if found, otherwise "undef"
611 (undefined value) is returned.
612
613 For controlling the "automatic" call to IntegrityCheck (via opens), the
614 level may be given with the option (flag) "diaglevel => n", where "n"
615 is between 0 and 5.
616
617 Preferences - set user display preferences
618 $pdf->preferences(%options)
619 Controls viewing preferences for the PDF.
620
621 Page Mode Options
622
623 fullscreen
624 Full-screen mode, with no menu bar, window controls, or any
625 other window visible.
626
627 thumbs
628 Thumbnail images visible.
629
630 outlines
631 Document outline visible.
632
633 Page Layout Options
634
635 singlepage
636 Display one page at a time.
637
638 onecolumn
639 Display the pages in one column.
640
641 twocolumnleft
642 Display the pages in two columns, with oddnumbered pages on the
643 left.
644
645 twocolumnright
646 Display the pages in two columns, with oddnumbered pages on the
647 right.
648
649 Viewer Options
650
651 hidetoolbar
652 Specifying whether to hide tool bars.
653
654 hidemenubar
655 Specifying whether to hide menu bars.
656
657 hidewindowui
658 Specifying whether to hide user interface elements.
659
660 fitwindow
661 Specifying whether to resize the document's window to the size
662 of the displayed page.
663
664 centerwindow
665 Specifying whether to position the document's window in the
666 center of the screen.
667
668 displaytitle
669 Specifying whether the window's title bar should display the
670 document title taken from the Title entry of the document
671 information dictionary.
672
673 afterfullscreenthumbs
674 Thumbnail images visible after Full-screen mode.
675
676 afterfullscreenoutlines
677 Document outline visible after Full-screen mode.
678
679 printscalingnone
680 Set the default print setting for page scaling to none.
681
682 simplex
683 Print single-sided by default.
684
685 duplexflipshortedge
686 Print duplex by default and flip on the short edge of the
687 sheet.
688
689 duplexfliplongedge
690 Print duplex by default and flip on the long edge of the sheet.
691
692 Page Fit Options
693
694 These options are used for the "firstpage" layout, as well as for
695 Annotations, Named Destinations and Outlines.
696
697 'fit' => 1
698 Display the page designated by $page, with its contents magnified
699 just enough to fit the entire page within the window both
700 horizontally and vertically. If the required horizontal and
701 vertical magnification factors are different, use the smaller of
702 the two, centering the page within the window in the other
703 dimension.
704
705 'fith' => $top
706 Display the page designated by $page, with the vertical coordinate
707 $top positioned at the top edge of the window and the contents of
708 the page magnified just enough to fit the entire width of the page
709 within the window.
710
711 'fitv' => $left
712 Display the page designated by $page, with the horizontal
713 coordinate $left positioned at the left edge of the window and the
714 contents of the page magnified just enough to fit the entire height
715 of the page within the window.
716
717 'fitr' => [ $left, $bottom, $right, $top ]
718 Display the page designated by $page, with its contents magnified
719 just enough to fit the rectangle specified by the coordinates
720 $left, $bottom, $right, and $top entirely within the window both
721 horizontally and vertically. If the required horizontal and
722 vertical magnification factors are different, use the smaller of
723 the two, centering the rectangle within the window in the other
724 dimension.
725
726 'fitb' => 1
727 Display the page designated by $page, with its contents magnified
728 just enough to fit its bounding box entirely within the window both
729 horizontally and vertically. If the required horizontal and
730 vertical magnification factors are different, use the smaller of
731 the two, centering the bounding box within the window in the other
732 dimension.
733
734 'fitbh' => $top
735 Display the page designated by $page, with the vertical coordinate
736 $top positioned at the top edge of the window and the contents of
737 the page magnified just enough to fit the entire width of its
738 bounding box within the window.
739
740 'fitbv' => $left
741 Display the page designated by $page, with the horizontal
742 coordinate $left positioned at the left edge of the window and the
743 contents of the page magnified just enough to fit the entire height
744 of its bounding box within the window.
745
746 'xyz' => [ $left, $top, $zoom ]
747 Display the page designated by $page, with the coordinates
748 "$[$left, $top]" positioned at the top-left corner of the window
749 and the contents of the page magnified by the factor $zoom. A zero
750 (0) value for any of the parameters $left, $top, or $zoom specifies
751 that the current value of that parameter is to be retained
752 unchanged.
753
754 Initial Page Options
755
756 firstpage => [ $page, %options ]
757 Specifying the page (either a page number or a page object) to be
758 displayed, plus one of the location options listed above in "Page
759 Fit Options".
760
761 Example
762
763 $pdf->preferences(
764 fullscreen => 1,
765 onecolumn => 1,
766 afterfullscreenoutlines => 1,
767 firstpage => [$page, fit => 1],
768 );
769
770 info Example
771 %h = $pdf->info(
772 'Author' => "Alfred Reibenschuh",
773 'CreationDate' => "D:20020911000000+01'00'",
774 'ModDate' => "D:YYYYMMDDhhmmssOHH'mm'",
775 'Creator' => "fredos-script.pl",
776 'Producer' => "PDF::Builder",
777 'Title' => "some Publication",
778 'Subject' => "perl ?",
779 'Keywords' => "all good things are pdf"
780 );
781 print "Author: $h{'Author'}\n";
782
783 XMP XML example
784 $xml = $pdf->xmpMetadata();
785 print "PDFs Metadata reads: $xml\n";
786 $xml=<<EOT;
787 <?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
788 <?adobe-xap-filters esc="CRLF"?>
789 <x:xmpmeta
790 xmlns:x='adobe:ns:meta/'
791 x:xmptk='XMP toolkit 2.9.1-14, framework 1.6'>
792 <rdf:RDF
793 xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
794 xmlns:iX='http://ns.adobe.com/iX/1.0/'>
795 <rdf:Description
796 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
797 xmlns:pdf='http://ns.adobe.com/pdf/1.3/'
798 pdf:Producer='Acrobat Distiller 6.0.1 for Macintosh'></rdf:Description>
799 <rdf:Description
800 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
801 xmlns:xap='http://ns.adobe.com/xap/1.0/'
802 xap:CreateDate='2004-11-14T08:41:16Z'
803 xap:ModifyDate='2004-11-14T16:38:50-08:00'
804 xap:CreatorTool='FrameMaker 7.0'
805 xap:MetadataDate='2004-11-14T16:38:50-08:00'></rdf:Description>
806 <rdf:Description
807 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
808 xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/'
809 xapMM:DocumentID='uuid:919b9378-369c-11d9-a2b5-000393c97fd8'/></rdf:Description>
810 <rdf:Description
811 rdf:about='uuid:b8659d3a-369e-11d9-b951-000393c97fd8'
812 xmlns:dc='http://purl.org/dc/elements/1.1/'
813 dc:format='application/pdf'>
814 <dc:description>
815 <rdf:Alt>
816 <rdf:li xml:lang='x-default'>Adobe Portable Document Format (PDF)</rdf:li>
817 </rdf:Alt>
818 </dc:description>
819 <dc:creator>
820 <rdf:Seq>
821 <rdf:li>Adobe Systems Incorporated</rdf:li>
822 </rdf:Seq>
823 </dc:creator>
824 <dc:title>
825 <rdf:Alt>
826 <rdf:li xml:lang='x-default'>PDF Reference, version 1.6</rdf:li>
827 </rdf:Alt>
828 </dc:title>
829 </rdf:Description>
830 </rdf:RDF>
831 </x:xmpmeta>
832 <?xpacket end='w'?>
833 EOT
834
835 $xml = $pdf->xmpMetadata($xml);
836 print "PDF metadata now reads: $xml\n";
837
838 "BOX" METHODS
839 A general note: Use care if specifying a different Media Box (or other
840 "box") for a page, than the global "box" setting, to define the whole
841 "chain" of boxes on the page, to avoid surprises. For example, to
842 define a global Media Box (paper size) and a global Crop Box, and then
843 define a new page-level Media Box without defining a new page-level
844 Crop Box, may give odd results in the resultant cropping. Such
845 combinations are not well defined.
846
847 All dimensions in boxes default to the default User Unit, which is
848 points (1/72 inch). Note that the PDF specification limits sizes and
849 coordinates to 14400 User Units (200 inches, for the default User Unit
850 of one point), and Adobe products (so far) follow this limit for
851 Acrobat and Distiller. It is worth noting that other PDF writers and
852 readers may choose to ignore the 14400 unit limit, with or without the
853 use of a specified User Unit. Therefore, PDF::Builder does not enforce
854 any limits on coordinates -- it's your responsibility to consider what
855 readers and other PDF tools may be used with a PDF you produce! Also
856 note that earlier Acrobat readers had coordinate limits as small as
857 3240 User Units (45 inches), and minimum media size of 72 or 3 User
858 Units.
859
860 User Units
861
862 $pdf->userunit($number)
863 The default User Unit in the PDF coordinate system is one point
864 (1/72 inch). You can think of it as a scale factor to enable larger
865 (or even, smaller) documents. This method may be used (for PDF 1.6
866 and higher) to set the User Unit to some number of points. For
867 example, "userunit(72)" will set the scale multiplier to 72.0
868 points per User Unit, or 1 inch to the User Unit. Any number
869 greater than zero is acceptable, although some readers and tools
870 may not handle User Units of less than 1.0 very well.
871
872 Not all readers respect the User Unit, if you give one, or handle
873 it in exactly the same way. Adobe Distiller, for one, does not use
874 it. How User Units are handled may vary from reader to reader.
875 Adobe Acrobat, at this writing, respects User Unit in version 7.0
876 and up, but limits it to 75000 (giving a maximum document size of
877 15 million inches or 236.7 miles or 381 km). Other readers and PDF
878 tools may allow a larger (or smaller) limit.
879
880 Your Mileage May Vary: Some readers ignore a global User Unit
881 setting and do not have pages inherit it (PDF::Builder duplicates
882 it on each page to simulate inheritance). Some readers may give
883 spurious warnings about truncated content when a Media Box is
884 changed while User Units are being used. Some readers do strange
885 things with Crop Boxes when a User Unit is in effect.
886
887 Depending on the reader used, the effect of a larger User Unit
888 (greater than 1) may mean lower resolution (chunkier or coarser
889 appearance) in the rendered document. If you're printing something
890 the size of a highway billboard, this may not matter to you, but
891 you should be aware of the possibility (even with fractional
892 coordinates). Conversely, a User Unit of less than 1.0 (if
893 permitted) reduces the allowable size of your document, but may
894 result in greater resolution.
895
896 A global (PDF level) User Unit setting is inherited by each page
897 (an action by PDF::Builder, not necessarily automatically done by
898 the reader), or can be overridden by calling userunit in the page.
899 Do not give more than one global userunit setting, as only the last
900 one will be used. Setting a page's User Unit (if "$page->"
901 instead) is permitted (overriding the global setting for this
902 page). However, many sources recommend against doing this, as
903 results may not be as expected (once again, depending on the quirks
904 of the reader).
905
906 Remember to call "userunit" before calling anything having to do
907 with page or box sizes, or coordinates. Especially when setting
908 'named' box sizes, the methods need to know the current User Unit
909 so that named page sizes (in points) may be scaled down to the
910 current User Unit.
911
912 Media Box
913
914 $pdf->mediabox($name)
915 $pdf->mediabox($name, orient => 'orientation' )
916 $pdf->mediabox($w,$h)
917 $pdf->mediabox($llx,$lly, $urx,$ury)
918 ($llx,$lly, $urx,$ury) = $pdf->mediabox()
919 Sets the global Media Box (or page's Media Box, if "$page->"
920 instead). This defines the width and height (or by corner
921 coordinates, or by standard name) of the output page itself, such
922 as the physical paper size. This is normally the largest of the
923 "boxes". If any subsidiary box (within it) exceeds the media box,
924 the portion of the material or boxes outside of the Media Box will
925 be ignored. That is, the Media Box is the One Box to Rule Them All,
926 and is the overall limit for other boxes (some documentation refers
927 to the Media Box as "clipping" other boxes). In addition, the Media
928 Box defines the overall coordinate system for text and graphics
929 operations.
930
931 If no arguments are given, the current Media Box (global or page)
932 coordinates are returned instead. The former "get_mediabox" (page
933 only) function is deprecated and will likely be removed some time
934 in the future. In addition, when setting the Media Box, the
935 resulting coordinates are returned. This permits you to specify the
936 page size by a name (alias) and get the dimensions back, all in one
937 call.
938
939 Note that many printers can not print all the way to the physical
940 edge of the paper, so you should plan to leave some blank margin,
941 even outside of any crop marks and bleeds. Printers and on-screen
942 readers are free to discard any content found outside the Media
943 Box, and printers may discard some material just inside the Media
944 Box.
945
946 A global Media Box is required by the PDF spec; if not explicitly
947 given, PDF::Builder will set the global Media Box to US Letter size
948 (8.5in x 11in). This is the media size that will be used for all
949 pages if you do not specify a "mediabox" call on a page. That is, a
950 global (PDF level) mediabox setting is inherited by each page, or
951 can be overridden by setting mediabox in the page. Do not give more
952 than one global mediabox setting, as only the last one will be
953 used.
954
955 If you give a single string name (e.g., 'A4'), you may optionally
956 add an orientation to turn the page 90 degrees into Landscape mode:
957 "orient => 'L'" or "orient => 'l'". "orient" is the only option
958 recognized, and a string beginning with an 'L' or 'l' (for
959 Landscape) is the only value of interest (anything else is treated
960 as Portrait mode). The y axis still runs from 0 at the bottom of
961 the page to what used to be the page width (now, height) at the
962 top, and likewise for the x axis: 0 at left to (former) height at
963 the right. That is, the coordinate system is the same as before,
964 except that the height and width are different.
965
966 The lower left corner does not have to be 0,0. It can be any values
967 you want, including negative values (so long as the resulting
968 media's sides are at least one point long). "mediabox" sets the
969 coordinate system (including the origin) of the graphics and text
970 that will be drawn, as well as for subsequent "boxes". It's even
971 possible to give any two opposite corners (such as upper left and
972 lower right). The coordinate system will be rearranged (by the
973 Reader) to still be the conventional minimum "x" and "y" in the
974 lower left (i.e., you can't make "y" increase from top to bottom!).
975
976 Example:
977
978 $pdf = PDF::Builder->new();
979 $pdf->mediabox('A4'); # A4 size (595 Pt wide by 842 Pt high)
980 ...
981 $pdf->saveas('our/new.pdf');
982
983 $pdf = PDF::Builder->new();
984 $pdf->mediabox(595, 842); # A4 size, with implicit 0,0 LL corner
985 ...
986 $pdf->saveas('our/new.pdf');
987
988 $pdf = PDF::Builder->new;
989 $pdf->mediabox(0, 0, 595, 842); # A4 size, with explicit 0,0 LL corner
990 ...
991 $pdf->saveas('our/new.pdf');
992
993 See the PDF::Builder::Resource::PaperSizes source code for the full
994 list of supported names (aliases) and their dimensions in points.
995 You are free to add additional paper sizes to this file, if you
996 wish. You might want to do this if you frequently use a standard
997 page size in rotated (Landscape) mode. See also the "getPaperSizes"
998 call in PDF::Builder::Util. These names (aliases) are also usable
999 in other "box" calls, although useful only if the "box" is the same
1000 size as the full media (Media Box), and you don't mind their
1001 starting at 0,0.
1002
1003 Crop Box
1004
1005 $pdf->cropbox($name)
1006 $pdf->cropbox($name, orient => 'orientation')
1007 $pdf->cropbox($w,$h)
1008 $pdf->cropbox($llx,$lly, $urx,$ury)
1009 ($llx,$lly, $urx,$ury) = $pdf->cropbox()
1010 Sets the global Crop Box (or page's Crop Box, if "$page->"
1011 instead). This will define the media size to which the output will
1012 later be clipped. Note that this does not itself output any crop
1013 marks to guide cutting of the paper! PDF Readers should consider
1014 this to be the visible portion of the page, and anything found
1015 outside it may be clipped (invisible). By default, it is equal to
1016 the Media Box, but may be defined to be smaller, in the coordinate
1017 system set by the Media Box. A global setting will be inherited by
1018 each page, but can be overridden on a per-page basis.
1019
1020 A Reader or Printer may choose to discard any clipped (invisible)
1021 part of the page, and show only the area within the Crop Box. For
1022 example, if your page Media Box is A4 (0,0 to 595,842 Points), and
1023 your Crop Box is (100,100 to 495,742), a reader such as Adobe
1024 Acrobat Reader may show you a page 395 by 642 Points in size (i.e.,
1025 just the visible area of your page). Other Readers may show you the
1026 full media size (Media Box) and a 100 Point wide blank area (in
1027 this example) around the visible content.
1028
1029 If no arguments are given, the current Crop Box (global or page)
1030 coordinates are returned instead. The former "get_cropbox" (page
1031 only) function is deprecated and will likely be removed some time
1032 in the future. If a Crop Box has not been defined, the Media Box
1033 coordinates (which always exist) will be returned instead. In
1034 addition, when setting the Crop Box, the resulting coordinates are
1035 returned. This permits you to specify the crop box by a name
1036 (alias) and get the dimensions back, all in one call.
1037
1038 Do not confuse the Crop Box with the "Trim Box", which shows where
1039 printed paper is expected to actually be cut. Some PDF Readers may
1040 reduce the visible "paper" background to the size of the crop box;
1041 others may simply omit any content outside it. Either way, you
1042 would lose any trim or crop marks, printer instructions, color
1043 alignment dots, or other content outside the Crop Box. A good use
1044 of the Crop Box would be limit printing to the area where a printer
1045 can reliably put down ink, and leave white the edge areas where
1046 paper-handling mechanisms prevent ink or toner from being applied.
1047 This would keep you from accidentally putting valuable content in
1048 an area where a printer will refuse to print, yet permit you to
1049 include a bleed area and space for printer's marks and
1050 instructions. Needless to say, if your printer cannot print to the
1051 very edge of the paper, you will need to trim (cut) the printed
1052 sheets to get true bleeds.
1053
1054 A global (PDF level) cropbox setting is inherited by each page, or
1055 can be overridden by setting cropbox in the page. As with
1056 "mediabox", only one crop box may be set at this (PDF) level. As
1057 with "mediabox", a named media size may have an orientation (l or
1058 L) for Landscape mode. Note that the PDF level global Crop Box
1059 will be used even if the page gets its own Media Box. That is, the
1060 page's Crop Box inherits the global Crop Box, not the page Media
1061 Box, even if the page has its own media size! If you set the page's
1062 own Media Box, you should consider also explicitly setting the page
1063 Crop Box (and other boxes).
1064
1065 Bleed Box
1066
1067 $pdf->bleedbox($name)
1068 $pdf->bleedbox($name, orient => 'orientation')
1069 $pdf->bleedbox($w,$h)
1070 $pdf->bleedbox($llx,$lly, $urx,$ury)
1071 ($llx,$lly, $urx,$ury) = $pdf->bleedbox()
1072 Sets the global Bleed Box (or page's Bleed Box, if "$page->"
1073 instead). This is typically used in printing on paper, where you
1074 want ink or color (such as thumb tabs) to be printed a bit beyond
1075 the final paper size, to ensure that the cut paper bleeds (the cut
1076 goes through the ink), rather than accidentally leaving some white
1077 paper visible outside. Allow enough "bleed" over the expected trim
1078 line to account for minor variations in paper handling, folding,
1079 and cutting; to avoid showing white paper at the edge. The Bleed
1080 Box is where printing could actually extend to; the Trim Box is
1081 normally within it, where the paper would actually be cut. The
1082 default value is equal to the Crop Box, but is often a bit smaller.
1083 The space between the Bleed Box and the Crop Box is available for
1084 printer instructions, color alignment dots, etc., while crop marks
1085 (trim guides) are at least partly within the bleed area (and should
1086 be printed after content is printed).
1087
1088 If no arguments are given, the current Bleed Box (global or page)
1089 coordinates are returned instead. The former "get_bleedbox" (page
1090 only) function is deprecated and will likely be removed some time
1091 in the future. If a Bleed Box has not been defined, the Crop Box
1092 coordinates (if defined) will be returned, otherwise the Media Box
1093 coordinates (which always exist) will be returned. In addition,
1094 when setting the Bleed Box, the resulting coordinates are returned.
1095 This permits you to specify the bleed box by a name (alias) and get
1096 the dimensions back, all in one call.
1097
1098 A global (PDF level) bleedbox setting is inherited by each page, or
1099 can be overridden by setting bleedbox in the page. As with
1100 "mediabox", only one bleed box may be set at this (PDF) level. As
1101 with "mediabox", a named media size may have an orientation (l or
1102 L) for Landscape mode. Note that the PDF level global Bleed Box
1103 will be used even if the page gets its own Crop Box. That is, the
1104 page's Bleed Box inherits the global Bleed Box, not the page Crop
1105 Box, even if the page has its own media size! If you set the page's
1106 own Media Box or Crop Box, you should consider also explicitly
1107 setting the page Bleed Box (and other boxes).
1108
1109 Trim Box
1110
1111 $pdf->trimbox($name)
1112 $pdf->trimbox($name, orient => 'orientation')
1113 $pdf->trimbox($w,$h)
1114 $pdf->trimbox($llx,$lly, $urx,$ury)
1115 ($llx,$lly, $urx,$ury) = $pdf->trimbox()
1116 Sets the global Trim Box (or page's Trim Box, if "$page->"
1117 instead). This is supposed to be the actual dimensions of the
1118 finished page (after trimming of the paper). In some production
1119 environments, it is useful to have printer's instructions, cut
1120 marks, and so on outside of the trim box. The default value is
1121 equal to Crop Box, but is often a bit smaller than any Bleed Box,
1122 to allow the desired "bleed" effect.
1123
1124 If no arguments are given, the current Trim Box (global or page)
1125 coordinates are returned instead. The former "get_trimbox" (page
1126 only) function is deprecated and will likely be removed some time
1127 in the future. If a Trim Box has not been defined, the Crop Box
1128 coordinates (if defined) will be returned, otherwise the Media Box
1129 coordinates (which always exist) will be returned. In addition,
1130 when setting the Trim Box, the resulting coordinates are returned.
1131 This permits you to specify the trim box by a name (alias) and get
1132 the dimensions back, all in one call.
1133
1134 A global (PDF level) trimbox setting is inherited by each page, or
1135 can be overridden by setting trimbox in the page. As with
1136 "mediabox", only one trim box may be set at this (PDF) level. As
1137 with "mediabox", a named media size may have an orientation (l or
1138 L) for Landscape mode. Note that the PDF level global Trim Box
1139 will be used even if the page gets its own Crop Box. That is, the
1140 page's Trim Box inherits the global Trim Box, not the page Crop
1141 Box, even if the page has its own media size! If you set the page's
1142 own Media Box or Crop Box, you should consider also explicitly
1143 setting the page Trim Box (and other boxes).
1144
1145 Art Box
1146
1147 $pdf->artbox($name)
1148 $pdf->artbox($name, orient => 'orientation')
1149 $pdf->artbox($w,$h)
1150 $pdf->artbox($llx,$lly, $urx,$ury)
1151 ($llx,$lly, $urx,$ury) = $pdf->artbox()
1152 Sets the global Art Box (or page's Art Box, if "$page->" instead).
1153 This is supposed to define "the extent of the page's meaningful
1154 content (including [margins])". It might exclude some content, such
1155 as Headlines or headings. Any binding or punched-holes margin would
1156 typically be outside of the Art Box, as would be page numbers and
1157 running headers and footers. The default value is equal to the Crop
1158 Box, although normally it would be no larger than any Trim Box. The
1159 Art Box may often be used for defining "important" content (e.g.,
1160 excluding advertisements) that may or may not be brought over to
1161 another page (e.g., N-up printing).
1162
1163 If no arguments are given, the current Art Box (global or page)
1164 coordinates are returned instead. The former "get_artbox" (page
1165 only) function is deprecated and will likely be removed some time
1166 in the future. If an Art Box has not been defined, the Crop Box
1167 coordinates (if defined) will be returned, otherwise the Media Box
1168 coordinates (which always exist) will be returned. In addition,
1169 when setting the Art Box, the resulting coordinates are returned.
1170 This permits you to specify the art box by a name (alias) and get
1171 the dimensions back, all in one call.
1172
1173 A global (PDF level) artbox setting is inherited by each page, or
1174 can be overridden by setting artbox in the page. As with
1175 "mediabox", only one art box may be set at this (PDF) level. As
1176 with "mediabox", a named media size may have an orientation (l or
1177 L) for Landscape mode. Note that the PDF level global Art Box will
1178 be used even if the page gets its own Crop Box. That is, the page's
1179 Art Box inherits the global Art Box, not the page Crop Box, even if
1180 the page has its own media size! If you set the page's own Media
1181 Box or Crop Box, you should consider also explicitly setting the
1182 page Art Box (and other boxes).
1183
1184 Suggested Box Usage
1185
1186 See "examples/Boxes.pl" for an example of using boxes.
1187
1188 How you define your boxes (or let them default) is up to you, depending
1189 on whether you're duplex printing US Letter or A4 on your laser
1190 printer, to be spiral bound on the bind margin, or engaging a
1191 professional printer. In the latter case, discuss in advance with the
1192 print firm what capabilities (and limitations) they have and what
1193 information they need from a PDF file. For instance, they may not want
1194 a Crop Box defined, and may call for very specific box sizes. For large
1195 press runs, they may print multiple pages (N-up) duplexed on large web
1196 roll "signatures", which are then intricately folded and guillotined
1197 (trimmed) and bound together into books or magazines. You would usually
1198 just supply a PDF with all the pages; they would take care of the
1199 signature layout (which includes offsets and 180 degree rotations).
1200
1201 (As an aside, don't count on a printer having any particular font
1202 available, so be sure to ask. Usually they will want you to embed all
1203 fonts used, but ask first, and double-check before handing over the
1204 print job! TTF/OTF fonts ("ttfont()") are embedded by default, but
1205 other fonts (core, ps, bdf, cjk) are not! A printer may have a core
1206 font collection, but they are free to substitute a "workalike" font for
1207 any given core font, and the results may not match what you saw on your
1208 PC!)
1209
1210 On the assumption that you're using a single sheet (US Letter or A4)
1211 laser or inkjet printer, are you planning to trim each sheet down to a
1212 smaller final size? If so, you can do true bleeds by defining a Trim
1213 Box and a slightly larger Bleed Box. You would print bleeds (all the
1214 way to the finished edge) out to the Bleed Box, but nothing is enforced
1215 about the Bleed Box. At the other end of the spectrum, you would define
1216 the Media Box to be the physical paper size being printed on. Most
1217 printers reserve a little space on the sides (and possibly top and
1218 bottom) for paper handling, so it is often good to define your Crop Box
1219 as the printable area. Remember that the Media Box sets the coordinate
1220 system used, so you still need to avoid going outside the Crop Box with
1221 content (most readers and printers will not show any ink outside of the
1222 Crop Box). Whether or not you define a Crop Box, you're going to almost
1223 always end up with white paper on at least the sides.
1224
1225 For small in-house jobs, you probably won't need color alignment dots
1226 and other such professional instructions and information between the
1227 Bleed Box and the Crop Box, but crop marks for trimming (if used)
1228 should go just outside the Trim Box (partly or wholly within the Bleed
1229 Box), and be drawn after all content. If you're not trimming the paper,
1230 don't try to do any bleed effects (including solid background color
1231 pages/covers), as you will usually have a white edge around the sheet
1232 anyway. Don't count on a PDF document never being physically printed,
1233 and not just displayed (where you can do things like bleed all the way
1234 to the media edge). Finally, for single sheet printing, an Art Box is
1235 probably unnecessary, but if you're combining pages into N-up prints,
1236 or doing other manipulations, it may be useful.
1237
1238 Box Inheritance
1239
1240 What Media, Crop, Bleed, Trim, and Art Boxes a page gets can be a
1241 little complicated. Note that usually, only the Media and Crop Boxes
1242 will have a clear visual effect. The visual effect of the other boxes
1243 (if any) may be very subtle.
1244
1245 First, everything is set at the global (PDF) level. The Media Box is
1246 always defined, and defaults to US Letter (8.5 inches wide by 11 inches
1247 high). The global Crop Box inherits the Media Box, unless explicitly
1248 defined. The Bleed, Trim, and Art Boxes inherit the Crop Box, unless
1249 explicitly defined. A global box should only be defined once, as the
1250 last one defined is the one that will be written to the PDF!
1251
1252 Second, a page inherits the global boxes, for its initial settings. You
1253 may call any of the box set methods ("cropbox", "trimbox", etc.) to
1254 explicitly set (override) any box for this page. Note that setting a
1255 new Media Box for the page does not reset the page's Crop Box -- it
1256 still uses whatever it inherited from the global Crop Box. You would
1257 need to explicitly set the page's Crop Box if you want a different
1258 setting. Likewise, the page's Bleed, Trim, and Art Boxes will not be
1259 reset by a new page Crop Box -- they will still inherit from the global
1260 (PDF) settings.
1261
1262 Third, the page Media Box (the one actually used for output pages),
1263 clips or limits all the other boxes to extend no larger than its size.
1264 For example, if the Media Box is US Letter, and you set a Crop Box of
1265 A4 size, the smaller of the two heights (11 inches) would be effective,
1266 and the smaller of the two widths (8.26 inches, 595 Points) would be
1267 effective. The given dimensions of a box are returned on query (get),
1268 not the effective dimensions clipped by the Media Box.
1269
1270 FONT METHODS
1271 Core Fonts
1272
1273 Core fonts are limited to single byte encodings. You cannot use UTF-8
1274 or other multibyte encodings with core fonts. The default encoding for
1275 the core fonts is WinAnsiEncoding (roughly the CP-1252 superset of
1276 ISO-8859-1). See the "encode" option below to change this encoding.
1277 See "font automap" in PDF::Builder::Resource::Font method for
1278 information on accessing more than 256 glyphs in a font, using planes,
1279 although there is no guarantee that future changes to font files will
1280 permit consistent results.
1281
1282 Note that core fonts use fixed lists of expected glyphs, along with
1283 metrics such as their widths. This may not exactly match up with
1284 whatever local font file is used by the PDF reader. It's usually pretty
1285 close, but many cases have been found where the list of glyphs is
1286 different between the core fonts and various local font files, so be
1287 aware of this.
1288
1289 To allow UTF-8 text and extended glyph counts, you should consider
1290 replacing your use of core fonts with TrueType (.ttf) and OpenType
1291 (.otf) fonts. There are tools, such as FontForge, which can do a fairly
1292 good (though, not perfect) job of converting a Type1 font library to
1293 OTF.
1294
1295 Examples:
1296
1297 $font1 = $pdf->corefont('Times-Roman', encode => 'latin2');
1298 $font2 = $pdf->corefont('Times-Bold');
1299 $font3 = $pdf->corefont('Helvetica');
1300 $font4 = $pdf->corefont('ZapfDingbats');
1301
1302 Valid %options are:
1303
1304 encode
1305 Changes the encoding of the font from its default. Notice that the
1306 encoding (not the entire font's glyph list) is shown in a PDF
1307 object (record), listing 256 glyphs associated with this encoding
1308 (and that are available in this font).
1309
1310 dokern
1311 Enables kerning if data is available.
1312
1313 Notes:
1314
1315 Even though these are called "core" fonts, they are not shipped with
1316 PDF::Builder, but are expected to be found on the machine with the PDF
1317 reader. Most core fonts are installed with a PDF reader, and thus are
1318 not coordinated with PDF::Builder. PDF::Builder does ship with core
1319 font metrics files (width, glyph names, etc.), but these cannot be
1320 guaranteed to be in sync with what the PDF reader has installed!
1321
1322 There are some 14 core fonts (regular, italic, bold, and bold-italic
1323 for Times [serif], Helvetica [sans serif], Courier [fixed pitch]; plus
1324 two symbol fonts) that are supposed to be available on any PDF reader,
1325 although other fonts with very similar metrics are often substituted.
1326 You should not count on any of the 15 Windows core fonts (Bank Gothic,
1327 Georgia, Trebuchet, Verdana, and two more symbol fonts) being present,
1328 especially on Linux, Mac, or other non-Windows platforms. Be aware if
1329 you are producing PDFs to be read on a variety of different systems!
1330
1331 If you want to ensure the widest portability for a PDF document you
1332 produce, you should consider using TTF fonts (instead of core fonts)
1333 and embedding them in the document. This ensures that there will be no
1334 substitutions, that all metrics are known and match the glyphs, UTF-8
1335 encoding can be used, and that the glyphs will be available on the
1336 reader's machine. At least on Windows platforms, most of the fonts are
1337 TTF anyway, which are used behind the scenes for "core" fonts, while
1338 missing most of the capabilities of TTF (now or possibly later in
1339 PDF::Builder) such as embedding, ligatures, UTF-8, etc. The downside
1340 is, obviously, that the resulting PDF file will be larger because it
1341 includes the font(s). There might also be copyright or licensing issues
1342 with the redistribution of font files in this manner (you might want to
1343 check, before widely distributing a PDF document with embedded fonts,
1344 although many do permit the part of the font used, to be embedded.).
1345
1346 See also PDF::Builder::Resource::Font::CoreFont.
1347
1348 PS Fonts
1349
1350 PS (T1) fonts are limited to single byte encodings. You cannot use
1351 UTF-8 or other multibyte encodings with T1 fonts. The default encoding
1352 for the T1 fonts is WinAnsiEncoding (roughly the CP-1252 superset of
1353 ISO-8859-1). See the "encode" option below to change this encoding.
1354 See "font automap" in PDF::Builder::Resource::Font method for
1355 information on accessing more than 256 glyphs in a font, using planes,
1356 although there is no guarantee that future changes to font files will
1357 permit consistent results. Note: many Type1 fonts are limited to 256
1358 glyphs, but some are available with more than 256 glyphs. Still, a
1359 maximum of 256 at a time are usable.
1360
1361 "psfont" accepts both ASCII (.pfa) and binary (.pfb) Type1 glyph files.
1362 Font metrics can be supplied in either ASCII (.afm) or binary (.pfm)
1363 format, as can be seen in the examples given below. It is possible to
1364 use .pfa with .pfm and .pfb with .afm if that's what's available. The
1365 ASCII and binary files have the same content, just in different
1366 formats.
1367
1368 To allow UTF-8 text and extended glyph counts in one font, you should
1369 consider replacing your use of Type1 fonts with TrueType (.ttf) and
1370 OpenType (.otf) fonts. There are tools, such as FontForge, which can do
1371 a fairly good (though, not perfect) job of converting your font library
1372 to OTF.
1373
1374 Examples:
1375
1376 $font1 = $pdf->psfont('Times-Book.pfa', afmfile => 'Times-Book.afm');
1377 $font2 = $pdf->psfont('/fonts/Synest-FB.pfb', pfmfile => '/fonts/Synest-FB.pfm');
1378
1379 Valid %options are:
1380
1381 encode
1382 Changes the encoding of the font from its default. Notice that the
1383 encoding (not the entire font's glyph list) is shown in a PDF
1384 object (record), listing 256 glyphs associated with this encoding
1385 (and that are available in this font).
1386
1387 afmfile
1388 Specifies the location of the ASCII font metrics file (.afm). It
1389 may be used with either an ASCII (.pfa) or binary (.pfb) glyph
1390 file.
1391
1392 pfmfile
1393 Specifies the location of the binary font metrics file (.pfm). It
1394 may be used with either an ASCII (.pfa) or binary (.pfb) glyph
1395 file.
1396
1397 dokern
1398 Enables kerning if data is available.
1399
1400 Note: these T1 (Type1) fonts are not shipped with PDF::Builder, but are
1401 expected to be found on the machine with the PDF reader. Most PDF
1402 readers do not install T1 fonts, and it is up to the user of the PDF
1403 reader to install the needed fonts. Unlike TrueType fonts, PS (T1)
1404 fonts are not embedded in the PDF, and must be supplied on the Reader
1405 end.
1406
1407 See also PDF::Builder::Resource::Font::Postscript.
1408
1409 TrueType Fonts
1410
1411 Warning: BaseEncoding is not set by default for TrueType fonts, so text
1412 in the PDF isn't searchable (by the PDF reader) unless a ToUnicode CMap
1413 is included. A ToUnicode CMap is included by default (unicodemap set to
1414 1) by PDF::Builder, but allows it to be disabled (for performance and
1415 file size reasons) by setting unicodemap to 0. This will produce non-
1416 searchable text, which, besides being annoying to users, may prevent
1417 screen readers and other aids to disabled users from working correctly!
1418
1419 Examples:
1420
1421 $font1 = $pdf->ttfont('Times.ttf');
1422 $font2 = $pdf->ttfont('Georgia.otf');
1423
1424 Valid %options are:
1425
1426 encode
1427 Changes the encoding of the font from its default
1428 (WinAnsiEncoding).
1429
1430 Note that for a single byte encoding (e.g., 'latin1'), you are
1431 limited to 256 characters defined for that encoding. 'automap' does
1432 not work with TrueType. If you want more characters than that, use
1433 'utf8' encoding with a UTF-8 encoded text string.
1434
1435 isocmap
1436 Use the ISO Unicode Map instead of the default MS Unicode Map.
1437
1438 unicodemap
1439 If 1 (default), output ToUnicode CMap to permit text searches and
1440 screen readers. Set to 0 to save space by not including the
1441 ToUnicode CMap, but text searching and screen reading will not be
1442 possible.
1443
1444 dokern
1445 Enables kerning if data is available.
1446
1447 noembed
1448 Disables embedding of the font file. Note that this is potentially
1449 hazardous, as the glyphs provided on the PDF reader machine may not
1450 match what was used on the PDF writer machine (the one running
1451 PDF::Builder)! If you know for sure that all PDF readers will be
1452 using the same TTF or OTF file you're using with PDF::Builder; not
1453 embedding the font may be acceptable, in return for a smaller PDF
1454 file size. Note that the Reader needs to know where to find the
1455 font file -- it can't be in any random place, but typically needs
1456 to be listed in a path that the Reader follows. Otherwise, it will
1457 be unable to render the text!
1458
1459 The only value for the "noembed" flag currently checked for is 1,
1460 which means to not embed the font file in the PDF. Any other value
1461 currently results in the font file being embedded (by default),
1462 although in the future, other values might be given significance
1463 (such as checking permission bits).
1464
1465 Some additional comments on embedding font file(s) into the PDF:
1466 besides substantially increasing the size of the PDF (even if the
1467 font is subsetted, by default), PDF::Builder does not check the
1468 font file for any flags indicating font licensing issues and
1469 limitations on use. A font foundry may not permit embedding at all,
1470 may permit a subset of the font to be embedded, may permit a full
1471 font to be embedded, and may specify what can be done with an
1472 embedded font (e.g., may or may not be extracted for further use
1473 beyond displaying this one PDF). When you choose to use (and embed)
1474 a font, you should be aware of any such licensing issues.
1475
1476 nosubset
1477 Disables subsetting of a TTF/OTF font, when embedded. By default,
1478 only the glyphs used by a document are included in the file, and
1479 not the entire font. This can result in a tremendous savings in
1480 PDF file size. If you intend to allow the PDF to be edited by
1481 users, not having the entire font glyph set available may cause
1482 problems, so be aware of that (and consider using "nosubset => 1".
1483 Setting this flag to any value results in the entire font glyph set
1484 being embedded in the file. It might be a good idea to use only the
1485 value 1, in case other values are assigned roles in the future.
1486
1487 debug
1488 If set to 1 (default is 0), diagnostic information is output about
1489 the CMap processing.
1490
1491 usecmf
1492 If set to 1 (default is 0), the first priority is to make use of
1493 one of the four ".cmap" files for CJK fonts. This is the old way of
1494 processing TTF files. If, after all is said and done, a working
1495 internal CMap hasn't been found (for usecmf=>0), "ttfont()" will
1496 fall back to using a ".cmap" file if possible.
1497
1498 cmaps
1499 This flag may be set to a string listing the Platform/Encoding
1500 pairs to look for of any internal CMaps in the font file, in the
1501 desired order (highest priority first). If one list (comma and/or
1502 space-separated pairs) is given, it is used for both Windows and
1503 non-Windows platforms (on which PDF::Builder is running, not the
1504 PDF reader's). Two lists, separated by a semicolon ; may be given,
1505 with the first being used for a Windows platform and the second for
1506 non-Windows. The default list is "0/6 3/10 0/4 3/1 0/3; 0/6 0/4
1507 3/10 0/3 3/1". Finally, instead of a P/E list, a string "find_ms"
1508 may be given to tell it to simply call the Font::TTF "find_ms()"
1509 method to find a (preferably Windows) internal CMap. "cmaps" set to
1510 'find_ms' would emulate the old way of looking for CMaps. Symbol
1511 fonts (3/0) always use find_ms(), and the new default lookup is (if
1512 ".cmap" isn't used, see "usecmf") to try to get a match with the
1513 default list for the appropriate OS. If none can be found,
1514 find_ms() is tried, and as last resort use the ".cmap" (if
1515 available), even if "usecmf" is not 1.
1516
1517 CJK Fonts
1518
1519 Examples:
1520
1521 $font = $pdf->cjkfont('korean');
1522 $font = $pdf->cjkfont('traditional');
1523
1524 Valid %options are:
1525
1526 encode
1527 Changes the encoding of the font from its default.
1528
1529 Warning: Unlike "ttfont", the font file is not embedded in the output
1530 PDF file. This is evidently behavior left over from the early days of
1531 CJK fonts, where the "Cmap" and "Data" were always external files,
1532 rather than internal tables. If you need a CJK-using PDF file to embed
1533 the font, for portability, you can create a PDF using "cjkfont", and
1534 then use an external utility (e.g., "pdfcairo") to embed the font in
1535 the PDF. It may also be possible to use "ttfont" instead, to produce
1536 the PDF, provided you can deduce the correct font file name from
1537 examining the PDF file (e.g., on my Windows system, the "Ming" font
1538 would be "$font = $pdf->ttfont("C:/Program Files/Adobe/Acrobat
1539 DC/Resource/CIDFont/AdobeMingStd-Light.otf")". Of course, the font
1540 file used would have to be ".ttf" or ".otf". It may act a little
1541 differently than "cjkfont" (due a a different Cmap), but you should be
1542 able to embed the font file into the PDF.
1543
1544 See also PDF::Builder::Resource::CIDFont::CJKFont
1545
1546 Synthetic Fonts
1547
1548 Warning: BaseEncoding is not set by default for these fonts, so text in
1549 the PDF isn't searchable (by the PDF reader) unless a ToUnicode CMap is
1550 included. A ToUnicode CMap is included by default (unicodemap set to 1)
1551 by PDF::Builder, but allows it to be disabled (for performance and file
1552 size reasons) by setting unicodemap to 0. This will produce non-
1553 searchable text, which, besides being annoying to users, may prevent
1554 screen readers and other aids to disabled users from working correctly!
1555
1556 Examples:
1557
1558 $cf = $pdf->corefont('Times-Roman', encode => 'latin1');
1559 $sf = $pdf->synfont($cf, condense => 0.85); # compressed 85%
1560 $sfb = $pdf->synfont($cf, bold => 1); # embolden by 10em
1561 $sfi = $pdf->synfont($cf, oblique => -12); # italic at -12 degrees
1562
1563 Valid %options are:
1564
1565 condense
1566 Character width condense/expand factor (0.1-0.9 = condense, 1 =
1567 normal/default, 1.1+ = expand). It is the multiplier to apply to
1568 the width of each character.
1569
1570 oblique
1571 Italic angle (+/- degrees, default 0), sets skew of character box.
1572
1573 bold
1574 Emboldening factor (0.1+, bold = 1, heavy = 2, ...), additional
1575 thickness to draw outline of character (with a heavier line width)
1576 before filling.
1577
1578 space
1579 Additional character spacing in milliems (0-1000)
1580
1581 caps
1582 0 for normal text, 1 for small caps. Implemented by asking the
1583 font what the uppercased translation (single character) is for a
1584 given character, and outputting it at 80% height and 88% width
1585 (heavier vertical stems are better looking than a straight 80%
1586 scale).
1587
1588 Note that only lower case letters which appear in the "standard"
1589 font (plane 0 for core fonts and PS fonts) will be small-capped.
1590 This may include eszett (German sharp s), which becomes SS, and
1591 dotless i and j which become I and J respectively. There are many
1592 other accented Latin alphabet letters which may show up in planes 1
1593 and higher. Ligatures (e.g., ij and ffl) do not have uppercase
1594 equivalents, nor does a long s. If you have text which includes
1595 such characters, you may want to consider preprocessing it to
1596 replace them with Latin character expansions (e.g., i+j and f+f+l)
1597 before small-capping.
1598
1599 Note that CJK fonts (created with the "cjkfont" method) do not work
1600 properly with "synfont". This is due to a different internal structure
1601 of the CJK fonts, as compared to corefont, ttfont, and psfont base
1602 fonts. If you require a synthesized (modified) CJK font, you might try
1603 finding the TTF or OTF original, use "ttfont" to create the base font,
1604 and running "synfont" against that, in the manner described for
1605 embedding "CJK Fonts".
1606
1607 See also PDF::Builder::Resource::Font::SynFont
1608
1609 IMAGE METHODS
1610 This is additional information on enhanced libraries available for TIFF
1611 and PNG images. See specific information listings for GD, GIF, JPEG,
1612 and PNM image formats. In addition, see "examples/Content.pl" for an
1613 example of placing an image on a page, as well as using in a "Form".
1614
1615 Why is my image flipped or rotated?
1616
1617 Something not uncommonly seen when using JPEG photos in a PDF is that
1618 the images will be rotated and/or mirrored (flipped). This may happen
1619 when using TIFF images too. What happens is that the camera stores an
1620 image just as it comes off the CCD sensor, regardless of the camera
1621 orientation, and does not rotate it to the correct orientation! It does
1622 store a separate "orientation" flag to suggest how the image might be
1623 corrected, but not all image processing obeys this flag (PDF::Builder
1624 does not.). For example, if you take a "portrait" (tall) photo of a
1625 tree (with the phone held vertically), and then use it in a PDF, the
1626 tree may appear to have been cut down! (appears in landscape mode)
1627
1628 I have found some code that should allow the "image_jpeg" or "image"
1629 routine to auto-rotate to (supposedly) the correct orientation, by
1630 looking for the Exif metadata "Orientation" tag in the file. However,
1631 three problems arise: 1) if a photo has been edited, and rotated or
1632 flipped in the process, there is no guarantee that the Orientation tag
1633 has been corrected. 2) more than one Orientation tag may exist (e.g.,
1634 in the binary APP1/Exif header, and in XML data), and they may not
1635 agree with each other -- which should be used? 3) the code would need
1636 to uncompress the raster data, swap and/or transpose rows and/or
1637 columns, and recompress the raster data for inclusion into the PDF.
1638 This is costly and error-prone. In any case, the user would need to be
1639 able to override any auto-rotate function.
1640
1641 For the time being, PDF::Builder will simply leave it up to the user of
1642 the library to take care of rotating and/or flipping an image which
1643 displays incorrectly. It is possible that we will consider adding some
1644 sort of query or warning that the image appears to not be "normally"
1645 oriented (Orientation value 1 or "Top-left"), according to the
1646 Orientation flag. You can consider either (re-)saving the photo in an
1647 editor such as PhotoShop or GIMP, or using PDF::Builder code similar to
1648 the following (for images rotated 180 degrees):
1649
1650 $pW = 612; $pH = 792; # page dimensions (US Letter)
1651 my $img = $pdf->image_jpeg("AliceLake.jpeg");
1652 # raw size WxH 4032x3024, scaled down to 504x378
1653 $sW = 4032/8; $sH = 3024/8;
1654 # intent is to center on US Letter sized page (LL at 54,207)
1655 # Orientation flag on this image is 3 (rotated 180 degrees).
1656 # if naively displayed (just $gfx->image call), it will be upside down
1657
1658 $gfx->save();
1659
1660 ## method 0: simple display, is rotated 180 degrees!
1661 #$gfx->image($img, ($pW-$sW)/2,($pH-$sH)/2, $sW,$sH);
1662
1663 ## method 1: translate, then rotate
1664 #$gfx->translate($pW,$pH); # to new origin (media UR corner)
1665 #$gfx->rotate(180); # rotate around new origin
1666 #$gfx->image($img, ($pW-$sW)/2,($pH-$sH)/2, $sW,$sH);
1667 # image's UR corner, not LL
1668
1669 # method 2: rotate, then translate
1670 $gfx->rotate(180); # rotate around current origin
1671 $gfx->translate(-$sW,-$sH); # translate in rotated coordinates
1672 $gfx->image($img, -($pW-$sW)/2,-($pH-$sH)/2, $sW,$sH);
1673 # image's UR corner, not LL
1674
1675 ## method 3: flip (mirror) twice
1676 #$scale = 1; # not rescaling here
1677 #$size_page = $pH/$scale;
1678 #$invScale = 1.0/$scale;
1679 #$gfx->add("-$invScale 0 0 -$invScale 0 $size_page cm");
1680 #$gfx->image($img, -($pW-$sW)/2-$sW,($pH-$sH)/2, $sW,$sH);
1681
1682 $gfx->restore();
1683
1684 If your image is also mirrored (flipped about an axis), simple rotation
1685 will not suffice. You could do something with a reversal of the
1686 coordinate system, as in "method 3" above (see "Advanced Methods" in
1687 PDF::Builder::Content). To mirror only left/right, the second $invScale
1688 would be positive; to mirror only top/bottom, the first would be
1689 positive. If all else fails, you could save a mirrored copy in a photo
1690 editor. 90 or 270 degree rotations will require a "rotate" call,
1691 possibly with "cm" usage to reverse mirroring. Incidentally, do not
1692 confuse this issue with the coordinate flipping performed by some
1693 Chrome browsers when printing a page to PDF.
1694
1695 Note that TIFF images may have the same rotation/mirroring problems as
1696 JPEG, which is not surprising, as the Exif format was lifted from TIFF
1697 for use in JPEG. The cure will be similar to JPEG's.
1698
1699 TIFF Images
1700
1701 Note that the Graphics::TIFF support library does not currently permit
1702 a filehandle for $file.
1703
1704 PDF::Builder will use the Graphics::TIFF support library for TIFF
1705 functions, if it is available, unless explicitly told not to. Your code
1706 can test whether Graphics::TIFF is available by examining
1707 "$tiff->usesLib()" or "$pdf->LA_GT()".
1708
1709 = -1
1710 Graphics::TIFF is installed, but your code has specified "nouseGT",
1711 to not use it. The old, pure Perl, code (buggy!) will be used
1712 instead, as if Graphics::TIFF was not installed.
1713
1714 = 0 Graphics::TIFF is not installed. Not all systems are able to
1715 successfully install this package, as it requires libtiff.a.
1716
1717 = 1 Graphics::TIFF is installed and is being used.
1718
1719 Options:
1720
1721 nouseGT => 1
1722 Do not use the Graphics::TIFF library, even if it's available.
1723 Normally you would want to use this library, but there may be cases
1724 where you don't, such as when you want to use a file handle instead
1725 of a name.
1726
1727 silent => 1
1728 Do not give the message that Graphics::TIFF is not installed. This
1729 message will be given only once, but you may want to suppress it,
1730 such as during t-tests.
1731
1732 PNG Images
1733
1734 PDF::Builder will use the Image::PNG::Libpng support library for PNG
1735 functions, if it is available, unless explicitly told not to. Your code
1736 can test whether Image::PNG::Libpng is available by examining
1737 "$png->usesLib()" or "$pdf->LA_IPL()".
1738
1739 = -1
1740 Image::PNG::Libpng is installed, but your code has specified
1741 "nouseIPL", to not use it. The old, pure Perl, code (slower and
1742 less capable) will be used instead, as if Image::PNG::Libpng was
1743 not installed.
1744
1745 = 0 Image::PNG::Libpng is not installed. Not all systems are able to
1746 successfully install this package, as it requires libpng.a.
1747
1748 = 1 Image::PNG::Libpng is installed and is being used.
1749
1750 Options:
1751
1752 nouseIPL => 1
1753 Do not use the Image::PNG::Libpng library, even if it's available.
1754 Normally you would want to use this library, when available, but
1755 there may be cases where you don't.
1756
1757 silent => 1
1758 Do not give the message that Image::PNG::Libpng is not installed.
1759 This message will be given only once, but you may want to suppress
1760 it, such as during t-tests.
1761
1762 notrans => 1
1763 No transparency -- ignore tRNS chunk if provided, ignore Alpha
1764 channel if provided.
1765
1766 USING SHAPER (HarfBuzz::Shaper library)
1767 # if HarfBuzz::Shaper is not installed, either bail out, or try to
1768 # use regular TTF calls instead
1769 my $rc;
1770 $rc = eval {
1771 require HarfBuzz::Shaper;
1772 1;
1773 };
1774 if (!defined $rc) { $rc = 0; }
1775 if ($rc == 0) {
1776 # bail out in some manner
1777 } else {
1778 # can use Shaper
1779 }
1780
1781 my $fontfile = '/WINDOWS/Fonts/times.ttf'; # used by both Shaper and textHS
1782 my $fontsize = 15; # used by both Shaper and textHS
1783 my $font = $pdf->ttfont($fontfile);
1784 $text->font($font, $fontsize);
1785
1786 my $hb = HarfBuzz::Shaper->new(); # only need to set up once
1787 my %settings; # for textHS(), not Shaper
1788 $settings{'dump'} = 1; # see the diagnostics
1789 $settings{'script'} = 'Latn';
1790 $settings('dir'} = 'L'; # LTR
1791 $settings{'features'} = (); # required
1792
1793 # -- set language (override automatic setting)
1794 #$settings{'language'} = 'en';
1795 #$hb->set_language( 'en_US' );
1796 # -- turn OFF ligatures
1797 #push @{ $settings{'features'} }, 'liga';
1798 #$hb->add_features( 'liga' );
1799 # -- turn OFF kerning
1800 #push @{ $settings{'features'} }, 'kern';
1801 #$hb->add_features( 'kern' );
1802 $hb->set_font($fontfile);
1803 $hb->set_size($fontsize);
1804 $hb->set_text("Let's eat waffles in the field for brunch.");
1805 # expect ffl and fi ligatures, and perhaps some kerning
1806
1807 my $info = $hb->shaper();
1808 $text->textHS($info, \%settings); # strikethru, underline allowed
1809
1810 The package HarfBuzz::Shaper may be optionally installed in order to
1811 use the text-shaping capabilities of the HarfBuzz library. These
1812 include kerning and ligatures in Western scripts (such as the Latin
1813 alphabet). More complex scripts can be handled, such as Arabic family
1814 and Indic scripts, where multiple forms of a character may be
1815 automatically selected, characters may be reordered, and other
1816 modifications made. The examples/HarfBuzz.pl script gives some examples
1817 of what may be done.
1818
1819 Keep in mind that HarfBuzz works only with TrueType (.ttf) and OpenType
1820 (.otf) font files. It will not work with PostScript (Type1), core,
1821 bitmapped, or CJK fonts. Not all .ttf fonts have the instructions
1822 necessary to guide HarfBuzz, but most proper .otf fonts do. In other
1823 words, there are no guarantees that a particular font file will work
1824 with Shaper!
1825
1826 The basic idea is to break up text into "chunks" which are of the same
1827 script (alphabet), language, direction, font face, font size, and
1828 variant (italic, bold, etc.). These could range from a single character
1829 to paragraph-length strings of text. These are fed to HarfBuzz::Shaper,
1830 along with flags, the font file to be used, and other supporting
1831 information, to create an array of output glyphs. Each element is a
1832 hash describing the glyph to be output, including its name (if
1833 available), its glyph ID (number) in the selected font, its x and y
1834 displacement (usually 0), and its "advance" x and y values, all in
1835 points. For horizontal languages (LTR and RTL), the y advance is
1836 normally 0 and the x advance is the font's character width, less any
1837 kerning amount.
1838
1839 Shaper will attempt to figure out the script used and the text
1840 direction, based on the Unicode range; and a reasonable guess at the
1841 language used. The language can be overridden, but currently the script
1842 and text direction cannot be overridden.
1843
1844 An important note: the number of glyphs (array elements) may not be
1845 equal to the number of Unicode points (characters) given in the chunk's
1846 text string! Sometimes a character will be decomposed into several
1847 pieces (multiple glyphs); sometimes multiple characters may be combined
1848 into a single ligature glyph; and characters may be reordered
1849 (especially in Indic and Southeast Asian languages). As well, for
1850 Right-to-Left (bidirectional) scripts such as Hebrew and Arabic
1851 families, the text is output in Left-to-Right order (reversed from the
1852 input).
1853
1854 With due care, a Shaper array can be manipulated in code. The elements
1855 are more or less independent of each other, so elements can be
1856 modified, rearranged, inserted, or deleted. You might adjust the
1857 position of a glyph with 'dx' and 'dy' hash elements. The 'ax' value
1858 should be left alone, so that the wrong kerning isn't calculated, but
1859 you might need to adjust the "advance x" value by means of one of the
1860 following:
1861
1862 axs is a value to be substituted for 'ax' (points)
1863 axsp is a substituted value (percentage) of the original 'ax'
1864 axr reduces 'ax' by the value (points). If negative, increase 'ax'
1865 axrp reduces 'ax' by the given percentage. Again, negative increases
1866 'ax'
1867
1868 Caution: a given character's glyph ID is not necessarily going to be
1869 the same between any two fonts! For example, an ASCII space (U+0020)
1870 might be "<0001>" in one font, and "<0003>" in another font (even one
1871 closely related!). A U+00A0 required blank (non-breaking space) may be
1872 output as a regular ASCII space U+0020. Take care if you need to find a
1873 particular glyph in the array, especially if the number of elements
1874 don't match. Consider making a text string of "marker" characters
1875 (space, nbsp, hyphen, soft hyphen, etc.) and processing it through
1876 HarfBuzz::Shaper to get the corresponding glyph numbers. You may have
1877 to count spaces, say, to see where you could break a glyph array to fit
1878 a line.
1879
1880 The "advancewidthHS()" method uses the same inputs as does "textHS()".
1881 Like "advancewidth()", it returns the chunk length in points. Unlike
1882 "advancewidth()", you cannot override the glyph array's font, font
1883 size, etc.
1884
1885 Once you have your (possibly modified) array of glyphs, you feed it to
1886 the "textHS()" method to render it to the page. Remember that this
1887 method handles only a single line of text; it does not do line
1888 splitting or fitting -- that you currently need to do manually. For
1889 Western scripts (e.g., Latin), that might not be too difficult, but for
1890 other scripts that involve extensive modification of the raw
1891 characters, it may be quite difficult to split words, but you still may
1892 be able to split at inter-word spaces.
1893
1894 A useful, but not exhaustive, set of functions are allowed by
1895 "textHS()" use. Support includes direction setting (top-to-bottom and
1896 bottom-to-top directions, e.g., for Far Eastern languages in
1897 traditional orientation), and explicit script names and language
1898 (depending on what support HarfBuzz itself gives). Not yet supported
1899 are features such as discretionary ligatures and manual selection of
1900 glyphs (e.g., swashes and alternate forms).
1901
1902 Currently, "textHS()" can only handle a single text string. We are
1903 looking at how fitting to a line length (splitting up an array) could
1904 be done, as well as how words might be split on hard and soft hyphens.
1905 At some point, full paragraph and page shaping could be possible.
1906
1907
1908
1909perl v5.36.0 2022-09-13 PDF::Builder::Docs(3)