1Spreadsheet::ParseExcelU(s3e)r Contributed Perl DocumentaStpiroenadsheet::ParseExcel(3)
2
3
4

NAME

6       Spreadsheet::ParseExcel - Extract information from an Excel file.
7

SYNOPSIS

9           #!/usr/bin/perl -w
10
11           use strict;
12           use Spreadsheet::ParseExcel;
13
14           my $parser   = Spreadsheet::ParseExcel->new();
15           my $workbook = $parser->Parse('Book1.xls');
16
17           for my $worksheet ( $workbook->worksheets() ) {
18
19               my ( $row_min, $row_max ) = $worksheet->row_range();
20               my ( $col_min, $col_max ) = $worksheet->col_range();
21
22               for my $row ( $row_min .. $row_max ) {
23                   for my $col ( $col_min .. $col_max ) {
24
25                       my $cell = $worksheet->get_cell( $row, $col );
26                       next unless $cell;
27
28                       print "Row, Col    = ($row, $col)\n";
29                       print "Value       = ", $cell->value(),       "\n";
30                       print "Unformatted = ", $cell->unformatted(), "\n";
31                       print "\n";
32                   }
33               }
34           }
35

DESCRIPTION

37       The Spreadsheet::ParseExcel module can be used to read information from
38       an Excel 95-2003 file.
39

Parser

41   new()
42       The "new()" method is used to create a new "Spreadsheet::ParseExcel"
43       parser object.
44
45           my $parser = Spreadsheet::ParseExcel->new();
46
47       As an advanced feature it is also possible to pass a call-back handler
48       to the parser to control the parsing of the spreadsheet.
49
50           $parser = Spreadsheet::ParseExcel->new(
51                               [ CellHandler => \&cell_handler,
52                                 NotSetCell  => 1,
53                               ]);
54
55       The call-back can be used to ignore certain cells or to reduce memory
56       usage. See the section "Reducing the memory usage of
57       Spreadsheet::ParseExcel" for more information.
58
59   Parse($filename, [$formatter])
60       The Parser "Parse()" method return a "Workbook" object.
61
62           my $parser   = Spreadsheet::ParseExcel->new();
63           my $workbook = $parser->Parse('Book1.xls');
64
65       If an error occurs "Parse()" returns "undef".
66
67       The $filename parameter is generally the file to be parsed. However, it
68       can also be a filehandle or a scalar reference.
69
70       The optional $formatter array ref can be an reference to a "Formatter
71       Class" to format the value of cells.
72
73   ColorIdxToRGB()
74       The "ColorIdxToRGB()" method returns a RGB string corresponding to a
75       specified color index. The RGB string has 6 characters, representing
76       the RGB hex value, for example 'FF0000'. The color index is generally
77       obtained from a FONT object.
78
79           $RGB = $parser->ColorIdxToRGB($color_index);
80

Workbook

82       A "Spreadsheet::ParseExcel::Workbook" is created via the
83       "Spreadsheet::ParseExcel" "Parse()" method:
84
85           my $parser   = Spreadsheet::ParseExcel->new();
86           my $workbook = $parser->Parse('Book1.xls');
87
88       The Workbook class has methods and properties that are outlined in the
89       following sections.
90

Workbook Methods

92   Parse()
93       As a syntactic shorthand you can create a Parser and Workbook object in
94       one go using the Workbook "Parse()" method. The following examples are
95       equivalent:
96
97           # Method 1
98           my $parser   = Spreadsheet::ParseExcel->new();
99           my $workbook = $parser->Parse('Book1.xls');
100
101           # Method 2
102           my $workbook = Spreadsheet::ParseExcel::Workbook->Parse('Book1.xls');
103
104   worksheets()
105       Returns an array of "Worksheet" objects. This was most commonly used to
106       iterate over the worksheets in a workbook:
107
108           for my $worksheet ( $workbook->worksheets() ) {
109               ...
110           }
111
112   Worksheet()
113       The "Worksheet()" method returns a single "Worksheet" object using
114       either its name or index:
115
116           $worksheet = $workbook->Worksheet('Sheet1');
117           $worksheet = $workbook->Worksheet(0);
118
119       Returns "undef" if the sheet name or index doesn't exist.
120

Workbook Properties

122       A workbook object exposes a number of properties as shown below:
123
124           $workbook->{Worksheet }->[$index]
125           $workbook->{File}
126           $workbook->{Author}
127           $workbook->{Flg1904}
128           $workbook->{Version}
129           $workbook->{SheetCount}
130           $workbook->{PrintArea }->[$index]
131           $workbook->{PrintTitle}->[$index]
132
133       These properties are generally only of interest to advanced users.
134       Casual users can skip this section.
135
136   $workbook->{Worksheet}->[$index]
137       Returns an array of "Worksheet" objects. This was most commonly used to
138       iterate over the worksheets in a workbook:
139
140           for my $worksheet (@{$workbook->{Worksheet}}) {
141               ...
142           }
143
144       It is now deprecated, use worksheets()) instead.
145
146   $workbook->{File}
147       Returns the name of the Excel file.
148
149   $workbook->{Author}
150       Returns the author of the Excel file.
151
152   $workbook->{Flg1904}
153       Returns true if the Excel file is using the 1904 date epoch instead of
154       the 1900 epoch. The Windows version of Excel generally uses the 1900
155       epoch while the Mac version of Excel generally uses the 1904 epoch.
156
157   $workbook->{Version}
158       Returns the version of the Excel file.
159
160   $workbook->{SheetCount}
161       Returns the numbers of "Worksheet" objects in the Workbook.
162
163   $workbook->{PrintArea}->[$index]
164       Returns an array ref of print areas. Each print area is as follows:
165
166           [ $start_row, $start_col, $end_row, $end_col]
167
168   $workbook->{PrintTitle}->[$index]
169       Returns an array ref  of print title hash refs. Each print title is as
170       follows:
171
172           {
173               Row    => [$start_row, $end_row],
174               Column => [$start_col, $end_col]
175           }
176

Worksheet

178       The "Spreadsheet::ParseExcel::Worksheet" class has the following
179       methods and properties.
180

Worksheet methods

182   get_cell($row, $col)
183       Return the "Cell" object at row $row and column $col if it is defined.
184       Otherwise returns undef.
185
186           my $cell = $worksheet->get_cell($row, $col);
187
188   row_range()
189       Return a two-element list "($min, $max)" containing the minimum and
190       maximum defined rows in the worksheet. If there is no row defined $max
191       is smaller than $min.
192
193           my ( $row_min, $row_max ) = $worksheet->row_range();
194
195   col_range()
196       Return a two-element list "($min, $max)" containing the minimum and
197       maximum of defined columns in the worksheet. If there is no column
198       defined $max is smaller than $min.
199
200           my ( $col_min, $col_max ) = $worksheet->col_range();
201

Worksheet Properties

203       A worksheet object exposes a number of properties as shown below:
204
205           $worksheet->{Name}
206           $worksheet->{DefRowHeight}
207           $worksheet->{DefColWidth}
208           $worksheet->{RowHeight}->[$row]
209           $worksheet->{ColWidth}->[$col]
210           $worksheet->{Cells}->[$row]->[$col]
211           $worksheet->{Landscape}
212           $worksheet->{Scale}
213           $worksheet->{PageFit}
214           $worksheet->{FitWidth}
215           $worksheet->{FitHeight}
216           $worksheet->{PaperSize}
217           $worksheet->{PageStart}
218           $worksheet->{UsePage}
219           $worksheet->{$margin}
220           $worksheet->{HCenter}
221           $worksheet->{VCenter}
222           $worksheet->{Header}
223           $worksheet->{Footer}
224           $worksheet->{PrintGrid}
225           $worksheet->{PrintHeaders}
226           $worksheet->{NoColor}
227           $worksheet->{Draft}
228           $worksheet->{Notes}
229           $worksheet->{LeftToRight}
230           $worksheet->{HPageBreak}
231           $worksheet->{VPageBreak}
232           $worksheet->{MergedArea}
233
234       These properties are generally only of interest to advanced users.
235       Casual users can skip this section.
236
237   $worksheet->{Name}
238       Returns the name of the worksheet such as 'Sheet1'.
239
240   $worksheet->{DefRowHeight}
241       Returns default height of the rows in the worksheet.
242
243   $worksheet->{DefColWidth}
244       Returns default width of columns in the worksheet.
245
246   $worksheet->{RowHeight}->[$row]
247       Returns an array of row heights.
248
249   $worksheet->{ColWidth}->[$col]
250       Returns array of column widths. A value of "undef" means the column has
251       the "DefColWidth".
252
253   $worksheet->{Cells}->[$row]->[$col]
254       Returns array of "Cell" objects in the worksheet.
255
256           my $cell = $worksheet->{Cells}->[$row]->[$col];
257
258   $worksheet->{Landscape}
259       Returns 0 for horizontal or 1 for vertical.
260
261   $worksheet->{Scale}
262       Returns the worksheet print scale.
263
264   $worksheet->{PageFit}
265       Returns true if the "fit to" print option is set.
266
267   $worksheet->{FitWidth}
268       Return the number of pages in the "fit to width" option.
269
270   $worksheet->{FitHeight}
271       Return the number of pages in the "fit to height" option.
272
273   $worksheet->{PaperSize}
274       Returns the printer paper size. The value corresponds to the formats
275       shown below:
276
277           Index   Paper format            Paper size
278           =====   ============            ==========
279             0     Printer default         -
280             1     Letter                  8 1/2 x 11 in
281             2     Letter Small            8 1/2 x 11 in
282             3     Tabloid                 11 x 17 in
283             4     Ledger                  17 x 11 in
284             5     Legal                   8 1/2 x 14 in
285             6     Statement               5 1/2 x 8 1/2 in
286             7     Executive               7 1/4 x 10 1/2 in
287             8     A3                      297 x 420 mm
288             9     A4                      210 x 297 mm
289            10     A4 Small                210 x 297 mm
290            11     A5                      148 x 210 mm
291            12     B4                      250 x 354 mm
292            13     B5                      182 x 257 mm
293            14     Folio                   8 1/2 x 13 in
294            15     Quarto                  215 x 275 mm
295            16     -                       10x14 in
296            17     -                       11x17 in
297            18     Note                    8 1/2 x 11 in
298            19     Envelope  9             3 7/8 x 8 7/8
299            20     Envelope 10             4 1/8 x 9 1/2
300            21     Envelope 11             4 1/2 x 10 3/8
301            22     Envelope 12             4 3/4 x 11
302            23     Envelope 14             5 x 11 1/2
303            24     C size sheet            -
304            25     D size sheet            -
305            26     E size sheet            -
306            27     Envelope DL             110 x 220 mm
307            28     Envelope C3             324 x 458 mm
308            29     Envelope C4             229 x 324 mm
309            30     Envelope C5             162 x 229 mm
310            31     Envelope C6             114 x 162 mm
311            32     Envelope C65            114 x 229 mm
312            33     Envelope B4             250 x 353 mm
313            34     Envelope B5             176 x 250 mm
314            35     Envelope B6             176 x 125 mm
315            36     Envelope                110 x 230 mm
316            37     Monarch                 3.875 x 7.5 in
317            38     Envelope                3 5/8 x 6 1/2 in
318            39     Fanfold                 14 7/8 x 11 in
319            40     German Std Fanfold      8 1/2 x 12 in
320            41     German Legal Fanfold    8 1/2 x 13 in
321            256    User defined
322
323       The two most common paper sizes are "1 = "US Letter"" and "9 = A4".
324
325   $worksheet->{PageStart}
326       Returns the page number where printing starts.
327
328   $worksheet->{UsePage}
329       Returns whether a user defined start page is in use.
330
331   $worksheet->{$margin}
332       Returns the worksheet margin for left, right, top, bottom, header and
333       footer where $margin has one of the following values:
334
335           LeftMargin
336           RightMargin
337           TopMargin
338           BottomMargin
339           HeaderMargin
340           FooterMargin
341
342   $worksheet->{HCenter}
343       Returns true if the "Center horizontally when Printing" option is set.
344
345   $worksheet->{VCenter}
346       Returns true if the "Center vertically when Printing" option is set.
347
348   $worksheet->{Header}
349       Returns the print header string. This can contain control codes for
350       alignment and font properties. Refer to the Excel on-line help on
351       headers and footers or to the Spreadsheet::WriteExcel documentation for
352       "set_header()".
353
354   $worksheet->{Footer}
355       Returns the print footer string. This can contain control codes for
356       alignment and font properties. Refer to the Excel on-line help on
357       headers and footers or to the Spreadsheet::WriteExcel documentation for
358       "set_header()".
359
360   $worksheet->{PrintGrid}
361       Returns true if Print with gridlines is set.
362
363   $worksheet->{PrintHeaders}
364       Returns true if Print with headings is set.
365
366   $worksheet->{NoColor}
367       Returns true if Print in black and white is set.
368
369   $worksheet->{Draft}
370       Returns true if the "draft mode" print option is set.
371
372   $worksheet->{Notes}
373       Returns true if print with notes option is set.
374
375   $worksheet->{LeftToRight}
376       Returns the print order for the worksheet. Returns 0 for "left to
377       right" printing and 1 for "top down" printing.
378
379   $worksheet->{HPageBreak}
380       Return an array ref of horizontal page breaks.
381
382   $worksheet->{VPageBreak}
383       Return an array ref of vertical page breaks.
384
385   $worksheet->{MergedArea}
386       Return an array ref of merged areas. Each merged area is:
387
388           [ $start_row, $start_col, $end_row, $end_col]
389

Cell

391       The "Spreadsheet::ParseExcel::Cell" class has the following methods and
392       properties.
393

Cell methods

395   value()
396       Formatted value of the cell.
397
398   unformatted()
399       Unformatted value of the cell.
400

Cell properties

402           $cell->{Val}
403           $cell->{Type}
404           $cell->{Code}
405           $cell->{Format}
406           $cell->{Merged}
407           $cell->{Rich}
408
409   $cell->{Val}
410       Returns the unformatted value of the cell. This is Deprecated, use
411       "$cell->unformatted()" instead.
412
413   $cell->{Type}
414       Returns the type of cell such as "Text", "Numeric" or "Date".
415
416       If the type was detected as "Numeric", and the Cell Format matches
417       "m{^[dmy][-\\/dmy]*$}", it will be treated as a "Date" type.
418
419   $cell->{Code}
420       Returns the character encoding of the cell. It is either  "undef",
421       "ucs2" or "_native_".
422
423       If "undef" then the character encoding seems to be "ascii".
424
425       If "_native_" it means that cell seems to be 'sjis' or something
426       similar.
427
428   $cell->{Format}
429       Returns the "Format" object for the cell.
430
431   $cell->{Merged}
432       Returns true if the cell is merged.
433
434   $cell->{Rich}
435       Returns an array ref of font information about each string block in a
436       "rich", i.e. multi-format, string. Each entry has the form:
437
438           [ $start_position>, $font_object ]
439
440       For more information refer to the example program "sample/dmpExR.pl".
441

Format

443       The "Spreadsheet::ParseExcel::Format" class has the following
444       properties:
445
446   Format properties
447           $format->{Font}
448           $format->{AlignH}
449           $format->{AlignV}
450           $format->{Indent}
451           $format->{Wrap}
452           $format->{Shrink}
453           $format->{Rotate}
454           $format->{JustLast}
455           $format->{ReadDir}
456           $format->{BdrStyle}
457           $format->{BdrColor}
458           $format->{BdrDiag}
459           $format->{Fill}
460           $format->{Lock}
461           $format->{Hidden}
462           $format->{Style}
463
464       These properties are generally only of interest to advanced users.
465       Casual users can skip this section.
466
467   $format->{Font}
468       Returns the "Font" object for the Format.
469
470   $format->{AlignH}
471       Returns the horizontal alignment of the format where the value has the
472       following meaning:
473
474           0 => No alignment
475           1 => Left
476           2 => Center
477           3 => Right
478           4 => Fill
479           5 => Justify
480           6 => Center across
481           7 => Distributed/Equal spaced
482
483   $format->{AlignV}
484       Returns the vertical alignment of the format where the value has the
485       following meaning:
486
487           0 => Top
488           1 => Center
489           2 => Bottom
490           3 => Justify
491           4 => Distributed/Equal spaced
492
493   $format->{Indent}
494       Returns the indent level of the "Left" horizontal alignment.
495
496   $format->{Wrap}
497       Returns true if textwrap is on.
498
499   $format->{Shrink}
500       Returns true if "Shrink to fit" is set for the format.
501
502   $format->{Rotate}
503       Returns the text rotation. In Excel97+, it returns the angle in degrees
504       of the text rotation.
505
506       In Excel95 or earlier it returns a value as follows:
507
508           0 => No rotation
509           1 => Top down
510           2 => 90 degrees anti-clockwise,
511           3 => 90 clockwise
512
513   $format->{JustLast}
514       Return true if the "justify last" property is set for the format.
515
516   $format->{ReadDir}
517       Returns the direction that the text is read from.
518
519   $format->{BdrStyle}
520       Returns an array ref of border styles as follows:
521
522           [ $left, $right, $top, $bottom ]
523
524   $format->{BdrColor}
525       Returns an array ref of border color indexes as follows:
526
527           [ $left, $right, $top, $bottom ]
528
529   $format->{BdrDiag}
530       Returns an array ref of diagonal border kind, style and color index as
531       follows:
532
533           [$kind, $style, $color ]
534
535       Where kind is:
536
537           0 => None
538           1 => Right-Down
539           2 => Right-Up
540           3 => Both
541
542   $format->{Fill}
543       Returns an array ref of fill pattern and color indexes as follows:
544
545           [ $pattern, $front_color, $back_color ]
546
547   $format->{Lock}
548       Returns true if the cell is locked.
549
550   $format->{Hidden}
551       Returns true if the cell is Hidden.
552
553   $format->{Style}
554       Returns true if the format is a Style format.
555

Font

557       Spreadsheet::ParseExcel::Font
558
559       Format class has these properties:
560

Font Properties

562           $font->{Name}
563           $font->{Bold}
564           $font->{Italic}
565           $font->{Height}
566           $font->{Underline}
567           $font->{UnderlineStyle}
568           $font->{Color}
569           $font->{Strikeout}
570           $font->{Super}
571
572   $font->{Name}
573       Returns the name of the font, for example 'Arial'.
574
575   $font->{Bold}
576       Returns true if the font is bold.
577
578   $font->{Italic}
579       Returns true if the font is italic.
580
581   $font->{Height}
582       Returns the size (height) of the font.
583
584   $font->{Underline}
585       Returns true if the font in underlined.
586
587   $font->{UnderlineStyle}
588       Returns the style of an underlined font where the value has the
589       following meaning:
590
591            0 => None
592            1 => Single
593            2 => Double
594           33 => Single accounting
595           34 => Double accounting
596
597   $font->{Color}
598       Returns the color index for the font. The index can be converted to a
599       RGB string using the "ColorIdxToRGB()" Parser method.
600
601   $font->{Strikeout}
602       Returns true if the font has the strikeout property set.
603
604   $font->{Super}
605       Returns one of the following values if the superscript or subscript
606       property of the font is set:
607
608           0 => None
609           1 => Superscript
610           2 => Subscript
611

Formatter class

613       Spreadsheet::ParseExcel::Fmt*
614
615       Formatter class will convert cell data.
616
617       Spreadsheet::ParseExcel includes 2 formatter classes. "FmtDefault" and
618       "FmtJapanese". It is also possible to create a user defined formatting
619       class.
620
621       The formatter class "Spreadsheet::ParseExcel::Fmt*" should provide the
622       following functions:
623
624   ChkType($self, $is_numeric, $format_index)
625       Method to check the the type of data in the cell. Should return "Date",
626       "Numeric" or "Text". It is passed the following parameters:
627
628       $self
629           A scalar reference to the Formatter object.
630
631       $is_numeric
632           If true, the value seems to be number.
633
634       $format_index
635           The index number for the cell Format object.
636
637   TextFmt($self, $string_data, $string_encoding)
638       Converts the string data in the cell into the correct encoding.  It is
639       passed the following parameters:
640
641       $self
642           A scalar reference to the Formatter object.
643
644       $string_data
645           The original string/text data.
646
647       $string_encoding
648           The character encoding of original string/text.
649
650   ValFmt($self, $cell, $workbook)
651       Convert the original unformatted cell value into the appropriate
652       formatted value. For instance turn a number into a formatted date.  It
653       is passed the following parameters:
654
655       $self
656           A scalar reference to the Formatter object.
657
658       $cell
659           A scalar reference to the Cell object.
660
661       $workbook
662           A scalar reference to the Workbook object.
663
664   FmtString($self, $cell, $workbook)
665       Get the format string for the Cell.  It is passed the following
666       parameters:
667
668       $self
669           A scalar reference to the Formatter object.
670
671       $cell
672           A scalar reference to the Cell object.
673
674       $workbook
675           A scalar reference to the Workbook object.
676

Reducing the memory usage of Spreadsheet::ParseExcel

678       In some cases a "Spreadsheet::ParseExcel" application may consume a lot
679       of memory when processing a large Excel file and, as a result, may fail
680       to complete. The following explains why this can occur and how to
681       resolve it.
682
683       "Spreadsheet::ParseExcel" processes an Excel file in two stages. In the
684       first stage it extracts the Excel binary stream from the OLE container
685       file using "OLE::Storage_Lite". In the second stage it parses the
686       binary stream to read workbook, worksheet and cell data which it then
687       stores in memory. The majority of the memory usage is required for
688       storing cell data.
689
690       The reason for this is that as the Excel file is parsed and each cell
691       is encountered a cell handling function creates a relatively large
692       nested cell object that contains the cell value and all of the data
693       that relates to the cell formatting. For large files (a 10MB Excel file
694       on a 256MB system) this overhead can cause the system to grind to a
695       halt.
696
697       However, in a lot of cases when an Excel file is being processed the
698       only information that is required are the cell values. In these cases
699       it is possible to avoid most of the memory overhead by specifying your
700       own cell handling function and by telling Spreadsheet::ParseExcel not
701       to store the parsed cell data. This is achieved by passing a cell
702       handler function to "new()" when creating the parse object. Here is an
703       example.
704
705           #!/usr/bin/perl -w
706
707           use strict;
708           use Spreadsheet::ParseExcel;
709
710           my $parser = Spreadsheet::ParseExcel->new(
711               CellHandler => \&cell_handler,
712               NotSetCell  => 1
713           );
714
715           my $workbook = $parser->Parse('file.xls');
716
717           sub cell_handler {
718
719               my $workbook    = $_[0];
720               my $sheet_index = $_[1];
721               my $row         = $_[2];
722               my $col         = $_[3];
723               my $cell        = $_[4];
724
725               # Do something useful with the formatted cell value
726               print $cell->value(), "\n";
727
728           }
729
730       The user specified cell handler is passed as a code reference to
731       "new()" along with the parameter "NotSetCell" which tells
732       Spreadsheet::ParseExcel not to store the parsed cell. Note, you don't
733       have to iterate over the rows and columns, this happens automatically
734       as part of the parsing.
735
736       The cell handler is passed 5 arguments. The first, $workbook, is a
737       reference to the "Spreadsheet::ParseExcel::Workbook" object that
738       represent the parsed workbook. This can be used to access any of the
739       "Spreadsheet::ParseExcel::Workbook" methods, see "Workbook". The second
740       $sheet_index is the zero-based index of the worksheet being parsed. The
741       third and fourth, $row and $col, are the zero-based row and column
742       number of the cell. The fifth, $cell, is a reference to the
743       "Spreadsheet::ParseExcel::Cell" object. This is used to extract the
744       data from the cell. See "Cell" for more information.
745
746       This technique can be useful if you are writing an Excel to database
747       filter since you can put your DB calls in the cell handler.
748
749       If you don't want all of the data in the spreadsheet you can add some
750       control logic to the cell handler. For example we can extend the
751       previous example so that it only prints the first 10 rows of the first
752       two worksheets in the parsed workbook by adding some "if()" statements
753       to the cell handler:
754
755           #!/usr/bin/perl -w
756
757           use strict;
758           use Spreadsheet::ParseExcel;
759
760           my $parser = Spreadsheet::ParseExcel->new(
761               CellHandler => \&cell_handler,
762               NotSetCell  => 1
763           );
764
765           my $workbook = $parser->Parse('file.xls');
766
767           sub cell_handler {
768
769               my $workbook    = $_[0];
770               my $sheet_index = $_[1];
771               my $row         = $_[2];
772               my $col         = $_[3];
773               my $cell        = $_[4];
774
775               # Skip some worksheets and rows (inefficiently).
776               return if $sheet_index >= 3;
777               return if $row >= 10;
778
779               # Do something with the formatted cell value
780               print $cell->value(), "\n";
781
782           }
783
784       However, this still processes the entire workbook. If you wish to save
785       some additional processing time you can abort the parsing after you
786       have read the data that you want, using the workbook "ParseAbort"
787       method:
788
789           #!/usr/bin/perl -w
790
791           use strict;
792           use Spreadsheet::ParseExcel;
793
794           my $parser = Spreadsheet::ParseExcel->new(
795               CellHandler => \&cell_handler,
796               NotSetCell  => 1
797           );
798
799           my $workbook = $parser->Parse('file.xls');
800
801           sub cell_handler {
802
803               my $workbook    = $_[0];
804               my $sheet_index = $_[1];
805               my $row         = $_[2];
806               my $col         = $_[3];
807               my $cell        = $_[4];
808
809               # Skip some worksheets and rows (more efficiently).
810               if ( $sheet_index >= 1 and $row >= 10 ) {
811                   $workbook->ParseAbort(1);
812                   return;
813               }
814
815               # Do something with the formatted cell value
816               print $cell->value(), "\n";
817
818           }
819

KNOWN PROBLEMS

821       ·   Issues reported by users:
822           http://rt.cpan.org/Public/Dist/Display.html?Name=Spreadsheet-ParseExcel
823
824       ·   This module cannot read the values of formulas from files created
825           with Spreadsheet::WriteExcel unless the user specified the values
826           when creating the file (which is generally not the case). The
827           reason for this is that Spreadsheet::WriteExcel writes the formula
828           but not the formula result since it isn't in a position to
829           calculate arbitrary Excel formulas without access to Excel's
830           formula engine.
831
832       ·   If Excel has date fields where the specified format is equal to the
833           system-default for the short-date locale, Excel does not store the
834           format, but defaults to an internal format which is system
835           dependent. In these cases ParseExcel uses the date format
836           'yyyy-mm-dd'.
837

REPORTING A BUG

839       Bugs can be reported via rt.cpan.org. See the following for
840       instructions on bug reporting for Spreadsheet::ParseExcel
841
842       http://rt.cpan.org/Public/Dist/Display.html?Name=Spreadsheet-ParseExcel
843

SEE ALSO

845       ·   xls2csv by Ken Prows
846           (http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv).
847
848       ·   xls2csv and xlscat by H.Merijn Brand (these utilities are part of
849           Spreadsheet::Read, see below).
850
851       ·   excel2txt by Ken Youens-Clark,
852           (http://search.cpan.org/~kclark/excel2txt/excel2txt). This is an
853           excellent example of an Excel filter using Spreadsheet::ParseExcel.
854           It can produce CSV, Tab delimited, Html, XML and Yaml.
855
856       ·   XLSperl by Jon Allen
857           (http://search.cpan.org/~jonallen/XLSperl/bin/XLSperl). This
858           application allows you to use Perl "one-liners" with Microsoft
859           Excel files.
860
861       ·   Spreadsheet::XLSX
862           (http://search.cpan.org/~dmow/Spreadsheet-XLSX/lib/Spreadsheet/XLSX.pm)
863           by Dmitry Ovsyanko. A module with a similar interface to
864           Spreadsheet::ParseExcel for parsing Excel 2007 XLSX OpenXML files.
865
866       ·   Spreadsheet::Read
867           (http://search.cpan.org/~hmbrand/Spreadsheet-Read/Read.pm) by
868           H.Merijn Brand. A single interface for reading several different
869           spreadsheet formats.
870
871       ·   Spreadsheet::WriteExcel
872           (http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm).
873           A perl module for creating new Excel files.
874
875       ·   Spreadsheet::ParseExcel::SaveParser
876           (http://search.cpan.org/~jmcnamara/Spreadsheet-ParseExcel/lib/Spreadsheet/ParseExcel/SaveParser.pm).
877           This is a combination of Spreadsheet::ParseExcel and
878           Spreadsheet::WriteExcel and it allows you to "rewrite" an Excel
879           file. See the following example
880           (http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm#MODIFYING_AND_REWRITING_EXCEL_FILES).
881           It is part of the Spreadsheet::ParseExcel distro.
882
883       ·   Text::CSV_XS
884           (http://search.cpan.org/~hmbrand/Text-CSV_XS/CSV_XS.pm) by H.Merijn
885           Brand. A fast and rigorous module for reading and writing CSV data.
886           Don't consider rolling your own CSV handling, use this module
887           instead.
888

MAILING LIST

890       There is a Google group for discussing and asking questions about
891       Spreadsheet::ParseExcel. This is a good place to search to see if your
892       question has been asked before:
893       http://groups-beta.google.com/group/spreadsheet-parseexcel/
894

DONATIONS

896       If you'd care to donate to the Spreadsheet::ParseExcel project, you can
897       do so via PayPal: http://tinyurl.com/7ayes
898

TODO

900       ·   The current maintenance work is directed towards making the
901           documentation more useful, improving and simplifying the API, and
902           improving the maintainability of the code base. After that new
903           features will be added.
904
905       ·   Fix open bugs and documentation for SaveParser.
906
907       ·   Add Formula support, Hyperlink support, Named Range support.
908
909       ·   Improve Spreadsheet::ParseExcel::SaveParser compatibility with
910           Spreadsheet::WriteExcel.
911
912       ·   Improve Unicode and other encoding support. This will probably
913           require dropping support for perls prior to 5.8+.
914

ACKNOWLEDGEMENTS

916       From Kawai Takanori:
917
918       First of all, I would like to acknowledge the following valuable
919       programs and modules: XHTML, OLE::Storage and Spreadsheet::WriteExcel.
920
921       In no particular order: Yamaji Haruna, Simamoto Takesi, Noguchi Harumi,
922       Ikezawa Kazuhiro, Suwazono Shugo, Hirofumi Morisada, Michael Edwards,
923       Kim Namusk, Slaven Rezic, Grant Stevens, H.Merijn Brand and many many
924       people + Kawai Mikako.
925

DISCLAIMER OF WARRANTY

927       Because this software is licensed free of charge, there is no warranty
928       for the software, to the extent permitted by applicable law. Except
929       when otherwise stated in writing the copyright holders and/or other
930       parties provide the software "as is" without warranty of any kind,
931       either expressed or implied, including, but not limited to, the implied
932       warranties of merchantability and fitness for a particular purpose. The
933       entire risk as to the quality and performance of the software is with
934       you. Should the software prove defective, you assume the cost of all
935       necessary servicing, repair, or correction.
936
937       In no event unless required by applicable law or agreed to in writing
938       will any copyright holder, or any other party who may modify and/or
939       redistribute the software as permitted by the above licence, be liable
940       to you for damages, including any general, special, incidental, or
941       consequential damages arising out of the use or inability to use the
942       software (including but not limited to loss of data or data being
943       rendered inaccurate or losses sustained by you or third parties or a
944       failure of the software to operate with any other software), even if
945       such holder or other party has been advised of the possibility of such
946       damages.
947

LICENSE

949       Either the Perl Artistic Licence
950       http://dev.perl.org/licenses/artistic.html or the GPL
951       http://www.opensource.org/licenses/gpl-license.php
952

AUTHOR

954       Current maintainer 0.40+: John McNamara jmcnamara@cpan.org
955
956       Maintainer 0.27-0.33: Gabor Szabo szabgab@cpan.org
957
958       Original author: Kawai Takanori (Hippo2000) kwitknr@cpan.org
959
961       Copyright (c) 2009 John McNamara
962
963       Copyright (c) 2006-2008 Gabor Szabo
964
965       Copyright (c) 2000-2006 Kawai Takanori
966
967       All rights reserved. This is free software. You may distribute under
968       the terms of either the GNU General Public License or the Artistic
969       License.
970
971
972
973perl v5.12.0                      2010-05-06        Spreadsheet::ParseExcel(3)
Impressum