1Convert::Binary::C(3) User Contributed Perl DocumentationConvert::Binary::C(3)
2
3
4
6 Convert::Binary::C - Binary Data Conversion using C Types
7
9 Simple
10
11 use Convert::Binary::C;
12
13 #---------------------------------------------
14 # Create a new object and parse embedded code
15 #---------------------------------------------
16 my $c = Convert::Binary::C->new->parse(<<ENDC);
17
18 enum Month { JAN, FEB, MAR, APR, MAY, JUN,
19 JUL, AUG, SEP, OCT, NOV, DEC };
20
21 struct Date {
22 int year;
23 enum Month month;
24 int day;
25 };
26
27 ENDC
28
29 #-----------------------------------------------
30 # Pack Perl data structure into a binary string
31 #-----------------------------------------------
32 my $date = { year => 2002, month => 'DEC', day => 24 };
33
34 my $packed = $c->pack('Date', $date);
35
36 Advanced
37
38 use Convert::Binary::C;
39 use Data::Dumper;
40
41 #---------------------
42 # Create a new object
43 #---------------------
44 my $c = new Convert::Binary::C ByteOrder => 'BigEndian';
45
46 #---------------------------------------------------
47 # Add include paths and global preprocessor defines
48 #---------------------------------------------------
49 $c->Include('/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include',
50 '/usr/include')
51 ->Define(qw( __USE_POSIX __USE_ISOC99=1 ));
52
53 #----------------------------------
54 # Parse the 'time.h' header file
55 #----------------------------------
56 $c->parse_file('time.h');
57
58 #---------------------------------------
59 # See which files the object depends on
60 #---------------------------------------
61 print Dumper([$c->dependencies]);
62
63 #-----------------------------------------------------------
64 # See if struct timespec is defined and dump its definition
65 #-----------------------------------------------------------
66 if ($c->def('struct timespec')) {
67 print Dumper($c->struct('timespec'));
68 }
69
70 #-------------------------------
71 # Create some binary dummy data
72 #-------------------------------
73 my $data = "binary_test_string";
74
75 #--------------------------------------------------------
76 # Unpack $data according to 'struct timespec' definition
77 #--------------------------------------------------------
78 if (length($data) >= $c->sizeof('timespec')) {
79 my $perl = $c->unpack('timespec', $data);
80 print Dumper($perl);
81 }
82
83 #--------------------------------------------------------
84 # See which member lies at offset 5 of 'struct timespec'
85 #--------------------------------------------------------
86 my $member = $c->member('timespec', 5);
87 print "member('timespec', 5) = '$member'\n";
88
90 Convert::Binary::C is a preprocessor and parser for C type definitions.
91 It is highly configurable and supports arbitrarily complex data struc‐
92 tures. Its object-oriented interface has "pack" and "unpack" methods
93 that act as replacements for Perl's "pack" and "unpack" and allow to
94 use C types instead of a string representation of the data structure
95 for conversion of binary data from and to Perl's complex data struc‐
96 tures.
97
98 Actually, what Convert::Binary::C does is not very different from what
99 a C compiler does, just that it doesn't compile the source code into an
100 object file or executable, but only parses the code and allows Perl to
101 use the enumerations, structs, unions and typedefs that have been
102 defined within your C source for binary data conversion, similar to
103 Perl's "pack" and "unpack".
104
105 Beyond that, the module offers a lot of convenience methods to retrieve
106 information about the C types that have been parsed.
107
108 Background and History
109
110 In late 2000 I wrote a real-time debugging interface for an embedded
111 medical device that allowed me to send out data from that device over
112 its integrated Ethernet adapter. The interface was "printf()"-like, so
113 you could easily send out strings or numbers. But you could also send
114 out what I called arbitrary data, which was intended for arbitrary
115 blocks of the device's memory.
116
117 Another part of this real-time debugger was a Perl application running
118 on my workstation that gathered all the messages that were sent out
119 from the embedded device. It printed all the strings and numbers, and
120 hex-dumped the arbitrary data. However, manually parsing a couple of
121 300 byte hex-dumps of a complex C structure is not only frustrating,
122 but also error-prone and time consuming.
123
124 Using "unpack" to retrieve the contents of a C structure works fine for
125 small structures and if you don't have to deal with struct member
126 alignment. But otherwise, maintaining such code can be as awful as
127 deciphering hex-dumps.
128
129 As I didn't find anything to solve my problem on the CPAN, I wrote a
130 little module that translated simple C structs into "unpack" strings.
131 It worked, but it was slow. And since it couldn't deal with struct mem‐
132 ber alignment, I soon found myself adding padding bytes everywhere. So
133 again, I had to maintain two sources, and changing one of them forced
134 me to touch the other one.
135
136 All in all, this little module seemed to make my task a bit easier, but
137 it was far from being what I was thinking of:
138
139 · A module that could directly use the source I've been coding for the
140 embedded device without any modifications.
141
142 · A module that could be configured to match the properties of the dif‐
143 ferent compilers and target platforms I was using.
144
145 · A module that was fast enough to decode a great amount of binary data
146 even on my slow workstation.
147
148 I didn't know how to accomplish these tasks until I read something
149 about XS. At least, it seemed as if it could solve my performance prob‐
150 lems. However, writing a C parser in C isn't easier than it is in Perl.
151 But writing a C preprocessor from scratch is even worse.
152
153 Fortunately enough, after a few weeks of searching I found both, a
154 lean, open-source C preprocessor library, and a reusable YACC grammar
155 for ANSI-C. That was the beginning of the development of Con‐
156 vert::Binary::C in late 2001.
157
158 Now, I'm successfully using the module in my embedded environment since
159 long before it appeared on CPAN. From my point of view, it is exactly
160 what I had in mind. It's fast, flexible, easy to use and portable. It
161 doesn't require external programs or other Perl modules.
162
163 About this document
164
165 This document describes how to use Convert::Binary::C. A lot of differ‐
166 ent features are presented, and the example code sometimes uses Perl's
167 more advanced language elements. If your experience with Perl is rather
168 limited, you should know how to use Perl's very good documentation sys‐
169 tem.
170
171 To look up one of the manpages, use the "perldoc" command. For exam‐
172 ple,
173
174 perldoc perl
175
176 will show you Perl's main manpage. To look up a specific Perl function,
177 use "perldoc -f":
178
179 perldoc -f map
180
181 gives you more information about the "map" function. You can also
182 search the FAQ using "perldoc -q":
183
184 perldoc -q array
185
186 will give you everything you ever wanted to know about Perl arrays. But
187 now, let's go on with some real stuff!
188
189 Why use Convert::Binary::C?
190
191 Say you want to pack (or unpack) data according to the following C
192 structure:
193
194 struct foo {
195 char ary[3];
196 unsigned short baz;
197 int bar;
198 };
199
200 You could of course use Perl's "pack" and "unpack" functions:
201
202 @ary = (1, 2, 3);
203 $baz = 40000;
204 $bar = -4711;
205 $binary = pack 'c3 S i', @ary, $baz, $bar;
206
207 But this implies that the struct members are byte aligned. If they were
208 long aligned (which is the default for most compilers), you'd have to
209 write
210
211 $binary = pack 'c3 x S x2 i', @ary, $baz, $bar;
212
213 which doesn't really increase readability.
214
215 Now imagine that you need to pack the data for a completely different
216 architecture with different byte order. You would look into the "pack"
217 manpage again and perhaps come up with this:
218
219 $binary = pack 'c3 x n x2 N', @ary, $baz, $bar;
220
221 However, if you try to unpack $foo again, your signed values have
222 turned into unsigned ones.
223
224 All this can still be managed with Perl. But imagine your structures
225 get more complex? Imagine you need to support different platforms?
226 Imagine you need to make changes to the structures? You'll not only
227 have to change the C source but also dozens of "pack" strings in your
228 Perl code. This is no fun. And Perl should be fun.
229
230 Now, wouldn't it be great if you could just read in the C source you've
231 already written and use all the types defined there for packing and
232 unpacking? That's what Convert::Binary::C does.
233
234 Creating a Convert::Binary::C object
235
236 To use Convert::Binary::C just say
237
238 use Convert::Binary::C;
239
240 to load the module. Its interface is completely object oriented, so it
241 doesn't export any functions.
242
243 Next, you need to create a new Convert::Binary::C object. This can be
244 done by either
245
246 $c = Convert::Binary::C->new;
247
248 or
249
250 $c = new Convert::Binary::C;
251
252 You can optionally pass configuration options to the constructor as
253 described in the next section.
254
255 Configuring the object
256
257 To configure a Convert::Binary::C object, you can either call the "con‐
258 figure" method or directly pass the configuration options to the con‐
259 structor. If you want to change byte order and alignment, you can use
260
261 $c->configure(ByteOrder => 'LittleEndian',
262 Alignment => 2);
263
264 or you can change the construction code to
265
266 $c = new Convert::Binary::C ByteOrder => 'LittleEndian',
267 Alignment => 2;
268
269 Either way, the object will now know that it should use little endian
270 (Intel) byte order and 2-byte struct member alignment for packing and
271 unpacking.
272
273 Alternatively, you can use the option names as names of methods to con‐
274 figure the object, like:
275
276 $c->ByteOrder('LittleEndian');
277
278 You can also retrieve information about the current configuration of a
279 Convert::Binary::C object. For details, see the section about the "con‐
280 figure" method.
281
282 Parsing C code
283
284 Convert::Binary::C allows two ways of parsing C source. Either by pars‐
285 ing external C header or C source files:
286
287 $c->parse_file('header.h');
288
289 Or by parsing C code embedded in your script:
290
291 $c->parse(<<'CCODE');
292 struct foo {
293 char ary[3];
294 unsigned short baz;
295 int bar;
296 };
297 CCODE
298
299 Now the object $c will know everything about "struct foo". The example
300 above uses a so-called here-document. It allows to easily embed multi-
301 line strings in your code. You can find more about here-documents in
302 perldata or perlop.
303
304 Since the "parse" and "parse_file" methods throw an exception when a
305 parse error occurs, you usually want to catch these in an "eval" block:
306
307 eval { $c->parse_file('header.h') };
308 if ($@) {
309 # handle error appropriately
310 }
311
312 Perl's special $@ variable will contain an empty string (which evalu‐
313 ates to a false value in boolean context) on success or an error string
314 on failure.
315
316 As another feature, "parse" and "parse_file" return a reference to
317 their object on success, just like "configure" does when you're config‐
318 uring the object. This will allow you to write constructs like this:
319
320 my $c = eval {
321 Convert::Binary::C->new(Include => ['/usr/include'])
322 ->parse_file('header.h')
323 };
324 if ($@) {
325 # handle error appropriately
326 }
327
328 Packing and unpacking
329
330 Convert::Binary::C has two methods, "pack" and "unpack", that act simi‐
331 lar to the functions of same denominator in Perl. To perform the pack‐
332 ing described in the example above, you could write:
333
334 $data = {
335 ary => [1, 2, 3],
336 baz => 40000,
337 bar => -4711,
338 };
339 $binary = $c->pack('foo', $data);
340
341 Unpacking will work exactly the same way, just that the "unpack" method
342 will take a byte string as its input and will return a reference to a
343 (possibly very complex) Perl data structure.
344
345 $binary = get_data_from_memory();
346 $data = $c->unpack('foo', $binary);
347
348 You can now easily access all of the values:
349
350 print "foo.ary[1] = $data->{ary}[1]\n";
351
352 Or you can even more conveniently use the Data::Dumper module:
353
354 use Data::Dumper;
355 print Dumper($data);
356
357 The output would look something like this:
358
359 $VAR1 = {
360 'bar' => -271,
361 'baz' => 5000,
362 'ary' => [
363 42,
364 48,
365 100
366 ]
367 };
368
369 Preprocessor configuration
370
371 Convert::Binary::C uses Thomas Pornin's "ucpp" as an internal C pre‐
372 processor. It is compliant to ISO-C99, so you don't have to worry about
373 using even weird preprocessor constructs in your code.
374
375 If your C source contains includes or depends upon preprocessor
376 defines, you may need to configure the internal preprocessor. Use the
377 "Include" and "Define" configuration options for that:
378
379 $c->configure(Include => ['/usr/include',
380 '/home/mhx/include'],
381 Define => [qw( NDEBUG FOO=42 )]);
382
383 If your code uses system includes, it is most likely that you will need
384 to define the symbols that are usually defined by the compiler.
385
386 On some operating systems, the system includes require the preprocessor
387 to predefine a certain set of assertions. Assertions are supported by
388 "ucpp", and you can define them either in the source code using
389 "#assert" or as a property of the Convert::Binary::C object using
390 "Assert":
391
392 $c->configure(Assert => ['predicate(answer)']);
393
394 Information about defined macros can be retrieved from the preprocessor
395 as long as its configuration isn't changed. The preprocessor is implic‐
396 itly reset if you change one of the following configuration options:
397
398 Include
399 Define
400 Assert
401 HasCPPComments
402 HasMacroVAARGS
403
404 Supported pragma directives
405
406 Convert::Binary::C supports the "pack" pragma to locally override
407 struct member alignment. The supported syntax is as follows:
408
409 #pragma pack( ALIGN )
410 Sets the new alignment to ALIGN. If ALIGN is 0, resets the align‐
411 ment to its original value.
412
413 #pragma pack
414 Resets the alignment to its original value.
415
416 #pragma pack( push, ALIGN )
417 Saves the current alignment on a stack and sets the new alignment
418 to ALIGN. If ALIGN is 0, sets the alignment to the default align‐
419 ment.
420
421 #pragma pack( pop )
422 Restores the alignment to the last value saved on the stack.
423
424 /* Example assumes sizeof( short ) == 2, sizeof( long ) == 4. */
425
426 #pragma pack(1)
427
428 struct nopad {
429 char a; /* no padding bytes between 'a' and 'b' */
430 long b;
431 };
432
433 #pragma pack /* reset to "native" alignment */
434
435 #pragma pack( push, 2 )
436
437 struct pad {
438 char a; /* one padding byte between 'a' and 'b' */
439 long b;
440
441 #pragma pack( push, 1 )
442
443 struct {
444 char c; /* no padding between 'c' and 'd' */
445 short d;
446 } e; /* sizeof( e ) == 3 */
447
448 #pragma pack( pop ); /* back to pack( 2 ) */
449
450 long f; /* one padding byte between 'e' and 'f' */
451 };
452
453 #pragma pack( pop ); /* back to "native" */
454
455 The "pack" pragma as it is currently implemented only affects the maxi‐
456 mum struct member alignment. There are compilers that also allow to
457 specify the minimum struct member alignment. This is not supported by
458 Convert::Binary::C.
459
460 Automatic configuration using "ccconfig"
461
462 As there are over 20 different configuration options, setting all of
463 them correctly can be a lengthy and tedious task.
464
465 The "ccconfig" script, which is bundled with this module, aims at auto‐
466 matically determining the correct compiler configuration by testing the
467 compiler executable. It works for both, native and cross compilers.
468
470 This section covers one of the fundamental features of Con‐
471 vert::Binary::C. It's how type expressions, referred to as TYPEs in the
472 method reference, are handled by the module.
473
474 Many of the methods, namely "pack", "unpack", "sizeof", "typeof", "mem‐
475 ber", "offsetof", "def", "initializer" and "tag", are passed a TYPE to
476 operate on as their first argument.
477
478 Standard Types
479
480 These are trivial. Standard types are simply enum names, struct names,
481 union names, or typedefs. Almost every method that wants a TYPE will
482 accept a standard type.
483
484 For enums, structs and unions, the prefixes "enum", "struct" and
485 "union" are optional. However, if a typedef with the same name exists,
486 like in
487
488 struct foo {
489 int bar;
490 };
491
492 typedef int foo;
493
494 you will have to use the prefix to distinguish between the struct and
495 the typedef. Otherwise, a typedef is always given preference.
496
497 Basic Types
498
499 Basic types, or atomic types, are "int" or "char", for example. It's
500 possible to use these basic types without having parsed any code. You
501 can simply do
502
503 $c = new Convert::Binary::C;
504 $size = $c->sizeof('unsigned long');
505 $data = $c->pack('short int', 42);
506
507 Even though the above works fine, it is not possible to define more
508 complex types on the fly, so
509
510 $size = $c->sizeof('struct { int a, b; }');
511
512 will result in an error.
513
514 Basic types are not supported by all methods. For example, it makes no
515 sense to use "member" or "offsetof" on a basic type. Using "typeof"
516 isn't very useful, but supported.
517
518 Member Expressions
519
520 This is by far the most complex part, depending on the complexity of
521 your data structures. Any standard type that defines a compound or an
522 array may be followed by a member expression to select only a certain
523 part of the data type. Say you have parsed the following C code:
524
525 struct foo {
526 long type;
527 struct {
528 short x, y;
529 } array[20];
530 };
531
532 typedef struct foo matrix[8][8];
533
534 You may want to know the size of the "array" member of "struct foo".
535 This is quite easy:
536
537 print $c->sizeof('foo.array'), " bytes";
538
539 will print
540
541 80 bytes
542
543 depending of course on the "ShortSize" you configured.
544
545 If you wanted to unpack only a single column of "matrix", that's easy
546 as well (and of course it doesn't matter which index you use):
547
548 $column = $c->unpack('matrix[2]', $data);
549
550 Just like in C, it is possible to use out-of-bounds array indices.
551 This means that, for example, despite "array" is declared to have 20
552 elements, the following code
553
554 $size = $c->sizeof('foo.array[4711]');
555 $offset = $c->offsetof('foo', 'array[-13]');
556
557 is perfectly valid and will result in:
558
559 $size = 4
560 $offset = -48
561
562 Member expressions can be arbitrarily complex:
563
564 $type = $c->typeof('matrix[2][3].array[7].y');
565 print "the type is $type";
566
567 will, for example, print
568
569 the type is short
570
571 Member expressions are also used as the second argument to "offsetof".
572
573 Offsets
574
575 Members returned by the "member" method have an optional offset suffix
576 to indicate that the given offset doesn't point to the start of that
577 member. For example,
578
579 $member = $c->member('matrix', 1431);
580 print $member;
581
582 will print
583
584 [2][1].type+3
585
586 If you would use this as a member expression, like in
587
588 $size = $c->sizeof("matrix $member");
589
590 the offset suffix will simply be ignored. Actually, it will be ignored
591 for all methods if it's used in the first argument.
592
593 When used in the second argument to "offsetof", it will usually do what
594 you mean, i. e. the offset suffix, if present, will be considered when
595 determining the offset. This behaviour ensures that
596
597 $member = $c->member('foo', 43);
598 $offset = $c->offsetof('foo', $member);
599 print "'$member' is located at offset $offset of struct foo";
600
601 will always correctly set $offset:
602
603 '.array[9].y+1' is located at offset 43 of struct foo
604
605 If this is not what you mean, e.g. because you want to know the offset
606 where the member returned by "member" starts, you just have to remove
607 the suffix:
608
609 $member =~ s/\+\d+$//;
610 $offset = $c->offsetof('foo', $member);
611 print "'$member' starts at offset $offset of struct foo";
612
613 This would then print:
614
615 '.array[9].y' starts at offset 42 of struct foo
616
618 In a nutshell, tags are properties that you can attach to types.
619
620 You can add tags to types using the "tag" method, and remove them using
621 "tag" or "untag", for example:
622
623 # Attach 'Format' and 'Hooks' tags
624 $c->tag('type', Format => 'String', Hooks => { pack => \&rout });
625
626 $c->untag('type', 'Format'); # Remove only 'Format' tag
627 $c->untag('type'); # Remove all tags
628
629 You can also use "tag" to see which tags are attached to a type, for
630 example:
631
632 $tags = $c->tag('type');
633
634 This would give you:
635
636 $tags = {
637 'Hooks' => {
638 'pack' => \&rout
639 },
640 'Format' => 'String'
641 };
642
643 Currently, there are only a couple of different tags that influence the
644 way data is packed and unpacked. There are probably more tags to come
645 in the future.
646
647 The Format Tag
648
649 One of the tags currently available is the "Format" tag. Using this
650 tag, you can tell a Convert::Binary::C object to pack and unpack a cer‐
651 tain data type in a special way.
652
653 For example, if you have a (fixed length) string type
654
655 typedef char str_type[40];
656
657 this type would, by default, be unpacked as an array of "char"s. That's
658 because it is only an array of "char"s, and Convert::Binary::C doesn't
659 know it is actually used as a string.
660
661 But you can tell Convert::Binary::C that "str_type" is a C string using
662 the "Format" tag:
663
664 $c->tag('str_type', Format => 'String');
665
666 This will make "unpack" (and of course also "pack") treat the binary
667 data like a null-terminated C string:
668
669 $binary = "Hello World!\n\0 this is just some dummy data";
670 $hello = $c->unpack('str_type', $binary);
671 print $hello;
672
673 would thusly print:
674
675 Hello World!
676
677 Of course, this also works the other way round:
678
679 use Data::Hexdumper;
680
681 $binary = $c->pack('str_type', "Just another C::B::C hacker");
682 print hexdump(data => $binary);
683
684 would print:
685
686 0x0000 : 4A 75 73 74 20 61 6E 6F 74 68 65 72 20 43 3A 3A : Just.another.C::
687 0x0010 : 42 3A 3A 43 20 68 61 63 6B 65 72 00 00 00 00 00 : B::C.hacker.....
688 0x0020 : 00 00 00 00 00 00 00 00 : ........
689
690 If you want Convert::Binary::C to not interpret the binary data at all,
691 you can set the "Format" tag to "Binary". This might not be seem very
692 useful, as "pack" and "unpack" would just pass through the unmodified
693 binary data. But you can tag not only whole types, but also compound
694 members. For example
695
696 $c->parse(<<ENDC);
697 struct packet {
698 unsigned short header;
699 unsigned short flags;
700 unsigned char payload[28];
701 };
702 ENDC
703
704 $c->tag('packet.payload', Format => 'Binary');
705
706 would allow you to write:
707
708 read FILE, $payload, $c->sizeof('packet.payload');
709
710 $packet = {
711 header => 4711,
712 flags => 0xf00f,
713 payload => $payload,
714 };
715
716 $binary = $c->pack('packet', $packet);
717
718 print hexdump(data => $binary);
719
720 This would print something like:
721
722 0x0000 : 12 67 F0 0F 6E 6F 0A 6E 6F 0A 6E 6F 0A 6E 6F 0A : .g..no.no.no.no.
723 0x0010 : 6E 6F 0A 6E 6F 0A 6E 6F 0A 6E 6F 0A 6E 6F 0A 6E : no.no.no.no.no.n
724
725 For obvious reasons, it is not allowed to attach a "Format" tag to bit‐
726 field members. Trying to do so will result in an exception being thrown
727 by the "tag" method.
728
729 The ByteOrder Tag
730
731 The "ByteOrder" tag allows you to override the byte order of certain
732 types or members. The implementation of this tag is considered experi‐
733 mental and may be subject to changes in the future.
734
735 Usually it doesn't make much sense to override the byte order, but
736 there may be applications where a sub-structure is packed in a differ‐
737 ent byte order than the surrounding structure.
738
739 Take, for example, the following code:
740
741 $c = Convert::Binary::C->new(ByteOrder => 'BigEndian',
742 OrderMembers => 1);
743 $c->parse(<<'ENDC');
744
745 typedef unsigned short u_16;
746
747 struct coords_3d {
748 long x, y, z;
749 };
750
751 struct coords_msg {
752 u_16 header;
753 u_16 length;
754 struct coords_3d coords;
755 };
756
757 ENDC
758
759 Assume that while "coords_msg" is big endian, the embedded coordinates
760 "coords_3d" are stored in little endian format for some reason. In C,
761 you'll have to handle this manually.
762
763 But using Convert::Binary::C, you can simply attach a "ByteOrder" tag
764 to either the "coords_3d" structure or to the "coords" member of the
765 "coords_msg" structure. Both will work in this case. The only differ‐
766 ence is that if you tag the "coords" member, "coords_3d" will only be
767 treated as little endian if you "pack" or "unpack" the "coords_msg"
768 structure. (BTW, you could also tag all members of "coords_3d" individ‐
769 ually, but that would be inefficient.)
770
771 So, let's attach the "ByteOrder" tag to the "coords" member:
772
773 $c->tag('coords_msg.coords', ByteOrder => 'LittleEndian');
774
775 Assume the following binary message:
776
777 0x0000 : 00 2A 00 0C FF FF FF FF 02 00 00 00 2A 00 00 00 : .*..........*...
778
779 If you unpack this message...
780
781 $msg = $c->unpack('coords_msg', $binary);
782
783 ...you will get the following data structure:
784
785 $msg = {
786 'header' => 42,
787 'length' => 12,
788 'coords' => {
789 'x' => -1,
790 'y' => 2,
791 'z' => 42
792 }
793 };
794
795 Without the "ByteOrder" tag, you would get:
796
797 $msg = {
798 'header' => 42,
799 'length' => 12,
800 'coords' => {
801 'x' => -1,
802 'y' => 33554432,
803 'z' => 704643072
804 }
805 };
806
807 The "ByteOrder" tag is a recursive tag, i.e. it applies to all children
808 of the tagged object recursively. Of course, it is also possible to
809 override a "ByteOrder" tag by attaching another "ByteOrder" tag to a
810 child type. Confused? Here's an example. In addition to tagging the
811 "coords" member as little endian, we now tag "coords_3d.y" as big
812 endian:
813
814 $c->tag('coords_3d.y', ByteOrder => 'BigEndian');
815 $msg = $c->unpack('coords_msg', $binary);
816
817 This will return the following data structure:
818
819 $msg = {
820 'header' => 42,
821 'length' => 12,
822 'coords' => {
823 'x' => -1,
824 'y' => 33554432,
825 'z' => 42
826 }
827 };
828
829 Note that if you tag both a type and a member of that type within a
830 compound, the tag attached to the type itself has higher precedence.
831 Using the example above, if you would attach a "ByteOrder" tag to both
832 "coords_msg.coords" and "coords_3d", the tag attached to "coords_3d"
833 would always win.
834
835 Also note that the "ByteOrder" tag might not work as expected along
836 with bitfields, which is why the implementation is considered experi‐
837 mental. Bitfields are currently not affected by the "ByteOrder" tag at
838 all. This is because the byte order would affect the bitfield layout,
839 and a consistent implementation supporting multiple layouts of the same
840 struct would be quite bulky and probably slow down the whole module.
841
842 If you really need the correct behaviour, you can use the following
843 trick:
844
845 $le = Convert::Binary::C->new(ByteOrder => 'LittleEndian');
846
847 $le->parse(<<'ENDC');
848
849 typedef unsigned short u_16;
850 typedef unsigned long u_32;
851
852 struct message {
853 u_16 header;
854 u_16 length;
855 struct {
856 u_32 a;
857 u_32 b;
858 u_32 c : 7;
859 u_32 d : 5;
860 u_32 e : 20;
861 } data;
862 };
863
864 ENDC
865
866 $be = $le->clone->ByteOrder('BigEndian');
867
868 $le->tag('message.data', Format => 'Binary', Hooks => {
869 unpack => sub { $be->unpack('message.data', @_) },
870 pack => sub { $be->pack('message.data', @_) },
871 });
872
873 $msg = $le->unpack('message', $binary);
874
875 This uses the "Format" and "Hooks" tags along with a big endian "clone"
876 of the original little endian object. It attaches hooks to the little
877 endian object and in the hooks it uses the big endian object to "pack"
878 and "unpack" the binary data.
879
880 The Dimension Tag
881
882 The "Dimension" tag allows you to override the declared dimension of an
883 array for packing or unpacking data. The implementation of this tag is
884 considered very experimental and will definitely change in a future
885 release.
886
887 That being said, the "Dimension" tag is primarily useful to support
888 variable length arrays. Usually, you have to write the following code
889 for such a variable length array in C:
890
891 struct c_message
892 {
893 unsigned count;
894 char data[1];
895 };
896
897 So, because you cannot declare an empty array, you declare an array
898 with a single element. If you have a ISO-C99 compliant compiler, you
899 can write this code instead:
900
901 struct c99_message
902 {
903 unsigned count;
904 char data[];
905 };
906
907 This explicitly tells the compiler that "data" is a flexible array mem‐
908 ber. Convert::Binary::C already uses this information to handle flexi‐
909 ble array members in a special way.
910
911 As you can see in the following example, the two types are treated dif‐
912 ferently:
913
914 $data = pack 'NC*', 3, 1..8;
915 $uc = $c->unpack('c_message', $data);
916 $uc99 = $c->unpack('c99_message', $data);
917
918 This will result in:
919
920 $uc = {'count' => 3,'data' => [1]};
921 $uc99 = {'count' => 3,'data' => [1,2,3,4,5,6,7,8]};
922
923 However, only few compilers support ISO-C99, and you probably don't
924 want to change your existing code only to get some extra features when
925 using Convert::Binary::C.
926
927 So it is possible to attach a tag to the "data" member of the "c_mes‐
928 sage" struct that tells Convert::Binary::C to treat the array as if it
929 were flexible:
930
931 $c->tag('c_message.data', Dimension => '*');
932
933 Now both "c_message" and "c99_message" will behave exactly the same
934 when using "pack" or "unpack". Repeating the above code:
935
936 $uc = $c->unpack('c_message', $data);
937
938 This will result in:
939
940 $uc = {'count' => 3,'data' => [1,2,3,4,5,6,7,8]};
941
942 But there's more you can do. Even though it probably doesn't make much
943 sense, you can tag a fixed dimension to an array:
944
945 $c->tag('c_message.data', Dimension => '5');
946
947 This will obviously result in:
948
949 $uc = {'count' => 3,'data' => [1,2,3,4,5]};
950
951 A more useful way to use the "Dimension" tag is to set it to the name
952 of a member in the same compound:
953
954 $c->tag('c_message.data', Dimension => 'count');
955
956 Convert::Binary::C will now use the value of that member to determine
957 the size of the array, so unpacking will result in:
958
959 $uc = {'count' => 3,'data' => [1,2,3]};
960
961 Of course, you can also tag flexible array members. And yes, it's also
962 possible to use more complex member expressions:
963
964 $c->parse(<<ENDC);
965 struct msg_header
966 {
967 unsigned len[2];
968 };
969
970 struct more_complex
971 {
972 struct msg_header hdr;
973 char data[];
974 };
975 ENDC
976
977 $data = pack 'NNC*', 42, 7, 1 .. 10;
978
979 $c->tag('more_complex.data', Dimension => 'hdr.len[1]');
980
981 $u = $c->unpack('more_complex', $data);
982
983 The result will be:
984
985 $u = {
986 'hdr' => {
987 'len' => [
988 42,
989 7
990 ]
991 },
992 'data' => [
993 1,
994 2,
995 3,
996 4,
997 5,
998 6,
999 7
1000 ]
1001 };
1002
1003 By the way, it's also possible to tag arrays that are not embedded
1004 inside a compound:
1005
1006 $c->parse(<<ENDC);
1007 typedef unsigned short short_array[];
1008 ENDC
1009
1010 $c->tag('short_array', Dimension => '5');
1011
1012 $u = $c->unpack('short_array', $data);
1013
1014 Resulting in:
1015
1016 $u = [0,42,0,7,258];
1017
1018 The final and most powerful way to define a "Dimension" tag is to pass
1019 it a subroutine reference. The referenced subroutine can execute what‐
1020 ever code is neccessary to determine the size of the tagged array:
1021
1022 sub get_size
1023 {
1024 my $m = shift;
1025 return $m->{hdr}{len}[0] / $m->{hdr}{len}[1];
1026 }
1027
1028 $c->tag('more_complex.data', Dimension => \&get_size);
1029
1030 $u = $c->unpack('more_complex', $data);
1031
1032 As you can guess from the above code, the subroutine is being passed a
1033 reference to hash that stores the already unpacked part of the compound
1034 embedding the tagged array. This is the result:
1035
1036 $u = {
1037 'hdr' => {
1038 'len' => [
1039 42,
1040 7
1041 ]
1042 },
1043 'data' => [
1044 1,
1045 2,
1046 3,
1047 4,
1048 5,
1049 6
1050 ]
1051 };
1052
1053 You can also pass custom arguments to the subroutines by using the
1054 "arg" method. This is similar to the functionality offered by the
1055 "Hooks" tag.
1056
1057 Of course, all that also works for the "pack" method as well.
1058
1059 However, the current implementation has at least one shortcomings,
1060 which is why it's experimental: The "Dimension" tag doesn't impact com‐
1061 pound layout. This means that while you can alter the size of an array
1062 in the middle of a compound, the offset of the members after that array
1063 won't be impacted. I'd rather like to see the layout adapt dynamically,
1064 so this is what I'm hoping to implement in the future.
1065
1066 The Hooks Tag
1067
1068 Hooks are a special kind of tag that can be extremely useful.
1069
1070 Using hooks, you can easily override the way "pack" and "unpack" handle
1071 data using your own subroutines. If you define hooks for a certain
1072 data type, each time this data type is processed the corresponding hook
1073 will be called to allow you to modify that data.
1074
1075 Basic Hooks
1076
1077 Here's an example. Let's assume the following C code has been parsed:
1078
1079 typedef unsigned long u_32;
1080 typedef u_32 ProtoId;
1081 typedef ProtoId MyProtoId;
1082
1083 struct MsgHeader {
1084 MyProtoId id;
1085 u_32 len;
1086 };
1087
1088 struct String {
1089 u_32 len;
1090 char buf[];
1091 };
1092
1093 You could now use the types above and, for example, unpack binary data
1094 representing a "MsgHeader" like this:
1095
1096 $msg_header = $c->unpack('MsgHeader', $data);
1097
1098 This would give you:
1099
1100 $msg_header = {
1101 'len' => 13,
1102 'id' => 42
1103 };
1104
1105 Instead of dealing with "ProtoId"'s as integers, you would rather like
1106 to have them as clear text. You could provide subroutines to convert
1107 between clear text and integers:
1108
1109 %proto = (
1110 CATS => 1,
1111 DOGS => 42,
1112 HEDGEHOGS => 4711,
1113 );
1114
1115 %rproto = reverse %proto;
1116
1117 sub ProtoId_unpack {
1118 $rproto{$_[0]} ⎪⎪ 'unknown protocol'
1119 }
1120
1121 sub ProtoId_pack {
1122 $proto{$_[0]} or die 'unknown protocol'
1123 }
1124
1125 You can now register these subroutines by attaching a "Hooks" tag to
1126 "ProtoId" using the "tag" method:
1127
1128 $c->tag('ProtoId', Hooks => { pack => \&ProtoId_pack,
1129 unpack => \&ProtoId_unpack });
1130
1131 Doing exactly the same unpack on "MsgHeader" again would now return:
1132
1133 $msg_header = {
1134 'len' => 13,
1135 'id' => 'DOGS'
1136 };
1137
1138 Actually, if you don't need the reverse operation, you don't even have
1139 to register a "pack" hook. Or, even better, you can have a more intel‐
1140 ligent "unpack" hook that creates a dual-typed variable:
1141
1142 use Scalar::Util qw(dualvar);
1143
1144 sub ProtoId_unpack2 {
1145 dualvar $_[0], $rproto{$_[0]} ⎪⎪ 'unknown protocol'
1146 }
1147
1148 $c->tag('ProtoId', Hooks => { unpack => \&ProtoId_unpack2 });
1149
1150 $msg_header = $c->unpack('MsgHeader', $data);
1151
1152 Just as before, this would print
1153
1154 $msg_header = {
1155 'len' => 13,
1156 'id' => 'DOGS'
1157 };
1158
1159 but without requiring a "pack" hook for packing, at least as long as
1160 you keep the variable dual-typed.
1161
1162 Hooks are usually called with exactly one argument, which is the data
1163 that should be processed (see "Advanced Hooks" for details on how to
1164 customize hook arguments). They are called in scalar context and
1165 expected to return the processed data.
1166
1167 To get rid of registered hooks, you can either undefine only certain
1168 hooks
1169
1170 $c->tag('ProtoId', Hooks => { pack => undef });
1171
1172 or all hooks:
1173
1174 $c->tag('ProtoId', Hooks => undef);
1175
1176 Of course, hooks are not restricted to handling integer values. You
1177 could just as well attach hooks for the "String" struct from the code
1178 above. A useful example would be to have these hooks:
1179
1180 sub string_unpack {
1181 my $s = shift;
1182 pack "c$s->{len}", @{$s->{buf}};
1183 }
1184
1185 sub string_pack {
1186 my $s = shift;
1187 return {
1188 len => length $s,
1189 buf => [ unpack 'c*', $s ],
1190 }
1191 }
1192
1193 (Don't be confused by the fact that the "unpack" hook uses "pack" and
1194 the "pack" hook uses "unpack". And also see "Advanced Hooks" for a
1195 more clever approach.)
1196
1197 While you would normally get the following output when unpacking a
1198 "String"
1199
1200 $string = {
1201 'len' => 12,
1202 'buf' => [
1203 72,
1204 101,
1205 108,
1206 108,
1207 111,
1208 32,
1209 87,
1210 111,
1211 114,
1212 108,
1213 100,
1214 33
1215 ]
1216 };
1217
1218 you could just register the hooks using
1219
1220 $c->tag('String', Hooks => { pack => \&string_pack,
1221 unpack => \&string_unpack });
1222
1223 and you would get a nice human-readable Perl string:
1224
1225 $string = 'Hello World!';
1226
1227 Packing a string turns out to be just as easy:
1228
1229 use Data::Hexdumper;
1230
1231 $data = $c->pack('String', 'Just another Perl hacker,');
1232
1233 print hexdump(data => $data);
1234
1235 This would print:
1236
1237 0x0000 : 00 00 00 19 4A 75 73 74 20 61 6E 6F 74 68 65 72 : ....Just.another
1238 0x0010 : 20 50 65 72 6C 20 68 61 63 6B 65 72 2C : .Perl.hacker,
1239
1240 If you want to find out if or which hooks are registered for a certain
1241 type, you can also use the "tag" method:
1242
1243 $hooks = $c->tag('String', 'Hooks');
1244
1245 This would return:
1246
1247 $hooks = {
1248 'unpack' => \&string_unpack,
1249 'pack' => \&string_pack
1250 };
1251
1252 Advanced Hooks
1253
1254 It is also possible to combine hooks with using the "Format" tag. This
1255 can be useful if you know better than Convert::Binary::C how to inter‐
1256 pret the binary data. In the previous section, we've handled this type
1257
1258 struct String {
1259 u_32 len;
1260 char buf[];
1261 };
1262
1263 with the following hooks:
1264
1265 sub string_unpack {
1266 my $s = shift;
1267 pack "c$s->{len}", @{$s->{buf}};
1268 }
1269
1270 sub string_pack {
1271 my $s = shift;
1272 return {
1273 len => length $s,
1274 buf => [ unpack 'c*', $s ],
1275 }
1276 }
1277
1278 $c->tag('String', Hooks => { pack => \&string_pack,
1279 unpack => \&string_unpack });
1280
1281 As you can see in the hook code, "buf" is expected to be an array of
1282 characters. For the "unpack" case Convert::Binary::C first turns the
1283 binary data into a Perl array, and then the hook packs it back into a
1284 string. The intermediate array creation and destruction is completely
1285 useless. Same thing, of course, for the "pack" case.
1286
1287 Here's a clever way to handle this. Just tag "buf" as binary
1288
1289 $c->tag('String.buf', Format => 'Binary');
1290
1291 and use the following hooks instead:
1292
1293 sub string_unpack2 {
1294 my $s = shift;
1295 substr $s->{buf}, 0, $s->{len};
1296 }
1297
1298 sub string_pack2 {
1299 my $s = shift;
1300 return {
1301 len => length $s,
1302 buf => $s,
1303 }
1304 }
1305
1306 $c->tag('String', Hooks => { pack => \&string_pack2,
1307 unpack => \&string_unpack2 });
1308
1309 This will be exactly equivalent to the old code, but faster and proba‐
1310 bly even much easier to understand.
1311
1312 But hooks are even more powerful. You can customize the arguments that
1313 are passed to your hooks and you can use "arg" to pass certain special
1314 arguments, such as the name of the type that is currently being pro‐
1315 cessed by the hook.
1316
1317 The following example shows how it is easily possible to peek into the
1318 perl internals using hooks.
1319
1320 use Config;
1321
1322 $c = new Convert::Binary::C %CC, OrderMembers => 1;
1323 $c->Include(["$Config{archlib}/CORE", @{$c->Include}]);
1324 $c->parse(<<ENDC);
1325 #include "EXTERN.h"
1326 #include "perl.h"
1327 ENDC
1328
1329 $c->tag($_, Hooks => { unpack_ptr => [\&unpack_ptr,
1330 $c->arg(qw(SELF TYPE DATA))] })
1331 for qw( XPVAV XPVHV );
1332
1333 First, we add the perl core include path and parse perl.h. Then, we add
1334 an "unpack_ptr" hook for a couple of the internal data types.
1335
1336 The "unpack_ptr" and "pack_ptr" hooks are called whenever a pointer to
1337 a certain data structure is processed. This is by far the most experi‐
1338 mental part of the hooks feature, as this includes any kind of pointer.
1339 There's no way for the hook to know the difference between a plain
1340 pointer, or a pointer to a pointer, or a pointer to an array (this is
1341 because the difference doesn't matter anywhere else in Con‐
1342 vert::Binary::C).
1343
1344 But the hook above makes use of another very interesting feature: It
1345 uses "arg" to pass special arguments to the hook subroutine. Usually,
1346 the hook subroutine is simply passed a single data argument. But using
1347 the above definition, it'll get a reference to the calling object
1348 ("SELF"), the name of the type being processed ("TYPE") and the data
1349 ("DATA").
1350
1351 But how does our hook look like?
1352
1353 sub unpack_ptr {
1354 my($self, $type, $ptr) = @_;
1355 $ptr or return '<NULL>';
1356 my $size = $self->sizeof($type);
1357 $self->unpack($type, unpack("P$size", pack('I', $ptr)));
1358 }
1359
1360 As you can see, the hook is rather simple. First, it receives the argu‐
1361 ments mentioned above. It performs a quick check if the pointer is
1362 "NULL" and shouldn't be processed any further. Next, it determines the
1363 size of the type being processed. And finally, it'll just use the "P"n
1364 unpack template to read from that memory location and recursively call
1365 "unpack" to unpack the type. (And yes, this may of course again call
1366 other hooks.)
1367
1368 Now, let's test that:
1369
1370 my $ref = { foo => 42, bar => 4711 };
1371 my $ptr = hex(("$ref" =~ /\(0x([[:xdigit:]]+)\)$/)[0]);
1372
1373 print Dumper(unpack_ptr($c, 'AV', $ptr));
1374
1375 Just for the fun of it, we create a blessed array reference. But how do
1376 we get a pointer to the corresponding "AV"? This is rather easy, as the
1377 address of the "AV" is just the hex value that appears when using the
1378 array reference in string context. So we just grab that and turn it
1379 into decimal. All that's left to do is just call our hook, as it can
1380 already handle "AV" pointers. And this is what we get:
1381
1382 $VAR1 = {
1383 'sv_any' => {
1384 'xnv_u' => {
1385 'xnv_nv' => '2.18376848395956105e-4933',
1386 'xgv_stash' => 0,
1387 'xpad_cop_seq' => {
1388 'xlow' => 0,
1389 'xhigh' => 139484332
1390 },
1391 'xbm_s' => {
1392 'xbm_previous' => 0,
1393 'xbm_flags' => 172,
1394 'xbm_rare' => 92
1395 }
1396 },
1397 'xav_fill' => 2,
1398 'xav_max' => 7,
1399 'xiv_u' => {
1400 'xivu_iv' => 2,
1401 'xivu_uv' => 2,
1402 'xivu_p1' => 2,
1403 'xivu_i32' => 2,
1404 'xivu_namehek' => 2,
1405 'xivu_hv' => 2
1406 },
1407 'xmg_u' => {
1408 'xmg_magic' => 0,
1409 'xmg_ourstash' => 0
1410 },
1411 'xmg_stash' => 0
1412 },
1413 'sv_refcnt' => 1,
1414 'sv_flags' => 536870924,
1415 'sv_u' => {
1416 'svu_iv' => 139483844,
1417 'svu_uv' => 139483844,
1418 'svu_rv' => 139483844,
1419 'svu_pv' => 139483844,
1420 'svu_array' => 139483844,
1421 'svu_hash' => 139483844,
1422 'svu_gp' => 139483844
1423 }
1424 };
1425
1426 Even though it is rather easy to do such stuff using "unpack_ptr"
1427 hooks, you should really know what you're doing and do it with extreme
1428 care because of the limitations mentioned above. It's really easy to
1429 run into segmentation faults when you're dereferencing pointers that
1430 point to memory which you don't own.
1431
1432 Performance
1433
1434 Using hooks isn't for free. In performance-critical applications you
1435 have to keep in mind that hooks are actually perl subroutines and that
1436 they are called once for every value of a registered type that is being
1437 packed or unpacked. If only about 10% of the values require hooks to be
1438 called, you'll hardly notice the difference (if your hooks are imple‐
1439 mented efficiently, that is). But if all values would require hooks to
1440 be called, that alone could easily make packing and unpacking very
1441 slow.
1442
1443 Tag Order
1444
1445 Since it is possible to attach multiple tags to a single type, the
1446 order in which the tags are processed is important. Here's a small ta‐
1447 ble that shows the processing order.
1448
1449 pack unpack
1450 ---------------------
1451 Hooks Format
1452 Format ByteOrder
1453 ByteOrder Hooks
1454
1455 As a general rule, the "Hooks" tag is always the first thing processed
1456 when packing data, and the last thing processed when unpacking data.
1457
1458 The "Format" and "ByteOrder" tags are exclusive, but when both are
1459 given the "Format" tag wins.
1460
1462 new
1463
1464 "new"
1465 "new" OPTION1 => VALUE1, OPTION2 => VALUE2, ...
1466 The constructor is used to create a new Convert::Binary::C
1467 object. You can simply use
1468
1469 $c = new Convert::Binary::C;
1470
1471 without additional arguments to create an object, or you can
1472 optionally pass any arguments to the constructor that are
1473 described for the "configure" method.
1474
1475 configure
1476
1477 "configure"
1478 "configure" OPTION
1479 "configure" OPTION1 => VALUE1, OPTION2 => VALUE2, ...
1480 This method can be used to configure an existing Con‐
1481 vert::Binary::C object or to retrieve its current configura‐
1482 tion.
1483
1484 To configure the object, the list of options consists of key
1485 and value pairs and must therefore contain an even number of
1486 elements. "configure" (and also "new" if used with configura‐
1487 tion options) will throw an exception if you pass an odd number
1488 of elements. Configuration will normally look like this:
1489
1490 $c->configure(ByteOrder => 'BigEndian', IntSize => 2);
1491
1492 To retrieve the current value of a configuration option, you
1493 must pass a single argument to "configure" that holds the name
1494 of the option, just like
1495
1496 $order = $c->configure('ByteOrder');
1497
1498 If you want to get the values of all configuration options at
1499 once, you can call "configure" without any arguments and it
1500 will return a reference to a hash table that holds the whole
1501 object configuration. This can be conveniently used with the
1502 Data::Dumper module, for example:
1503
1504 use Convert::Binary::C;
1505 use Data::Dumper;
1506
1507 $c = new Convert::Binary::C Define => ['DEBUGGING', 'FOO=123'],
1508 Include => ['/usr/include'];
1509
1510 print Dumper($c->configure);
1511
1512 Which will print something like this:
1513
1514 $VAR1 = {
1515 'Define' => [
1516 'DEBUGGING',
1517 'FOO=123'
1518 ],
1519 'StdCVersion' => 199901,
1520 'ByteOrder' => 'LittleEndian',
1521 'LongSize' => 4,
1522 'IntSize' => 4,
1523 'HostedC' => 1,
1524 'ShortSize' => 2,
1525 'HasMacroVAARGS' => 1,
1526 'Assert' => [],
1527 'UnsignedChars' => 0,
1528 'DoubleSize' => 8,
1529 'CharSize' => 1,
1530 'EnumType' => 'Integer',
1531 'PointerSize' => 4,
1532 'EnumSize' => 4,
1533 'DisabledKeywords' => [],
1534 'FloatSize' => 4,
1535 'Alignment' => 1,
1536 'LongLongSize' => 8,
1537 'LongDoubleSize' => 12,
1538 'KeywordMap' => {},
1539 'Include' => [
1540 '/usr/include'
1541 ],
1542 'HasCPPComments' => 1,
1543 'Bitfields' => {
1544 'Engine' => 'Generic'
1545 },
1546 'UnsignedBitfields' => 0,
1547 'Warnings' => 0,
1548 'CompoundAlignment' => 1,
1549 'OrderMembers' => 0
1550 };
1551
1552 Since you may not always want to write a "configure" call when
1553 you only want to change a single configuration item, you can
1554 use any configuration option name as a method name, like:
1555
1556 $c->ByteOrder('LittleEndian') if $c->IntSize < 4;
1557
1558 (Yes, the example doesn't make very much sense... ;-)
1559
1560 However, you should keep in mind that configuration methods
1561 that can take lists (namely "Include", "Define" and "Assert",
1562 but not "DisabledKeywords") may behave slightly different than
1563 their "configure" equivalent. If you pass these methods a sin‐
1564 gle argument that is an array reference, the current list will
1565 be replaced by the new one, which is just the behaviour of the
1566 corresponding "configure" call. So the following are equiva‐
1567 lent:
1568
1569 $c->configure(Define => ['foo', 'bar=123']);
1570 $c->Define(['foo', 'bar=123']);
1571
1572 But if you pass a list of strings instead of an array reference
1573 (which cannot be done when using "configure"), the new list
1574 items are appended to the current list, so
1575
1576 $c = new Convert::Binary::C Include => ['/include'];
1577 $c->Include('/usr/include', '/usr/local/include');
1578 print Dumper($c->Include);
1579
1580 $c->Include(['/usr/local/include']);
1581 print Dumper($c->Include);
1582
1583 will first print all three include paths, but finally only
1584 "/usr/local/include" will be configured:
1585
1586 $VAR1 = [
1587 '/include',
1588 '/usr/include',
1589 '/usr/local/include'
1590 ];
1591 $VAR1 = [
1592 '/usr/local/include'
1593 ];
1594
1595 Furthermore, configuration methods can be chained together, as
1596 they return a reference to their object if called as a set
1597 method. So, if you like, you can configure your object like
1598 this:
1599
1600 $c = Convert::Binary::C->new(IntSize => 4)
1601 ->Define(qw( __DEBUG__ DB_LEVEL=3 ))
1602 ->ByteOrder('BigEndian');
1603
1604 $c->configure(EnumType => 'Both', Alignment => 4)
1605 ->Include('/usr/include', '/usr/local/include');
1606
1607 In the example above, "qw( ... )" is the word list quoting
1608 operator. It returns a list of all non-whitespace sequences,
1609 and is especially useful for configuring preprocessor defines
1610 or assertions. The following assignments are equivalent:
1611
1612 @array = ('one', 'two', 'three');
1613 @array = qw(one two three);
1614
1615 You can configure the following options. Unknown options, as
1616 well as invalid values for an option, will cause the object to
1617 throw exceptions.
1618
1619 "IntSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1620 Set the number of bytes that are occupied by an integer.
1621 This is in most cases 2 or 4. If you set it to zero, the
1622 size of an integer on the host system will be used. This is
1623 also the default unless overridden by
1624 "CBC_DEFAULT_INT_SIZE" at compile time.
1625
1626 "CharSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1627 Set the number of bytes that are occupied by a "char".
1628 This rarely needs to be changed, except for some platforms
1629 that don't care about bytes, for example DSPs. If you set
1630 this to zero, the size of a "char" on the host system will
1631 be used. This is also the default unless overridden by
1632 "CBC_DEFAULT_CHAR_SIZE" at compile time.
1633
1634 "ShortSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1635 Set the number of bytes that are occupied by a short inte‐
1636 ger. Although integers explicitly declared as "short"
1637 should be always 16 bit, there are compilers that make a
1638 short 8 bit wide. If you set it to zero, the size of a
1639 short integer on the host system will be used. This is also
1640 the default unless overridden by "CBC_DEFAULT_SHORT_SIZE"
1641 at compile time.
1642
1643 "LongSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1644 Set the number of bytes that are occupied by a long inte‐
1645 ger. If set to zero, the size of a long integer on the
1646 host system will be used. This is also the default unless
1647 overridden by "CBC_DEFAULT_LONG_SIZE" at compile time.
1648
1649 "LongLongSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1650 Set the number of bytes that are occupied by a long long
1651 integer. If set to zero, the size of a long long integer on
1652 the host system, or 8, will be used. This is also the
1653 default unless overridden by "CBC_DEFAULT_LONG_LONG_SIZE"
1654 at compile time.
1655
1656 "FloatSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8 ⎪ 12 ⎪ 16
1657 Set the number of bytes that are occupied by a single pre‐
1658 cision floating point value. If you set it to zero, the
1659 size of a "float" on the host system will be used. This is
1660 also the default unless overridden by
1661 "CBC_DEFAULT_FLOAT_SIZE" at compile time. For details on
1662 floating point support, see "FLOATING POINT VALUES".
1663
1664 "DoubleSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8 ⎪ 12 ⎪ 16
1665 Set the number of bytes that are occupied by a double pre‐
1666 cision floating point value. If you set it to zero, the
1667 size of a "double" on the host system will be used. This is
1668 also the default unless overridden by "CBC_DEFAULT_DOU‐
1669 BLE_SIZE" at compile time. For details on floating point
1670 support, see "FLOATING POINT VALUES".
1671
1672 "LongDoubleSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8 ⎪ 12 ⎪ 16
1673 Set the number of bytes that are occupied by a double pre‐
1674 cision floating point value. If you set it to zero, the
1675 size of a "long double" on the host system, or 12 will be
1676 used. This is also the default unless overridden by
1677 "CBC_DEFAULT_LONG_DOUBLE_SIZE" at compile time. For details
1678 on floating point support, see "FLOATING POINT VALUES".
1679
1680 "PointerSize" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1681 Set the number of bytes that are occupied by a pointer.
1682 This is in most cases 2 or 4. If you set it to zero, the
1683 size of a pointer on the host system will be used. This is
1684 also the default unless overridden by
1685 "CBC_DEFAULT_PTR_SIZE" at compile time.
1686
1687 "EnumSize" => -1 ⎪ 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8
1688 Set the number of bytes that are occupied by an enumeration
1689 type. On most systems, this is equal to the size of an
1690 integer, which is also the default. However, for some com‐
1691 pilers, the size of an enumeration type depends on the size
1692 occupied by the largest enumerator. So the size may vary
1693 between 1 and 8. If you have
1694
1695 enum foo {
1696 ONE = 100, TWO = 200
1697 };
1698
1699 this will occupy one byte because the enum can be repre‐
1700 sented as an unsigned one-byte value. However,
1701
1702 enum foo {
1703 ONE = -100, TWO = 200
1704 };
1705
1706 will occupy two bytes, because the -100 forces the type to
1707 be signed, and 200 doesn't fit into a signed one-byte
1708 value. Therefore, the type used is a signed two-byte
1709 value. If this is the behaviour you need, set the EnumSize
1710 to 0.
1711
1712 Some compilers try to follow this strategy, but don't care
1713 whether the enumeration has signed values or not. They
1714 always declare an enum as signed. On such a compiler, given
1715
1716 enum one { ONE = -100, TWO = 100 };
1717 enum two { ONE = 100, TWO = 200 };
1718
1719 enum "one" will occupy only one byte, while enum "two" will
1720 occupy two bytes, even though it could be represented by a
1721 unsigned one-byte value. If this is the behaviour of your
1722 compiler, set EnumSize to "-1".
1723
1724 "Alignment" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8 ⎪ 16
1725 Set the struct member alignment. This option controls where
1726 padding bytes are inserted between struct members. It glob‐
1727 ally sets the alignment for all structs/unions. However,
1728 this can be overridden from within the source code with the
1729 common "pack" pragma as explained in "Supported pragma
1730 directives". The default alignment is 1, which means no
1731 padding bytes are inserted. A setting of 0 means native
1732 alignment, i.e. the alignment of the system that Con‐
1733 vert::Binary::C has been compiled on. You can determine the
1734 native properties using the "native" function.
1735
1736 The "Alignment" option is similar to the "-Zp[n]" option of
1737 the Intel compiler. It globally specifies the maximum
1738 boundary to which struct members are aligned. Consider the
1739 following structure and the sizes of "char", "short",
1740 "long" and "double" being 1, 2, 4 and 8, respectively.
1741
1742 struct align {
1743 char a;
1744 short b, c;
1745 long d;
1746 double e;
1747 };
1748
1749 With an alignment of 1 (the default), the struct members
1750 would be packed tightly:
1751
1752 0 1 2 3 4 5 6 7 8 9 10 11 12
1753 +---+---+---+---+---+---+---+---+---+---+---+---+
1754 ⎪ a ⎪ b ⎪ c ⎪ d ⎪ ...
1755 +---+---+---+---+---+---+---+---+---+---+---+---+
1756
1757 12 13 14 15 16 17
1758 +---+---+---+---+---+
1759 ... e ⎪
1760 +---+---+---+---+---+
1761
1762 With an alignment of 2, the struct members larger than one
1763 byte would be aligned to 2-byte boundaries, which results
1764 in a single padding byte between "a" and "b".
1765
1766 0 1 2 3 4 5 6 7 8 9 10 11 12
1767 +---+---+---+---+---+---+---+---+---+---+---+---+
1768 ⎪ a ⎪ * ⎪ b ⎪ c ⎪ d ⎪ ...
1769 +---+---+---+---+---+---+---+---+---+---+---+---+
1770
1771 12 13 14 15 16 17 18
1772 +---+---+---+---+---+---+
1773 ... e ⎪
1774 +---+---+---+---+---+---+
1775
1776 With an alignment of 4, the struct members of size 2 would
1777 be aligned to 2-byte boundaries and larger struct members
1778 would be aligned to 4-byte boundaries:
1779
1780 0 1 2 3 4 5 6 7 8 9 10 11 12
1781 +---+---+---+---+---+---+---+---+---+---+---+---+
1782 ⎪ a ⎪ * ⎪ b ⎪ c ⎪ * ⎪ * ⎪ d ⎪ ...
1783 +---+---+---+---+---+---+---+---+---+---+---+---+
1784
1785 12 13 14 15 16 17 18 19 20
1786 +---+---+---+---+---+---+---+---+
1787 ... ⎪ e ⎪
1788 +---+---+---+---+---+---+---+---+
1789
1790 This layout of the struct members allows the compiler to
1791 generate optimized code because aligned members can be
1792 accessed more easily by the underlying architecture.
1793
1794 Finally, setting the alignment to 8 will align "double"s to
1795 8-byte boundaries:
1796
1797 0 1 2 3 4 5 6 7 8 9 10 11 12
1798 +---+---+---+---+---+---+---+---+---+---+---+---+
1799 ⎪ a ⎪ * ⎪ b ⎪ c ⎪ * ⎪ * ⎪ d ⎪ ...
1800 +---+---+---+---+---+---+---+---+---+---+---+---+
1801
1802 12 13 14 15 16 17 18 19 20 21 22 23 24
1803 +---+---+---+---+---+---+---+---+---+---+---+---+
1804 ... ⎪ * ⎪ * ⎪ * ⎪ * ⎪ e ⎪
1805 +---+---+---+---+---+---+---+---+---+---+---+---+
1806
1807 Further increasing the alignment does not alter the layout
1808 of our structure, as only members larger that 8 bytes would
1809 be affected.
1810
1811 The alignment of a structure depends on its largest member
1812 and on the setting of the "Alignment" option. With "Align‐
1813 ment" set to 2, a structure holding a "long" would be
1814 aligned to a 2-byte boundary, while a structure containing
1815 only "char"s would have no alignment restrictions. (Unfor‐
1816 tunately, that's not the whole story. See the "Com‐
1817 poundAlignment" option for details.)
1818
1819 Here's another example. Assuming 8-byte alignment, the fol‐
1820 lowing two structs will both have a size of 16 bytes:
1821
1822 struct one {
1823 char c;
1824 double d;
1825 };
1826
1827 struct two {
1828 double d;
1829 char c;
1830 };
1831
1832 This is clear for "struct one", because the member "d" has
1833 to be aligned to an 8-byte boundary, and thus 7 padding
1834 bytes are inserted after "c". But for "struct two", the
1835 padding bytes are inserted at the end of the structure,
1836 which doesn't make much sense immediately. However, it
1837 makes perfect sense if you think about an array of "struct
1838 two". Each "double" has to be aligned to an 8-byte bound‐
1839 ary, an thus each array element would have to occupy 16
1840 bytes. With that in mind, it would be strange if a "struct
1841 two" variable would have a different size. And it would
1842 make the widely used construct
1843
1844 struct two array[] = { {1.0, 0}, {2.0, 1} };
1845 int elements = sizeof(array) / sizeof(struct two);
1846
1847 impossible.
1848
1849 The alignment behaviour described here seems to be common
1850 for all compilers. However, not all compilers have an
1851 option to configure their default alignment.
1852
1853 "CompoundAlignment" => 0 ⎪ 1 ⎪ 2 ⎪ 4 ⎪ 8 ⎪ 16
1854 Usually, the alignment of a compound (i.e. a "struct" or a
1855 "union") depends only on its largest member and on the set‐
1856 ting of the "Alignment" option. There are, however, archi‐
1857 tectures and compilers where compounds can have different
1858 alignment constraints.
1859
1860 For most platforms and compilers, the alignment constraint
1861 for compounds is 1 byte. That is, on most platforms
1862
1863 struct onebyte {
1864 char byte;
1865 };
1866
1867 will have an alignment of 1 and also a size of 1. But if
1868 you take an ARM architecture, the above "struct onebyte"
1869 will have an alignment of 4, and thus also a size of 4.
1870
1871 You can configure this by setting "CompoundAlignment" to 4.
1872 This will ensure that the alignment of compounds is always
1873 4.
1874
1875 Setting "CompoundAlignment" to 0 means native compound
1876 alignment, i.e. the compound alignment of the system that
1877 Convert::Binary::C has been compiled on. You can determine
1878 the native properties using the "native" function.
1879
1880 There are also compilers for certain platforms that allow
1881 you to adjust the compound alignment. If you're not aware
1882 of the fact that your compiler/architecture has a compound
1883 alignment other than 1, strange things can happen. If, for
1884 example, the compound alignment is 2 and you have something
1885 like
1886
1887 typedef unsigned char U8;
1888
1889 struct msg_head {
1890 U8 cmd;
1891 struct {
1892 U8 hi;
1893 U8 low;
1894 } crc16;
1895 U8 len;
1896 };
1897
1898 there will be one padding byte inserted before the embedded
1899 "crc16" struct and after the "len" member, which is most
1900 probably not what was intended:
1901
1902 0 1 2 3 4 5 6
1903 +-----+-----+-----+-----+-----+-----+
1904 ⎪ cmd ⎪ * ⎪ hi ⎪ low ⎪ len ⎪ * ⎪
1905 +-----+-----+-----+-----+-----+-----+
1906
1907 Note that both "#pragma pack" and the "Alignment" option
1908 can override "CompoundAlignment". If you set "Com‐
1909 poundAlignment" to 4, but "Alignment" to 2, compounds will
1910 actually be aligned on 2-byte boundaries.
1911
1912 "ByteOrder" => 'BigEndian' ⎪ 'LittleEndian'
1913 Set the byte order for integers larger than a single byte.
1914 Little endian (Intel, least significant byte first) and big
1915 endian (Motorola, most significant byte first) byte order
1916 are supported. The default byte order is the same as the
1917 byte order of the host system unless overridden by
1918 "CBC_DEFAULT_BYTEORDER" at compile time.
1919
1920 "EnumType" => 'Integer' ⎪ 'String' ⎪ 'Both'
1921 This option controls the type that enumeration constants
1922 will have in data structures returned by the "unpack"
1923 method. If you have the following definitions:
1924
1925 typedef enum {
1926 SUNDAY, MONDAY, TUESDAY, WEDNESDAY,
1927 THURSDAY, FRIDAY, SATURDAY
1928 } Weekday;
1929
1930 typedef enum {
1931 JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY,
1932 AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER
1933 } Month;
1934
1935 typedef struct {
1936 int year;
1937 Month month;
1938 int day;
1939 Weekday weekday;
1940 } Date;
1941
1942 and a byte string that holds a packed Date struct, then
1943 you'll get the following results from a call to the
1944 "unpack" method.
1945
1946 "Integer"
1947 Enumeration constants are returned as plain integers.
1948 This is fast, but may be not very useful. It is also
1949 the default.
1950
1951 $date = {
1952 'weekday' => 1,
1953 'month' => 0,
1954 'day' => 7,
1955 'year' => 2002
1956 };
1957
1958 "String"
1959 Enumeration constants are returned as strings. This
1960 will create a string constant for every unpacked enu‐
1961 meration constant and thus consumes more time and mem‐
1962 ory. However, the result may be more useful.
1963
1964 $date = {
1965 'weekday' => 'MONDAY',
1966 'month' => 'JANUARY',
1967 'day' => 7,
1968 'year' => 2002
1969 };
1970
1971 "Both"
1972 Enumeration constants are returned as double typed
1973 scalars. If evaluated in string context, the enumera‐
1974 tion constant will be a string, if evaluated in numeric
1975 context, the enumeration constant will be an integer.
1976
1977 $date = $c->EnumType('Both')->unpack('Date', $binary);
1978
1979 printf "Weekday = %s (%d)\n\n", $date->{weekday},
1980 $date->{weekday};
1981
1982 if ($date->{month} == 0) {
1983 print "It's $date->{month}, happy new year!\n\n";
1984 }
1985
1986 print Dumper($date);
1987
1988 This will print:
1989
1990 Weekday = MONDAY (1)
1991
1992 It's JANUARY, happy new year!
1993
1994 $VAR1 = {
1995 'weekday' => 'MONDAY',
1996 'month' => 'JANUARY',
1997 'day' => 7,
1998 'year' => 2002
1999 };
2000
2001 "DisabledKeywords" => [ KEYWORDS ]
2002 This option allows you to selectively deactivate certain
2003 keywords in the C parser. Some C compilers don't have the
2004 complete ANSI keyword set, i.e. they don't recognize the
2005 keywords "const" or "void", for example. If you do
2006
2007 typedef int void;
2008
2009 on such a compiler, this will usually be ok. But if you
2010 parse this with an ANSI compiler, it will be a syntax
2011 error. To parse the above code correctly, you have to dis‐
2012 able the "void" keyword in the Convert::Binary::C parser:
2013
2014 $c->DisabledKeywords([qw( void )]);
2015
2016 By default, the Convert::Binary::C parser will recognize
2017 the keywords "inline" and "restrict". If your compiler
2018 doesn't have these new keywords, it usually doesn't matter.
2019 Only if you're using the keywords as identifiers, like in
2020
2021 typedef struct inline {
2022 int a, b;
2023 } restrict;
2024
2025 you'll have to disable these ISO-C99 keywords:
2026
2027 $c->DisabledKeywords([qw( inline restrict )]);
2028
2029 The parser allows you to disable the following keywords:
2030
2031 asm
2032 auto
2033 const
2034 double
2035 enum
2036 extern
2037 float
2038 inline
2039 long
2040 register
2041 restrict
2042 short
2043 signed
2044 static
2045 unsigned
2046 void
2047 volatile
2048
2049 "KeywordMap" => { KEYWORD => TOKEN, ... }
2050 This option allows you to add new keywords to the parser.
2051 These new keywords can either be mapped to existing tokens
2052 or simply ignored. For example, recent versions of the GNU
2053 compiler recognize the keywords "__signed__" and "__exten‐
2054 sion__". The first one obviously is a synonym for
2055 "signed", while the second one is only a marker for a lan‐
2056 guage extension.
2057
2058 Using the preprocessor, you could of course do the follow‐
2059 ing:
2060
2061 $c->Define(qw( __signed__=signed __extension__= ));
2062
2063 However, the preprocessor symbols could be undefined or
2064 redefined in the code, and
2065
2066 #ifdef __signed__
2067 # undef __signed__
2068 #endif
2069
2070 typedef __extension__ __signed__ long long s_quad;
2071
2072 would generate a parse error, because "__signed__" is an
2073 unexpected identifier.
2074
2075 Instead of utilizing the preprocessor, you'll have to cre‐
2076 ate mappings for the new keywords directly in the parser
2077 using "KeywordMap". In the above example, you want to map
2078 "__signed__" to the built-in C keyword "signed" and ignore
2079 "__extension__". This could be done with the following
2080 code:
2081
2082 $c->KeywordMap({ __signed__ => 'signed',
2083 __extension__ => undef });
2084
2085 You can specify any valid identifier as hash key, and
2086 either a valid C keyword or "undef" as hash value. Having
2087 configured the object that way, you could parse even
2088
2089 #ifdef __signed__
2090 # undef __signed__
2091 #endif
2092
2093 typedef __extension__ __signed__ long long s_quad;
2094
2095 without problems.
2096
2097 Note that "KeywordMap" and "DisabledKeywords" perfectly
2098 work together. You could, for example, disable the "signed"
2099 keyword, but still have "__signed__" mapped to the original
2100 "signed" token:
2101
2102 $c->configure(DisabledKeywords => [ 'signed' ],
2103 KeywordMap => { __signed__ => 'signed' });
2104
2105 This would allow you to define
2106
2107 typedef __signed__ long signed;
2108
2109 which would normally be a syntax error because "signed"
2110 cannot be used as an identifier.
2111
2112 "UnsignedChars" => 0 ⎪ 1
2113 Use this boolean option if you want characters to be
2114 unsigned if specified without an explicit "signed" or
2115 "unsigned" type specifier. By default, characters are
2116 signed.
2117
2118 "UnsignedBitfields" => 0 ⎪ 1
2119 Use this boolean option if you want bitfields to be
2120 unsigned if specified without an explicit "signed" or
2121 "unsigned" type specifier. By default, bitfields are
2122 signed.
2123
2124 "Warnings" => 0 ⎪ 1
2125 Use this boolean option if you want warnings to be issued
2126 during the parsing of source code. Currently, warnings are
2127 only reported by the preprocessor, so don't expect the out‐
2128 put to cover everything.
2129
2130 By default, warnings are turned off and only errors will be
2131 reported. However, even these errors are turned off if you
2132 run without the "-w" flag.
2133
2134 "HasCPPComments" => 0 ⎪ 1
2135 Use this option to turn C++ comments on or off. By default,
2136 C++ comments are enabled. Disabling C++ comments may be
2137 necessary if your code includes strange things like:
2138
2139 one = 4 //* <- divide */ 4;
2140 two = 2;
2141
2142 With C++ comments, the above will be interpreted as
2143
2144 one = 4
2145 two = 2;
2146
2147 which will obviously be a syntax error, but without C++
2148 comments, it will be interpreted as
2149
2150 one = 4 / 4;
2151 two = 2;
2152
2153 which is correct.
2154
2155 "HasMacroVAARGS" => 0 ⎪ 1
2156 Use this option to turn the "__VA_ARGS__" macro expansion
2157 on or off. If this is enabled (which is the default), you
2158 can use variable length argument lists in your preprocessor
2159 macros.
2160
2161 #define DEBUG( ... ) fprintf( stderr, __VA_ARGS__ )
2162
2163 There's normally no reason to turn that feature off.
2164
2165 "StdCVersion" => undef ⎪ INTEGER
2166 Use this option to change the value of the preprocessor's
2167 predefined "__STDC_VERSION__" macro. When set to "undef",
2168 the macro will not be defined.
2169
2170 "HostedC" => undef ⎪ 0 ⎪ 1
2171 Use this option to change the value of the preprocessor's
2172 predefined "__STDC_HOSTED__" macro. When set to "undef",
2173 the macro will not be defined.
2174
2175 "Include" => [ INCLUDES ]
2176 Use this option to set the include path for the internal
2177 preprocessor. The option value is a reference to an array
2178 of strings, each string holding a directory that should be
2179 searched for includes.
2180
2181 "Define" => [ DEFINES ]
2182 Use this option to define symbols in the preprocessor. The
2183 option value is, again, a reference to an array of strings.
2184 Each string can be either just a symbol or an assignment to
2185 a symbol. This is completely equivalent to what the "-D"
2186 option does for most preprocessors.
2187
2188 The following will define the symbol "FOO" and define "BAR"
2189 to be 12345:
2190
2191 $c->configure(Define => [qw( FOO BAR=12345 )]);
2192
2193 "Assert" => [ ASSERTIONS ]
2194 Use this option to make assertions in the preprocessor. If
2195 you don't know what assertions are, don't be concerned,
2196 since they're deprecated anyway. They are, however, used in
2197 some system's include files. The value is an array refer‐
2198 ence, just like for the macro definitions. Only the way the
2199 assertions are defined is a bit different and mimics the
2200 way they are defined with the "#assert" directive:
2201
2202 $c->configure(Assert => ['foo(bar)']);
2203
2204 "OrderMembers" => 0 ⎪ 1
2205 When using "unpack" on compounds and iterating over the
2206 returned hash, the order of the compound members is gener‐
2207 ally not preserved due to the nature of hash tables. It is
2208 not even guaranteed that the order is the same between dif‐
2209 ferent runs of the same program. This can be very annoying
2210 if you simply use to dump your data structures and the com‐
2211 pound members always show up in a different order.
2212
2213 By setting "OrderMembers" to a non-zero value, all hashes
2214 returned by "unpack" are tied to a class that preserves the
2215 order of the hash keys. This way, all compound members
2216 will be returned in the correct order just as they are
2217 defined in your C code.
2218
2219 use Convert::Binary::C;
2220 use Data::Dumper;
2221
2222 $c = Convert::Binary::C->new->parse(<<'ENDC');
2223 struct test {
2224 char one;
2225 char two;
2226 struct {
2227 char never;
2228 char change;
2229 char this;
2230 char order;
2231 } three;
2232 char four;
2233 };
2234 ENDC
2235
2236 $data = "Convert";
2237
2238 $u1 = $c->unpack('test', $data);
2239 $c->OrderMembers(1);
2240 $u2 = $c->unpack('test', $data);
2241
2242 print Data::Dumper->Dump([$u1, $u2], [qw(u1 u2)]);
2243
2244 This will print something like:
2245
2246 $u1 = {
2247 'three' => {
2248 'change' => 118,
2249 'order' => 114,
2250 'this' => 101,
2251 'never' => 110
2252 },
2253 'one' => 67,
2254 'two' => 111,
2255 'four' => 116
2256 };
2257 $u2 = {
2258 'one' => 67,
2259 'two' => 111,
2260 'three' => {
2261 'never' => 110,
2262 'change' => 118,
2263 'this' => 101,
2264 'order' => 114
2265 },
2266 'four' => 116
2267 };
2268
2269 To be able to use this option, you have to install either
2270 the Tie::Hash::Indexed or the Tie::IxHash module. If both
2271 are installed, Convert::Binary::C will give preference to
2272 Tie::Hash::Indexed because it's faster.
2273
2274 When using this option, you should keep in mind that tied
2275 hashes are significantly slower and consume more memory
2276 than ordinary hashes, even when the class they're tied to
2277 is implemented efficiently. So don't turn this option on if
2278 you don't have to.
2279
2280 You can also influence hash member ordering by using the
2281 "CBC_ORDER_MEMBERS" environment variable.
2282
2283 "Bitfields" => { OPTION => VALUE, ... }
2284 Use this option to specify and configure a bitfield layout‐
2285 ing engine. You can choose an engine by passing its name to
2286 the "Engine" option, like:
2287
2288 $c->configure(Bitfields => { Engine => 'Generic' });
2289
2290 Each engine can have its own set of options, although cur‐
2291 rently none of them does.
2292
2293 You can choose between the following bitfield engines:
2294
2295 "Generic"
2296 This engine implements the behaviour of most UNIX C
2297 compilers, including GCC. It does not handle packed
2298 bitfields yet.
2299
2300 "Microsoft"
2301 This engine implements the behaviour of Microsoft's
2302 "cl" compiler. It should be fairly complete and can
2303 handle packed bitfields.
2304
2305 "Simple"
2306 This engine is only used for testing the bitfield in‐
2307 frastructure in Convert::Binary::C. There's usually no
2308 reason to use it.
2309
2310 You can reconfigure all options even after you have parsed some
2311 code. The changes will be applied to the already parsed defini‐
2312 tions. This works as long as array lengths are not affected by
2313 the changes. If you have Alignment and IntSize set to 4 and
2314 parse code like this
2315
2316 typedef struct {
2317 char abc;
2318 int day;
2319 } foo;
2320
2321 struct bar {
2322 foo zap[2*sizeof(foo)];
2323 };
2324
2325 the array "zap" in "struct bar" will obviously have 16 ele‐
2326 ments. If you reconfigure the alignment to 1 now, the size of
2327 "foo" is now 5 instead of 8. While the alignment is adjusted
2328 correctly, the number of elements in array "zap" will still be
2329 16 and will not be changed to 10.
2330
2331 parse
2332
2333 "parse" CODE
2334 Parses a string of valid C code. All enumeration, compound and
2335 type definitions are extracted. You can call the "parse" and
2336 "parse_file" methods as often as you like to add further defi‐
2337 nitions to the Convert::Binary::C object.
2338
2339 "parse" will throw an exception if an error occurs. On suc‐
2340 cess, the method returns a reference to its object.
2341
2342 See "Parsing C code" for an example.
2343
2344 parse_file
2345
2346 "parse_file" FILE
2347 Parses a C source file. All enumeration, compound and type def‐
2348 initions are extracted. You can call the "parse" and
2349 "parse_file" methods as often as you like to add further defi‐
2350 nitions to the Convert::Binary::C object.
2351
2352 "parse_file" will search the include path given via the
2353 "Include" option for the file if it cannot find it in the cur‐
2354 rent directory.
2355
2356 "parse_file" will throw an exception if an error occurs. On
2357 success, the method returns a reference to its object.
2358
2359 See "Parsing C code" for an example.
2360
2361 When calling "parse" or "parse_file" multiple times, you may
2362 use types previously defined, but you are not allowed to rede‐
2363 fine types. The state of the preprocessor is also saved, so you
2364 may also use defines from a previous parse. This works only as
2365 long as the preprocessor is not reset. See "Preprocessor con‐
2366 figuration" for details.
2367
2368 When you're parsing C source files instead of C header files,
2369 note that local definitions are ignored. This means that type
2370 definitions hidden within functions will not be recognized by
2371 Convert::Binary::C. This is necessary because different func‐
2372 tions (even different blocks within the same function) can
2373 define types with the same name:
2374
2375 void my_func(int i)
2376 {
2377 if (i < 10)
2378 {
2379 enum digit { ONE, TWO, THREE } x = ONE;
2380 printf("%d, %d\n", i, x);
2381 }
2382 else
2383 {
2384 enum digit { THREE, TWO, ONE } x = ONE;
2385 printf("%d, %d\n", i, x);
2386 }
2387 }
2388
2389 The above is a valid piece of C code, but it's not possible for
2390 Convert::Binary::C to distinguish between the different defini‐
2391 tions of "enum digit", as they're only defined locally within
2392 the corresponding block.
2393
2394 clean
2395
2396 "clean" Clears all information that has been collected during previous
2397 calls to "parse" or "parse_file". You can use this method if
2398 you want to parse some entirely different code, but with the
2399 same configuration.
2400
2401 The "clean" method returns a reference to its object.
2402
2403 clone
2404
2405 "clone" Makes the object return an exact independent copy of itself.
2406
2407 $c = new Convert::Binary::C Include => ['/usr/include'];
2408 $c->parse_file('definitions.c');
2409 $clone = $c->clone;
2410
2411 The above code is technically equivalent (Mostly. Actually,
2412 using "sourcify" and "parse" might alter the order of the
2413 parsed data, which would make methods such as "compound" return
2414 the definitions in a different order.) to:
2415
2416 $c = new Convert::Binary::C Include => ['/usr/include'];
2417 $c->parse_file('definitions.c');
2418 $clone = new Convert::Binary::C %{$c->configure};
2419 $clone->parse($c->sourcify);
2420
2421 Using "clone" is just a lot faster.
2422
2423 def
2424
2425 "def" NAME
2426 "def" TYPE
2427 If you need to know if a definition for a certain type name
2428 exists, use this method. You pass it the name of an enum,
2429 struct, union or typedef, and it will return a non-empty string
2430 being either "enum", "struct", "union", or "typedef" if there's
2431 a definition for the type in question, an empty string if
2432 there's no such definition, or "undef" if the name is com‐
2433 pletely unknown. If the type can be interpreted as a basic
2434 type, "basic" will be returned.
2435
2436 If you pass in a TYPE, the output will be slightly different.
2437 If the specified member exists, the "def" method will return
2438 "member". If the member doesn't exist, or if the type cannot
2439 have members, the empty string will be returned. Again, if the
2440 name of the type is completely unknown, "undef" will be
2441 returned. This may be useful if you want to check if a certain
2442 member exists within a compound, for example.
2443
2444 use Convert::Binary::C;
2445
2446 my $c = Convert::Binary::C->new->parse(<<'ENDC');
2447
2448 typedef struct __not not;
2449 typedef struct __not *ptr;
2450
2451 struct foo {
2452 enum bar *xxx;
2453 };
2454
2455 typedef int quad[4];
2456
2457 ENDC
2458
2459 for my $type (qw( not ptr foo bar xxx foo.xxx foo.abc xxx.yyy
2460 quad quad[3] quad[5] quad[-3] short[1] ),
2461 'unsigned long')
2462 {
2463 my $def = $c->def($type);
2464 printf "%-14s => %s\n",
2465 $type, defined $def ? "'$def'" : 'undef';
2466 }
2467
2468 The following would be returned by the "def" method:
2469
2470 not => ''
2471 ptr => 'typedef'
2472 foo => 'struct'
2473 bar => ''
2474 xxx => undef
2475 foo.xxx => 'member'
2476 foo.abc => ''
2477 xxx.yyy => undef
2478 quad => 'typedef'
2479 quad[3] => 'member'
2480 quad[5] => 'member'
2481 quad[-3] => 'member'
2482 short[1] => undef
2483 unsigned long => 'basic'
2484
2485 So, if "def" returns a non-empty string, you can safely use any
2486 other method with that type's name or with that member expres‐
2487 sion.
2488
2489 Concerning arrays, note that the index into an array doesn't
2490 need to be within the bounds of the array's definition, just
2491 like in C. In the above example, "quad[5]" and "quad[-3]" are
2492 valid members of the "quad" array, even though it is declared
2493 to have only four elements.
2494
2495 In cases where the typedef namespace overlaps with the names‐
2496 pace of enums/structs/unions, the "def" method will give pref‐
2497 erence to the typedef and will thus return the string "type‐
2498 def". You could however force interpretation as an enum, struct
2499 or union by putting "enum", "struct" or "union" in front of the
2500 type's name.
2501
2502 defined
2503
2504 "defined" MACRO
2505 You can use the "defined" method to find out if a certain macro
2506 is defined, just like you would use the "defined" operator of
2507 the preprocessor. For example, the following code
2508
2509 use Convert::Binary::C;
2510
2511 my $c = Convert::Binary::C->new->parse(<<'ENDC');
2512
2513 #define ADD(a, b) ((a) + (b))
2514
2515 #if 1
2516 # define DEFINED
2517 #else
2518 # define UNDEFINED
2519 #endif
2520
2521 ENDC
2522
2523 for my $macro (qw( ADD DEFINED UNDEFINED )) {
2524 my $not = $c->defined($macro) ? '' : ' not';
2525 print "Macro '$macro' is$not defined.\n";
2526 }
2527
2528 would print:
2529
2530 Macro 'ADD' is defined.
2531 Macro 'DEFINED' is defined.
2532 Macro 'UNDEFINED' is not defined.
2533
2534 You have to keep in mind that this works only as long as the
2535 preprocessor is not reset. See "Preprocessor configuration" for
2536 details.
2537
2538 pack
2539
2540 "pack" TYPE
2541 "pack" TYPE, DATA
2542 "pack" TYPE, DATA, STRING
2543 Use this method to pack a complex data structure into a binary
2544 string according to a type definition that has been previously
2545 parsed. DATA must be a scalar matching the type definition. C
2546 structures and unions are represented by references to Perl
2547 hashes, C arrays by references to Perl arrays.
2548
2549 use Convert::Binary::C;
2550 use Data::Dumper;
2551 use Data::Hexdumper;
2552
2553 $c = Convert::Binary::C->new( ByteOrder => 'BigEndian'
2554 , LongSize => 4
2555 , ShortSize => 2
2556 )
2557 ->parse(<<'ENDC');
2558 struct test {
2559 char ary[3];
2560 union {
2561 short word[2];
2562 long quad;
2563 } uni;
2564 };
2565 ENDC
2566
2567 Hashes don't have to contain a key for each compound member and
2568 arrays may be truncated:
2569
2570 $binary = $c->pack('test', { ary => [1, 2], uni => { quad => 42 } });
2571
2572 Elements not defined in the Perl data structure will be set to
2573 zero in the packed byte string. If you pass "undef" as or sim‐
2574 ply omit the second parameter, the whole string will be ini‐
2575 tialized with zero bytes. On success, the packed byte string is
2576 returned.
2577
2578 print hexdump(data => $binary);
2579
2580 The above code would print:
2581
2582 0x0000 : 01 02 00 00 00 00 2A : ......*
2583
2584 You could also use "unpack" and dump the data structure.
2585
2586 $unpacked = $c->unpack('test', $binary);
2587 print Data::Dumper->Dump([$unpacked], ['unpacked']);
2588
2589 This would print:
2590
2591 $unpacked = {
2592 'uni' => {
2593 'word' => [
2594 0,
2595 42
2596 ],
2597 'quad' => 42
2598 },
2599 'ary' => [
2600 1,
2601 2,
2602 0
2603 ]
2604 };
2605
2606 If TYPE refers to a compound object, you may pack any member of
2607 that compound object. Simply add a member expression to the
2608 type name, just as you would access the member in C:
2609
2610 $array = $c->pack('test.ary', [1, 2, 3]);
2611 print hexdump(data => $array);
2612
2613 $value = $c->pack('test.uni.word[1]', 2);
2614 print hexdump(data => $value);
2615
2616 This would give you:
2617
2618 0x0000 : 01 02 03 : ...
2619 0x0000 : 00 02 : ..
2620
2621 Call "pack" with the optional STRING argument if you want to
2622 use an existing binary string to insert the data. If called in
2623 a void context, "pack" will directly modify the string you
2624 passed as the third argument. Otherwise, a copy of the string
2625 is created, and "pack" will modify and return the copy, so the
2626 original string will remain unchanged.
2627
2628 The 3-argument version may be useful if you want to change only
2629 a few members of a complex data structure without having to
2630 "unpack" everything, change the members, and then "pack" again
2631 (which could waste lots of memory and CPU cycles). So, instead
2632 of doing something like
2633
2634 $test = $c->unpack('test', $binary);
2635 $test->{uni}{quad} = 4711;
2636 $new = $c->pack('test', $test);
2637
2638 to change the "uni.quad" member of $packed, you could simply do
2639 either
2640
2641 $new = $c->pack('test', { uni => { quad => 4711 } }, $binary);
2642
2643 or
2644
2645 $c->pack('test', { uni => { quad => 4711 } }, $binary);
2646
2647 while the latter would directly modify $packed. Besides this
2648 code being a lot shorter (and perhaps even more readable), it
2649 can be significantly faster if you're dealing with really big
2650 data blocks.
2651
2652 If the length of the input string is less than the size
2653 required by the type, the string (or its copy) is extended and
2654 the extended part is initialized to zero. If the length is
2655 more than the size required by the type, the string is kept at
2656 that length, and also a copy would be an exact copy of that
2657 string.
2658
2659 $too_short = pack "C*", (1 .. 4);
2660 $too_long = pack "C*", (1 .. 20);
2661
2662 $c->pack('test', { uni => { quad => 0x4711 } }, $too_short);
2663 print "too_short:\n", hexdump(data => $too_short);
2664
2665 $copy = $c->pack('test', { uni => { quad => 0x4711 } }, $too_long);
2666 print "\ncopy:\n", hexdump(data => $copy);
2667
2668 This would print:
2669
2670 too_short:
2671 0x0000 : 01 02 03 00 00 47 11 : .....G.
2672
2673 copy:
2674 0x0000 : 01 02 03 00 00 47 11 08 09 0A 0B 0C 0D 0E 0F 10 : .....G..........
2675 0x0010 : 11 12 13 14 : ....
2676
2677 unpack
2678
2679 "unpack" TYPE, STRING
2680 Use this method to unpack a binary string and create an arbi‐
2681 trarily complex Perl data structure based on a previously
2682 parsed type definition.
2683
2684 use Convert::Binary::C;
2685 use Data::Dumper;
2686
2687 $c = Convert::Binary::C->new( ByteOrder => 'BigEndian'
2688 , LongSize => 4
2689 , ShortSize => 2
2690 )
2691 ->parse( <<'ENDC' );
2692 struct test {
2693 char ary[3];
2694 union {
2695 short word[2];
2696 long *quad;
2697 } uni;
2698 };
2699 ENDC
2700
2701 # Generate some binary dummy data
2702 $binary = pack "C*", 1 .. $c->sizeof('test');
2703
2704 On failure, e.g. if the specified type cannot be found, the
2705 method will throw an exception. On success, a reference to a
2706 complex Perl data structure is returned, which can directly be
2707 dumped using the Data::Dumper module:
2708
2709 $unpacked = $c->unpack('test', $binary);
2710 print Dumper($unpacked);
2711
2712 This would print:
2713
2714 $VAR1 = {
2715 'uni' => {
2716 'word' => [
2717 1029,
2718 1543
2719 ],
2720 'quad' => 67438087
2721 },
2722 'ary' => [
2723 1,
2724 2,
2725 3
2726 ]
2727 };
2728
2729 If TYPE refers to a compound object, you may unpack any member
2730 of that compound object. Simply add a member expression to the
2731 type name, just as you would access the member in C:
2732
2733 $binary2 = substr $binary, $c->offsetof('test', 'uni.word');
2734
2735 $unpack1 = $unpacked->{uni}{word};
2736 $unpack2 = $c->unpack('test.uni.word', $binary2);
2737
2738 print Data::Dumper->Dump([$unpack1, $unpack2], [qw(unpack1 unpack2)]);
2739
2740 You will find that the output is exactly the same for both
2741 $unpack1 and $unpack2:
2742
2743 $unpack1 = [
2744 1029,
2745 1543
2746 ];
2747 $unpack2 = [
2748 1029,
2749 1543
2750 ];
2751
2752 When "unpack" is called in list context, it will unpack as many
2753 elements as possible from STRING, including zero if STRING is
2754 not long enough.
2755
2756 initializer
2757
2758 "initializer" TYPE
2759 "initializer" TYPE, DATA
2760 The "initializer" method can be used retrieve an initializer
2761 string for a certain TYPE. This can be useful if you have to
2762 initialize only a couple of members in a huge compound type or
2763 if you simply want to generate initializers automatically.
2764
2765 struct date {
2766 unsigned year : 12;
2767 unsigned month: 4;
2768 unsigned day : 5;
2769 unsigned hour : 5;
2770 unsigned min : 6;
2771 };
2772
2773 typedef struct {
2774 enum { DATE, QWORD } type;
2775 short number;
2776 union {
2777 struct date date;
2778 unsigned long qword;
2779 } choice;
2780 } data;
2781
2782 Given the above code has been parsed
2783
2784 $init = $c->initializer('data');
2785 print "data x = $init;\n";
2786
2787 would print the following:
2788
2789 data x = {
2790 0,
2791 0,
2792 {
2793 {
2794 0,
2795 0,
2796 0,
2797 0,
2798 0
2799 }
2800 }
2801 };
2802
2803 You could directly put that into a C program, although it prob‐
2804 ably isn't very useful yet. It becomes more useful if you actu‐
2805 ally specify how you want to initialize the type:
2806
2807 $data = {
2808 type => 'QWORD',
2809 choice => {
2810 date => { month => 12, day => 24 },
2811 qword => 4711,
2812 },
2813 stuff => 'yes?',
2814 };
2815
2816 $init = $c->initializer('data', $data);
2817 print "data x = $init;\n";
2818
2819 This would print the following:
2820
2821 data x = {
2822 QWORD,
2823 0,
2824 {
2825 {
2826 0,
2827 12,
2828 24,
2829 0,
2830 0
2831 }
2832 }
2833 };
2834
2835 As only the first member of a "union" can be initialized,
2836 "choice.qword" is ignored. You will not be warned about the
2837 fact that you probably tried to initialize a member other than
2838 the first. This is considered a feature, because it allows you
2839 to use "unpack" to generate the initializer data:
2840
2841 $data = $c->unpack('data', $binary);
2842 $init = $c->initializer('data', $data);
2843
2844 Since "unpack" unpacks all union members, you would otherwise
2845 have to delete all but the first one previous to feeding it
2846 into "initializer".
2847
2848 Also, "stuff" is ignored, because it actually isn't a member of
2849 "data". You won't be warned about that either.
2850
2851 sizeof
2852
2853 "sizeof" TYPE
2854 This method will return the size of a C type in bytes. If it
2855 cannot find the type, it will throw an exception.
2856
2857 If the type defines some kind of compound object, you may ask
2858 for the size of a member of that compound object:
2859
2860 $size = $c->sizeof('test.uni.word[1]');
2861
2862 This would set $size to 2.
2863
2864 typeof
2865
2866 "typeof" TYPE
2867 This method will return the type of a C member. While this
2868 only makes sense for compound types, it's legal to also use it
2869 for non-compound types. If it cannot find the type, it will
2870 throw an exception.
2871
2872 The "typeof" method can be used on any valid member, even on
2873 arrays or unnamed types. It will always return a string that
2874 holds the name (or in case of unnamed types only the class) of
2875 the type, optionally followed by a '*' character to indicate
2876 it's a pointer type, and optionally followed by one or more
2877 array dimensions if it's an array type. If the type is a bit‐
2878 field, the type name is followed by a colon and the number of
2879 bits.
2880
2881 struct test {
2882 char ary[3];
2883 union {
2884 short word[2];
2885 long *quad;
2886 } uni;
2887 struct {
2888 unsigned short six:6;
2889 unsigned short ten:10;
2890 } bits;
2891 };
2892
2893 Given the above C code has been parsed, calls to "typeof" would
2894 return the following values:
2895
2896 $c->typeof('test') => 'struct test'
2897 $c->typeof('test.ary') => 'char [3]'
2898 $c->typeof('test.uni') => 'union'
2899 $c->typeof('test.uni.quad') => 'long *'
2900 $c->typeof('test.uni.word') => 'short [2]'
2901 $c->typeof('test.uni.word[1]') => 'short'
2902 $c->typeof('test.bits') => 'struct'
2903 $c->typeof('test.bits.six') => 'unsigned short :6'
2904 $c->typeof('test.bits.ten') => 'unsigned short :10'
2905
2906 offsetof
2907
2908 "offsetof" TYPE, MEMBER
2909 You can use "offsetof" just like the C macro of same denomina‐
2910 tor. It will simply return the offset (in bytes) of MEMBER rel‐
2911 ative to TYPE.
2912
2913 use Convert::Binary::C;
2914
2915 $c = Convert::Binary::C->new( Alignment => 4
2916 , LongSize => 4
2917 , PointerSize => 4
2918 )
2919 ->parse(<<'ENDC');
2920 typedef struct {
2921 char abc;
2922 long day;
2923 int *ptr;
2924 } week;
2925
2926 struct test {
2927 week zap[8];
2928 };
2929 ENDC
2930
2931 @args = (
2932 ['test', 'zap[5].day' ],
2933 ['test.zap[2]', 'day' ],
2934 ['test', 'zap[5].day+1'],
2935 ['test', 'zap[-3].ptr' ],
2936 );
2937
2938 for (@args) {
2939 my $offset = eval { $c->offsetof(@$_) };
2940 printf "\$c->offsetof('%s', '%s') => $offset\n", @$_;
2941 }
2942
2943 The final loop will print:
2944
2945 $c->offsetof('test', 'zap[5].day') => 64
2946 $c->offsetof('test.zap[2]', 'day') => 4
2947 $c->offsetof('test', 'zap[5].day+1') => 65
2948 $c->offsetof('test', 'zap[-3].ptr') => -28
2949
2950 * The first iteration simply shows that the offset of
2951 "zap[5].day" is 64 relative to the beginning of "struct
2952 test".
2953
2954 * You may additionally specify a member for the type passed as
2955 the first argument, as shown in the second iteration.
2956
2957 * The offset suffix is also supported by "offsetof", so the
2958 third iteration will correctly print 65.
2959
2960 * The last iteration demonstrates that even out-of-bounds array
2961 indices are handled correctly, just as they are handled in C.
2962
2963 Unlike the C macro, "offsetof" also works on array types.
2964
2965 $offset = $c->offsetof('test.zap', '[3].ptr+2');
2966 print "offset = $offset";
2967
2968 This will print:
2969
2970 offset = 46
2971
2972 If TYPE is a compound, MEMBER may optionally be prefixed with a
2973 dot, so
2974
2975 printf "offset = %d\n", $c->offsetof('week', 'day');
2976 printf "offset = %d\n", $c->offsetof('week', '.day');
2977
2978 are both equivalent and will print
2979
2980 offset = 4
2981 offset = 4
2982
2983 This allows to
2984
2985 * use the C macro style, without a leading dot, and
2986
2987 * directly use the output of the "member" method, which
2988 includes a leading dot for compound types, as input for the
2989 MEMBER argument.
2990
2991 member
2992
2993 "member" TYPE
2994 "member" TYPE, OFFSET
2995 You can think of "member" as being the reverse of the "off‐
2996 setof" method. However, as this is more complex, there's no
2997 equivalent to "member" in the C language.
2998
2999 Usually this method is used if you want to retrieve the name of
3000 the member that is located at a specific offset of a previously
3001 parsed type.
3002
3003 use Convert::Binary::C;
3004
3005 $c = Convert::Binary::C->new( Alignment => 4
3006 , LongSize => 4
3007 , PointerSize => 4
3008 )
3009 ->parse(<<'ENDC');
3010 typedef struct {
3011 char abc;
3012 long day;
3013 int *ptr;
3014 } week;
3015
3016 struct test {
3017 week zap[8];
3018 };
3019 ENDC
3020
3021 for my $offset (24, 39, 69, 99) {
3022 print "\$c->member('test', $offset)";
3023 my $member = eval { $c->member('test', $offset) };
3024 print $@ ? "\n exception: $@" : " => '$member'\n";
3025 }
3026
3027 This will print:
3028
3029 $c->member('test', 24) => '.zap[2].abc'
3030 $c->member('test', 39) => '.zap[3]+3'
3031 $c->member('test', 69) => '.zap[5].ptr+1'
3032 $c->member('test', 99)
3033 exception: Offset 99 out of range (0 <= offset < 96)
3034
3035 * The output of the first iteration is obvious. The member
3036 "zap[2].abc" is located at offset 24 of "struct test".
3037
3038 * In the second iteration, the offset points into a region of
3039 padding bytes and thus no member of "week" can be named.
3040 Instead of a member name the offset relative to "zap[3]" is
3041 appended.
3042
3043 * In the third iteration, the offset points to "zap[5].ptr".
3044 However, "zap[5].ptr" is located at 68, not at 69, and thus
3045 the remaining offset of 1 is also appended.
3046
3047 * The last iteration causes an exception because the offset of
3048 99 is not valid for "struct test" since the size of "struct
3049 test" is only 96. You might argue that this is inconsistent,
3050 since "offsetof" can also handle out-of-bounds array members.
3051 But as soon as you have more than one level of array nesting,
3052 there's an infinite number of out-of-bounds members for a
3053 single given offset, so it would be impossible to return a
3054 list of all members.
3055
3056 You can additionally specify a member for the type passed as
3057 the first argument:
3058
3059 $member = $c->member('test.zap[2]', 6);
3060 print $member;
3061
3062 This will print:
3063
3064 .day+2
3065
3066 Like "offsetof", "member" also works on array types:
3067
3068 $member = $c->member('test.zap', 42);
3069 print $member;
3070
3071 This will print:
3072
3073 [3].day+2
3074
3075 While the behaviour for "struct"s is quite obvious, the behav‐
3076 iour for "union"s is rather tricky. As a single offset usually
3077 references more than one member of a union, there are certain
3078 rules that the algorithm uses for determining the best member.
3079
3080 * The first non-compound member that is referenced without an
3081 offset has the highest priority.
3082
3083 * If no member is referenced without an offset, the first non-
3084 compound member that is referenced with an offset will be
3085 returned.
3086
3087 * Otherwise the first padding region that is encountered will
3088 be taken.
3089
3090 As an example, given 4-byte-alignment and the union
3091
3092 union choice {
3093 struct {
3094 char color[2];
3095 long size;
3096 char taste;
3097 } apple;
3098 char grape[3];
3099 struct {
3100 long weight;
3101 short price[3];
3102 } melon;
3103 };
3104
3105 the "member" method would return what is shown in the Member
3106 column of the following table. The Type column shows the result
3107 of the "typeof" method when passing the corresponding member.
3108
3109 Offset Member Type
3110 --------------------------------------
3111 0 .apple.color[0] 'char'
3112 1 .apple.color[1] 'char'
3113 2 .grape[2] 'char'
3114 3 .melon.weight+3 'long'
3115 4 .apple.size 'long'
3116 5 .apple.size+1 'long'
3117 6 .melon.price[1] 'short'
3118 7 .apple.size+3 'long'
3119 8 .apple.taste 'char'
3120 9 .melon.price[2]+1 'short'
3121 10 .apple+10 'struct'
3122 11 .apple+11 'struct'
3123
3124 It's like having a stack of all the union members and looking
3125 through the stack for the shiniest piece you can see. The
3126 beginning of a member (denoted by uppercase letters) is always
3127 shinier than the rest of a member, while padding regions
3128 (denoted by dashes) aren't shiny at all.
3129
3130 Offset 0 1 2 3 4 5 6 7 8 9 10 11
3131 -------------------------------------------------------
3132 apple (C) (C) - - (S) (s) s (s) (T) - (-) (-)
3133 grape G G (G)
3134 melon W w w (w) P p (P) p P (p) - -
3135
3136 If you look through that stack from top to bottom, you'll end
3137 up at the parenthesized members.
3138
3139 Alternatively, if you're not only interested in the best mem‐
3140 ber, you can call "member" in list context, which makes it
3141 return all members referenced by the given offset.
3142
3143 Offset Member Type
3144 --------------------------------------
3145 0 .apple.color[0] 'char'
3146 .grape[0] 'char'
3147 .melon.weight 'long'
3148 1 .apple.color[1] 'char'
3149 .grape[1] 'char'
3150 .melon.weight+1 'long'
3151 2 .grape[2] 'char'
3152 .melon.weight+2 'long'
3153 .apple+2 'struct'
3154 3 .melon.weight+3 'long'
3155 .apple+3 'struct'
3156 4 .apple.size 'long'
3157 .melon.price[0] 'short'
3158 5 .apple.size+1 'long'
3159 .melon.price[0]+1 'short'
3160 6 .melon.price[1] 'short'
3161 .apple.size+2 'long'
3162 7 .apple.size+3 'long'
3163 .melon.price[1]+1 'short'
3164 8 .apple.taste 'char'
3165 .melon.price[2] 'short'
3166 9 .melon.price[2]+1 'short'
3167 .apple+9 'struct'
3168 10 .apple+10 'struct'
3169 .melon+10 'struct'
3170 11 .apple+11 'struct'
3171 .melon+11 'struct'
3172
3173 The first member returned is always the best member. The other
3174 members are sorted according to the rules given above. This
3175 means that members referenced without an offset are followed by
3176 members referenced with an offset. Padding regions will be at
3177 the end.
3178
3179 If OFFSET is not given in the method call, "member" will return
3180 a list of all possible members of TYPE.
3181
3182 print "$_\n" for $c->member('choice');
3183
3184 This will print:
3185
3186 .apple.color[0]
3187 .apple.color[1]
3188 .apple.size
3189 .apple.taste
3190 .grape[0]
3191 .grape[1]
3192 .grape[2]
3193 .melon.weight
3194 .melon.price[0]
3195 .melon.price[1]
3196 .melon.price[2]
3197
3198 In scalar context, the number of possible members is returned.
3199
3200 tag
3201
3202 "tag" TYPE
3203 "tag" TYPE, TAG
3204 "tag" TYPE, TAG1 => VALUE1, TAG2 => VALUE2, ...
3205 The "tag" method can be used to tag properties to a TYPE. It's
3206 a bit like having "configure" for individual types.
3207
3208 See "USING TAGS" for an example.
3209
3210 Note that while you can tag whole types as well as compound
3211 members, it is not possible to tag array members, i.e. you can‐
3212 not treat, for example, "a[1]" and "a[2]" differently.
3213
3214 Also note that in code like this
3215
3216 struct test {
3217 int a;
3218 struct {
3219 int x;
3220 } b, c;
3221 };
3222
3223 if you tag "test.b.x", this will also tag "test.c.x" implic‐
3224 itly.
3225
3226 It is also possible to tag basic types if you really want to do
3227 that, for example:
3228
3229 $c->tag('int', Format => 'Binary');
3230
3231 To remove a tag from a type, you can either set that tag to
3232 "undef", for example
3233
3234 $c->tag('test', Hooks => undef);
3235
3236 or use "untag".
3237
3238 To see if a tag is attached to a type or to get the value of a
3239 tag, pass only the type and tag name to "tag":
3240
3241 $c->tag('test.a', Format => 'Binary');
3242
3243 $hooks = $c->tag('test.a', 'Hooks');
3244 $format = $c->tag('test.a', 'Format');
3245
3246 This will give you:
3247
3248 $hooks = undef;
3249 $format = 'Binary';
3250
3251 To see which tags are attached to a type, pass only the type.
3252 The "tag" method will now return a hash reference containing
3253 all tags attached to the type:
3254
3255 $tags = $c->tag('test.a');
3256
3257 This will give you:
3258
3259 $tags = {
3260 'Format' => 'Binary'
3261 };
3262
3263 "tag" will throw an exception if an error occurs. If called as
3264 a 'set' method, it will return a reference to its object,
3265 allowing you to chain together consecutive method calls.
3266
3267 Note that when a compound is inlined, tags attached to the
3268 inlined compound are ignored, for example:
3269
3270 $c->parse(<<ENDC);
3271 struct header {
3272 int id;
3273 int len;
3274 unsigned flags;
3275 };
3276
3277 struct message {
3278 struct header;
3279 short samples[32];
3280 };
3281 ENDC
3282
3283 for my $type (qw( header message header.len )) {
3284 $c->tag($type, Hooks => { unpack => sub { print "unpack: $type\n"; @_ } });
3285 }
3286
3287 for my $type (qw( header message )) {
3288 print "[unpacking $type]\n";
3289 $u = $c->unpack($type, $data);
3290 }
3291
3292 This will print:
3293
3294 [unpacking header]
3295 unpack: header.len
3296 unpack: header
3297 [unpacking message]
3298 unpack: header.len
3299 unpack: message
3300
3301 As you can see from the above output, tags attached to members
3302 of inlined compounds ("header.len" are still handled.
3303
3304 The following tags can be configured:
3305
3306 "Format" => 'Binary' ⎪ 'String'
3307 The "Format" tag allows you to control the way binary data
3308 is converted by "pack" and "unpack".
3309
3310 If you tag a "TYPE" as "Binary", it will not be converted
3311 at all, i.e. it will be passed through as a binary string.
3312
3313 If you tag it as "String", it will be treated like a null-
3314 terminated C string, i.e. "unpack" will convert the C
3315 string to a Perl string and vice versa.
3316
3317 See "The Format Tag" for an example.
3318
3319 "ByteOrder" => 'BigEndian' ⎪ 'LittleEndian'
3320 The "ByteOrder" tag allows you to explicitly set the byte
3321 order of a TYPE.
3322
3323 See "The ByteOrder Tag" for an example.
3324
3325 "Dimension" => '*'
3326 "Dimension" => VALUE
3327 "Dimension" => MEMBER
3328 "Dimension" => SUB
3329 "Dimension" => [ SUB, ARGS ]
3330 The "Dimension" tag allows you to alter the size of an
3331 array dynamically.
3332
3333 You can tag fixed size arrays as being flexible using '*'.
3334 This is useful if you cannot use flexible array members in
3335 your source code.
3336
3337 $c->tag('type.array', Dimension => '*');
3338
3339 You can also tag an array to have a fixed size different
3340 from the one it was originally declared with.
3341
3342 $c->tag('type.array', Dimension => 42);
3343
3344 If the array is a member of a compound, you can also tag it
3345 with to have a size corresponding to the value of another
3346 member in that compound.
3347
3348 $c->tag('type.array', Dimension => 'count');
3349
3350 Finally, you can specify a subroutine that is called when
3351 the size of the array needs to be determined.
3352
3353 $c->tag('type.array', Dimension => \&get_count);
3354
3355 By default, and if the array is a compound member, that
3356 subroutine will be passed a reference to the hash storing
3357 the data for the compound.
3358
3359 You can also instruct Convert::Binary::C to pass additional
3360 arguments to the subroutine by passing an array reference
3361 instead of the subroutine reference. This array contains
3362 the subroutine reference as well as a list of arguments.
3363 It is possible to define certain special arguments using
3364 the "arg" method.
3365
3366 $c->tag('type.array', Dimension => [\&get_count, $c->arg('SELF'), 42]);
3367
3368 See "The Dimension Tag" for various examples.
3369
3370 "Hooks" => { HOOK => SUB, HOOK => [ SUB, ARGS ], ... }, ...
3371 The "Hooks" tag allows you to register subroutines as
3372 hooks.
3373
3374 Hooks are called whenever a certain "TYPE" is packed or
3375 unpacked. Hooks are currently considered an experimental
3376 feature.
3377
3378 "HOOK" can be one of the following:
3379
3380 pack
3381 unpack
3382 pack_ptr
3383 unpack_ptr
3384
3385 "pack" and "unpack" hooks are called when processing their
3386 "TYPE", while "pack_ptr" and "unpack_ptr" hooks are called
3387 when processing pointers to their "TYPE".
3388
3389 "SUB" is a reference to a subroutine that usually takes one
3390 input argument, processes it and returns one output argu‐
3391 ment.
3392
3393 Alternatively, you can pass a custom list of arguments to
3394 the hook by using an array reference instead of "SUB" that
3395 holds the subroutine reference in the first element and the
3396 arguments to be passed to the subroutine as the other ele‐
3397 ments. This way, you can even pass special arguments to
3398 the hook using the "arg" method.
3399
3400 Here are a few examples for registering hooks:
3401
3402 $c->tag('ObjectType', Hooks => {
3403 pack => \&obj_pack,
3404 unpack => \&obj_unpack
3405 });
3406
3407 $c->tag('ProtocolId', Hooks => {
3408 unpack => sub { $protos[$_[0]] }
3409 });
3410
3411 $c->tag('ProtocolId', Hooks => {
3412 unpack_ptr => [sub {
3413 sprintf "$_[0]:{0x%X}", $_[1]
3414 },
3415 $c->arg('TYPE', 'DATA')
3416 ],
3417 });
3418
3419 Note that the above example registers both an "unpack" hook
3420 and an "unpack_ptr" hook for "ProtocolId" with two separate
3421 calls to "tag". As long as you don't explicitly overwrite a
3422 previously registered hook, it won't be modified or removed
3423 by registering other hooks for the same "TYPE".
3424
3425 To remove all registered hooks for a type, simply remove
3426 the "Hooks" tag:
3427
3428 $c->untag('ProtocolId', 'Hooks');
3429
3430 To remove only a single hook, pass "undef" as "SUB" instead
3431 of a subroutine reference:
3432
3433 $c->tag('ObjectType', Hooks => { pack => undef });
3434
3435 If all hooks are removed, the whole "Hooks" tag is removed.
3436
3437 See "The Hooks Tag" for examples on how to use hooks.
3438
3439 untag
3440
3441 "untag" TYPE
3442 "untag" TYPE, TAG1, TAG2, ...
3443 Use the "untag" method to remove one, more, or all tags from a
3444 type. If you don't pass any tag names, all tags attached to the
3445 type will be removed. Otherwise only the listed tags will be
3446 removed.
3447
3448 See "USING TAGS" for an example.
3449
3450 arg
3451
3452 "arg" 'ARG', ...
3453 Creates placeholders for special arguments to be passed to
3454 hooks or other subroutines. These arguments are currently:
3455
3456 "SELF"
3457 A reference to the calling Convert::Binary::C object. This
3458 may be useful if you need to work with the object inside
3459 the subroutine.
3460
3461 "TYPE"
3462 The name of the type that is currently being processed by
3463 the hook.
3464
3465 "DATA"
3466 The data argument that is passed to the subroutine.
3467
3468 "HOOK"
3469 The type of the hook as which the subroutine has been
3470 called, for example "pack" or "unpack_ptr".
3471
3472 "arg" will return a placeholder for each argument it is being
3473 passed. Note that not all arguments may be supported depending
3474 on the context of the subroutine.
3475
3476 dependencies
3477
3478 "dependencies"
3479 After some code has been parsed using either the "parse" or
3480 "parse_file" methods, the "dependencies" method can be used to
3481 retrieve information about all files that the object depends
3482 on, i.e. all files that have been parsed.
3483
3484 In scalar context, the method returns a hash reference. Each
3485 key is the name of a file. The values are again hash refer‐
3486 ences, each of which holds the size, modification time (mtime),
3487 and change time (ctime) of the file at the moment it was
3488 parsed.
3489
3490 use Convert::Binary::C;
3491 use Data::Dumper;
3492
3493 #----------------------------------------------------------
3494 # Create object, set include path, parse 'string.h' header
3495 #----------------------------------------------------------
3496 my $c = Convert::Binary::C->new
3497 ->Include('/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include',
3498 '/usr/include')
3499 ->parse_file('string.h');
3500
3501 #----------------------------------------------------------
3502 # Get dependencies of the object, extract dependency files
3503 #----------------------------------------------------------
3504 my $depend = $c->dependencies;
3505 my @files = keys %$depend;
3506
3507 #-----------------------------
3508 # Dump dependencies and files
3509 #-----------------------------
3510 print Data::Dumper->Dump([$depend, \@files],
3511 [qw( depend *files )]);
3512
3513 The above code would print something like this:
3514
3515 $depend = {
3516 '/usr/include/features.h' => {
3517 'ctime' => 1196609327,
3518 'mtime' => 1196609232,
3519 'size' => 11688
3520 },
3521 '/usr/include/gnu/stubs-32.h' => {
3522 'ctime' => 1196609327,
3523 'mtime' => 1196609305,
3524 'size' => 624
3525 },
3526 '/usr/include/sys/cdefs.h' => {
3527 'ctime' => 1196609327,
3528 'mtime' => 1196609269,
3529 'size' => 11773
3530 },
3531 '/usr/include/gnu/stubs.h' => {
3532 'ctime' => 1196609327,
3533 'mtime' => 1196609232,
3534 'size' => 315
3535 },
3536 '/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include/stddef.h' => {
3537 'ctime' => 1203359674,
3538 'mtime' => 1203357922,
3539 'size' => 12695
3540 },
3541 '/usr/include/string.h' => {
3542 'ctime' => 1196609327,
3543 'mtime' => 1196609262,
3544 'size' => 16438
3545 },
3546 '/usr/include/bits/wordsize.h' => {
3547 'ctime' => 1196609327,
3548 'mtime' => 1196609257,
3549 'size' => 873
3550 }
3551 };
3552 @files = (
3553 '/usr/include/features.h',
3554 '/usr/include/gnu/stubs-32.h',
3555 '/usr/include/sys/cdefs.h',
3556 '/usr/include/gnu/stubs.h',
3557 '/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include/stddef.h',
3558 '/usr/include/string.h',
3559 '/usr/include/bits/wordsize.h'
3560 );
3561
3562 In list context, the method returns the names of all files that
3563 have been parsed, i.e. the following lines are equivalent:
3564
3565 @files = keys %{$c->dependencies};
3566 @files = $c->dependencies;
3567
3568 sourcify
3569
3570 "sourcify"
3571 "sourcify" CONFIG
3572 Returns a string that holds the C source code necessary to rep‐
3573 resent all parsed C data structures.
3574
3575 use Convert::Binary::C;
3576
3577 $c = new Convert::Binary::C;
3578 $c->parse(<<'END');
3579
3580 #define ADD(a, b) ((a) + (b))
3581 #define NUMBER 42
3582
3583 typedef struct _mytype mytype;
3584
3585 struct _mytype {
3586 union {
3587 int iCount;
3588 enum count *pCount;
3589 } counter;
3590 #pragma pack( push, 1 )
3591 struct {
3592 char string[NUMBER];
3593 int array[NUMBER/sizeof(int)];
3594 } storage;
3595 #pragma pack( pop )
3596 mytype *next;
3597 };
3598
3599 enum count { ZERO, ONE, TWO, THREE };
3600
3601 END
3602
3603 print $c->sourcify;
3604
3605 The above code would print something like this:
3606
3607 /* typedef predeclarations */
3608
3609 typedef struct _mytype mytype;
3610
3611 /* defined enums */
3612
3613 enum count
3614 {
3615 ZERO,
3616 ONE,
3617 TWO,
3618 THREE
3619 };
3620
3621 /* defined structs and unions */
3622
3623 struct _mytype
3624 {
3625 union
3626 {
3627 int iCount;
3628 enum count *pCount;
3629 } counter;
3630 #pragma pack(push, 1)
3631 struct
3632 {
3633 char string[42];
3634 int array[10];
3635 } storage;
3636 #pragma pack(pop)
3637 mytype *next;
3638 };
3639
3640 The purpose of the "sourcify" method is to enable some kind of
3641 platform-independent caching. The C code generated by "sour‐
3642 cify" can be parsed by any standard C compiler, as well as of
3643 course by the Convert::Binary::C parser. However, the code may
3644 be significantly shorter than the code that has originally been
3645 parsed.
3646
3647 When parsing a typical header file, it's easily possible that
3648 you need to open dozens of other files that are included from
3649 that file, and end up parsing several hundred kilobytes of C
3650 code. Since most of it is usually preprocessor directives,
3651 function prototypes and comments, the "sourcify" function
3652 strips this down to a few kilobytes. Saving the "sourcify"
3653 string and parsing it next time instead of the original code
3654 may be a lot faster.
3655
3656 The "sourcify" method takes a hash reference as an optional
3657 argument. It can be used to tweak the method's output. The
3658 following options can be configured.
3659
3660 "Context" => 0 ⎪ 1
3661 Turns preprocessor context information on or off. If this
3662 is turned on, "sourcify" will insert "#line" preprocessor
3663 directives in its output. So in the above example
3664
3665 print $c->sourcify({ Context => 1 });
3666
3667 would print:
3668
3669 /* typedef predeclarations */
3670
3671 typedef struct _mytype mytype;
3672
3673 /* defined enums */
3674
3675 #line 21 "[buffer]"
3676 enum count
3677 {
3678 ZERO,
3679 ONE,
3680 TWO,
3681 THREE
3682 };
3683
3684 /* defined structs and unions */
3685
3686 #line 7 "[buffer]"
3687 struct _mytype
3688 {
3689 #line 8 "[buffer]"
3690 union
3691 {
3692 int iCount;
3693 enum count *pCount;
3694 } counter;
3695 #pragma pack(push, 1)
3696 #line 13 "[buffer]"
3697 struct
3698 {
3699 char string[42];
3700 int array[10];
3701 } storage;
3702 #pragma pack(pop)
3703 mytype *next;
3704 };
3705
3706 Note that "[buffer]" refers to the here-doc buffer when
3707 using "parse".
3708
3709 "Defines" => 0 ⎪ 1
3710 Turn this on if you want all the defined macros to be part
3711 of the source code output. Given the example code above
3712
3713 print $c->sourcify({ Defines => 1 });
3714
3715 would print:
3716
3717 /* typedef predeclarations */
3718
3719 typedef struct _mytype mytype;
3720
3721 /* defined enums */
3722
3723 enum count
3724 {
3725 ZERO,
3726 ONE,
3727 TWO,
3728 THREE
3729 };
3730
3731 /* defined structs and unions */
3732
3733 struct _mytype
3734 {
3735 union
3736 {
3737 int iCount;
3738 enum count *pCount;
3739 } counter;
3740 #pragma pack(push, 1)
3741 struct
3742 {
3743 char string[42];
3744 int array[10];
3745 } storage;
3746 #pragma pack(pop)
3747 mytype *next;
3748 };
3749
3750 /* preprocessor defines */
3751
3752 #define ADD(a, b) ((a) + (b))
3753 #define NUMBER 42
3754
3755 The macro definitions always appear at the end of the
3756 source code. The order of the macro definitions is unde‐
3757 fined.
3758
3759 The following methods can be used to retrieve information about the
3760 definitions that have been parsed. The examples given in the descrip‐
3761 tion for "enum", "compound" and "typedef" all assume this piece of C
3762 code has been parsed:
3763
3764 #define ABC_SIZE 2
3765 #define MULTIPLY(x, y) ((x)*(y))
3766
3767 #ifdef ABC_SIZE
3768 # define DEFINED
3769 #else
3770 # define NOT_DEFINED
3771 #endif
3772
3773 typedef unsigned long U32;
3774 typedef void *any;
3775
3776 enum __socket_type
3777 {
3778 SOCK_STREAM = 1,
3779 SOCK_DGRAM = 2,
3780 SOCK_RAW = 3,
3781 SOCK_RDM = 4,
3782 SOCK_SEQPACKET = 5,
3783 SOCK_PACKET = 10
3784 };
3785
3786 struct STRUCT_SV {
3787 void *sv_any;
3788 U32 sv_refcnt;
3789 U32 sv_flags;
3790 };
3791
3792 typedef union {
3793 int abc[ABC_SIZE];
3794 struct xxx {
3795 int a;
3796 int b;
3797 } ab[3][4];
3798 any ptr;
3799 } test;
3800
3801 enum_names
3802
3803 "enum_names"
3804 Returns a list of identifiers of all defined enumeration
3805 objects. Enumeration objects don't necessarily have an identi‐
3806 fier, so something like
3807
3808 enum { A, B, C };
3809
3810 will obviously not appear in the list returned by the
3811 "enum_names" method. Also, enumerations that are not defined
3812 within the source code - like in
3813
3814 struct foo {
3815 enum weekday *pWeekday;
3816 unsigned long year;
3817 };
3818
3819 where only a pointer to the "weekday" enumeration object is
3820 used - will not be returned, even though they have an identi‐
3821 fier. So for the above two enumerations, "enum_names" will
3822 return an empty list:
3823
3824 @names = $c->enum_names;
3825
3826 The only way to retrieve a list of all enumeration identifiers
3827 is to use the "enum" method without additional arguments. You
3828 can get a list of all enumeration objects that have an identi‐
3829 fier by using
3830
3831 @enums = map { $_->{identifier} ⎪⎪ () } $c->enum;
3832
3833 but these may not have a definition. Thus, the two arrays would
3834 look like this:
3835
3836 @names = ();
3837 @enums = ('weekday');
3838
3839 The "def" method returns a true value for all identifiers
3840 returned by "enum_names".
3841
3842 enum
3843
3844 enum
3845 "enum" LIST
3846 Returns a list of references to hashes containing detailed
3847 information about all enumerations that have been parsed.
3848
3849 If a list of enumeration identifiers is passed to the method,
3850 the returned list will only contain hash references for those
3851 enumerations. The enumeration identifiers may optionally be
3852 prefixed by "enum".
3853
3854 If an enumeration identifier cannot be found, the returned list
3855 will contain an undefined value at that position.
3856
3857 In scalar context, the number of enumerations will be returned
3858 as long as the number of arguments to the method call is not 1.
3859 In the latter case, a hash reference holding information for
3860 the enumeration will be returned.
3861
3862 The list returned by the "enum" method looks similar to this:
3863
3864 @enum = (
3865 {
3866 'enumerators' => {
3867 'SOCK_STREAM' => 1,
3868 'SOCK_RAW' => 3,
3869 'SOCK_SEQPACKET' => 5,
3870 'SOCK_RDM' => 4,
3871 'SOCK_PACKET' => 10,
3872 'SOCK_DGRAM' => 2
3873 },
3874 'identifier' => '__socket_type',
3875 'context' => 'definitions.c(13)',
3876 'size' => 4,
3877 'sign' => 0
3878 }
3879 );
3880
3881 "identifier"
3882 holds the enumeration identifier. This key is not present
3883 if the enumeration has no identifier.
3884
3885 "context"
3886 is the context in which the enumeration is defined. This is
3887 the filename followed by the line number in parentheses.
3888
3889 "enumerators"
3890 is a reference to a hash table that holds all enumerators
3891 of the enumeration.
3892
3893 "sign"
3894 is a boolean indicating if the enumeration is signed (i.e.
3895 has negative values).
3896
3897 One useful application may be to create a hash table that holds
3898 all enumerators of all defined enumerations:
3899
3900 %enum = map %{ $_->{enumerators} ⎪⎪ {} }, $c->enum;
3901
3902 The %enum hash table would then be:
3903
3904 %enum = (
3905 'SOCK_STREAM' => 1,
3906 'SOCK_RAW' => 3,
3907 'SOCK_SEQPACKET' => 5,
3908 'SOCK_RDM' => 4,
3909 'SOCK_DGRAM' => 2,
3910 'SOCK_PACKET' => 10
3911 );
3912
3913 compound_names
3914
3915 "compound_names"
3916 Returns a list of identifiers of all structs and unions (com‐
3917 pound data structures) that are defined in the parsed source
3918 code. Like enumerations, compounds don't need to have an iden‐
3919 tifier, nor do they need to be defined.
3920
3921 Again, the only way to retrieve information about all struct
3922 and union objects is to use the "compound" method and don't
3923 pass it any arguments. If you should need a list of all struct
3924 and union identifiers, you can use:
3925
3926 @compound = map { $_->{identifier} ⎪⎪ () } $c->compound;
3927
3928 The "def" method returns a true value for all identifiers
3929 returned by "compound_names".
3930
3931 If you need the names of only the structs or only the unions,
3932 use the "struct_names" and "union_names" methods respectively.
3933
3934 compound
3935
3936 "compound"
3937 "compound" LIST
3938 Returns a list of references to hashes containing detailed
3939 information about all compounds (structs and unions) that have
3940 been parsed.
3941
3942 If a list of struct/union identifiers is passed to the method,
3943 the returned list will only contain hash references for those
3944 compounds. The identifiers may optionally be prefixed by
3945 "struct" or "union", which limits the search to the specified
3946 kind of compound.
3947
3948 If an identifier cannot be found, the returned list will con‐
3949 tain an undefined value at that position.
3950
3951 In scalar context, the number of compounds will be returned as
3952 long as the number of arguments to the method call is not 1. In
3953 the latter case, a hash reference holding information for the
3954 compound will be returned.
3955
3956 The list returned by the "compound" method looks similar to
3957 this:
3958
3959 @compound = (
3960 {
3961 'identifier' => 'STRUCT_SV',
3962 'align' => 1,
3963 'context' => 'definitions.c(23)',
3964 'pack' => 0,
3965 'type' => 'struct',
3966 'declarations' => [
3967 {
3968 'declarators' => [
3969 {
3970 'declarator' => '*sv_any',
3971 'size' => 4,
3972 'offset' => 0
3973 }
3974 ],
3975 'type' => 'void'
3976 },
3977 {
3978 'declarators' => [
3979 {
3980 'declarator' => 'sv_refcnt',
3981 'size' => 4,
3982 'offset' => 4
3983 }
3984 ],
3985 'type' => 'U32'
3986 },
3987 {
3988 'declarators' => [
3989 {
3990 'declarator' => 'sv_flags',
3991 'size' => 4,
3992 'offset' => 8
3993 }
3994 ],
3995 'type' => 'U32'
3996 }
3997 ],
3998 'size' => 12
3999 },
4000 {
4001 'identifier' => 'xxx',
4002 'align' => 1,
4003 'context' => 'definitions.c(31)',
4004 'pack' => 0,
4005 'type' => 'struct',
4006 'declarations' => [
4007 {
4008 'declarators' => [
4009 {
4010 'declarator' => 'a',
4011 'size' => 4,
4012 'offset' => 0
4013 }
4014 ],
4015 'type' => 'int'
4016 },
4017 {
4018 'declarators' => [
4019 {
4020 'declarator' => 'b',
4021 'size' => 4,
4022 'offset' => 4
4023 }
4024 ],
4025 'type' => 'int'
4026 }
4027 ],
4028 'size' => 8
4029 },
4030 {
4031 'align' => 1,
4032 'context' => 'definitions.c(29)',
4033 'pack' => 0,
4034 'type' => 'union',
4035 'declarations' => [
4036 {
4037 'declarators' => [
4038 {
4039 'declarator' => 'abc[2]',
4040 'size' => 8,
4041 'offset' => 0
4042 }
4043 ],
4044 'type' => 'int'
4045 },
4046 {
4047 'declarators' => [
4048 {
4049 'declarator' => 'ab[3][4]',
4050 'size' => 96,
4051 'offset' => 0
4052 }
4053 ],
4054 'type' => 'struct xxx'
4055 },
4056 {
4057 'declarators' => [
4058 {
4059 'declarator' => 'ptr',
4060 'size' => 4,
4061 'offset' => 0
4062 }
4063 ],
4064 'type' => 'any'
4065 }
4066 ],
4067 'size' => 96
4068 }
4069 );
4070
4071 "identifier"
4072 holds the struct or union identifier. This key is not
4073 present if the compound has no identifier.
4074
4075 "context"
4076 is the context in which the struct or union is defined.
4077 This is the filename followed by the line number in paren‐
4078 theses.
4079
4080 "type"
4081 is either 'struct' or 'union'.
4082
4083 "size"
4084 is the size of the struct or union.
4085
4086 "align"
4087 is the alignment of the struct or union.
4088
4089 "pack"
4090 is the struct member alignment if the compound is packed,
4091 or zero otherwise.
4092
4093 "declarations"
4094 is an array of hash references describing each struct dec‐
4095 laration:
4096
4097 "type"
4098 is the type of the struct declaration. This may be a
4099 string or a reference to a hash describing the type.
4100
4101 "declarators"
4102 is an array of hashes describing each declarator:
4103
4104 "declarator"
4105 is a string representation of the declarator.
4106
4107 "offset"
4108 is the offset of the struct member represented by
4109 the current declarator relative to the beginning of
4110 the struct or union.
4111
4112 "size"
4113 is the size occupied by the struct member repre‐
4114 sented by the current declarator.
4115
4116 It may be useful to have separate lists for structs and unions.
4117 One way to retrieve such lists would be to use
4118
4119 push @{$_->{type} eq 'union' ? \@unions : \@structs}, $_
4120 for $c->compound;
4121
4122 However, you should use the "struct" and "union" methods, which
4123 is a lot simpler:
4124
4125 @structs = $c->struct;
4126 @unions = $c->union;
4127
4128 struct_names
4129
4130 "struct_names"
4131 Returns a list of all defined struct identifiers. This is
4132 equivalent to calling "compound_names", just that it only
4133 returns the names of the struct identifiers and doesn't return
4134 the names of the union identifiers.
4135
4136 struct
4137
4138 "struct"
4139 "struct" LIST
4140 Like the "compound" method, but only allows for structs.
4141
4142 union_names
4143
4144 "union_names"
4145 Returns a list of all defined union identifiers. This is
4146 equivalent to calling "compound_names", just that it only
4147 returns the names of the union identifiers and doesn't return
4148 the names of the struct identifiers.
4149
4150 union
4151
4152 "union"
4153 "union" LIST
4154 Like the "compound" method, but only allows for unions.
4155
4156 typedef_names
4157
4158 "typedef_names"
4159 Returns a list of all defined typedef identifiers. Typedefs
4160 that do not specify a type that you could actually work with
4161 will not be returned.
4162
4163 The "def" method returns a true value for all identifiers
4164 returned by "typedef_names".
4165
4166 typedef
4167
4168 "typedef"
4169 "typedef" LIST
4170 Returns a list of references to hashes containing detailed
4171 information about all typedefs that have been parsed.
4172
4173 If a list of typedef identifiers is passed to the method, the
4174 returned list will only contain hash references for those type‐
4175 defs.
4176
4177 If an identifier cannot be found, the returned list will con‐
4178 tain an undefined value at that position.
4179
4180 In scalar context, the number of typedefs will be returned as
4181 long as the number of arguments to the method call is not 1. In
4182 the latter case, a hash reference holding information for the
4183 typedef will be returned.
4184
4185 The list returned by the "typedef" method looks similar to
4186 this:
4187
4188 @typedef = (
4189 {
4190 'declarator' => 'U32',
4191 'type' => 'unsigned long'
4192 },
4193 {
4194 'declarator' => '*any',
4195 'type' => 'void'
4196 },
4197 {
4198 'declarator' => 'test',
4199 'type' => {
4200 'align' => 1,
4201 'context' => 'definitions.c(29)',
4202 'pack' => 0,
4203 'type' => 'union',
4204 'declarations' => [
4205 {
4206 'declarators' => [
4207 {
4208 'declarator' => 'abc[2]',
4209 'size' => 8,
4210 'offset' => 0
4211 }
4212 ],
4213 'type' => 'int'
4214 },
4215 {
4216 'declarators' => [
4217 {
4218 'declarator' => 'ab[3][4]',
4219 'size' => 96,
4220 'offset' => 0
4221 }
4222 ],
4223 'type' => 'struct xxx'
4224 },
4225 {
4226 'declarators' => [
4227 {
4228 'declarator' => 'ptr',
4229 'size' => 4,
4230 'offset' => 0
4231 }
4232 ],
4233 'type' => 'any'
4234 }
4235 ],
4236 'size' => 96
4237 }
4238 }
4239 );
4240
4241 "declarator"
4242 is the type declarator.
4243
4244 "type"
4245 is the type specification. This may be a string or a refer‐
4246 ence to a hash describing the type. See "enum" and "com‐
4247 pound" for a description on how to interpret this hash.
4248
4249 macro_names
4250
4251 "macro_names"
4252 Returns a list of all defined macro names.
4253
4254 The list returned by the "macro_names" method looks similar to
4255 this:
4256
4257 @macro_names = (
4258 '__STDC_VERSION__',
4259 '__STDC_HOSTED__',
4260 'DEFINED',
4261 'MULTIPLY',
4262 'ABC_SIZE'
4263 );
4264
4265 This works only as long as the preprocessor is not reset. See
4266 "Preprocessor configuration" for details.
4267
4268 macro
4269
4270 "macro"
4271 "macro" LIST
4272 Returns the definitions for all defined macros.
4273
4274 If a list of macro names is passed to the method, the returned
4275 list will only contain the definitions for those macros. For
4276 undefined macros, "undef" will be returned.
4277
4278 The list returned by the "macro" method looks similar to this:
4279
4280 @macro = (
4281 '__STDC_VERSION__ 199901L',
4282 '__STDC_HOSTED__ 1',
4283 'DEFINED',
4284 'MULTIPLY(x, y) ((x)*(y))',
4285 'ABC_SIZE 2'
4286 );
4287
4288 This works only as long as the preprocessor is not reset. See
4289 "Preprocessor configuration" for details.
4290
4292 You can alternatively call the following functions as methods on Con‐
4293 vert::Binary::C objects.
4294
4295 feature
4296
4297 "feature" STRING
4298 Checks if Convert::Binary::C was built with certain features.
4299 For example,
4300
4301 print "debugging version"
4302 if Convert::Binary::C::feature('debug');
4303
4304 will check if Convert::Binary::C was built with debugging sup‐
4305 port enabled. The "feature" function returns 1 if the feature
4306 is enabled, 0 if the feature is disabled, and "undef" if the
4307 feature is unknown. Currently the only features that can be
4308 checked are "ieeefp" and "debug".
4309
4310 You can enable or disable certain features at compile time of
4311 the module by using the
4312
4313 perl Makefile.PL enable-feature disable-feature
4314
4315 syntax.
4316
4317 native
4318
4319 "native"
4320 "native" STRING
4321 Returns the value of a property of the native system that Con‐
4322 vert::Binary::C was built on. For example,
4323
4324 $size = Convert::Binary::C::native('IntSize');
4325
4326 will fetch the size of an "int" on the native system. The fol‐
4327 lowing properties can be queried:
4328
4329 Alignment
4330 ByteOrder
4331 CharSize
4332 CompoundAlignment
4333 DoubleSize
4334 EnumSize
4335 FloatSize
4336 HostedC
4337 IntSize
4338 LongDoubleSize
4339 LongLongSize
4340 LongSize
4341 PointerSize
4342 ShortSize
4343 StdCVersion
4344 UnsignedBitfields
4345 UnsignedChars
4346
4347 You can also call "native" without arguments, in which case it
4348 will return a reference to a hash with all properties, like:
4349
4350 $native = {
4351 'StdCVersion' => undef,
4352 'ByteOrder' => 'LittleEndian',
4353 'LongSize' => 4,
4354 'IntSize' => 4,
4355 'HostedC' => 1,
4356 'ShortSize' => 2,
4357 'UnsignedChars' => 0,
4358 'DoubleSize' => 8,
4359 'CharSize' => 1,
4360 'EnumSize' => 4,
4361 'PointerSize' => 4,
4362 'FloatSize' => 4,
4363 'LongLongSize' => 8,
4364 'Alignment' => 4,
4365 'LongDoubleSize' => 12,
4366 'UnsignedBitfields' => 0,
4367 'CompoundAlignment' => 1
4368 };
4369
4370 The contents of that hash are suitable for passing them to the
4371 "configure" method.
4372
4374 Like perl itself, Convert::Binary::C can be compiled with debugging
4375 support that can then be selectively enabled at runtime. You can spec‐
4376 ify whether you like to build Convert::Binary::C with debugging support
4377 or not by explicitly giving an argument to Makefile.PL. Use
4378
4379 perl Makefile.PL enable-debug
4380
4381 to enable debugging, or
4382
4383 perl Makefile.PL disable-debug
4384
4385 to disable debugging. The default will depend on how your perl binary
4386 was built. If it was built with "-DDEBUGGING", Convert::Binary::C will
4387 be built with debugging support, too.
4388
4389 Once you have built Convert::Binary::C with debugging support, you can
4390 use the following syntax to enable debug output. Instead of
4391
4392 use Convert::Binary::C;
4393
4394 you simply say
4395
4396 use Convert::Binary::C debug => 'all';
4397
4398 which will enable all debug output. However, I don't recommend to
4399 enable all debug output, because that can be a fairly large amount.
4400
4401 Debugging options
4402
4403 Instead of saying "all", you can pass a string that consists of one or
4404 more of the following characters:
4405
4406 m enable memory allocation tracing
4407 M enable memory allocation & assertion tracing
4408
4409 h enable hash table debugging
4410 H enable hash table dumps
4411
4412 d enable debug output from the XS module
4413 c enable debug output from the ctlib
4414 t enable debug output about type objects
4415
4416 l enable debug output from the C lexer
4417 p enable debug output from the C parser
4418 P enable debug output from the C preprocessor
4419 r enable debug output from the #pragma parser
4420
4421 y enable debug output from yacc (bison)
4422
4423 So the following might give you a brief overview of what's going on
4424 inside Convert::Binary::C:
4425
4426 use Convert::Binary::C debug => 'dct';
4427
4428 When you want to debug memory allocation using
4429
4430 use Convert::Binary::C debug => 'm';
4431
4432 you can use the Perl script check_alloc.pl that resides in the
4433 ctlib/util/tool directory to extract statistics about memory usage and
4434 information about memory leaks from the resulting debug output.
4435
4436 Redirecting debug output
4437
4438 By default, all debug output is written to "stderr". You can, however,
4439 redirect the debug output to a file with the "debugfile" option:
4440
4441 use Convert::Binary::C debug => 'dcthHm',
4442 debugfile => './debug.out';
4443
4444 If the file cannot be opened, you'll receive a warning and the output
4445 will go the "stderr" way again.
4446
4447 Alternatively, you can use the environment variables "CBC_DEBUG_OPT"
4448 and "CBC_DEBUG_FILE" to turn on debug output.
4449
4450 If Convert::Binary::C is built without debugging support, passing the
4451 "debug" or "debugfile" options will cause a warning to be issued. The
4452 corresponding environment variables will simply be ignored.
4453
4455 "CBC_ORDER_MEMBERS"
4456
4457 Setting this variable to a non-zero value will globally turn on hash
4458 key ordering for compound members. Have a look at the "OrderMembers"
4459 option for details.
4460
4461 Setting the variable to the name of a perl module will additionally use
4462 this module instead of the predefined modules for member ordering to
4463 tie the hashes to.
4464
4465 "CBC_DEBUG_OPT"
4466
4467 If Convert::Binary::C is built with debugging support, you can use this
4468 variable to specify the debugging options.
4469
4470 "CBC_DEBUG_FILE"
4471
4472 If Convert::Binary::C is built with debugging support, you can use this
4473 variable to redirect the debug output to a file.
4474
4475 "CBC_DISABLE_PARSER"
4476
4477 This variable is intended purely for development. Setting it to a non-
4478 zero value disables the Convert::Binary::C parser, which means that no
4479 information is collected from the file or code that is parsed. However,
4480 the preprocessor will run, which is useful for benchmarking the pre‐
4481 processor.
4482
4484 Flexible array members are a feature introduced with ISO-C99. It's a
4485 common problem that you have a variable length data field at the end of
4486 a structure, for example an array of characters at the end of a message
4487 struct. ISO-C99 allows you to write this as:
4488
4489 struct message {
4490 long header;
4491 char data[];
4492 };
4493
4494 The advantage is that you clearly indicate that the size of the
4495 appended data is variable, and that the "data" member doesn't contrib‐
4496 ute to the size of the "message" structure.
4497
4498 When packing or unpacking data, Convert::Binary::C deals with flexible
4499 array members as if their length was adjustable. For example, "unpack"
4500 will adapt the length of the array depending on the input string:
4501
4502 $msg1 = $c->unpack('message', 'abcdefg');
4503 $msg2 = $c->unpack('message', 'abcdefghijkl');
4504
4505 The following data is unpacked:
4506
4507 $msg1 = {
4508 'data' => [
4509 101,
4510 102,
4511 103
4512 ],
4513 'header' => 1633837924
4514 };
4515 $msg2 = {
4516 'data' => [
4517 101,
4518 102,
4519 103,
4520 104,
4521 105,
4522 106,
4523 107,
4524 108
4525 ],
4526 'header' => 1633837924
4527 };
4528
4529 Similarly, pack will adjust the length of the output string according
4530 to the data you feed in:
4531
4532 use Data::Hexdumper;
4533
4534 $msg = {
4535 header => 4711,
4536 data => [0x10, 0x20, 0x30, 0x40, 0x77..0x88],
4537 };
4538
4539 $data = $c->pack('message', $msg);
4540
4541 print hexdump(data => $data);
4542
4543 This would print:
4544
4545 0x0000 : 00 00 12 67 10 20 30 40 77 78 79 7A 7B 7C 7D 7E : ...g..0@wxyz{⎪}~
4546 0x0010 : 7F 80 81 82 83 84 85 86 87 88 : ..........
4547
4548 Incomplete types such as
4549
4550 typedef unsigned long array[];
4551
4552 are handled in exactly the same way. Thus, you can easily
4553
4554 $array = $c->unpack('array', '?'x20);
4555
4556 which will unpack the following array:
4557
4558 $array = [
4559 1061109567,
4560 1061109567,
4561 1061109567,
4562 1061109567,
4563 1061109567
4564 ];
4565
4566 You can also alter the length of an array using the "Dimension" tag.
4567
4569 When using Convert::Binary::C to handle floating point values, you have
4570 to be aware of some limitations.
4571
4572 You're usually safe if all your platforms are using the IEEE floating
4573 point format. During the Convert::Binary::C build process, the "ieeefp"
4574 feature will automatically be enabled if the host is using IEEE float‐
4575 ing point. You can check for this feature at runtime using the "fea‐
4576 ture" function:
4577
4578 if (Convert::Binary::C::feature('ieeefp')) {
4579 # do something
4580 }
4581
4582 When IEEE floating point support is enabled, the module can also handle
4583 floating point values of a different byteorder.
4584
4585 If your host platform is not using IEEE floating point, the "ieeefp"
4586 feature will be disabled. Convert::Binary::C then will be more restric‐
4587 tive, refusing to handle any non-native floating point values.
4588
4589 However, Convert::Binary::C cannot detect the floating point format
4590 used by your target platform. It can only try to prevent problems in
4591 obvious cases. If you know your target platform has a completely dif‐
4592 ferent floating point format, don't use floating point conversion at
4593 all.
4594
4595 Whenever Convert::Binary::C detects that it cannot properly do floating
4596 point value conversion, it will issue a warning and will not attempt to
4597 convert the floating point value.
4598
4600 Bitfield support in Convert::Binary::C is currently in an experimental
4601 state. You are encouraged to test it, but you should not blindly rely
4602 on its results.
4603
4604 You are also encouraged to supply layouting algorithms for compilers
4605 whose bitfield implementation is not handled correctly at the moment.
4606 Even better that the plain algorithm is of course a patch that adds a
4607 new bitfield layouting engine.
4608
4609 While bitfields may not be handled correctly by the conversion routines
4610 yet, they are always parsed correctly. This means that you can reliably
4611 use the declarator fields as returned by the "struct" or "typedef"
4612 methods. Given the following source
4613
4614 struct bitfield {
4615 int seven:7;
4616 int :1;
4617 int four:4, :0;
4618 int integer;
4619 };
4620
4621 a call to "struct" will return
4622
4623 @struct = (
4624 {
4625 'identifier' => 'bitfield',
4626 'align' => 1,
4627 'context' => 'bitfields.c(1)',
4628 'pack' => 0,
4629 'type' => 'struct',
4630 'declarations' => [
4631 {
4632 'declarators' => [
4633 {
4634 'declarator' => 'seven:7'
4635 }
4636 ],
4637 'type' => 'int'
4638 },
4639 {
4640 'declarators' => [
4641 {
4642 'declarator' => ':1'
4643 }
4644 ],
4645 'type' => 'int'
4646 },
4647 {
4648 'declarators' => [
4649 {
4650 'declarator' => 'four:4'
4651 },
4652 {
4653 'declarator' => ':0'
4654 }
4655 ],
4656 'type' => 'int'
4657 },
4658 {
4659 'declarators' => [
4660 {
4661 'declarator' => 'integer',
4662 'size' => 4,
4663 'offset' => 4
4664 }
4665 ],
4666 'type' => 'int'
4667 }
4668 ],
4669 'size' => 8
4670 }
4671 );
4672
4673 No size/offset keys will currently be returned for bitfield entries.
4674
4676 Convert::Binary::C was designed to be thread-safe.
4677
4679 If you wish to derive a new class from Convert::Binary::C, this is rel‐
4680 atively easy. Despite their XS implementation, Convert::Binary::C
4681 objects are actually blessed hash references.
4682
4683 The XS data is stored in a read-only hash value for the key that is the
4684 empty string. So it is safe to use any non-empty hash key when deriving
4685 your own class. In addition, Convert::Binary::C does quite a lot of
4686 checks to detect corruption in the object hash.
4687
4688 If you store private data in the hash, you should override the "clone"
4689 method and provide the necessary code to clone your private data.
4690 You'll have to call "SUPER::clone", but this will only clone the Con‐
4691 vert::Binary::C part of the object.
4692
4693 For an example of a derived class, you can have a look at Con‐
4694 vert::Binary::C::Cached.
4695
4697 Convert::Binary::C should build and run on most of the platforms that
4698 Perl runs on:
4699
4700 · Various Linux systems
4701
4702 · Various BSD systems
4703
4704 · HP-UX
4705
4706 · Compaq/HP Tru64 Unix
4707
4708 · Mac-OS X
4709
4710 · Cygwin
4711
4712 · Windows 98/NT/2000/XP
4713
4714 Also, many architectures are supported:
4715
4716 · Various Intel Pentium and Itanium systems
4717
4718 · Various Alpha systems
4719
4720 · HP PA-RISC
4721
4722 · Power-PC
4723
4724 · StrongARM
4725
4726 The module should build with any perl binary from 5.004 up to the lat‐
4727 est development version.
4728
4730 Most of the time when you're really looking for Convert::Binary::C
4731 you'll actually end up finding one of the following modules. Some of
4732 them have different goals, so it's probably worth pointing out the dif‐
4733 ferences.
4734
4735 C::Include
4736
4737 Like Convert::Binary::C, this module aims at doing conversion from and
4738 to binary data based on C types. However, its configurability is very
4739 limited compared to Convert::Binary::C. Also, it does not parse all C
4740 code correctly. It's slower than Convert::Binary::C, doesn't have a
4741 preprocessor. On the plus side, it's written in pure Perl.
4742
4743 C::DynaLib::Struct
4744
4745 This module doesn't allow you to reuse your C source code. One main
4746 goal of Convert::Binary::C was to avoid code duplication or, even
4747 worse, having to maintain different representations of your data struc‐
4748 tures. Like C::Include, C::DynaLib::Struct is rather limited in its
4749 configurability.
4750
4751 Win32::API::Struct
4752
4753 This module has a special purpose. It aims at building structs for
4754 interfacing Perl code with Windows API code.
4755
4757 · My love Jennifer for always being there, for filling my life with joy
4758 and last but not least for proofreading the documentation.
4759
4760 · Alain Barbet <alian@cpan.org> for testing and debugging support.
4761
4762 · Mitchell N. Charity for giving me pointers into various interesting
4763 directions.
4764
4765 · Alexis Denis for making me improve (externally) and simplify (inter‐
4766 nally) floating point support. He can also be blamed (indirectly) for
4767 the "initializer" method, as I need it in my effort to support bit‐
4768 fields some day.
4769
4770 · Michael J. Hohmann <mjh@scientist.de> for endless discussions on our
4771 way to and back home from work, and for making me think about sup‐
4772 porting "pack" and "unpack" for compound members.
4773
4774 · Thorsten Jens <thojens@gmx.de> for testing the package on various
4775 platforms.
4776
4777 · Mark Overmeer <mark@overmeer.net> for suggesting the module name and
4778 giving invaluable feedback.
4779
4780 · Thomas Pornin <pornin@bolet.org> for his excellent "ucpp" preproces‐
4781 sor library.
4782
4783 · Marc Rosenthal for his suggestions and support.
4784
4785 · James Roskind, as his C parser was a great starting point to fix all
4786 the problems I had with my original parser based only on the ANSI
4787 ruleset.
4788
4789 · Gisbert W. Selke for spotting some interesting bugs and providing
4790 extensive reports.
4791
4792 · Steffen Zimmermann for a prolific discussion on the cloning algo‐
4793 rithm.
4794
4796 There's also a mailing list that you can join:
4797
4798 convert-binary-c@yahoogroups.com
4799
4800 To subscribe, simply send mail to:
4801
4802 convert-binary-c-subscribe@yahoogroups.com
4803
4804 You can use this mailing list for non-bug problems, questions or dis‐
4805 cussions.
4806
4808 I'm sure there are still lots of bugs in the code for this module. If
4809 you find any bugs, Convert::Binary::C doesn't seem to build on your
4810 system or any of its tests fail, please use the CPAN Request Tracker at
4811 <http://rt.cpan.org/> to create a ticket for the module. Alternatively,
4812 just send a mail to <mhx@cpan.org>.
4813
4815 Some features in Convert::Binary::C are marked as experimental. This
4816 has most probably one of the following reasons:
4817
4818 · The feature does not behave in exactly the way that I wish it did,
4819 possibly due to some limitations in the current design of the module.
4820
4821 · The feature hasn't been tested enough and may completely fail to pro‐
4822 duce the expected results.
4823
4824 I hope to fix most issues with these experimental features someday, but
4825 this may mean that I have to change the way they currently work in a
4826 way that's not backwards compatible. So if any of these features is
4827 useful to you, you can use it, but you should be aware that the behav‐
4828 iour or the interface may change in future releases of this module.
4829
4831 If you're interested in what I currently plan to improve (or fix), have
4832 a look at the TODO file.
4833
4835 If you're using my module and like it, you can show your appreciation
4836 by sending me a postcard from where you live. I won't urge you to do
4837 it, it's completely up to you. To me, this is just a very nice way of
4838 receiving feedback about my work. Please send your postcard to:
4839
4840 Marcus Holland-Moritz
4841 Kuppinger Weg 28
4842 71116 Gaertringen
4843 GERMANY
4844
4845 If you feel that sending a postcard is too much effort, you maybe want
4846 to rate the module at <http://cpanratings.perl.org/>.
4847
4849 Copyright (c) 2002-2008 Marcus Holland-Moritz. All rights reserved.
4850 This program is free software; you can redistribute it and/or modify it
4851 under the same terms as Perl itself.
4852
4853 The "ucpp" library is (c) 1998-2002 Thomas Pornin. For license and
4854 redistribution details refer to ctlib/ucpp/README.
4855
4856 Portions copyright (c) 1989, 1990 James A. Roskind.
4857
4858 The include files located in tests/include/include, which are used in
4859 some of the test scripts are (c) 1991-1999, 2000, 2001 Free Software
4860 Foundation, Inc. They are neither required to create the binary nor
4861 linked to the source code of this module in any other way.
4862
4864 See ccconfig, perl, perldata, perlop, perlvar, Data::Dumper and
4865 Scalar::Util.
4866
4867
4868
4869perl v5.8.8 2008-04-15 Convert::Binary::C(3)