1MARC::Charset(3) User Contributed Perl Documentation MARC::Charset(3)
2
3
4
6 MARC::Charset - convert MARC-8 encoded strings to UTF-8
7
9 # import the marc8_to_utf8 function
10 use MARC::Charset 'marc8_to_utf8';
11
12 # prepare STDOUT for utf8
13 binmode(STDOUT, 'utf8');
14
15 # print out some marc8 as utf8
16 print marc8_to_utf8($marc8_string);
17
19 MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8
20 strings. MARC-8 is a single byte character encoding that predates
21 unicode, and allows you to put non-Roman scripts in MARC bibliographic
22 records.
23
24 http://www.loc.gov/marc/specifications/spechome.html
25
27 ignore_errors()
28 Tells MARC::Charset whether or not to ignore all encoding errors, and
29 returns the current setting. This is helpful if you have records that
30 contain both MARC8 and UNICODE characters.
31
32 my $ignore = MARC::Charset->ignore_errors();
33
34 MARC::Charset->ignore_errors(1); # ignore errors
35 MARC::Charset->ignore_errors(0); # DO NOT ignore errors
36
37 assume_unicode()
38 Tells MARC::Charset whether or not to assume UNICODE when an error is
39 encountered in ignore_errors mode and returns the current setting.
40 This is helpful if you have records that contain both MARC8 and UNICODE
41 characters.
42
43 my $setting = MARC::Charset->assume_unicode();
44
45 MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
46 MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
47
48 assume_encoding()
49 Tells MARC::Charset whether or not to assume a specific encoding when
50 an error is encountered in ignore_errors mode and returns the current
51 setting. This is helpful if you have records that contain both MARC8
52 and other characters.
53
54 my $setting = MARC::Charset->assume_encoding();
55
56 MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
57 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
58
59 marc8_to_utf8()
60 Converts a MARC-8 encoded string to UTF-8.
61
62 my $utf8 = marc8_to_utf8($marc8);
63
64 If you'd like to ignore errors pass in a true value as the 2nd
65 parameter or call MARC::Charset->ignore_errors() with a true value:
66
67 my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
68
69 or
70
71 MARC::Charset->ignore_errors(1);
72 my $utf8 = marc8_to_utf8($marc8);
73
74 utf8_to_marc8()
75 Will attempt to translate utf8 into marc8.
76
77 my $marc8 = utf8_to_marc8($utf8);
78
79 If you'd like to ignore errors, or characters that can't be converted
80 to marc8 then pass in a true value as the second parameter:
81
82 my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
83
84 or
85
86 MARC::Charset->ignore_errors(1);
87 my $utf8 = marc8_to_utf8($marc8);
88
90 If you need to alter the default character sets you can set the
91 $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to
92 the appropriate character set code:
93
94 use MARC::Charset::Constants qw(:all);
95 $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
96 $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
97
99 • MARC::Charset::Constant
100
101 • MARC::Charset::Table
102
103 • MARC::Charset::Code
104
105 • MARC::Charset::Compiler
106
107 • MARC::Record
108
109 • MARC::XML
110
112 Ed Summers (ehs@pobox.com)
113
114
115
116perl v5.36.0 2022-07-22 MARC::Charset(3)