1MARC::Charset(3)      User Contributed Perl Documentation     MARC::Charset(3)
2
3
4

NAME

6       MARC::Charset - convert MARC-8 encoded strings to UTF-8
7

SYNOPSIS

9           # import the marc8_to_utf8 function
10           use MARC::Charset 'marc8_to_utf8';
11
12           # prepare STDOUT for utf8
13           binmode(STDOUT, 'utf8');
14
15           # print out some marc8 as utf8
16           print marc8_to_utf8($marc8_string);
17

DESCRIPTION

19       MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8
20       strings. MARC-8 is a single byte character encoding that predates
21       unicode, and allows you to put non-Roman scripts in MARC bibliographic
22       records.
23
24           http://www.loc.gov/marc/specifications/spechome.html
25

EXPORTS

27   ignore_errors()
28       Tells MARC::Charset whether or not to ignore all encoding errors, and
29       returns the current setting.  This is helpful if you have records that
30       contain both MARC8 and UNICODE characters.
31
32           my $ignore = MARC::Charset->ignore_errors();
33
34           MARC::Charset->ignore_errors(1); # ignore errors
35           MARC::Charset->ignore_errors(0); # DO NOT ignore errors
36
37   assume_unicode()
38       Tells MARC::Charset whether or not to assume UNICODE when an error is
39       encountered in ignore_errors mode and returns the current setting.
40       This is helpful if you have records that contain both MARC8 and UNICODE
41       characters.
42
43           my $setting = MARC::Charset->assume_unicode();
44
45           MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
46           MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
47
48   assume_encoding()
49       Tells MARC::Charset whether or not to assume a specific encoding when
50       an error is encountered in ignore_errors mode and returns the current
51       setting.  This is helpful if you have records that contain both MARC8
52       and other characters.
53
54           my $setting = MARC::Charset->assume_encoding();
55
56           MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
57           MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
58
59   marc8_to_utf8()
60       Converts a MARC-8 encoded string to UTF-8.
61
62           my $utf8 = marc8_to_utf8($marc8);
63
64       If you'd like to ignore errors pass in a true value as the 2nd
65       parameter or call MARC::Charset->ignore_errors() with a true value:
66
67           my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
68
69         or
70
71           MARC::Charset->ignore_errors(1);
72           my $utf8 = marc8_to_utf8($marc8);
73
74   utf8_to_marc8()
75       Will attempt to translate utf8 into marc8.
76
77           my $marc8 = utf8_to_marc8($utf8);
78
79       If you'd like to ignore errors, or characters that can't be converted
80       to marc8 then pass in a true value as the second parameter:
81
82           my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
83
84         or
85
86           MARC::Charset->ignore_errors(1);
87           my $utf8 = marc8_to_utf8($marc8);
88

DEFAULT CHARACTER SETS

90       If you need to alter the default character sets you can set the
91       $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to
92       the appropriate character set code:
93
94           use MARC::Charset::Constants qw(:all);
95           $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
96           $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
97

SEE ALSO

99       •   MARC::Charset::Constant
100
101       •   MARC::Charset::Table
102
103       •   MARC::Charset::Code
104
105       •   MARC::Charset::Compiler
106
107       •   MARC::Record
108
109       •   MARC::XML
110

AUTHOR

112       Ed Summers (ehs@pobox.com)
113
114
115
116perl v5.34.0                      2022-01-21                  MARC::Charset(3)
Impressum