1Jcode(3)              User Contributed Perl Documentation             Jcode(3)
2
3
4

NAME

6       Jcode - Japanese Charset Handler
7

SYNOPSIS

9        use Jcode;
10        #
11        # traditional
12        Jcode::convert(\$str, $ocode, $icode, "z");
13        # or OOP!
14        print Jcode->new($str)->h2z->tr($from, $to)->utf8;
15

DESCRIPTION

17       <Japanese document is now available as Jcode::Nihongo. >
18
19       Jcode.pm supports both object and traditional approach.  With object
20       approach, you can go like;
21
22         $iso_2022_jp = Jcode->new($str)->h2z->jis;
23
24       Which is more elegant than:
25
26         $iso_2022_jp = $str;
27         &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
28
29       For those unfamiliar with objects, Jcode.pm still supports "getcode()"
30       and "convert()."
31
32       If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
33       standard charset handler module for Perl 5.8 or later.
34

Methods

36       Methods mentioned here all return Jcode object unless otherwise men‐
37       tioned.
38
39       Constructors
40
41       $j = Jcode->new($str [, $icode])
42         Creates Jcode object $j from $str.  Input code is automatically
43         checked unless you explicitly set $icode. For available charset, see
44         getcode below.
45
46         For perl 5.8.1 or better, $icode can be any encoding name that Encode
47         understands.
48
49           $j = Jcode->new($european, 'iso-latin1');
50
51         When the object is stringified, it returns the EUC-converted string
52         so you can <print $j> instead of <print $j->euc>.
53
54         Passing Reference
55           Instead of scalar value, You can use reference as
56
57           Jcode->new(\$str);
58
59           This saves time a little bit.  In exchange of the value of $str
60           being converted. (In a way, $str is now "tied" to jcode object).
61
62       $j->set($str [, $icode])
63         Sets $j's internal string to $str.  Handy when you use Jcode object
64         repeatedly (saves time and memory to create object).
65
66          # converts mailbox to SJIS format
67          my $jconv = new Jcode;
68          $/ = 00;
69          while(&lt;&gt;){
70              print $jconv->set(\$_)->mime_decode->sjis;
71          }
72
73       $j->append($str [, $icode]);
74         Appends $str to $j's internal string.
75
76       $j = jcode($str [, $icode]);
77         shortcut for Jcode->new() so you can go like;
78
79       Encoded Strings
80
81       In general, you can retrieve encoded string as $j->encoded.
82
83       $sjis = jcode($str)->sjis
84       $euc = $j->euc
85       $jis = $j->jis
86       $sjis = $j->sjis
87       $ucs2 = $j->ucs2
88       $utf8 = $j->utf8
89         What you code is what you get :)
90
91       $iso_2022_jp = $j->iso_2022_jp
92         Same as "$j->h2z->jis".  Hankaku Kanas are forcibly converted to
93         Zenkaku.
94
95         For perl 5.8.1 and better, you can also use any encoding names and
96         aliases that Encode supports.  For example:
97
98           $european = $j->iso_latin1; # replace '-' with '_' for names.
99
100         FYI: Encode::Encoder uses similar trick.
101
102         $j->fallback($fallback)
103           For perl is 5.8.1 or better, Jcode stores the internal string in
104           UTF-8.  Any character that does not map to ->encoding are replaced
105           with a '?', which is Encode standard.
106
107             my $unistr = "\x{262f}"; # YIN YANG
108             my $j = jcode($unistr);  # $j->euc is '?'
109
110           You can change this behavior by specifying fallback like Encode.
111           Values are the same as Encode.  "Jcode::FB_PERLQQ", "Jcode::FB_XML‐
112           CREF", "Jcode::FB_HTMLCREF" are aliased to those of Encode for con‐
113           venice.
114
115             print $j->fallback(Jcode::FB_PERLQQ)->euc;   # '\x{262f}'
116             print $j->fallback(Jcode::FB_XMLCREF)->euc;  # '&#x262f;'
117             print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '&#9775;'
118
119           The global variable $Jcode::FALLBACK stores the default fallback so
120           you can override that by assigning the value.
121
122             $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
123
124       [@lines =] $jcode->jfold([$width, $newline_str, $kref])
125         folds lines in jcode string every $width (default: 72) where $width
126         is the number of "halfwidth" character.  Fullwidth Characters are
127         counted as two.
128
129         with a newline string spefied by $newline_str (default: "\n").
130
131         Rudimentary kinsoku suppport is now available for Perl 5.8.1 and bet‐
132         ter.
133
134       $length = $jcode->jlength();
135         returns character length properly, rather than byte length.
136
137       Methods that use MIME::Base64
138
139       To use methods below, you need MIME::Base64.  To install, simply
140
141          perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
142
143       If your perl is 5.6 or better, there is no need since MIME::Base64 is
144       bundled.
145
146       $mime_header = $j->mime_encode([$lf, $bpl])
147         Converts $str to MIME-Header documented in RFC1522.  When $lf is
148         specified, it uses $lf to fold line (default: \n).  When $bpl is
149         specified, it uses $bpl for the number of bytes (default: 76; this
150         number must be smaller than 76).
151
152         For Perl 5.8.1 or better, you can also encode MIME Header as:
153
154           $mime_header = $j->MIME_Header;
155
156         In which case the resulting $mime_header is MIME-B-encoded UTF-8
157         whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.
158         Most modern MUAs support both.
159
160       $j->mime_decode;
161         Decodes MIME-Header in Jcode object.  For perl 5.8.1 or better, you
162         can also do the same as:
163
164           Jcode->new($str, 'MIME-Header')
165
166       Hankaku vs. Zenkaku
167
168       $j->h2z([$keep_dakuten])
169         Converts X201 kana (Hankaku) to X208 kana (Zenkaku).  When
170         $keep_dakuten is set, it leaves dakuten as is (That is, "ka +
171         dakuten" is left as is instead of being converted to "ga")
172
173         You can retrieve the number of matches via $j->nmatch;
174
175       $j->z2h
176         Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
177
178         You can retrieve the number of matches via $j->nmatch;
179
180       Regexp emulators
181
182       To use "->m()" and "->s()", you need perl 5.8.1 or better.
183
184       $j->tr($from, $to, $opt);
185         Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-
186         JP strings.  On perl 5.8.1 or better, $from and $to can also be
187         flagged UTF-8 strings.
188
189         If $opt is set, "tr/$from/$to/$opt" is applied.  $opt must be 'c',
190         'd' or the combination thereof.
191
192         You can retrieve the number of matches via $j->nmatch;
193
194         The following methods are available only for perl 5.8.1 or better.
195
196       $j->s($patter, $replace, $opt);
197         Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in
198         EUC-JP or flagged UTF-8. $opt are the same as regexp options.  See
199         perlre for regexp options.
200
201         Like "$j->tr()", "$j->s()" returns the object itself so you can nest
202         the operation as follows;
203
204           $j->tr("a-z", "A-Z")->s("foo", "bar");
205
206       [@match = ] $j->m($pattern, $opt);
207         Applies "m/$patter/$opt".  Note that this method DOES NOT RETURN AN
208         OBJECT so you can't chain the method like  "$j->s()".
209
210       Instance Variables
211
212       If you need to access instance variables of Jcode object, use access
213       methods below instead of directly accessing them (That's what OOP is
214       all about)
215
216       FYI, Jcode uses a ref to array instead of ref to hash (common way) to
217       optimize speed (Actually you don't have to know as long as you use
218       access methods instead;  Once again, that's OOP)
219
220       $j->r_str
221         Reference to the EUC-coded String.
222
223       $j->icode
224         Input charcode in recent operation.
225
226       $j->nmatch
227         Number of matches (Used in $j->tr, etc.)
228

Subroutines

230       ($code, [$nmatch]) = getcode($str)
231         Returns char code of $str. Return codes are as follows
232
233          ascii   Ascii (Contains no Japanese Code)
234          binary  Binary (Not Text File)
235          euc     EUC-JP
236          sjis    SHIFT_JIS
237          jis     JIS (ISO-2022-JP)
238          ucs2    UCS2 (Raw Unicode)
239          utf8    UTF8
240
241         When array context is used instead of scaler, it also returns how
242         many character codes are found.  As mentioned above, $str can be
243         \$str instead.
244
245         jcode.pl Users:  This function is 100% upper-conpatible with
246         jcode::getcode() -- well, almost;
247
248          * When its return value is an array, the order is the opposite;
249            jcode::getcode() returns $nmatch first.
250
251          * jcode::getcode() returns 'undef' when the number of EUC characters
252            is equal to that of SJIS.  Jcode::getcode() returns EUC.  for
253            Jcode.pm there is no in-betweens.
254
255       Jcode::convert($str, [$ocode, $icode, $opt])
256         Converts $str to char code specified by $ocode.  When $icode is spec‐
257         ified also, it assumes $icode for input string instead of the one
258         checked by getcode(). As mentioned above, $str can be \$str instead.
259
260         jcode.pl Users:  This function is 100% upper-conpatible with
261         jcode::convert() !
262

BUGS

264       For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode.  Meaning
265       Jcode is subject to bugs therein.
266

ACKNOWLEDGEMENTS

268       This package owes a lot in motivation, design, and code, to the
269       jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
270
271       Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from
272       the very first stage of development.
273
274       JEncode by makamaka@donzoko.net has inspired me to integrate Encode to
275       Jcode.  He has also contributed Japanese POD.
276
277       And folks at Jcode Mailing list <jcode5@ring.gr.jp>.  Without them, I
278       couldn't have coded this far.
279

SEE ALSO

281       Encode
282
283       Jcode::Nihongo
284
285       <http://www.iana.org/assignments/character-sets>
286
288       Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
289
290       This library is free software; you can redistribute it and/or modify it
291       under the same terms as Perl itself.
292
293
294
295perl v5.8.8                       2005-02-19                          Jcode(3)
Impressum