1Jcode(3)              User Contributed Perl Documentation             Jcode(3)
2
3
4

NAME

6       Jcode - Japanese Charset Handler
7

SYNOPSIS

9        use Jcode;
10        #
11        # traditional
12        Jcode::convert(\$str, $ocode, $icode, "z");
13        # or OOP!
14        print Jcode->new($str)->h2z->tr($from, $to)->utf8;
15

DESCRIPTION

17       <Japanese document is now available as Jcode::Nihongo. >
18
19       Jcode.pm supports both object and traditional approach.  With object
20       approach, you can go like;
21
22         $iso_2022_jp = Jcode->new($str)->h2z->jis;
23
24       Which is more elegant than:
25
26         $iso_2022_jp = $str;
27         &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
28
29       For those unfamiliar with objects, Jcode.pm still supports "getcode()"
30       and "convert()."
31
32       If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
33       standard charset handler module for Perl 5.8 or later.
34

Methods

36       Methods mentioned here all return Jcode object unless otherwise
37       mentioned.
38
39   Constructors
40       $j = Jcode->new($str [, $icode])
41         Creates Jcode object $j from $str.  Input code is automatically
42         checked unless you explicitly set $icode. For available charset, see
43         getcode below.
44
45         For perl 5.8.1 or better, $icode can be any encoding name that Encode
46         understands.
47
48           $j = Jcode->new($european, 'iso-latin1');
49
50         When the object is stringified, it returns the EUC-converted string
51         so you can <print $j> instead of <print $j->euc>.
52
53         Passing Reference
54           Instead of scalar value, You can use reference as
55
56           Jcode->new(\$str);
57
58           This saves time a little bit.  In exchange of the value of $str
59           being converted. (In a way, $str is now "tied" to jcode object).
60
61       $j->set($str [, $icode])
62         Sets $j's internal string to $str.  Handy when you use Jcode object
63         repeatedly (saves time and memory to create object).
64
65          # converts mailbox to SJIS format
66          my $jconv = new Jcode;
67          $/ = 00;
68          while(&lt;&gt;){
69              print $jconv->set(\$_)->mime_decode->sjis;
70          }
71
72       $j->append($str [, $icode]);
73         Appends $str to $j's internal string.
74
75       $j = jcode($str [, $icode]);
76         shortcut for Jcode->new() so you can go like;
77
78   Encoded Strings
79       In general, you can retrieve encoded string as $j->encoded.
80
81       $sjis = jcode($str)->sjis
82       $euc = $j->euc
83       $jis = $j->jis
84       $sjis = $j->sjis
85       $ucs2 = $j->ucs2
86       $utf8 = $j->utf8
87         What you code is what you get :)
88
89       $iso_2022_jp = $j->iso_2022_jp
90         Same as "$j->h2z->jis".  Hankaku Kanas are forcibly converted to
91         Zenkaku.
92
93         For perl 5.8.1 and better, you can also use any encoding names and
94         aliases that Encode supports.  For example:
95
96           $european = $j->iso_latin1; # replace '-' with '_' for names.
97
98         FYI: Encode::Encoder uses similar trick.
99
100         $j->fallback($fallback)
101           For perl is 5.8.1 or better, Jcode stores the internal string in
102           UTF-8.  Any character that does not map to ->encoding are replaced
103           with a '?', which is Encode standard.
104
105             my $unistr = "\x{262f}"; # YIN YANG
106             my $j = jcode($unistr);  # $j->euc is '?'
107
108           You can change this behavior by specifying fallback like Encode.
109           Values are the same as Encode.  "Jcode::FB_PERLQQ",
110           "Jcode::FB_XMLCREF", "Jcode::FB_HTMLCREF" are aliased to those of
111           Encode for convenice.
112
113             print $j->fallback(Jcode::FB_PERLQQ)->euc;   # '\x{262f}'
114             print $j->fallback(Jcode::FB_XMLCREF)->euc;  # '&#x262f;'
115             print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '&#9775;'
116
117           The global variable $Jcode::FALLBACK stores the default fallback so
118           you can override that by assigning the value.
119
120             $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
121
122       [@lines =] $jcode->jfold([$width, $newline_str, $kref])
123         folds lines in jcode string every $width (default: 72) where $width
124         is the number of "halfwidth" character.  Fullwidth Characters are
125         counted as two.
126
127         with a newline string spefied by $newline_str (default: "\n").
128
129         Rudimentary kinsoku suppport is now available for Perl 5.8.1 and
130         better.
131
132       $length = $jcode->jlength();
133         returns character length properly, rather than byte length.
134
135   Methods that use MIME::Base64
136       To use methods below, you need MIME::Base64.  To install, simply
137
138          perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
139
140       If your perl is 5.6 or better, there is no need since MIME::Base64 is
141       bundled.
142
143       $mime_header = $j->mime_encode([$lf, $bpl])
144         Converts $str to MIME-Header documented in RFC1522.  When $lf is
145         specified, it uses $lf to fold line (default: \n).  When $bpl is
146         specified, it uses $bpl for the number of bytes (default: 76; this
147         number must be smaller than 76).
148
149         For Perl 5.8.1 or better, you can also encode MIME Header as:
150
151           $mime_header = $j->MIME_Header;
152
153         In which case the resulting $mime_header is MIME-B-encoded UTF-8
154         whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.
155         Most modern MUAs support both.
156
157       $j->mime_decode;
158         Decodes MIME-Header in Jcode object.  For perl 5.8.1 or better, you
159         can also do the same as:
160
161           Jcode->new($str, 'MIME-Header')
162
163   Hankaku vs. Zenkaku
164       $j->h2z([$keep_dakuten])
165         Converts X201 kana (Hankaku) to X208 kana (Zenkaku).  When
166         $keep_dakuten is set, it leaves dakuten as is (That is, "ka +
167         dakuten" is left as is instead of being converted to "ga")
168
169         You can retrieve the number of matches via $j->nmatch;
170
171       $j->z2h
172         Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
173
174         You can retrieve the number of matches via $j->nmatch;
175
176   Regexp emulators
177       To use "->m()" and "->s()", you need perl 5.8.1 or better.
178
179       $j->tr($from, $to, $opt);
180         Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-
181         JP strings.  On perl 5.8.1 or better, $from and $to can also be
182         flagged UTF-8 strings.
183
184         If $opt is set, "tr/$from/$to/$opt" is applied.  $opt must be 'c',
185         'd' or the combination thereof.
186
187         You can retrieve the number of matches via $j->nmatch;
188
189         The following methods are available only for perl 5.8.1 or better.
190
191       $j->s($patter, $replace, $opt);
192         Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in
193         EUC-JP or flagged UTF-8. $opt are the same as regexp options.  See
194         perlre for regexp options.
195
196         Like "$j->tr()", "$j->s()" returns the object itself so you can nest
197         the operation as follows;
198
199           $j->tr("a-z", "A-Z")->s("foo", "bar");
200
201       [@match = ] $j->m($pattern, $opt);
202         Applies "m/$patter/$opt".  Note that this method DOES NOT RETURN AN
203         OBJECT so you can't chain the method like  "$j->s()".
204
205   Instance Variables
206       If you need to access instance variables of Jcode object, use access
207       methods below instead of directly accessing them (That's what OOP is
208       all about)
209
210       FYI, Jcode uses a ref to array instead of ref to hash (common way) to
211       optimize speed (Actually you don't have to know as long as you use
212       access methods instead;  Once again, that's OOP)
213
214       $j->r_str
215         Reference to the EUC-coded String.
216
217       $j->icode
218         Input charcode in recent operation.
219
220       $j->nmatch
221         Number of matches (Used in $j->tr, etc.)
222

Subroutines

224       ($code, [$nmatch]) = getcode($str)
225         Returns char code of $str. Return codes are as follows
226
227          ascii   Ascii (Contains no Japanese Code)
228          binary  Binary (Not Text File)
229          euc     EUC-JP
230          sjis    SHIFT_JIS
231          jis     JIS (ISO-2022-JP)
232          ucs2    UCS2 (Raw Unicode)
233          utf8    UTF8
234
235         When array context is used instead of scaler, it also returns how
236         many character codes are found.  As mentioned above, $str can be
237         \$str instead.
238
239         jcode.pl Users:  This function is 100% upper-conpatible with
240         jcode::getcode() -- well, almost;
241
242          * When its return value is an array, the order is the opposite;
243            jcode::getcode() returns $nmatch first.
244
245          * jcode::getcode() returns 'undef' when the number of EUC characters
246            is equal to that of SJIS.  Jcode::getcode() returns EUC.  for
247            Jcode.pm there is no in-betweens.
248
249       Jcode::convert($str, [$ocode, $icode, $opt])
250         Converts $str to char code specified by $ocode.  When $icode is
251         specified also, it assumes $icode for input string instead of the one
252         checked by getcode(). As mentioned above, $str can be \$str instead.
253
254         jcode.pl Users:  This function is 100% upper-conpatible with
255         jcode::convert() !
256

BUGS

258       For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode.  Meaning
259       Jcode is subject to bugs therein.
260

ACKNOWLEDGEMENTS

262       This package owes a lot in motivation, design, and code, to the
263       jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
264
265       Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from
266       the very first stage of development.
267
268       JEncode by makamaka@donzoko.net has inspired me to integrate Encode to
269       Jcode.  He has also contributed Japanese POD.
270
271       And folks at Jcode Mailing list <jcode5@ring.gr.jp>.  Without them, I
272       couldn't have coded this far.
273

SEE ALSO

275       Encode
276
277       Jcode::Nihongo
278
279       <http://www.iana.org/assignments/character-sets>
280
282       Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
283
284       This library is free software; you can redistribute it and/or modify it
285       under the same terms as Perl itself.
286
287
288
289perl v5.28.1                      2008-05-10                          Jcode(3)
Impressum