1Jcode(3) User Contributed Perl Documentation Jcode(3)
2
3
4
6 Jcode - Japanese Charset Handler
7
9 use Jcode;
10 #
11 # traditional
12 Jcode::convert(\$str, $ocode, $icode, "z");
13 # or OOP!
14 print Jcode->new($str)->h2z->tr($from, $to)->utf8;
15
17 <Japanese document is now available as Jcode::Nihongo. >
18
19 Jcode.pm supports both object and traditional approach. With object
20 approach, you can go like;
21
22 $iso_2022_jp = Jcode->new($str)->h2z->jis;
23
24 Which is more elegant than:
25
26 $iso_2022_jp = $str;
27 &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
28
29 For those unfamiliar with objects, Jcode.pm still supports "getcode()"
30 and "convert()."
31
32 If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
33 standard charset handler module for Perl 5.8 or later.
34
36 Methods mentioned here all return Jcode object unless otherwise men‐
37 tioned.
38
39 Constructors
40
41 $j = Jcode->new($str [, $icode])
42 Creates Jcode object $j from $str. Input code is automatically
43 checked unless you explicitly set $icode. For available charset, see
44 getcode below.
45
46 For perl 5.8.1 or better, $icode can be any encoding name that Encode
47 understands.
48
49 $j = Jcode->new($european, 'iso-latin1');
50
51 When the object is stringified, it returns the EUC-converted string
52 so you can <print $j> instead of <print $j->euc>.
53
54 Passing Reference
55 Instead of scalar value, You can use reference as
56
57 Jcode->new(\$str);
58
59 This saves time a little bit. In exchange of the value of $str
60 being converted. (In a way, $str is now "tied" to jcode object).
61
62 $j->set($str [, $icode])
63 Sets $j's internal string to $str. Handy when you use Jcode object
64 repeatedly (saves time and memory to create object).
65
66 # converts mailbox to SJIS format
67 my $jconv = new Jcode;
68 $/ = 00;
69 while(<>){
70 print $jconv->set(\$_)->mime_decode->sjis;
71 }
72
73 $j->append($str [, $icode]);
74 Appends $str to $j's internal string.
75
76 $j = jcode($str [, $icode]);
77 shortcut for Jcode->new() so you can go like;
78
79 Encoded Strings
80
81 In general, you can retrieve encoded string as $j->encoded.
82
83 $sjis = jcode($str)->sjis
84 $euc = $j->euc
85 $jis = $j->jis
86 $sjis = $j->sjis
87 $ucs2 = $j->ucs2
88 $utf8 = $j->utf8
89 What you code is what you get :)
90
91 $iso_2022_jp = $j->iso_2022_jp
92 Same as "$j->h2z->jis". Hankaku Kanas are forcibly converted to
93 Zenkaku.
94
95 For perl 5.8.1 and better, you can also use any encoding names and
96 aliases that Encode supports. For example:
97
98 $european = $j->iso_latin1; # replace '-' with '_' for names.
99
100 FYI: Encode::Encoder uses similar trick.
101
102 $j->fallback($fallback)
103 For perl is 5.8.1 or better, Jcode stores the internal string in
104 UTF-8. Any character that does not map to ->encoding are replaced
105 with a '?', which is Encode standard.
106
107 my $unistr = "\x{262f}"; # YIN YANG
108 my $j = jcode($unistr); # $j->euc is '?'
109
110 You can change this behavior by specifying fallback like Encode.
111 Values are the same as Encode. "Jcode::FB_PERLQQ", "Jcode::FB_XML‐
112 CREF", "Jcode::FB_HTMLCREF" are aliased to those of Encode for con‐
113 venice.
114
115 print $j->fallback(Jcode::FB_PERLQQ)->euc; # '\x{262f}'
116 print $j->fallback(Jcode::FB_XMLCREF)->euc; # '☯'
117 print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '☯'
118
119 The global variable $Jcode::FALLBACK stores the default fallback so
120 you can override that by assigning the value.
121
122 $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
123
124 [@lines =] $jcode->jfold([$width, $newline_str, $kref])
125 folds lines in jcode string every $width (default: 72) where $width
126 is the number of "halfwidth" character. Fullwidth Characters are
127 counted as two.
128
129 with a newline string spefied by $newline_str (default: "\n").
130
131 Rudimentary kinsoku suppport is now available for Perl 5.8.1 and bet‐
132 ter.
133
134 $length = $jcode->jlength();
135 returns character length properly, rather than byte length.
136
137 Methods that use MIME::Base64
138
139 To use methods below, you need MIME::Base64. To install, simply
140
141 perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
142
143 If your perl is 5.6 or better, there is no need since MIME::Base64 is
144 bundled.
145
146 $mime_header = $j->mime_encode([$lf, $bpl])
147 Converts $str to MIME-Header documented in RFC1522. When $lf is
148 specified, it uses $lf to fold line (default: \n). When $bpl is
149 specified, it uses $bpl for the number of bytes (default: 76; this
150 number must be smaller than 76).
151
152 For Perl 5.8.1 or better, you can also encode MIME Header as:
153
154 $mime_header = $j->MIME_Header;
155
156 In which case the resulting $mime_header is MIME-B-encoded UTF-8
157 whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.
158 Most modern MUAs support both.
159
160 $j->mime_decode;
161 Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you
162 can also do the same as:
163
164 Jcode->new($str, 'MIME-Header')
165
166 Hankaku vs. Zenkaku
167
168 $j->h2z([$keep_dakuten])
169 Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When
170 $keep_dakuten is set, it leaves dakuten as is (That is, "ka +
171 dakuten" is left as is instead of being converted to "ga")
172
173 You can retrieve the number of matches via $j->nmatch;
174
175 $j->z2h
176 Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
177
178 You can retrieve the number of matches via $j->nmatch;
179
180 Regexp emulators
181
182 To use "->m()" and "->s()", you need perl 5.8.1 or better.
183
184 $j->tr($from, $to, $opt);
185 Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-
186 JP strings. On perl 5.8.1 or better, $from and $to can also be
187 flagged UTF-8 strings.
188
189 If $opt is set, "tr/$from/$to/$opt" is applied. $opt must be 'c',
190 'd' or the combination thereof.
191
192 You can retrieve the number of matches via $j->nmatch;
193
194 The following methods are available only for perl 5.8.1 or better.
195
196 $j->s($patter, $replace, $opt);
197 Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in
198 EUC-JP or flagged UTF-8. $opt are the same as regexp options. See
199 perlre for regexp options.
200
201 Like "$j->tr()", "$j->s()" returns the object itself so you can nest
202 the operation as follows;
203
204 $j->tr("a-z", "A-Z")->s("foo", "bar");
205
206 [@match = ] $j->m($pattern, $opt);
207 Applies "m/$patter/$opt". Note that this method DOES NOT RETURN AN
208 OBJECT so you can't chain the method like "$j->s()".
209
210 Instance Variables
211
212 If you need to access instance variables of Jcode object, use access
213 methods below instead of directly accessing them (That's what OOP is
214 all about)
215
216 FYI, Jcode uses a ref to array instead of ref to hash (common way) to
217 optimize speed (Actually you don't have to know as long as you use
218 access methods instead; Once again, that's OOP)
219
220 $j->r_str
221 Reference to the EUC-coded String.
222
223 $j->icode
224 Input charcode in recent operation.
225
226 $j->nmatch
227 Number of matches (Used in $j->tr, etc.)
228
230 ($code, [$nmatch]) = getcode($str)
231 Returns char code of $str. Return codes are as follows
232
233 ascii Ascii (Contains no Japanese Code)
234 binary Binary (Not Text File)
235 euc EUC-JP
236 sjis SHIFT_JIS
237 jis JIS (ISO-2022-JP)
238 ucs2 UCS2 (Raw Unicode)
239 utf8 UTF8
240
241 When array context is used instead of scaler, it also returns how
242 many character codes are found. As mentioned above, $str can be
243 \$str instead.
244
245 jcode.pl Users: This function is 100% upper-conpatible with
246 jcode::getcode() -- well, almost;
247
248 * When its return value is an array, the order is the opposite;
249 jcode::getcode() returns $nmatch first.
250
251 * jcode::getcode() returns 'undef' when the number of EUC characters
252 is equal to that of SJIS. Jcode::getcode() returns EUC. for
253 Jcode.pm there is no in-betweens.
254
255 Jcode::convert($str, [$ocode, $icode, $opt])
256 Converts $str to char code specified by $ocode. When $icode is spec‐
257 ified also, it assumes $icode for input string instead of the one
258 checked by getcode(). As mentioned above, $str can be \$str instead.
259
260 jcode.pl Users: This function is 100% upper-conpatible with
261 jcode::convert() !
262
264 For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning
265 Jcode is subject to bugs therein.
266
268 This package owes a lot in motivation, design, and code, to the
269 jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
270
271 Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from
272 the very first stage of development.
273
274 JEncode by makamaka@donzoko.net has inspired me to integrate Encode to
275 Jcode. He has also contributed Japanese POD.
276
277 And folks at Jcode Mailing list <jcode5@ring.gr.jp>. Without them, I
278 couldn't have coded this far.
279
281 Encode
282
283 Jcode::Nihongo
284
285 <http://www.iana.org/assignments/character-sets>
286
288 Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
289
290 This library is free software; you can redistribute it and/or modify it
291 under the same terms as Perl itself.
292
293
294
295perl v5.8.8 2005-02-19 Jcode(3)