1Jcode(3) User Contributed Perl Documentation Jcode(3)
2
3
4
6 Jcode - Japanese Charset Handler
7
9 use Jcode;
10 #
11 # traditional
12 Jcode::convert(\$str, $ocode, $icode, "z");
13 # or OOP!
14 print Jcode->new($str)->h2z->tr($from, $to)->utf8;
15
17 <Japanese document is now available as Jcode::Nihongo. >
18
19 Jcode.pm supports both object and traditional approach. With object
20 approach, you can go like;
21
22 $iso_2022_jp = Jcode->new($str)->h2z->jis;
23
24 Which is more elegant than:
25
26 $iso_2022_jp = $str;
27 &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
28
29 For those unfamiliar with objects, Jcode.pm still supports "getcode()"
30 and "convert()."
31
32 If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
33 standard charset handler module for Perl 5.8 or later.
34
36 Methods mentioned here all return Jcode object unless otherwise
37 mentioned.
38
39 Constructors
40 $j = Jcode->new($str [, $icode])
41 Creates Jcode object $j from $str. Input code is automatically
42 checked unless you explicitly set $icode. For available charset, see
43 getcode below.
44
45 For perl 5.8.1 or better, $icode can be any encoding name that Encode
46 understands.
47
48 $j = Jcode->new($european, 'iso-latin1');
49
50 When the object is stringified, it returns the EUC-converted string
51 so you can <print $j> instead of <print $j->euc>.
52
53 Passing Reference
54 Instead of scalar value, You can use reference as
55
56 Jcode->new(\$str);
57
58 This saves time a little bit. In exchange of the value of $str
59 being converted. (In a way, $str is now "tied" to jcode object).
60
61 $j->set($str [, $icode])
62 Sets $j's internal string to $str. Handy when you use Jcode object
63 repeatedly (saves time and memory to create object).
64
65 # converts mailbox to SJIS format
66 my $jconv = new Jcode;
67 $/ = 00;
68 while(<>){
69 print $jconv->set(\$_)->mime_decode->sjis;
70 }
71
72 $j->append($str [, $icode]);
73 Appends $str to $j's internal string.
74
75 $j = jcode($str [, $icode]);
76 shortcut for Jcode->new() so you can go like;
77
78 Encoded Strings
79 In general, you can retrieve encoded string as $j->encoded.
80
81 $sjis = jcode($str)->sjis
82 $euc = $j->euc
83 $jis = $j->jis
84 $sjis = $j->sjis
85 $ucs2 = $j->ucs2
86 $utf8 = $j->utf8
87 What you code is what you get :)
88
89 $iso_2022_jp = $j->iso_2022_jp
90 Same as "$j->h2z->jis". Hankaku Kanas are forcibly converted to
91 Zenkaku.
92
93 For perl 5.8.1 and better, you can also use any encoding names and
94 aliases that Encode supports. For example:
95
96 $european = $j->iso_latin1; # replace '-' with '_' for names.
97
98 FYI: Encode::Encoder uses similar trick.
99
100 $j->fallback($fallback)
101 For perl is 5.8.1 or better, Jcode stores the internal string in
102 UTF-8. Any character that does not map to ->encoding are replaced
103 with a '?', which is Encode standard.
104
105 my $unistr = "\x{262f}"; # YIN YANG
106 my $j = jcode($unistr); # $j->euc is '?'
107
108 You can change this behavior by specifying fallback like Encode.
109 Values are the same as Encode. "Jcode::FB_PERLQQ",
110 "Jcode::FB_XMLCREF", "Jcode::FB_HTMLCREF" are aliased to those of
111 Encode for convenice.
112
113 print $j->fallback(Jcode::FB_PERLQQ)->euc; # '\x{262f}'
114 print $j->fallback(Jcode::FB_XMLCREF)->euc; # '☯'
115 print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '☯'
116
117 The global variable $Jcode::FALLBACK stores the default fallback so
118 you can override that by assigning the value.
119
120 $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
121
122 [@lines =] $jcode->jfold([$width, $newline_str, $kref])
123 folds lines in jcode string every $width (default: 72) where $width
124 is the number of "halfwidth" character. Fullwidth Characters are
125 counted as two.
126
127 with a newline string spefied by $newline_str (default: "\n").
128
129 Rudimentary kinsoku suppport is now available for Perl 5.8.1 and
130 better.
131
132 $length = $jcode->jlength();
133 returns character length properly, rather than byte length.
134
135 Methods that use MIME::Base64
136 To use methods below, you need MIME::Base64. To install, simply
137
138 perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
139
140 If your perl is 5.6 or better, there is no need since MIME::Base64 is
141 bundled.
142
143 $mime_header = $j->mime_encode([$lf, $bpl])
144 Converts $str to MIME-Header documented in RFC1522. When $lf is
145 specified, it uses $lf to fold line (default: \n). When $bpl is
146 specified, it uses $bpl for the number of bytes (default: 76; this
147 number must be smaller than 76).
148
149 For Perl 5.8.1 or better, you can also encode MIME Header as:
150
151 $mime_header = $j->MIME_Header;
152
153 In which case the resulting $mime_header is MIME-B-encoded UTF-8
154 whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.
155 Most modern MUAs support both.
156
157 $j->mime_decode;
158 Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you
159 can also do the same as:
160
161 Jcode->new($str, 'MIME-Header')
162
163 Hankaku vs. Zenkaku
164 $j->h2z([$keep_dakuten])
165 Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When
166 $keep_dakuten is set, it leaves dakuten as is (That is, "ka +
167 dakuten" is left as is instead of being converted to "ga")
168
169 You can retrieve the number of matches via $j->nmatch;
170
171 $j->z2h
172 Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
173
174 You can retrieve the number of matches via $j->nmatch;
175
176 Regexp emulators
177 To use "->m()" and "->s()", you need perl 5.8.1 or better.
178
179 $j->tr($from, $to, $opt);
180 Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-
181 JP strings. On perl 5.8.1 or better, $from and $to can also be
182 flagged UTF-8 strings.
183
184 If $opt is set, "tr/$from/$to/$opt" is applied. $opt must be 'c',
185 'd' or the combination thereof.
186
187 You can retrieve the number of matches via $j->nmatch;
188
189 The following methods are available only for perl 5.8.1 or better.
190
191 $j->s($patter, $replace, $opt);
192 Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in
193 EUC-JP or flagged UTF-8. $opt are the same as regexp options. See
194 perlre for regexp options.
195
196 Like "$j->tr()", "$j->s()" returns the object itself so you can nest
197 the operation as follows;
198
199 $j->tr("a-z", "A-Z")->s("foo", "bar");
200
201 [@match = ] $j->m($pattern, $opt);
202 Applies "m/$patter/$opt". Note that this method DOES NOT RETURN AN
203 OBJECT so you can't chain the method like "$j->s()".
204
205 Instance Variables
206 If you need to access instance variables of Jcode object, use access
207 methods below instead of directly accessing them (That's what OOP is
208 all about)
209
210 FYI, Jcode uses a ref to array instead of ref to hash (common way) to
211 optimize speed (Actually you don't have to know as long as you use
212 access methods instead; Once again, that's OOP)
213
214 $j->r_str
215 Reference to the EUC-coded String.
216
217 $j->icode
218 Input charcode in recent operation.
219
220 $j->nmatch
221 Number of matches (Used in $j->tr, etc.)
222
224 ($code, [$nmatch]) = getcode($str)
225 Returns char code of $str. Return codes are as follows
226
227 ascii Ascii (Contains no Japanese Code)
228 binary Binary (Not Text File)
229 euc EUC-JP
230 sjis SHIFT_JIS
231 jis JIS (ISO-2022-JP)
232 ucs2 UCS2 (Raw Unicode)
233 utf8 UTF8
234
235 When array context is used instead of scaler, it also returns how
236 many character codes are found. As mentioned above, $str can be
237 \$str instead.
238
239 jcode.pl Users: This function is 100% upper-conpatible with
240 jcode::getcode() -- well, almost;
241
242 * When its return value is an array, the order is the opposite;
243 jcode::getcode() returns $nmatch first.
244
245 * jcode::getcode() returns 'undef' when the number of EUC characters
246 is equal to that of SJIS. Jcode::getcode() returns EUC. for
247 Jcode.pm there is no in-betweens.
248
249 Jcode::convert($str, [$ocode, $icode, $opt])
250 Converts $str to char code specified by $ocode. When $icode is
251 specified also, it assumes $icode for input string instead of the one
252 checked by getcode(). As mentioned above, $str can be \$str instead.
253
254 jcode.pl Users: This function is 100% upper-conpatible with
255 jcode::convert() !
256
258 For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning
259 Jcode is subject to bugs therein.
260
262 This package owes a lot in motivation, design, and code, to the
263 jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
264
265 Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from
266 the very first stage of development.
267
268 JEncode by makamaka@donzoko.net has inspired me to integrate Encode to
269 Jcode. He has also contributed Japanese POD.
270
271 And folks at Jcode Mailing list <jcode5@ring.gr.jp>. Without them, I
272 couldn't have coded this far.
273
275 Encode
276
277 Jcode::Nihongo
278
279 <http://www.iana.org/assignments/character-sets>
280
282 Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
283
284 This library is free software; you can redistribute it and/or modify it
285 under the same terms as Perl itself.
286
287
288
289perl v5.32.0 2020-07-28 Jcode(3)