1NKF(3) User Contributed Perl Documentation NKF(3)
2
3
4
6 NKF - Perl extension for Network Kanji Filter
7
9 use NKF;
10 $output = nkf("-s",$input);
11
13 This is a Perl Extension version of nkf (Network Kanji Filter). It
14 converts the last argument and return converted result. Conversion
15 details are specified by flags before the last argument. Nkf is a yet
16 another kanji code converter among networks, hosts and terminals. It
17 converts input kanji code to designated kanji code such as ISO-2022-JP,
18 Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.
19
20 One of the most unique faculty of nkf is the guess of the input kanji
21 encodings. It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP,
22 UTF-8, UTF-16 and UTF-32. So users needn't set the input kanji code
23 explicitly.
24
25 By default, X0201 kana is converted into X0208 kana. For X0201 kana,
26 SO/SI, SSO and ESC-(-I methods are supported. For automatic code
27 detection, nkf assumes no X0201 kana in Shift_JIS. To accept X0201 in
28 Shift_JIS, use -X, -x or -S.
29
30 multiple options are specifed as seprate strings, such as
31
32 print nkf('--ic=UTF8-MAC', '-w', $string), "\n";
33
34 except the last arguments.
35
37 -J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32
38 Specify input and output encodings. Upper case is input. cf. --ic
39 and --oc.
40
41 -J ISO-2022-JP (JIS code).
42
43 -S Shift_JIS and JIS X 0201 kana. EUC-JP is recognized as X0201
44 kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth
45 kana) is converted into JIS X 0208. If you use Windows, see
46 Windows-31J (CP932).
47
48 -E EUC-JP.
49
50 -W UTF-8N.
51
52 -W16[BL][0]
53 UTF-16. B or L gives whether Big Endian or Little Endian. 0
54 gives whther put BOM or not.
55
56 -W32[BL][0]
57 UTF-32. B or L gives whether Big Endian or Little Endian. 0
58 gives whther put BOM or not.
59
60 -b -u
61 Output is buffered (DEFAULT), Output is unbuffered.
62
63 -t No conversion.
64
65 -i[@B]
66 Specify the escape sequence for JIS X 0208.
67
68 -i@ Use ESC ( @. (JIS X 0208-1978)
69
70 -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)
71
72 -o[BJ]
73 Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT
74 B)
75
76 -r {de/en}crypt ROT13/47
77
78 -h[123] --hiragana --katakana --katakana-hiragana
79 -h1 --hiragana
80 Katakana to Hiragana conversion.
81
82 -h2 --katakana
83 Hiragana to Katakana conversion.
84
85 -h3 --katakana-hiragana
86 Katakana to Hiragana and Hiragana to Katakana conversion.
87
88 -T Text mode output (MS-DOS)
89
90 -f[m [- n]]
91 Folding on m length with n margin in a line. Without this option,
92 fold length is 60 and fold margin is 10.
93
94 -F New line preserving line folding.
95
96 -Z[0-3]
97 Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.
98
99 -Z -Z0
100 Convert X0208 alphabet to ASCII.
101
102 -Z1 Convert X0208 kankaku to single ASCII space.
103
104 -Z2 Convert X0208 kankaku to double ASCII spaces.
105
106 -Z3 Replacing fullwidth >, <, ", & into '>', '<', '"',
107 '&' as in HTML.
108
109 -X -x
110 With -X or without this option, X0201 is converted into X0208 Kana.
111 With -x, try to preserve X0208 kana and do not convert X0201 kana
112 to X0208. In JIS output, ESC-(-I is used. In EUC output, SS2 is
113 used.
114
115 -B[0-2]
116 Assume broken JIS-Kanji input, which lost ESC. Useful when your
117 site is using old B-News Nihongo patch.
118
119 -B1 allows any chars after ESC-( or ESC-$.
120
121 -B2 force ASCII after NL.
122
123 -I Replacing non iso-2022-jp char into a geta character (substitute
124 character in Japanese).
125
126 -m[BQN0]
127 MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1
128 (Latin-1) -l is necessary.
129
130 -mB Decode MIME base64 encoded stream. Remove header or other part
131 before conversion.
132
133 -mQ Decode MIME quoted stream. '_' in quoted stream is converted to
134 space.
135
136 -mN Non-strict decoding. It allows line break in the middle of the
137 base64 encoding.
138
139 -m0 No MIME decode.
140
141 -M MIME encode. Header style. All ASCII code and control characters
142 are intact.
143
144 -MB MIME encode Base64 stream. Kanji conversion is performed
145 before encoding, so this cannot be used as a picture encoder.
146
147 -MQ Perform quoted encoding.
148
149 -l Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP. -s,
150 -e and -x are not compatible with this option.
151
152 -L[uwm] -d -c
153 Convert line breaks.
154
155 -Lu -d
156 unix (LF)
157
158 -Lw -c
159 windows (CRLF)
160
161 -Lm mac (CR)
162
163 Without this option, nkf doesn't convert line breaks.
164
165 --fj --unix --mac --msdos --windows
166 Convert for these systems.
167
168 --jis --euc --sjis --mime --base64
169 Convert to named code.
170
171 --jis-input --euc-input --sjis-input --mime-input --base64-input
172 Assume input system
173
174 --ic=input codeset --oc=output codeset
175 Set the input or output codeset. NKF supports following codesets
176 and those codeset names are case insensitive.
177
178 ISO-2022-JP
179 a.k.a. RFC1468, 7bit JIS, JUNET
180
181 EUC-JP (eucJP-nkf)
182 a.k.a. AT&T JIS, Japanese EUC, UJIS
183
184 eucJP-ascii
185 eucJP-ms
186 CP51932
187 Microsoft Version of EUC-JP.
188
189 Shift_JIS
190 a.k.a. SJIS, MS_Kanji
191
192 Windows-31J
193 a.k.a. CP932
194
195 UTF-8
196 same as UTF-8N
197
198 UTF-8N
199 UTF-8 without BOM
200
201 UTF-8-BOM
202 UTF-8 with BOM
203
204 UTF8-MAC (input only)
205 decomposed UTF-8
206
207 UTF-16
208 same as UTF-16BE
209
210 UTF-16BE
211 UTF-16 Big Endian without BOM
212
213 UTF-16BE-BOM
214 UTF-16 Big Endian with BOM
215
216 UTF-16LE
217 UTF-16 Little Endian without BOM
218
219 UTF-16LE-BOM
220 UTF-16 Little Endian with BOM
221
222 UTF-32
223 same as UTF-32BE
224
225 UTF-32BE
226 UTF-32 Big Endian without BOM
227
228 UTF-32BE-BOM
229 UTF-32 Big Endian with BOM
230
231 UTF-32LE
232 UTF-32 Little Endian without BOM
233
234 UTF-32LE-BOM
235 UTF-32 Little Endian with BOM
236
237 --fb-{skip, html, xml, perl, java, subchar}
238 Specify the way that nkf handles unassigned characters. Without
239 this option, --fb-skip is assumed.
240
241 --prefix=escape charactertarget character..
242 When nkf converts to Shift_JIS, nkf adds a specified escape
243 character to specified 2nd byte of Shift_JIS characters. 1st byte
244 of argument is the escape character and following bytes are target
245 characters.
246
247 --no-cp932ext
248 Handle the characters extended in CP932 as unassigned characters.
249
250 --no-best-fit-chars
251 When Unicode to Encoded byte conversion, don't convert characters
252 which is not round trip safe. When Unicode to Unicode conversion,
253 with this and -x option, nkf can be used as UTF converter. (In
254 other words, without this and -x option, nkf doesn't save some
255 characters)
256
257 When nkf converts strings that related to path, you should use this
258 opion.
259
260 --cap-input
261 Decode hex encoded characters.
262
263 --url-input
264 Unescape percent escaped characters.
265
266 --numchar-input
267 Decode character reference, such as "&#....;".
268
269 -- Ignore rest of -option.
270
272 Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).
273
274 Copyright (c) 1996-2015, The nkf Project.
275
277 perl(1). nkf(1)
278
279
280
281perl v5.32.1 2021-01-26 NKF(3)