1SKF(1) General Commands Manual SKF(1)
2
3
4
6 skf - simple Kanji Filter (v2.1)
7
9 skf [-EIJKNQRSXZbehjknqrsuvxz] [ long_format_options ] [infiles..]
10
12 skf is a yet another i18n capable kanji-filter, designed for reading
13 various CJK-coded files on the Net. skf converts input kanji texts or
14 streams into a character stream using designated codeset and output
15 them to standard output. Specifically, skf is designed to be a versa‐
16 tile filter to read documents in various code sets, and does not pro‐
17 vide features not related to code conversion.
18
19 Like nkf, skf automatically recognizes an input file code when it is a
20 kind of ISO-2022 compliant code, and also detects EUC-variant codes if
21 input file is Japanese text without X 0201 kanas. skf 2.1 can read
22 various iso-2022 compliant character sets, including JIS Kanji codes (X
23 0208, X 0212 and X 0213), EUC encoding (euc-jp (with X 0213 support),
24 euc-cn, euc-kr and euc-tw), ISO Europian latins (ISO-8859-1 to 11,
25 13/14/15/16) and many regional character sets. skf can also read some
26 non-iso2022 compliant sets, including Microsoft Shift-JIS code,
27 KOI-8-R/U, GB2312 (HZ), big5, VISCII(rfc1456, include VIQR), Unicode
28 standard (UCS2/UTF-16, UTF7 and UTF8), some of MS codesets (cp1250
29 etc.) and some other vendor specific codes (KEIS83, JEF etc).
30
31 Supported output character sets of skf are more limited, but still in‐
32 clude X 0208/X 0212/X 0213 JIS, X 0201 JIS, ASCII, Microsoft Shift-JIS,
33 EUC-jp/-kr/-cn, HZ, iso-2022-jp/kr, big5, VISCII and Unicode.
34
35 skf also provides some basic decoding features for some common encod‐
36 ings including MIME, Punycode and URI codepoint. Unicode decomposition
37 feature is also supported since 1.96.
38
39 As noted above, skf is designed to convert input text into some kind of
40 human-readable forms under a local environment (i.e. codeset), and has
41 several extra conversion features like GNU recode type folding. Such
42 conversions include Windows/Macintosh specific code swaps and old-new
43 jis glyph changes, html-format/TeX format conversion and variant unifi‐
44 cations.
45
46 skf also can be compiled as an extension of some lightweight languages.
47 See README.txt for details.
48
49 If one or more file names are given, skf read the files and output con‐
50 verted stream to stdout. If no file names are given, input is taken
51 from stdin and output is also stdout. OPTIONS are taken from environ‐
52 ment variables SKFENV, skfenv and command line, respectively in this
53 order. Environment variables are not used when skf is running as a
54 priviledged user. skf does not use LOCALE-related environment vari‐
55 ables for conversions, but output error messages are controlled by
56 given LOCALES.
57
59 skf is written from scratch, and inherits no code from nkf. However,
60 skf is intended to be a drop-in replacement for nkf(v1.4) and has a
61 similar commonly-used nkf option set.
62 skf 2.1 recognizes following options. Defaults are all off if not ex‐
63 plicitly specified.
64
65 buffering control
66 -b use buffered output. This is default.
67
68 -u use unbuffered output. Code detection feature is disabled when
69 this option is on.
70
71 Input/Output codeset options
72 --ic= input_code_set
73 specify input codeset is input_code_set. Possible candidates
74 are shown below.
75
76 --oc= output_code_set
77 specify output codeset is output_code_set. Possible candidates
78 are shown below. Default codeset in distribution package is euc-
79 jp, but depends on compile option. Default codeset is shown by
80 ´skf -h´.
81
82 Supported codeset
83 skf recognizes following codesets as an input/output codeset. These
84 codeset names are case insensitive, and minus ('-') and underscore
85 ('_') is ignored. Note that iso-2022 escape-based input codeset (reg‐
86 istered to IANA) is recoginized automatically, even when non-iso2022
87 codeset (except Unicode and B-Right/V) is specified. o in in-column
88 means named codeset can be specified as input and x means named codeset
89 is not for input. output-column is same except it is for output.
90
91 in out name description
92 o o iso8859-1 ascii + iso-8859-1 (latin-1)
93 o o iso8859-2 ascii + iso-8859-2 (latin-2)
94 o o iso8859-3 ascii + iso-8859-3 (latin-3)
95 o o iso8859-4 ascii + iso-8859-4 (latin-4)
96 o o iso8859-5 ascii + iso-8859-5 (Cyrillic)
97 o o iso8859-6 ascii + iso-8859-6 (Arabic)
98 o o iso8859-7 ascii + iso-8859-7 (Greek)
99 o o iso8859-8 ascii + iso-8859-8 (Hebrew)
100 o o iso8859-9 ascii + iso-8859-9 (latin-5)
101 o o iso8859-10 ascii + iso-8859-10 (latin-6)
102 o o iso8859-11 ascii + iso-8859-11 (Thai)
103 o o iso8859-13 ascii + iso-8859-13 (Baltic Rim)
104 o o iso8859-14 ascii + iso-8859-14 (Celtic)
105 o o iso8859-15 ascii + iso-8859-15 (Latin-9)
106 o o iso8859-16 ascii + iso-8859-16
107 o o koi-8r koi-8r (Russian)
108 o o koi-8u koi-8r (Ukraina)
109 o o cp1251 Cyrillic latin MS cp1251
110 o o jis iso-2022-jp (rfc1496 7bit JIS)
111 o o iso-2022-jp-x0213 iso-2022-jp-3 (JIS X 0213:2000)
112 a.k.a. jis-x0213
113 o o jis-x0213-strict iso-2022-jp-3-strict
114 o o iso-2022-jp-2004 iso-2022-jp-2004(JIS X 0213:2004)
115 a.k.a. jis-x0213-2004
116 o o oldjis iso-2022-jp-1978(JIS X 0208:1978)
117 o o cp50220 Microsoft codepage 50220
118 o o cp50221 Microsoft codepage 50221
119 o o cp50222 Microsoft codepage 50222
120 o o euc-jp EUC-encoded JIS X 0208:1997
121 o o euc-x0213 EUC-encoded JIS X 0213:2000
122 o o euc-jis-2004 EUC-encoded JIS X 0213:2004
123 o o cp51932 EUC-encoded Microsoft codepage 932
124 o o euc-kr EUC-encoded KS X 1001 Korian
125 o o euc7-kr 7bit EUC-encoded KS X 1001 Korian
126 o o uhc Unified hangle (Windows cp949)
127 o o johab KS X 1001-johab Korian
128 o o euc-cn EUC-encoded GB2312 Chinese
129 o o euc7-cn 7bit EUC-encoded GB2312 Chinese
130 o o hz HZ-encoded GB2312 Chinese
131 o o euc-tw EUC-encoded CNS 11643 Chinese
132 o o gb12345 EUC-encoded GB12345 Chinese
133 o o gbk GB2312 Extension(cp936) Chinese
134 o o gb18030 GB18030 chinese
135 o o big5 BIG5 (with Eten extension + EURO)
136 o o cp950 BIG5 (Microsoft cp950 + EURO)
137 o o big5-hkscs BIG5 with HKSCS
138 o o big5-2003 BIG5-2003
139 o o big5-uao BIG5-Unicode at On
140 o o sjis Shift-jis (Microsoft cp943)
141 o o shiftjis-x0213 Shiftjis-encoded JIS X 0213:2000
142 o o shiftjis-2004 Shiftjis-encoded JIS X 0213:2004
143 o o sjis-docomo Shiftjis-encoded with NTT Docomo emoticons.
144 o o sjis-au Shiftjis-encoded with AU emoticons.
145 o o sjis-softbank Shiftjis-encoded with SoftBank emoticons.
146 o o oldsjis Shift-jis (JIS X 0208:1978)
147 o o cp932 Shift-jis-encoded MS cp932
148 o o cp932w Shift-jis-encoded MS cp932 with
149 MS compatibility
150 o o viscii VISCII (rfc1456) Vietnamise
151 o o viqr VISCII (rfc1456-VIQR) Vietnamise
152 o o keis Hitachi KEIS83/90
153 o x jef Fujitsu JEF (basic support only)
154 o x ibm930 IBM EBCDIC DBCS Japanese
155 o x ibm931 IBM EBCDIC DBCS Japanese w.latin
156 o x ibm933 IBM EBCDIC DBCS Korian
157 o x ibm935 IBM EBCDIC DBCS Simpl. Chinese
158 o x ibm937 IBM EBCDIC DBCS Trad. Chinese
159 o o unicode Unicode(TM) UTF-16LE
160 o o unicodefffe Unicode(TM) UTF-16BE
161 o o utf7 Unicode(TM) UTF-7
162 o o utf8 Unicode(TM) UTF-8
163 o o utf8-bom Unicode(TM) UTF-8 with BOM
164 o o utf7-imap IMAP modified Unicode(TM) UTF-7 (RFC2060)
165 o o mutf8 Java modified Unicode(TM) UTF-8
166 o o cesu8 CESU-8 (Unicode Technical Report #26)
167 x o nyukan-utf-8 nyukan-utf-16 Nyukan-moji(Japanese nyukoku-kan‐
168 rikyoku gaiji). Encoding is utf-8 and utf-16 respectively.
169 o x arib-b24 ARIB B24 8-bit JIS-based
170 o x arib-b24-sj ARIB B24 8-bit SJIS-based
171 x o transparent Transparent mode (see below)
172 o x x-iscii-de India ISCII-91(IS13194:1991)
173 o x asmiscii-8 Armenian ARMISCII 8
174 o x geostd8 Geogian Geostd 8
175 o x mik Burgarian MIK
176 o x tscii Tamil TSCII 1.7
177 o o locale codeset specified in locale. See below.
178
179
180 Codeset explanations
181 iso-8859-*
182 When specified as output, G0 = GL is ascii and G1 = GR is
183 iso-8859-*. 8bit encoding is used.
184
185 iso-2022-jp, jis
186 Encoding is iso-2022-jp-2 (RFC1496). G0 = GL is JIS X 0201 ro‐
187 man, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and G3 is JIS
188 X 0212:1990 Supplementary Kanji.
189
190 jis-x0213, iso-2022-jp-3
191 Encoding is iso-2022-jp-3 (JIS X 0213:2000 based). G0 = GL is
192 JIS X 0201 roman, For output, G1 = GR is JIS X 0201 kana, G2 is
193 iso-8859-1 and G3 is JIS X 0213 plane2 Kanji.
194
195 jis-x0213-strict
196 Encoding is subset of iso-2022-jp-3-strict (uses Plane 1 only).
197 For output, G0 = GL is JIS X 0201 roman, G1 = GR is JIS X 0201
198 kana, G2 is iso-8859-1 and G3 is not set. Output code using JIS
199 X 0208 whenever possible. JIS X 0213 input is automatically rec‐
200 ognized.
201
202 jis-x0213-2004, iso-2022-jp-2004
203 Encoding is iso-2022-jp-2003:2004. For output, G0 = GL is JIS X
204 0201 roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and G3
205 is JIS X 0213 plane2 Kanji.
206
207 oldjis
208 Encoding is iso-2022-jp using old JIS X 0208:1978). G0 = GL is
209 JIS X 0201 roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1
210 and G3 is JIS X 0212 Supplementary Kanji.
211
212 euc-jp, euc
213 Encoding is 8-bit EUC using JIS X 0208:1997 character set. G0 =
214 GL is ascii, G1 = GR is JIS X 0208, G2 is JIS X 0201 kana and G3
215 is JIS X 0212 Supplementary Kanji.
216
217 euc-x0213, euc-jis-2003
218 Encoding is 8-bit EUC-based JIS X 0213:2000. G0 = GL is ascii,
219 G1 = GR is X 0213:2000 plane 1, G2 is iso-8859-1 and G3 is JIS X
220 0213:2000 plane2 Kanji.
221
222 euc-jis-2004
223 Encoding is 8-bit EUC-based JIS X0213:2004. G0 = GL is ascii,
224 G1 = GR is X0213:2004 plane 1, G2 is iso-8859-1 and G3 is JIS
225 x0213:2004 plane2 Kanji.
226
227 euc-kr
228 Encoding is 8-bit EUC using KS X 1001 Wansung character set. G0
229 = GR is KS X1003, G1 = GR is KS X1001, G2 and G3 is not set.
230
231 euc7-kr iso-2022-kr
232 Encoding is iso-2022-kr (rfc1557): 7-bit EUC using KS X 1001
233 Wansung character set. G0 = GR is KS X1003, G1 is KS X1001, G2
234 and G3 is not set.
235
236 euc-cn
237 Encoding is 8-bit EUC using GB 2312 simplified chinese character
238 set. G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3 is not set.
239
240 euc7-cn
241 Encoding is 7-bit EUC using GB 2312 simplified chinese character
242 set. G0 = GR is ASCII, G1 is GB2312, G2 and G3 is not set.
243
244 hz
245 Encoding is HZ encoded (rfc1842) GB 2312 simplified chinese
246 character set. G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3
247 is not set.
248
249 euc-tw
250 Encoding is EUC encoded CNS11643 Plane1/2 traditional chinese
251 character set. Subset of iso-2022-cn. G0 = GR is ASCII, G1 = GR
252 is CNS11643 plane 1, G2 is CNS11643 plane 2 and G3 is not set.
253
254 gb12345
255 Encoding is 8-bit EUC using GB 12345 (GBF) traditional chinese
256 character set. G0 = GR is ASCII, G1 = GR is GB12345, G2 and G3
257 is not set.
258
259 gbk, cp936
260 Encoding is GBK simplified chinese character set. G0 = GR is
261 ASCII and G1 = GR is GBK. G2 and G3 is not set.
262
263 gb18030 (experimental)
264 Encoding is GB18030 (ibm-1392, Windows cp54936) chinese charac‐
265 ter set. Uses ASCII as latin part.
266
267 big5
268 Encoding is Big5 traditional chinese character set with ETen ex‐
269 tension. Include Euro mapping. Uses ASCII as latin part.
270
271 cp950
272 Encoding is Microsoft cp950-Big5 traditional chinese character
273 set. Uses ASCII as latin part.
274
275 big5-hkscs (experimental)
276 Encoding is cp950-Big5 traditional chinese character set with
277 HKSCS extension. Uses ASCII as latin part.
278
279 big5-2003 (experimental)
280 Encoding is Big5-2003 Taiwanese standard traditional chinese
281 character set. Uses ASCII as latin part.
282
283 big5-uao (experimental)
284 Encoding is Big5-UAO (http://uao.cpatch.org) traditional chinese
285 character set. Uses ASCII as latin part.
286
287 VISCII (experimental)
288 Vietnamise VISCII (rfc1456) character set. Not TCVN-5712.
289
290 VIQR (experimental)
291 Vietnamise VISCII character set with VIQR encoding(rfc1456).
292
293 sjis
294 Encoding is Shift-encoded JIS X 0208:1997 character set. Note
295 that this is not cp932. Uses JIS X 0201 latin as latin(GL) part.
296
297 sjis-x0213, shift_jis-2000
298 Encoding is Shift-encoded JIS using JIS X 0213:2000 character
299 set.
300
301 sjis-x0213-2004, shift_jis-2004
302 Encoding is Shift-encoded JIS using JIS X 0213:2004 character
303 set. 10 newly defined character added, but Unicode mapping is
304 same as JIS X 0213:2000. Uses JIS X 0201 latin as latin(GL)
305 part.
306
307 sjis-cellular (experimental)
308 Encoding is Shift-encoded JIS X 0208:1997 character set with NTT
309 Docomo/Vodafone(SoftBank) cellular phone glyph mapping. Output
310 is not supported.
311
312 cp932 cp932w
313 Encoding is Microsoft SJIS cp932 with NEC/IBM gaiji area, based
314 on Windows XP mapping. Uses ASCII as latin(GL) part. --use-com‐
315 pat and --use-ms-compat is automatically enabled. cp932w pro‐
316 vides further WideCharToMultiByte compatibility.
317
318 cp51932
319 Encoding is Microsoft EUC-based cp51932 with NEC/IBM gaiji area,
320 based on Windows XP mapping. Uses ASCII as G0 and JIS X 0201
321 kana as EUC G2 part. G3 is not used for output, and JIS X
322 0212:2000 as input. --use-compat and --use-ms-compat is auto‐
323 matically enabled.
324
325 cp50220, cp50221, cp50222
326 Encoding is Microsoft JIS-based cp50220, cp50221, cp50222 with
327 NEC/IBM gaiji area, based on Windows XP mapping. For input, skf
328 accepts cp50220, 50221 and 50222. Note that this codeset is NOT
329 compatible with iso-2022. Uses ASCII as default character set.
330 --use-compat and --use-ms-compat is automatically enabled.
331
332 oldsjis
333 Encoding is Microsoft SJIS (JIS X 0208:1978 a.k.a. old JIS).
334 Uses JIS X 0201 latin as latin(GL) part.
335
336 johab
337 Encoding is KS X1001(Johab) character set. Uses KS X1003 latin
338 as latin(GL) part.
339
340 uhc
341 Encoding is UHC (cp949) character set. Uses ASCII as latin(GL)
342 part.
343
344 unicode, unicodefffe, utf16, utf16le
345 Encoding is Unicode UTF-16 (v15.0). Input/Output default byte-
346 endian is little for unicode and big for unicodefffe, and input
347 byte order mark is recognized. utf16 and unicodefffe is big-en‐
348 dian. utf16le and unicode is little endian. Output includes en‐
349 dian mark by default unless --disable-endian-mark is specified.
350 Output range is within UTF-32 with surrogate pair unless
351 --limit-to-ucs2 is specified.
352 Note that ucs2 is not supported within lightweight language ex‐
353 tension in both in and output, because of SWIG's passing data
354 structure limitation. Specify to ucs2 will generate error.
355
356 utf8
357 Encoding is UTF-8 encoded Unicode (v15.0). Output doesn't in‐
358 clude byte order mark unless --enable-endian-mark is specified.
359 Output range is within UTF-32 unless --limit-to-ucs2 is speci‐
360 fied. By default, CESU-8 is not accepted as input. Option --en‐
361 able-cesu8 enables CESU-8 input for utf-8 converter. CESU-8 out‐
362 put is not supported. For UTF-8, endian mark (BOM) is always
363 ignored.
364
365 utf7
366 Encoding is UTF-7 encoded Unicode (v15.0). Input/output range is
367 limited to UTF-16, and value above U+10000 is regarded as unde‐
368 fined. BOM is always ignored for input, and never used for out‐
369 put.
370
371 utf7-imap
372 Modified utf-7 for IMAP protocol described in RFC2060. BOM is
373 always ignored for input, and never used for output.
374
375 mutf8
376 Modified utf-8 for Java language. CESU-8 plus U-0000 encoding.
377 BOM is always ignored for input, and never used for output.
378
379 cesu-8
380 Modified utf-8 described in unicode technical report #26. BOM
381 is always ignored for input, and never used for output.
382
383 keis (experimental)
384 Encoding is Hitachi KEIS83/90. Output range is limited to EBCDIK
385 and JIS X 0208 area.
386
387 jef (experimental)
388 Encoding is Fujitsu JEF. Input only. Only basic part is sup‐
389 ported.
390
391 ibm930 (experimental)
392 Encoding is IBM DBCS Japanese with EBCDIC Kana
393
394 ibm931 (experimental)
395 Encoding is IBM DBCS Japanese with EBCDIC latin (ibm037)
396
397 ibm933 (experimental)
398 Encoding is IBM DBCS Korian with EBCDIC Wansung character set
399
400 ibm935 (experimental)
401 Encoding is IBM DBCS Simplified Chinese with EBCDIC Chinese
402
403 ibm937 (experimental)
404 Encoding is IBM DBCS Traditional Chinese with EBCDIC Chinese
405
406 koi8r
407 Russian KOI-8R code.
408
409 cp1250
410 Central Europian latin Microsoft cp1250 code
411
412 cp1251
413 Eastern Europian cyrillic Microsoft cp1251 code
414
415 arib-b24 arib-b24-sj
416 ARIB B24 code defined in ATIB-STD-B24 vol.1 part.2 chapt. 7.3.
417 b24 is 8-bit jis based, and b24-sj is sjis based.
418
419 nyukan-utf-8 nyukan-utf-16
420 Normalized Unicode UTF-8/UTF-16 based on Japanese law ministry
421 kokuji No. 582.
422
423 locale
424 Use locale-specified codeset. Since locale only provides partial
425 information as codeset, whether this option works as expected or
426 not depends on environmental settings.
427
428 transparent
429 Transparent mode. Various code control features, include folding
430 and line end code conversion, is also ignored.
431
432
433 Shortcuts
434 -j same as --oc=jis
435
436 -s same as --oc=sjis
437
438 -e same as --oc=euc-jp
439
440 -q same as --oc=unicode
441
442 -z same as --oc=sjis
443
444 -E same as --ic=euc-jp. Assume input codeset is EUC-JP.
445
446 -J same as --ic=jis. Assume input codeset is iso-2022-jp.
447
448 -S same as --ic=sjis. Assume input codeset is shift JIS
449
450 -Q same as --ic=utf-16 --input-little-endian.
451
452 -Z same as --ic=utf8.
453
454
455 ISO-2022 Specific controls
456 Replaces G0-3 after setting up according to specified input codeset by
457 assigned character set with this option. Note that this doesn't change
458 any codeset properties of the original codeset, like language and en‐
459 coding.
460
461 --set-g0=`charset name'
462 Predefines specified code set to plane 0 (G0). Also set to GL at
463 initial state.
464
465 --set-g1=`charset name'
466 Predefines specified code set to right plane (G1). Also set to
467 GR at initial state.
468
469 --set-g2=`charset name'
470 Predefines specified code set to right plane (G2).
471
472 --set-g3=`charset name'
473 Predefines specified code set to right plane (G3).
474
475
476 Supported `char_set' is as follows. 'o' means the codeset can be speci‐
477 fied to set to the plane. 'x' means you can't. For unicode family code‐
478 sets, this option is ignored. For other non-iso2022 categories, this
479 option is not supported, and result is unpredictable.
480
481
482 g0 g1 g2 g3 codeset name description
483 o o o o ascii ANSI X3.4 ASCII
484 o o o o x0201 JIS X 0201 (latin part)
485 x o o o iso8859-1 ISO 8859-1 latin
486 x o o o iso8859-2 ISO 8859-2 latin
487 x o o o iso8859-3 ISO 8859-3 latin
488 x o o o iso8859-4 ISO 8859-4 latin
489 x o o o iso8859-5 ISO 8859-5 Cyrillic
490 x o o o iso8859-6 ISO 8859-6 Arabic
491 x o o o iso8859-7 ISO 8859-7 Greek-latin
492 x o o o iso8859-8 ISO 8859-8 Hebrew
493 x o o o iso8859-9 ISO 8859-9 latin
494 x o o o iso8859-10 ISO 8859-10 latin
495 x o o o iso8859-11 ISO 8859-11 Thai
496 x o o o iso8859-13 ISO 8859-13 latin
497 x o o o iso8859-14 ISO 8859-14 latin
498 x o o o iso8859-15 ISO 8859-15 latin
499 x o o o iso8859-16 ISO 8859-16 latin
500 x o o o tcvn5712 TCVN 5712 (Vietnamese)
501 x o o o ecma94 ECMA 94 Cyrillic (KOI-8e)
502 o o o o x0212 JIS X 0212:1990
503 o o o o x0208 JIS X 0208:1997
504 o o o o x0213 JIS X 0213 Plane 1:2000
505 o o o o x0213-2 JIS X 0213 Plane 2:2000
506 o o o o x0213n JIS X 0213 Plane 1:2004
507 o o o o gb2312 Simplified Chinese GB2312
508 o o o o gb1988 Chinese GB1988(latin)
509 o o o o gb12345 Traditional Chinese GB12345
510 o o o o ksx1003 Korian KS X 1003(latin)
511 o o o o ksx1001 Korian KS X 1001
512 x o o o koi8-r Cyrillic KOI-8R
513 x o o o koi8-u Ukrainean Cyrillic KOI-8U
514 o o o o cns11643-1 Traditional Chinese CNS11643-1
515 x o o o viscii-r RFC1496 VISCII (right plane)
516 o o o o viscii-l RFC1496 VISCII (left plane)
517 x o o o cp437 Microsoft cp437 (US latin)
518 x o o o cp737 Microsoft cp737
519 x o o o cp775 Microsoft cp775
520 x o o o cp850 Microsoft cp850
521 x o o o cp852 Microsoft cp852
522 x o o o cp855 Microsoft cp855
523 x o o o cp857 Microsoft cp857
524 x o o o cp860 Microsoft cp860
525 x o o o cp861 Microsoft cp861
526 x o o o cp862 Microsoft cp862
527 x o o o cp863 Microsoft cp863
528 x o o o cp864 Microsoft cp864
529 x o o o cp865 Microsoft cp865
530 x o o o cp866 Microsoft cp866
531 x o o o cp869 Microsoft cp869
532 x o o o cp874 Microsoft cp874
533 x o o o cp932 Microsoft cp932 (Japanese)
534 x o o o cp1250 Microsoft cp1250(Central Europe)
535 x o o o cp1251 Microsoft cp1251 (Cyrillic)
536 x o o o cp1252 Microsoft cp1252 (Latin-1)
537 x o o o cp1253 Microsoft cp1253 (Greek)
538 x o o o cp1254 Microsoft cp1254 (Turkish)
539 x o o o cp1255 Microsoft cp1255
540 x o o o cp1256 Microsoft cp1256
541 x o o o cp1257 Microsoft cp1257
542 x o o o cp1258 Microsoft cp1258
543
544 --euc-protect-g1
545 In EUC input mode, suppress sequences to set a charset to G1.
546 Such sequences are discarded.
547
548 --add-annon
549 Add announcer for JIS X 0208:1997 to X 0208 designate sequence.
550 This option works only with iso-2022-based output.
551
552 --input-detect-jis78
553 Distinguish JIS X 0208:1978 codeset and JIS X 0208:1997 codeset.
554 By default, these two charsets are regarded as X 0208:1997. This
555 option is valid only when input encoding is JIS (iso-2022-jp).
556
557
558 JIS X 0212(Supplement Kanji code) Support
559 --x0212-enable
560 skf by default does not output JIS X 0212 code in JIS/EUC mode.
561 This option enables use of JIS X 0212 part. Non-Japanese code,
562 Shift_JIS variants, Unicode or KEIS output ignore this option.
563 Note that this option is supported for backward compatibility.
564 It may not be supported in future versions.
565
566
567 Unicode coding specific control options
568 skf-2.10 is conformed on Unicode 11.0 specification.
569
570 --use-compat --suppress-compat
571 By --suppress-compat, skf substitutes characters in unicode com‐
572 patibility planes (U+F900 - U+FFFD) to appropriate characters in
573 non-compatibility planes. If this substitution is enabled, these
574 characters is converted to variants or undefined. By --use-com‐
575 pat, skf outputs character in this area as it is. Default is
576 --use-compat. Several codesets controls this as codeset feature
577 (i.e. Use compatibility planes). See codeset section.
578
579 --use-ms-compat
580 When output is Unicode, make Unicode map to be Microsoft windows
581 compatible). This only changes conversion for some symbols in
582 JIS-Kanji, and adding --use-compat option is recommended for
583 roundtrip conversion. If you need more strict compatibility, try
584 cp932w for input codeset.
585
586 --use-cde-compat
587 When output is Unicode, make translation CDE standard codeset
588 compatible.
589
590 --little-endian
591 When output is UTF-16le/be, use little endian byte-order.
592
593 --big-endian
594 When output is UTF-16le/be, use big endian byte-order.
595
596 --disable-endian-mark --enable-endian-mark
597 When output is UTF-16 or UTF-8, do not use/use byte order mark‐
598 ing. To make UTF-16N, use this option with --little-endian. By
599 default, BOM is enabled for UTF-16 and disabled for UTF-8.
600
601 --input-little-endian
602 When input is UTF-16le/be, assume input is little endian byte-
603 ordered.
604
605 --input-big-endian
606 When input is UTF-16le/be, assume input is big endian byte-or‐
607 dered.
608
609 --endian-protect
610 Do not use endian mark in input stream. Endian mark is just dis‐
611 carded. This is off by default.
612
613 --limit-to-ucs2
614 Do not use > 0x10000 area code in Unicode (i.e. limits code to
615 BMP area). This option doesn't limit internal code range in
616 skf. This is off by default.
617
618 --disable-cjk-extension
619 Treat CJK extension A/B areas as undefined. This is off (i.e.
620 these areas are enabled) by default.
621
622 --enable-cesu8
623 Enable CESU-8 input in utf-8 codeset. Ignored for any other
624 codesets.
625
626 --non-strict-utf8
627 Enable broken (decodable but not obeying specs.) utf-8 input. If
628 you need this option, proceeds with extra care.
629
630 --enable-nfd-decomposition --disable-nfd-decomposition
631 Enable/Disable Unicode Normalized decomposition. Default is dis‐
632 abled.
633
634 --enable-nfda-decomposition --disable-nfda-decomposition
635 Enable/Disable Apple-compatible Unicode Normalized decomposi‐
636 tion. Default is disabled.
637
638 --oldcell-to-emoticon
639 Convert old cell-phone gaiji area in Unicode PUA to emoticon.
640 Supported: NTT Docomo/AU emoticons. A reverse mapping is not
641 supported.
642
643 --fix-ms-radical-bug
644 mscvrt bug for Windows VISTA or later has an infamous bug which
645 convert some Kanji to Kanji radix. This option re-convert radix
646 area to appropriate Kanjis. This option is valid for Unicode
647 output.
648
649
650
651 Miscellanious codeset related options
652 --old-nec-compat
653 Enable old NEC kanji sequence (ESC-K,H). Needs compile option
654 --enable-oldnec at configuration.
655
656 --no-utf7
657 Assume input codeset is *NOT* UTF-7 encoded Unicode. This op‐
658 tion disables input utf7 testing.
659
660 --no-kana
661 Assume input codeset does *NOT* include JIS X 0201 kana.
662
663 --input-limit-to-jp
664 Tell detection mechanism that input is some kind of Japanese
665 codeset.
666
667
668 OUTPUT Conversions options
669 skf is intended to output stream to stdout, buf nkf-compatible file-en‐
670 coding change option is also provided.
671
672 --overwrite[=SUFFIX] --in-place[=SUFFIX]
673 converts encoding of file(s) specified as input. --overwrite
674 preserves file change date. If SUFFIX parameter is added, input
675 file is back-up'ed with a name appended this SUFFIX.
676
677 skf has various features to fix output files appropriate in local envi‐
678 ronment. Most of these are controlled by extended control switches de‐
679 scribed in this section.
680
681 --use-g0-ascii
682 set G0(=GL) for output encoding to ASCII, ignoring codeset des‐
683 ignation.
684
685 X-0201 Kana/latin conversions
686 skf by default converts X-0201 kanas to X-0208 kanas. To output X-0201
687 kana as it is, use one of following options. When output is designated
688 to EUC or SJIS, these three options enable X-0201 kana output by ways
689 provided by each encoding. When Unicode output is specified, (equiv.)
690 kana part output is controlled by --use-compat, not following switches.
691 Valid only when output codeset is NOT Unicode family.
692
693 --kana-jis7
694 use SI/SO locking shift sequence to designate X-0201 kana. This
695 switch is valid for jis, jis-x0213 and cp50220 (i.e. cp50221)
696 encoding. For other codesets, this option is ignored.
697
698 --kana-jis8
699 output X-0201 kana using 8-bit code right plane. This switch is
700 valid for jis and jis-x0213 encoding. For other codeset, this
701 option is ignored.
702
703 --kana-esci --kana-call
704 use ESC-(-I to designate X-0201 kana. This switch is valid for
705 jis, jis-x0213 and cp50220 (i.e. cp50222) encoding. For other
706 codeset, this option is ignored.
707
708 --kana-enable
709 If output is EUC-JP or cp51932, use X-0201 kana with G2. If
710 SJIS output, it is same as --kana-jis8. When JIS output, it is
711 same as --kana-call.
712
713 --use-iso8859-1
714 Enable iso-8859-1 output. Iso-8859-1 is invoked to G1 and set to
715 GR plane.
716
717
718 URI/TeX format conversion feature options
719 With Unicode(tm) family output codings, skf output non-ascii latin
720 character part as it is, but with other output codings, skf converts
721 these characters using following rules:
722
723 (1) If a code is defined in a specified output codeset, specified code
724 point is used for output.
725 (2) If one of following html convert modes are enabled (i.e. --con‐
726 vert-html --convert-sgml) and the code is defined in html/sgml codeset,
727 it is converted to entity-reference or codepoint reference.
728 (3) If tex convert mode enabled and the code is defined in tex expres‐
729 sion, it is converted to tex format.
730 (4) If the code is a kind of combined ligatures, it is shown by a set
731 of characters.
732 (5) A kind of replacement character is shown, with warning.
733
734 --convert-html --convert-sgml--convert-xml
735 Enable html convert mode. This mode is cleared by --reset. These
736 two options are synonyms, and are treated as same option.
737
738 --convert-html-decimal
739 Enable html code-point decimal convert mode. This mode is
740 cleared by --reset.
741
742 --convert-html-hexadecimal
743 Enable html code-point hexadecimal convert mode. This mode is
744 cleared by --reset.
745
746 --convert-tex
747 Enable TeX convert mode. This mode is cleared by --reset.
748
749 --convert-perl
750 Enable Perl5 literal convert mode. This mode is cleared by --re‐
751 set.
752
753 --convert-java
754 Enable Java literal convert mode. This mode is cleared by --re‐
755 set.
756
757 --convert-python
758 Enable Python literal convert mode. This mode is cleared by
759 --reset.
760
761 --use-replace-char
762 In Unicode, use unicode replacement chatacter (U+fffc) for unde‐
763 fined chatacter.
764
765
766 Extended Options
767 Encoding/Decoding control options
768 --decode=`encoding scheme'
769
770 --encode=`encoding scheme'
771 Specify an decoding/encoding scheme for input stream. Supported
772 encoding schemes for decoding are `hex', 'mime', 'mime_q',
773 'mime_b', 'uri', 'ace', 'hex_perc_encode', 'base64', 'qencode',
774 'rfc2231', `rot' and 'none'. Each option means CAP hex-code,
775 mime, mime Q-encoding, mime B-encoding, uri character reference,
776 ACE punycode, uri percent notation, base64, Q-encoding, rfc2231
777 and rot13/47 respectively. 'none' means no decode.
778 For encoding, 'hex', 'mime_b', 'mime_q', 'uri', 'ace', 'cap',
779 'hex_perc_encode', 'base64' and 'none' are supported. EBCDIC
780 related codesets and some already ascii-encoded codeset (e.g.
781 UTF-7) output with encoding is not supported.
782 Only one decode/encode option is valid, and if more than one op‐
783 tion is specified, the last one is used. When one of mime de‐
784 codings is specified, base text is assumed to be EUC encoding
785 unless specified otherwise. Except rot, which assumes input
786 stream is Shift_JIS, EUC or iso-2022-jp, these encodings assumes
787 input stream is ascii (as defined in RFC2045). Some encodings
788 may co-exist with encoding, but this is not guaranteed. Espe‐
789 cially, if input is UTF-16/UCS2 code, these encoding is ignored
790 in skf.
791
792 --mime-ms-compat
793 treat japanese generic codesets as Microsoft cp932 compatible.
794 More specifically, with this option skf treats iso-2022-jp as
795 cp50220, euc-jp as cp51932 and Shift_JIS as cp932w.
796
797 --mime-persistent
798 skf detects address-like strings and excludes them from mime en‐
799 coding. This option disables such behavior. Default in nkf-com‐
800 patible mode.
801
802 --mime-limit-aware
803 In address-like string detection, skf respects character count
804 limits for a line.
805
806
807 Shortcut
808 -m same as --decode=mime
809
810 -mB same as --decode=mime_b
811
812 -mQ same as --decode=qencode
813
814 -m0 same as --decode=none
815
816 -M same as --encode=mime_b
817
818 -MB same as --encode=base64
819
820 -MQ same as --encode=qencode
821
822 End of line control options
823 --lineend-thru
824 Output end-of-line code as it is. Also output ^Z code as it is.
825 This is default.
826
827 --lineend-cr --lineend-mac-Lm
828 Use CR as end-of-line code. Also delete ^Z code from input
829 stream.
830
831 --lineend-lf --lineend-unix-Lu
832 Use LF as end-of-line code. Also delete ^Z code from input
833 stream.
834
835 --lineend-crlf --lineend-windows-Lw
836 Use CR+LF as end-of-line code. Also delete ^Z code from input
837 stream. This option doesn't preserve original order of cr and
838 lf.
839
840 --input-cr
841 Assume input stream uses CR as end-of-line code.
842
843 --input-lf
844 Assume input stream uses LF as end-of-line code.
845
846 --input-crlf
847 Assume input stream uses CR+LF as end-of-line code.
848
849 -F[line_length[-kinsoku]]
850
851 -f[line_length[-kinsoku]] -f[line_length[+kinsoku]]
852 Wrap input lines by line_length columns. f option deletes
853 CR/LF's in input, and F option doesn't delete them. For Japanese
854 convension, both gyoutou-kinsoku(by burasage-gumi) and gy‐
855 oumatsu-kinsoku(by oidasi-gumi) is supported. The burasage-
856 length is controlled by kinsoku option. Default value for
857 line_length is 66, and must be < 1000. Default value for kinsoku
858 is 5, and must be <= 10. In 'f' option, skf autodetects para‐
859 graph and retains some CR/LF. 2nd 'f' option format (with '+')
860 disables this behaviour. In nkf compatible mode, some fold be‐
861 haviors change as follows.
862 (1) Default line_length is set to 60, and kinsoku value is 10.
863 (2) alpha numeric characters become gyoutou-kinsoku characters.
864
865 File control options
866 --filewise-detect --force-reset
867 Reset and re-detect input code set at the start of each file.
868
869 --linewise-detect
870 Reset and re-detect input code set at the start of each line.
871
872
873 Compatibility options
874 --nkf-compat
875 interpret following options as nkf compatible manners. -l, -d,
876 -c, -x, -X, -w and -W works as nkf2.x -f and -F behavior is
877 changed as shown above. -T, -i, -o is not supported. Most of
878 other nkf options and switches also work like nkf, except in
879 case of error.
880
881 --skf-compat
882 interpret following options as skf-native manners.
883
884 -r nkf-compatible rot. Works only with --nkf-compat mode. Allowed
885 input encodings are limited to JIS/Shift_JIS/EUC.
886
887 -h[123]--hiragana--katakana--katakana-hiragana
888 -h, -h1 and --hiragana converts all kanas to hiragana. -h2 and
889 --katakana convert all kanas to katakana. -h3 and
890 --katakana-hiragana swap katakana and hiragana.
891
892 --nkf-help
893 show option difference/compatibility between skf and nkf.
894
895 --in-place[=SUF]--overwrite[=SUF]
896 replace specified file with converted codeset. overwrite retains
897 file create time stamp. If a suffix is given, the suffix is
898 added to output file name and input file is not removed.
899
900
901 Lightweight language specific options
902 skf plugin for lightweight language has subset of options. More specif‐
903 ically, file input/output related options(-b, -u, --overwrite --in-
904 place, --filewise-detect --linewise-detect --show-filename --suppress-
905 filename) and UTF-16 output is disabled(except ruby or python3). The
906 calling methods differ depending on LWL, but each extension has two pa‐
907 rameters, a option string and a string to convert. From 2.1.15, ruby
908 is not supported.
909
910
911 Python-3.x specific options
912 Since native codeset representation in python3.x is `ATIN-1/UCS2/UCS4,
913 skf behaves differently with output codeset option. If output codeset
914 is either ASCII, UTF-16 or UTF-32(in wide mode), skf returns Unicode
915 object, and for all other codesets skf returns binary array object.
916 Following options change this behavior. codesets assumed as ascii
917 (UTF-7) and MIME encoded strings are returned as strings.
918
919 --py-out-binary
920 use psuede unicode binary array stream to output. BOM is en‐
921 abled.
922
923 --py-out-string
924 use binary array object on ASCII, UTF-16/32 output. This is de‐
925 fault.
926 skf accepts either a binary array or an unicode object for in‐
927 put. BOM is disabled.
928
929
930 Misc. Control options
931 --disable-space-convert --enable-space-convert
932 skf converts an ideographic space into two ascii spaces. Dis‐
933 able option disables, and enable option enables this behavior.
934 Default is disabled.
935
936 --html-sanitize
937 Convert several characters in HTML document to entity reference
938 expression. Specifically, "!#$&%()/<>:;?´ are escaped by entity-
939 references.
940
941 --filewise-detect --force-reset
942 If multiple input files are given, detect input codeset for each
943 file.
944
945 --linewise-detect
946 Detect input code line-wise. Note this option weakens code de‐
947 tect correctness.
948
949 --reset
950 Reset all flags specified by extended controls and enviroment
951 variables.
952
953 --inquiry --guess
954 skf detects code and output detect result to stdout. No filter‐
955 ing output is performed. If multiple input files are given,
956 --show-filename is automatically enabled.
957
958 --hard-inquiry
959 Similar as inquiry, but reports both code and an end-of-line
960 character.
961
962 --suppress-filename
963 When inquiry(--inquiry) is on, this option disables file name
964 output. This option overrides --show-filename.
965
966 --show-filename
967 When inquiry(--inquiry) is on, this option adds each file name
968 to output.
969
970 --invis-strip
971 Delete all escape sequences not belonging to ISO-2022 code ex‐
972 tension. This is intended to replace invisstrip command bundled
973 in inews package.
974
975 -I Warn if input has unassigned code points.
976
977 -v print version information and exit.
978
979 --help print brief help and exit.
980
981 --show-supported-codeset
982 Display supported codesets (input) and exit. Both canonical
983 names (left side) and detailed names are shown. This canonical
984 name can be used as MIME charset and also as ic-option code
985 specification.
986
987 --show-supported-charset
988 Display supported character sets (output) and exit. Both canoni‐
989 cal names and detailed names are shown. Some charsets with spe‐
990 cial treatments (i.e. meaningless as set-g* parameters) inten‐
991 sionally lacks addressable cnames.
992
993
995 /usr/(local/)share/skf/lib/ (Unices)
996
997 /Program Files/skf/share/lib (MS Windows)
998 These directories are where external codeset conversion tables
999 go. The location that current skf assumes are shown by -h op‐
1000 tion.
1001
1002
1004 skf is written by Seiji Kaneko (efialtes@osdn.jp) based on idea from
1005 nkf written by Itaru Ichikawa (ichikawa@flab.fujitsu.co.jp) X 0213 code
1006 table is derived from work of earthian@tama.or.jp. Some codeset map‐
1007 ping is derived from various sources. Detailed origin is shown in copy‐
1008 right document included in this distribution. Unicode Database is
1009 copyrighted(c) by Unicode(R), Inc.
1010
1011
1013 skf is inspired by works or requests by shinoda@cs.titech,
1014 kato@cs.titech, uematsu@cs.titech, void@global ohta@ricoh, Hinata(HKE)
1015 Ashizawa(CRL) Kunimoto(SDL) Oohara(Univ of Kyoto), Jokagi(elf2000) and
1016 Naruse (at osdn.jp). Thanks.
1017
1018
1020 1. skf can handle mixed coding with some limitations. However, code de‐
1021 tection tends to fail for mixed code, and giving explicit input code
1022 set is strongly encouraged, if codeset is known beforehand.
1023 In case of need, --linewise-detect option may help, but code detecting
1024 will more likely fail.
1025
1026 2. skf implements ISO-2022 with following exceptions.
1027 i) GL 0x20 is always space. Even when 96-character codeset is invoked
1028 to GL.
1029 ii) Sequences for setting codes to C1 and C2 are ignored.
1030 iii) If unknown sequence is given to G0, G0 is set to ascii, and lock‐
1031 ing/single shift is cleared. Unknown sequece call to set to G1-G3 is
1032 just ignored.
1033 Private charset is also not supported and is ignored.
1034 iv) Sequences for 96 character multibyte coding is ignored (Currently,
1035 no codeset is registered).
1036 v) Calling UTF-8, UTF-16 coding system from iso-2022 is supported, and
1037 returns to previous coding system by standard return.
1038 Callings and returns to/from other coding schemes are ignored.
1039 vi) For supporting some of cellular phone glyphs, several private (not
1040 registered) codesets are defined in skf, and can be called by appropri‐
1041 ate sequences.
1042
1043 3. Error output coding is controlled by LOCALE environment variables in
1044 UN*X system. skf doesn't take care of situations like stdout and stderr
1045 are redirecting into a same stream. Such case should be handled by user
1046 side.
1047
1048 4. skf converts KEIS/JIS X 0213 code using CJK-extension B area and CJK
1049 compatibility area. For this reason, X 0213 and KEIS convert result
1050 varies depending on --use-compat and --limit-to-ucs2 switches.
1051
1052 5. JIS X 0207:1979 is not supported. JIS X 0211:1987 is designed to be
1053 supported (i.e. common terminal control sequence will be transparently
1054 passed to output).
1055
1056 6. Even if unbuffer option(-u) is specified, some code-translation re‐
1057 lated bufferings are still performed (in MIME, kana, VIQR etc.).
1058
1059 7. skf-1.9x or later recognizes and handles languages in iso639-1(alpha
1060 2). iso639-2 is not supported as a valid language set.
1061
1062 8. Unicode IVS is not supported. Sequences are just discarded.
1063
1064 9. skf-1.9x or later does not retain Macintosh RLO-ordered character
1065 property. Codesets with this kind of codes are not supported.
1066
1067 10. CNS11643 4th, 5th, 6th planes are not supported.
1068
1069 11. In python 3 extension, a detected codeset by inquiry for input uni‐
1070 code strings are always UTF-32be.
1071
1072 12. In lightweight language extension except ruby and python,
1073 UCS2/UTF-16 are not supported.
1074
1075
1076
1078 1. Extended options are changed extensively since skf-1.9. Some archaic
1079 options (eg. -B, -@ and -r) have been deleted from this version.
1080
1081 2. skf is originally forked project from nkf, but doesn't contain any
1082 nkf codes now. Copyright notice is retained by honor.
1083
1084 3. From version 1.9, default Japanese character set assumed by skf has
1085 changed to JIS X 0208:1990 with Microsoft Japanese Windows gaiji (i.e.
1086 CP932).
1087
1088 4. Code autodetection is not perfect by design. If it has failed to de‐
1089 tect input code properly, please give input code information explic‐
1090 itly.
1091
1092 5. Some ligatures in Unicode, cp932 gaiji and KEIS83 are converted us‐
1093 ing JIS X 0124 and other convention. During this conversion, its byte
1094 length is not preserved.
1095
1096 6. skf is intended to pass ANSI compatible terminal control codes
1097 transparently, but this is not guaranteed.
1098
1099 7. nkf's -i and -o options works only in nkf-compat mode. It is obso‐
1100 lete option in 1.97, and valid only when iso-2022-jp and without con‐
1101 sidering output codeset specifications.
1102
1103 8. For unconverted character, skf uses geta and undefined character as
1104 --use-replace-char option. If output codeset doesn't contain geta
1105 code, skf prefers 'black square character', then uses '.' respectively.
1106
1107 9. There are some undocumented options. These options should be consid‐
1108 ered as highly experimental.
1109
1110 10. In lineend_thru mode and using folding, skf remembers order of cr
1111 and lf appears in stream, and use that order. For this design, if skf
1112 needs to output line-end character before any line-end character ap‐
1113 pears in input stream, input order may not be preserved.
1114
1115 11. NKF-compatibility
1116 1) --prefix, some --fb's and --no-best-fit-chars are not supported.
1117 Error behaviors are not compatible.
1118 2) -r option and --decode=rot is different. See each option descrip‐
1119 tion.
1120 3) MSDOS (and -T), --exec-in and --exec-out are not supported. -O is
1121 supported.
1122 4) MIME decoding/encoding handling behaviors differ in various ways.
1123 5) lineend conversion acts differently. Results may not be same for
1124 text with multiple lineend characters.
1125 6) detected codeset name is not compatible with nkf. --help and --ver‐
1126 sion return different results.
1127 7) in-place and overwrite suffix with * is not supported.
1128
1129 12. Conversion to NYUUKAN GAIJI is as follows
1130 1) Kanji codes in JIS X0208(1997), JIS X0212(1990), JIS
1131 X0213(2004/2012),
1132 Houmusho-kokuji No.582 beppyou No.1 are sent to output as it is.
1133 2) Kanji codes in beppyou No.4-2 leftmost columns are converted to the
1134 first
1135 priority character in the table. If the second priority characters ap‐
1136 pear,
1137 the codes are sent to output as it is.
1138 3) Other kanji codes are converted as undefined codes. See above con‐
1139 version method. Non-kanji codes (latins, glyphs etc.) are sent to out‐
1140 put as it is.
1141
1142 13. ARIB B24 compatibility
1143 1) Input only. ARIB B24 output is not supported.
1144 2) Neither international encoding nor X0213 extension are supported.
1145 3) Macro define sequences are suppressed. These sequences are recog‐
1146 nized and
1147 discarded.
1148 4) Without specifying arib codeset, skf treats Arib-defined codepage as
1149 follows.
1150 i) private codepage are supported. ascii/jis x-0201 0x5f is not modi‐
1151 fied.
1152 ii) macro define/invoke and rpc invoke does not work. These charac‐
1153 ters are
1154 discarded.
1155
1156 14. option mnemonic table for -v option
1157 AA: aware ascii-art in code detection DBG: Debugging feature enabled
1158 F64: Large file enabled(default) NE: Environment variable handling dis‐
1159 abled NFJ: suppress fj-newsgroup convension NLS: Native language mes‐
1160 saging enabled(default) NN: detect skf is called under nkf name OMST:
1161 Have mkstemp PEP: Python3 PEP393 support enabled SG: Slow getc enabled
1162 SPNC: Space convert disabled. STT: Use Static codeset table UFY_A_J:
1163 Unify JIS x-0201 to ascii UID/EUID: Have UID/EUID. ULM: UCS2 generic
1164 latin support. WIN32: Windows environment.
1165
1166
1167 15. feature mnemonic table for -v option
1168 98: old-nec-compat (ESC-H/ESC-K) feature enabled ACE: punycode support
1169 enabled ARIB: ARIB B24 support enabled FD: fold feature enabled KD:
1170 KEIS90 auto-detect enabled KX: KEIS90 extra region enabled MIMEREC:
1171 Mime recovery feature anabled NFD: Unic*de decompose enabled ROT:
1172 rot13/47 support enabled UK: UTF16 hankaku-kana disabled UN: UTF16 nor‐
1173 malize enabled ONKF: nkf old -i, -o option enabled LE_*: lineend han‐
1174 dling.
1175
1176
1178 Unicode(TM) is a trademark of Unicode, Inc. Microsoft and Windows are
1179 registered trademarks of Microsoft corporation. Macintosh is a regis‐
1180 tered trademark of Apple Inc. Vodafone is a trademark of Vodafone K.K.
1181 Other names and terms may be trademarks or registered trademarks of
1182 their respective owner. Trademark symbol (TM) may be omitted in this
1183 manual page.
1184
1185
1186
1187
1188 10/Aug/2018 SKF(1)