1SKF(1)                      General Commands Manual                     SKF(1)
2
3
4

NAME

6       skf - simple Kanji Filter (v2.1)
7

SYNOPSIS

9       skf [-EIJKNQRSXZbehjknqrsuvxz] [ long_format_options ] [infiles..]
10

DESCRIPTION

12       skf  is  a  yet another i18n capable kanji-filter, designed for reading
13       various CJK-coded files on the Net.  skf converts input kanji texts  or
14       streams  into  a  character  stream using designated codeset and output
15       them to standard output. Specifically, skf is designed to be  a  versa‐
16       tile  filter  to read documents in various code sets, and does not pro‐
17       vide features not related to code conversion.
18
19       Like nkf, skf automatically recognizes an input file code when it is  a
20       kind  of ISO-2022 compliant code, and also detects EUC-variant codes if
21       input file is Japanese text without X 0201 kanas.   skf  2.1  can  read
22       various iso-2022 compliant character sets, including JIS Kanji codes (X
23       0208, X 0212 and X 0213), EUC encoding (euc-jp (with X  0213  support),
24       euc-cn,  euc-kr  and  euc-tw),  ISO  Europian latins (ISO-8859-1 to 11,
25       13/14/15/16) and many regional character sets.  skf can also read  some
26       non-iso2022   compliant   sets,  including  Microsoft  Shift-JIS  code,
27       KOI-8-R/U, GB2312 (HZ), big5, VISCII(rfc1456,  include  VIQR),  Unicode
28       standard  (UCS2/UTF-16,  UTF7  and  UTF8),  some of MS codesets (cp1250
29       etc.) and some other vendor specific codes (KEIS83, JEF etc).
30
31       Supported output character sets of skf are more limited, but still  in‐
32       clude X 0208/X 0212/X 0213 JIS, X 0201 JIS, ASCII, Microsoft Shift-JIS,
33       EUC-jp/-kr/-cn, HZ, iso-2022-jp/kr, big5, VISCII and Unicode.
34
35       skf also provides some basic decoding features for some  common  encod‐
36       ings including MIME, Punycode and URI codepoint.  Unicode decomposition
37       feature is also supported since 1.96.
38
39       As noted above, skf is designed to convert input text into some kind of
40       human-readable  forms under a local environment (i.e. codeset), and has
41       several extra conversion features like GNU recode type  folding.   Such
42       conversions  include  Windows/Macintosh specific code swaps and old-new
43       jis glyph changes, html-format/TeX format conversion and variant unifi‐
44       cations.
45
46       skf also can be compiled as an extension of some lightweight languages.
47       See README.txt for details.
48
49       If one or more file names are given, skf read the files and output con‐
50       verted  stream  to  stdout.  If no file names are given, input is taken
51       from stdin and output is also stdout.  OPTIONS are taken from  environ‐
52       ment  variables  SKFENV,  skfenv and command line, respectively in this
53       order. Environment variables are not used when  skf  is  running  as  a
54       priviledged  user.   skf  does not use LOCALE-related environment vari‐
55       ables for conversions, but output  error  messages  are  controlled  by
56       given LOCALES.
57

CODESET OPTIONS

59       skf  is  written  from scratch, and inherits no code from nkf. However,
60       skf is intended to be a drop-in replacement for  nkf(v1.4)  and  has  a
61       similar commonly-used nkf option set.
62       skf  2.1  recognizes following options. Defaults are all off if not ex‐
63       plicitly specified.
64
65   buffering control
66       -b     use buffered output. This is default.
67
68       -u     use unbuffered output.  Code detection feature is disabled  when
69              this option is on.
70
71   Input/Output codeset options
72       --ic=  input_code_set
73              specify  input  codeset  is input_code_set.  Possible candidates
74              are shown below.
75
76       --oc=  output_code_set
77              specify output codeset is output_code_set.  Possible  candidates
78              are shown below. Default codeset in distribution package is euc-
79              jp, but depends on compile option. Default codeset is  shown  by
80              ´skf -h´.
81
82     Supported codeset
83       skf  recognizes  following  codesets  as an input/output codeset. These
84       codeset names are case insensitive,  and  minus  ('-')  and  underscore
85       ('_')  is ignored.  Note that iso-2022 escape-based input codeset (reg‐
86       istered to IANA) is recoginized automatically,  even  when  non-iso2022
87       codeset  (except  Unicode  and B-Right/V) is specified.  o in in-column
88       means named codeset can be specified as input and x means named codeset
89       is not for input. output-column is same except it is for output.
90
91       in out  name            description
92       o  o    iso8859-1       ascii + iso-8859-1 (latin-1)
93       o  o    iso8859-2       ascii + iso-8859-2 (latin-2)
94       o  o    iso8859-3       ascii + iso-8859-3 (latin-3)
95       o  o    iso8859-4       ascii + iso-8859-4 (latin-4)
96       o  o    iso8859-5       ascii + iso-8859-5 (Cyrillic)
97       o  o    iso8859-6       ascii + iso-8859-6 (Arabic)
98       o  o    iso8859-7       ascii + iso-8859-7 (Greek)
99       o  o    iso8859-8       ascii + iso-8859-8 (Hebrew)
100       o  o    iso8859-9       ascii + iso-8859-9 (latin-5)
101       o  o    iso8859-10      ascii + iso-8859-10 (latin-6)
102       o  o    iso8859-11      ascii + iso-8859-11 (Thai)
103       o  o    iso8859-13      ascii + iso-8859-13 (Baltic Rim)
104       o  o    iso8859-14      ascii + iso-8859-14 (Celtic)
105       o  o    iso8859-15      ascii + iso-8859-15 (Latin-9)
106       o  o    iso8859-16      ascii + iso-8859-16
107       o  o    koi-8r          koi-8r (Russian)
108       o  o    koi-8u          koi-8r (Ukraina)
109       o  o    cp1251          Cyrillic latin MS cp1251
110       o  o    jis             iso-2022-jp (rfc1496 7bit JIS)
111       o  o    iso-2022-jp-x0213 iso-2022-jp-3 (JIS X 0213:2000)
112                               a.k.a. jis-x0213
113       o  o    jis-x0213-strict iso-2022-jp-3-strict
114       o  o    iso-2022-jp-2004 iso-2022-jp-2004(JIS X 0213:2004)
115                               a.k.a. jis-x0213-2004
116       o  o    oldjis          iso-2022-jp-1978(JIS X 0208:1978)
117       o  o    cp50220         Microsoft codepage 50220
118       o  o    cp50221         Microsoft codepage 50221
119       o  o    cp50222         Microsoft codepage 50222
120       o  o    euc-jp          EUC-encoded JIS X 0208:1997
121       o  o    euc-x0213       EUC-encoded JIS X 0213:2000
122       o  o    euc-jis-2004    EUC-encoded JIS X 0213:2004
123       o  o    cp51932         EUC-encoded Microsoft codepage 932
124       o  o    euc-kr          EUC-encoded KS X 1001 Korian
125       o  o    euc7-kr         7bit EUC-encoded KS X 1001 Korian
126       o  o    uhc             Unified hangle (Windows cp949)
127       o  o    johab           KS X 1001-johab Korian
128       o  o    euc-cn          EUC-encoded GB2312 Chinese
129       o  o    euc7-cn         7bit EUC-encoded GB2312 Chinese
130       o  o    hz              HZ-encoded GB2312 Chinese
131       o  o    euc-tw          EUC-encoded CNS 11643 Chinese
132       o  o    gb12345         EUC-encoded GB12345 Chinese
133       o  o    gbk             GB2312 Extension(cp936) Chinese
134       o  o    gb18030         GB18030 chinese
135       o  o    big5            BIG5 (with Eten extension + EURO)
136       o  o    cp950           BIG5 (Microsoft cp950 + EURO)
137       o  o    big5-hkscs      BIG5 with HKSCS
138       o  o    big5-2003       BIG5-2003
139       o  o    big5-uao        BIG5-Unicode at On
140       o  o    sjis            Shift-jis (Microsoft cp943)
141       o  o    shiftjis-x0213  Shiftjis-encoded JIS X 0213:2000
142       o  o    shiftjis-2004   Shiftjis-encoded JIS X 0213:2004
143       o  o    sjis-docomo Shiftjis-encoded with NTT Docomo emoticons.
144       o  o    sjis-au          Shiftjis-encoded with AU emoticons.
145       o  o    sjis-softbank    Shiftjis-encoded with SoftBank emoticons.
146       o  o    oldsjis         Shift-jis (JIS X 0208:1978)
147       o  o    cp932           Shift-jis-encoded MS cp932
148       o  o    cp932w          Shift-jis-encoded MS cp932 with
149                               MS compatibility
150       o  o    viscii          VISCII (rfc1456) Vietnamise
151       o  o    viqr            VISCII (rfc1456-VIQR) Vietnamise
152       o  o    keis            Hitachi KEIS83/90
153       o  x    jef             Fujitsu JEF (basic support only)
154       o  x    ibm930          IBM EBCDIC DBCS Japanese
155       o  x    ibm931          IBM EBCDIC DBCS Japanese w.latin
156       o  x    ibm933          IBM EBCDIC DBCS Korian
157       o  x    ibm935          IBM EBCDIC DBCS Simpl. Chinese
158       o  x    ibm937          IBM EBCDIC DBCS Trad. Chinese
159       o  o    unicode         Unicode(TM) UTF-16LE
160       o  o    unicodefffe     Unicode(TM) UTF-16BE
161       o  o    utf7            Unicode(TM) UTF-7
162       o  o    utf8            Unicode(TM) UTF-8
163       o  o    utf8-bom        Unicode(TM) UTF-8 with BOM
164       o  o    utf7-imap       IMAP modified Unicode(TM) UTF-7 (RFC2060)
165       o  o    mutf8           Java modified Unicode(TM) UTF-8
166       o  o    cesu8           CESU-8 (Unicode Technical Report #26)
167       x   o     nyukan-utf-8  nyukan-utf-16 Nyukan-moji(Japanese nyukoku-kan‐
168       rikyoku gaiji). Encoding is utf-8 and utf-16 respectively.
169       o  x    arib-b24        ARIB B24 8-bit JIS-based
170       o  x    arib-b24-sj     ARIB B24 8-bit SJIS-based
171       x  o    transparent     Transparent mode (see below)
172       o  x    x-iscii-de      India ISCII-91(IS13194:1991)
173       o  x    asmiscii-8  Armenian ARMISCII 8
174       o  x geostd8        Geogian Geostd 8
175       o  x mik       Burgarian MIK
176       o  x tscii          Tamil TSCII 1.7
177       o  o locale         codeset specified in locale. See below.
178
179
180     Codeset explanations
181       iso-8859-*
182              When specified as output, G0 = GL  is  ascii  and  G1  =  GR  is
183              iso-8859-*. 8bit encoding is used.
184
185       iso-2022-jp, jis
186              Encoding  is  iso-2022-jp-2 (RFC1496). G0 = GL is JIS X 0201 ro‐
187              man, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and G3 is  JIS
188              X 0212:1990 Supplementary Kanji.
189
190       jis-x0213, iso-2022-jp-3
191              Encoding  is  iso-2022-jp-3  (JIS X 0213:2000 based). G0 = GL is
192              JIS X 0201 roman, For output, G1 = GR is JIS X 0201 kana, G2  is
193              iso-8859-1 and G3 is JIS X 0213 plane2 Kanji.
194
195       jis-x0213-strict
196              Encoding  is subset of iso-2022-jp-3-strict (uses Plane 1 only).
197              For output, G0 = GL is JIS X 0201 roman, G1 = GR is JIS  X  0201
198              kana,  G2 is iso-8859-1 and G3 is not set. Output code using JIS
199              X 0208 whenever possible. JIS X 0213 input is automatically rec‐
200              ognized.
201
202       jis-x0213-2004, iso-2022-jp-2004
203              Encoding  is iso-2022-jp-2003:2004. For output, G0 = GL is JIS X
204              0201 roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and  G3
205              is JIS X 0213 plane2 Kanji.
206
207       oldjis
208              Encoding  is iso-2022-jp using old JIS X 0208:1978).  G0 = GL is
209              JIS X 0201 roman, G1 = GR is JIS X 0201 kana, G2  is  iso-8859-1
210              and G3 is JIS X 0212 Supplementary Kanji.
211
212       euc-jp, euc
213              Encoding is 8-bit EUC using JIS X 0208:1997 character set.  G0 =
214              GL is ascii, G1 = GR is JIS X 0208, G2 is JIS X 0201 kana and G3
215              is JIS X 0212 Supplementary Kanji.
216
217       euc-x0213, euc-jis-2003
218              Encoding  is 8-bit EUC-based JIS X 0213:2000.  G0 = GL is ascii,
219              G1 = GR is X 0213:2000 plane 1, G2 is iso-8859-1 and G3 is JIS X
220              0213:2000 plane2 Kanji.
221
222       euc-jis-2004
223              Encoding  is  8-bit EUC-based JIS X0213:2004.  G0 = GL is ascii,
224              G1 = GR is X0213:2004 plane 1, G2 is iso-8859-1 and  G3  is  JIS
225              x0213:2004 plane2 Kanji.
226
227       euc-kr
228              Encoding is 8-bit EUC using KS X 1001 Wansung character set.  G0
229              = GR is KS X1003, G1 = GR is KS X1001, G2 and G3 is not set.
230
231       euc7-kr iso-2022-kr
232              Encoding is iso-2022-kr (rfc1557): 7-bit EUC  using  KS  X  1001
233              Wansung  character set.  G0 = GR is KS X1003, G1 is KS X1001, G2
234              and G3 is not set.
235
236       euc-cn
237              Encoding is 8-bit EUC using GB 2312 simplified chinese character
238              set.  G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3 is not set.
239
240       euc7-cn
241              Encoding is 7-bit EUC using GB 2312 simplified chinese character
242              set.  G0 = GR is ASCII, G1 is GB2312, G2 and G3 is not set.
243
244       hz
245              Encoding is HZ encoded  (rfc1842)  GB  2312  simplified  chinese
246              character  set.   G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3
247              is not set.
248
249       euc-tw
250              Encoding is EUC encoded CNS11643  Plane1/2  traditional  chinese
251              character set. Subset of iso-2022-cn.  G0 = GR is ASCII, G1 = GR
252              is CNS11643 plane 1, G2 is CNS11643 plane 2 and G3 is not set.
253
254       gb12345
255              Encoding is 8-bit EUC using GB 12345 (GBF)  traditional  chinese
256              character  set.  G0 = GR is ASCII, G1 = GR is GB12345, G2 and G3
257              is not set.
258
259       gbk, cp936
260              Encoding is GBK simplified chinese character set.  G0  =  GR  is
261              ASCII and G1 = GR is GBK. G2 and G3 is not set.
262
263       gb18030 (experimental)
264              Encoding  is GB18030 (ibm-1392, Windows cp54936) chinese charac‐
265              ter set.  Uses ASCII as latin part.
266
267       big5
268              Encoding is Big5 traditional chinese character set with ETen ex‐
269              tension.  Include Euro mapping.  Uses ASCII as latin part.
270
271       cp950
272              Encoding  is  Microsoft cp950-Big5 traditional chinese character
273              set.  Uses ASCII as latin part.
274
275       big5-hkscs (experimental)
276              Encoding is cp950-Big5 traditional chinese  character  set  with
277              HKSCS extension.  Uses ASCII as latin part.
278
279       big5-2003 (experimental)
280              Encoding  is  Big5-2003  Taiwanese  standard traditional chinese
281              character set.  Uses ASCII as latin part.
282
283       big5-uao (experimental)
284              Encoding is Big5-UAO (http://uao.cpatch.org) traditional chinese
285              character set.  Uses ASCII as latin part.
286
287       VISCII (experimental)
288              Vietnamise VISCII (rfc1456) character set. Not TCVN-5712.
289
290       VIQR (experimental)
291              Vietnamise VISCII character set with VIQR encoding(rfc1456).
292
293       sjis
294              Encoding  is  Shift-encoded JIS X 0208:1997 character set.  Note
295              that this is not cp932. Uses JIS X 0201 latin as latin(GL) part.
296
297       sjis-x0213, shift_jis-2000
298              Encoding is Shift-encoded JIS using JIS  X  0213:2000  character
299              set.
300
301       sjis-x0213-2004, shift_jis-2004
302              Encoding  is  Shift-encoded  JIS using JIS X 0213:2004 character
303              set.  10 newly defined character added, but Unicode  mapping  is
304              same  as  JIS  X  0213:2000.  Uses JIS X 0201 latin as latin(GL)
305              part.
306
307       sjis-cellular (experimental)
308              Encoding is Shift-encoded JIS X 0208:1997 character set with NTT
309              Docomo/Vodafone(SoftBank)  cellular phone glyph mapping.  Output
310              is not supported.
311
312       cp932 cp932w
313              Encoding is Microsoft SJIS cp932 with NEC/IBM gaiji area,  based
314              on Windows XP mapping. Uses ASCII as latin(GL) part.  --use-com‐
315              pat and --use-ms-compat is automatically enabled.   cp932w  pro‐
316              vides further WideCharToMultiByte compatibility.
317
318       cp51932
319              Encoding is Microsoft EUC-based cp51932 with NEC/IBM gaiji area,
320              based on Windows XP mapping.  Uses ASCII as G0 and  JIS  X  0201
321              kana  as  EUC  G2  part.   G3  is not used for output, and JIS X
322              0212:2000 as input.  --use-compat and --use-ms-compat  is  auto‐
323              matically enabled.
324
325       cp50220, cp50221, cp50222
326              Encoding  is  Microsoft JIS-based cp50220, cp50221, cp50222 with
327              NEC/IBM gaiji area, based on Windows XP mapping.  For input, skf
328              accepts cp50220, 50221 and 50222.  Note that this codeset is NOT
329              compatible with iso-2022.  Uses ASCII as default character  set.
330              --use-compat and --use-ms-compat is automatically enabled.
331
332       oldsjis
333              Encoding  is  Microsoft  SJIS  (JIS X 0208:1978 a.k.a. old JIS).
334              Uses JIS X 0201 latin as latin(GL) part.
335
336       johab
337              Encoding is KS X1001(Johab) character set. Uses KS  X1003  latin
338              as latin(GL) part.
339
340       uhc
341              Encoding  is  UHC (cp949) character set. Uses ASCII as latin(GL)
342              part.
343
344       unicode, unicodefffe, utf16, utf16le
345              Encoding is Unicode UTF-16 (v15.0). Input/Output  default  byte-
346              endian  is little for unicode and big for unicodefffe, and input
347              byte order mark is recognized. utf16 and unicodefffe is  big-en‐
348              dian. utf16le and unicode is little endian.  Output includes en‐
349              dian mark by default unless --disable-endian-mark is  specified.
350              Output  range  is  within  UTF-32  with  surrogate  pair  unless
351              --limit-to-ucs2 is specified.
352              Note that ucs2 is not supported within lightweight language  ex‐
353              tension  in  both  in and output, because of SWIG's passing data
354              structure limitation. Specify to ucs2 will generate error.
355
356       utf8
357              Encoding is UTF-8 encoded Unicode (v15.0).  Output  doesn't  in‐
358              clude  byte order mark unless --enable-endian-mark is specified.
359              Output range is within UTF-32 unless --limit-to-ucs2  is  speci‐
360              fied.  By default, CESU-8 is not accepted as input. Option --en‐
361              able-cesu8 enables CESU-8 input for utf-8 converter. CESU-8 out‐
362              put  is  not  supported.  For UTF-8, endian mark (BOM) is always
363              ignored.
364
365       utf7
366              Encoding is UTF-7 encoded Unicode (v15.0). Input/output range is
367              limited  to UTF-16, and value above U+10000 is regarded as unde‐
368              fined.  BOM is always ignored for input, and never used for out‐
369              put.
370
371       utf7-imap
372              Modified  utf-7  for IMAP protocol described in RFC2060.  BOM is
373              always ignored for input, and never used for output.
374
375       mutf8
376              Modified utf-8 for Java language. CESU-8 plus  U-0000  encoding.
377              BOM is always ignored for input, and never used for output.
378
379       cesu-8
380              Modified  utf-8  described in unicode technical report #26.  BOM
381              is always ignored for input, and never used for output.
382
383       keis (experimental)
384              Encoding is Hitachi KEIS83/90. Output range is limited to EBCDIK
385              and JIS X 0208 area.
386
387       jef (experimental)
388              Encoding  is  Fujitsu  JEF.  Input only. Only basic part is sup‐
389              ported.
390
391       ibm930 (experimental)
392              Encoding is IBM DBCS Japanese with EBCDIC Kana
393
394       ibm931 (experimental)
395              Encoding is IBM DBCS Japanese with EBCDIC latin (ibm037)
396
397       ibm933 (experimental)
398              Encoding is IBM DBCS Korian with EBCDIC Wansung character set
399
400       ibm935 (experimental)
401              Encoding is IBM DBCS Simplified Chinese with EBCDIC Chinese
402
403       ibm937 (experimental)
404              Encoding is IBM DBCS Traditional Chinese with EBCDIC Chinese
405
406       koi8r
407              Russian KOI-8R code.
408
409       cp1250
410              Central Europian latin Microsoft cp1250 code
411
412       cp1251
413              Eastern Europian cyrillic Microsoft cp1251 code
414
415       arib-b24 arib-b24-sj
416              ARIB B24 code defined in ATIB-STD-B24 vol.1 part.2  chapt.  7.3.
417              b24 is 8-bit jis based, and b24-sj is sjis based.
418
419       nyukan-utf-8 nyukan-utf-16
420              Normalized  Unicode  UTF-8/UTF-16 based on Japanese law ministry
421              kokuji No. 582.
422
423       locale
424              Use locale-specified codeset. Since locale only provides partial
425              information as codeset, whether this option works as expected or
426              not depends on environmental settings.
427
428       transparent
429              Transparent mode. Various code control features, include folding
430              and line end code conversion, is also ignored.
431
432
433     Shortcuts
434       -j     same as --oc=jis
435
436       -s     same as --oc=sjis
437
438       -e     same as --oc=euc-jp
439
440       -q     same as --oc=unicode
441
442       -z     same as --oc=sjis
443
444       -E     same as --ic=euc-jp. Assume input codeset is EUC-JP.
445
446       -J     same as --ic=jis. Assume input codeset is iso-2022-jp.
447
448       -S     same as --ic=sjis. Assume input codeset is shift JIS
449
450       -Q     same as --ic=utf-16 --input-little-endian.
451
452       -Z     same as --ic=utf8.
453
454
455     ISO-2022 Specific controls
456       Replaces  G0-3 after setting up according to specified input codeset by
457       assigned character set with this option. Note that this doesn't  change
458       any  codeset  properties of the original codeset, like language and en‐
459       coding.
460
461       --set-g0=`charset name'
462              Predefines specified code set to plane 0 (G0). Also set to GL at
463              initial state.
464
465       --set-g1=`charset name'
466              Predefines  specified  code set to right plane (G1). Also set to
467              GR at initial state.
468
469       --set-g2=`charset name'
470              Predefines specified code set to right plane (G2).
471
472       --set-g3=`charset name'
473              Predefines specified code set to right plane (G3).
474
475
476       Supported `char_set' is as follows. 'o' means the codeset can be speci‐
477       fied to set to the plane. 'x' means you can't. For unicode family code‐
478       sets, this option is ignored. For other  non-iso2022  categories,  this
479       option is not supported, and result is unpredictable.
480
481
482       g0 g1 g2 g3    codeset name   description
483       o  o  o  o     ascii          ANSI X3.4 ASCII
484       o  o  o  o     x0201          JIS X 0201 (latin part)
485       x  o  o  o     iso8859-1      ISO 8859-1 latin
486       x  o  o  o     iso8859-2      ISO 8859-2 latin
487       x  o  o  o     iso8859-3      ISO 8859-3 latin
488       x  o  o  o     iso8859-4      ISO 8859-4 latin
489       x  o  o  o     iso8859-5      ISO 8859-5 Cyrillic
490       x  o  o  o     iso8859-6      ISO 8859-6 Arabic
491       x  o  o  o     iso8859-7      ISO 8859-7 Greek-latin
492       x  o  o  o     iso8859-8      ISO 8859-8 Hebrew
493       x  o  o  o     iso8859-9      ISO 8859-9 latin
494       x  o  o  o     iso8859-10     ISO 8859-10 latin
495       x  o  o  o     iso8859-11     ISO 8859-11 Thai
496       x  o  o  o     iso8859-13     ISO 8859-13 latin
497       x  o  o  o     iso8859-14     ISO 8859-14 latin
498       x  o  o  o     iso8859-15     ISO 8859-15 latin
499       x  o  o  o     iso8859-16     ISO 8859-16 latin
500       x  o  o  o     tcvn5712       TCVN 5712 (Vietnamese)
501       x  o  o  o     ecma94         ECMA 94 Cyrillic (KOI-8e)
502       o  o  o  o     x0212          JIS X 0212:1990
503       o  o  o  o     x0208          JIS X 0208:1997
504       o  o  o  o     x0213          JIS X 0213 Plane 1:2000
505       o  o  o  o     x0213-2        JIS X 0213 Plane 2:2000
506       o  o  o  o     x0213n         JIS X 0213 Plane 1:2004
507       o  o  o  o     gb2312         Simplified Chinese GB2312
508       o  o  o  o     gb1988         Chinese GB1988(latin)
509       o  o  o  o     gb12345        Traditional Chinese GB12345
510       o  o  o  o     ksx1003        Korian KS X 1003(latin)
511       o  o  o  o     ksx1001        Korian KS X 1001
512       x  o  o  o     koi8-r         Cyrillic KOI-8R
513       x  o  o  o     koi8-u         Ukrainean Cyrillic KOI-8U
514       o  o  o  o     cns11643-1   Traditional Chinese CNS11643-1
515       x  o  o  o     viscii-r       RFC1496 VISCII (right plane)
516       o  o  o  o     viscii-l       RFC1496 VISCII (left plane)
517       x  o  o  o     cp437          Microsoft cp437 (US latin)
518       x  o  o  o     cp737          Microsoft cp737
519       x  o  o  o     cp775          Microsoft cp775
520       x  o  o  o     cp850          Microsoft cp850
521       x  o  o  o     cp852          Microsoft cp852
522       x  o  o  o     cp855          Microsoft cp855
523       x  o  o  o     cp857          Microsoft cp857
524       x  o  o  o     cp860          Microsoft cp860
525       x  o  o  o     cp861          Microsoft cp861
526       x  o  o  o     cp862          Microsoft cp862
527       x  o  o  o     cp863          Microsoft cp863
528       x  o  o  o     cp864          Microsoft cp864
529       x  o  o  o     cp865          Microsoft cp865
530       x  o  o  o     cp866          Microsoft cp866
531       x  o  o  o     cp869          Microsoft cp869
532       x  o  o  o     cp874          Microsoft cp874
533       x  o  o  o     cp932          Microsoft cp932 (Japanese)
534       x  o  o  o     cp1250     Microsoft cp1250(Central Europe)
535       x  o  o  o     cp1251         Microsoft cp1251 (Cyrillic)
536       x  o  o  o     cp1252         Microsoft cp1252 (Latin-1)
537       x  o  o  o     cp1253         Microsoft cp1253 (Greek)
538       x  o  o  o     cp1254         Microsoft cp1254 (Turkish)
539       x  o  o  o     cp1255         Microsoft cp1255
540       x  o  o  o     cp1256         Microsoft cp1256
541       x  o  o  o     cp1257         Microsoft cp1257
542       x  o  o  o     cp1258         Microsoft cp1258
543
544       --euc-protect-g1
545              In  EUC  input  mode, suppress sequences to set a charset to G1.
546              Such sequences are discarded.
547
548       --add-annon
549              Add announcer for JIS X 0208:1997 to X 0208 designate  sequence.
550              This option works only with iso-2022-based output.
551
552       --input-detect-jis78
553              Distinguish JIS X 0208:1978 codeset and JIS X 0208:1997 codeset.
554              By default, these two charsets are regarded as X 0208:1997. This
555              option is valid only when input encoding is JIS (iso-2022-jp).
556
557
558     JIS X 0212(Supplement Kanji code) Support
559       --x0212-enable
560              skf  by default does not output JIS X 0212 code in JIS/EUC mode.
561              This option enables use of JIS X 0212 part.  Non-Japanese  code,
562              Shift_JIS  variants,  Unicode or KEIS output ignore this option.
563              Note that this option is supported for  backward  compatibility.
564              It may not be supported in future versions.
565
566
567     Unicode coding specific control options
568       skf-2.10 is conformed on Unicode 11.0 specification.
569
570       --use-compat --suppress-compat
571              By --suppress-compat, skf substitutes characters in unicode com‐
572              patibility planes (U+F900 - U+FFFD) to appropriate characters in
573              non-compatibility planes. If this substitution is enabled, these
574              characters is converted to variants or undefined.  By --use-com‐
575              pat,  skf  outputs  character in this area as it is.  Default is
576              --use-compat.  Several codesets controls this as codeset feature
577              (i.e. Use compatibility planes). See codeset section.
578
579       --use-ms-compat
580              When output is Unicode, make Unicode map to be Microsoft windows
581              compatible). This only changes conversion for  some  symbols  in
582              JIS-Kanji,  and  adding  --use-compat  option is recommended for
583              roundtrip conversion. If you need more strict compatibility, try
584              cp932w for input codeset.
585
586       --use-cde-compat
587              When  output  is  Unicode, make translation CDE standard codeset
588              compatible.
589
590       --little-endian
591              When output is UTF-16le/be, use little endian byte-order.
592
593       --big-endian
594              When output is UTF-16le/be, use big endian byte-order.
595
596       --disable-endian-mark --enable-endian-mark
597              When output is UTF-16 or UTF-8, do not use/use byte order  mark‐
598              ing.  To  make UTF-16N, use this option with --little-endian. By
599              default, BOM is enabled for UTF-16 and disabled for UTF-8.
600
601       --input-little-endian
602              When input is UTF-16le/be, assume input is little  endian  byte-
603              ordered.
604
605       --input-big-endian
606              When  input  is UTF-16le/be, assume input is big endian byte-or‐
607              dered.
608
609       --endian-protect
610              Do not use endian mark in input stream. Endian mark is just dis‐
611              carded.  This is off by default.
612
613       --limit-to-ucs2
614              Do  not  use > 0x10000 area code in Unicode (i.e. limits code to
615              BMP area).  This option doesn't limit  internal  code  range  in
616              skf. This is off by default.
617
618       --disable-cjk-extension
619              Treat  CJK  extension  A/B areas as undefined. This is off (i.e.
620              these areas are enabled) by default.
621
622       --enable-cesu8
623              Enable CESU-8 input in utf-8  codeset.  Ignored  for  any  other
624              codesets.
625
626       --non-strict-utf8
627              Enable broken (decodable but not obeying specs.) utf-8 input. If
628              you need this option, proceeds with extra care.
629
630       --enable-nfd-decomposition --disable-nfd-decomposition
631              Enable/Disable Unicode Normalized decomposition. Default is dis‐
632              abled.
633
634       --enable-nfda-decomposition --disable-nfda-decomposition
635              Enable/Disable  Apple-compatible  Unicode  Normalized decomposi‐
636              tion.  Default is disabled.
637
638       --oldcell-to-emoticon
639              Convert old cell-phone gaiji area in Unicode  PUA  to  emoticon.
640              Supported:  NTT  Docomo/AU  emoticons.  A reverse mapping is not
641              supported.
642
643       --fix-ms-radical-bug
644              mscvrt bug for Windows VISTA or later has an infamous bug  which
645              convert  some Kanji to Kanji radix. This option re-convert radix
646              area to appropriate Kanjis.  This option is  valid  for  Unicode
647              output.
648
649
650
651     Miscellanious codeset related options
652       --old-nec-compat
653              Enable  old  NEC  kanji sequence (ESC-K,H). Needs compile option
654              --enable-oldnec at configuration.
655
656       --no-utf7
657              Assume input codeset is *NOT* UTF-7 encoded Unicode.   This  op‐
658              tion disables input utf7 testing.
659
660       --no-kana
661              Assume input codeset does *NOT* include JIS X 0201 kana.
662
663       --input-limit-to-jp
664              Tell  detection  mechanism  that  input is some kind of Japanese
665              codeset.
666
667
668   OUTPUT Conversions options
669       skf is intended to output stream to stdout, buf nkf-compatible file-en‐
670       coding change option is also provided.
671
672       --overwrite[=SUFFIX] --in-place[=SUFFIX]
673              converts  encoding  of  file(s)  specified as input. --overwrite
674              preserves file change date. If SUFFIX parameter is added,  input
675              file is back-up'ed with a name appended this SUFFIX.
676
677       skf has various features to fix output files appropriate in local envi‐
678       ronment.  Most of these are controlled by extended control switches de‐
679       scribed in this section.
680
681       --use-g0-ascii
682              set  G0(=GL) for output encoding to ASCII, ignoring codeset des‐
683              ignation.
684
685     X-0201 Kana/latin conversions
686       skf by default converts X-0201 kanas to X-0208 kanas. To output  X-0201
687       kana  as it is, use one of following options. When output is designated
688       to EUC or SJIS, these three options enable X-0201 kana output  by  ways
689       provided  by  each encoding. When Unicode output is specified, (equiv.)
690       kana part output is controlled by --use-compat, not following switches.
691       Valid only when output codeset is NOT Unicode family.
692
693       --kana-jis7
694              use SI/SO locking shift sequence to designate X-0201 kana.  This
695              switch is valid for jis, jis-x0213 and  cp50220  (i.e.  cp50221)
696              encoding.  For other codesets, this option is ignored.
697
698       --kana-jis8
699              output X-0201 kana using 8-bit code right plane.  This switch is
700              valid for jis and jis-x0213 encoding.  For other  codeset,  this
701              option is ignored.
702
703       --kana-esci --kana-call
704              use  ESC-(-I to designate X-0201 kana.  This switch is valid for
705              jis, jis-x0213 and cp50220 (i.e. cp50222) encoding.   For  other
706              codeset, this option is ignored.
707
708       --kana-enable
709              If  output  is  EUC-JP  or cp51932, use X-0201 kana with G2.  If
710              SJIS output, it is same as --kana-jis8.  When JIS output, it  is
711              same as --kana-call.
712
713       --use-iso8859-1
714              Enable iso-8859-1 output. Iso-8859-1 is invoked to G1 and set to
715              GR plane.
716
717
718     URI/TeX format conversion feature options
719       With Unicode(tm) family output  codings,  skf  output  non-ascii  latin
720       character  part  as  it is, but with other output codings, skf converts
721       these characters using following rules:
722
723       (1) If a code is defined in a specified output codeset, specified  code
724       point is used for output.
725       (2)  If  one  of  following html convert modes are enabled (i.e. --con‐
726       vert-html --convert-sgml) and the code is defined in html/sgml codeset,
727       it is converted to entity-reference or codepoint reference.
728       (3)  If tex convert mode enabled and the code is defined in tex expres‐
729       sion, it is converted to tex format.
730       (4) If the code is a kind of combined ligatures, it is shown by  a  set
731       of characters.
732       (5) A kind of replacement character is shown, with warning.
733
734       --convert-html --convert-sgml--convert-xml
735              Enable html convert mode. This mode is cleared by --reset. These
736              two options are synonyms, and are treated as same option.
737
738       --convert-html-decimal
739              Enable html  code-point  decimal  convert  mode.  This  mode  is
740              cleared by --reset.
741
742       --convert-html-hexadecimal
743              Enable  html  code-point  hexadecimal convert mode. This mode is
744              cleared by --reset.
745
746       --convert-tex
747              Enable TeX convert mode. This mode is cleared by --reset.
748
749       --convert-perl
750              Enable Perl5 literal convert mode. This mode is cleared by --re‐
751              set.
752
753       --convert-java
754              Enable  Java literal convert mode. This mode is cleared by --re‐
755              set.
756
757       --convert-python
758              Enable Python literal convert mode.  This  mode  is  cleared  by
759              --reset.
760
761       --use-replace-char
762              In Unicode, use unicode replacement chatacter (U+fffc) for unde‐
763              fined chatacter.
764
765
766 Extended Options
767   Encoding/Decoding control options
768       --decode=`encoding scheme'
769
770       --encode=`encoding scheme'
771              Specify an decoding/encoding scheme for input stream.  Supported
772              encoding  schemes  for  decoding  are  `hex',  'mime', 'mime_q',
773              'mime_b', 'uri', 'ace', 'hex_perc_encode', 'base64',  'qencode',
774              'rfc2231',  `rot'  and  'none'.  Each option means CAP hex-code,
775              mime, mime Q-encoding, mime B-encoding, uri character reference,
776              ACE  punycode, uri percent notation, base64, Q-encoding, rfc2231
777              and rot13/47 respectively. 'none' means no decode.
778              For encoding, 'hex', 'mime_b', 'mime_q', 'uri', 'ace', 'cap',
779               'hex_perc_encode', 'base64' and 'none'  are  supported.  EBCDIC
780              related  codesets  and  some already ascii-encoded codeset (e.g.
781              UTF-7) output with encoding is not supported.
782              Only one decode/encode option is valid, and if more than one op‐
783              tion  is  specified, the last one is used.  When one of mime de‐
784              codings is specified, base text is assumed to  be  EUC  encoding
785              unless  specified  otherwise.  Except  rot,  which assumes input
786              stream is Shift_JIS, EUC or iso-2022-jp, these encodings assumes
787              input  stream  is  ascii (as defined in RFC2045). Some encodings
788              may co-exist with encoding, but this is  not  guaranteed.  Espe‐
789              cially,  if input is UTF-16/UCS2 code, these encoding is ignored
790              in skf.
791
792       --mime-ms-compat
793              treat japanese generic codesets as Microsoft  cp932  compatible.
794              More  specifically,  with  this option skf treats iso-2022-jp as
795              cp50220, euc-jp as cp51932 and Shift_JIS as cp932w.
796
797       --mime-persistent
798              skf detects address-like strings and excludes them from mime en‐
799              coding.  This option disables such behavior. Default in nkf-com‐
800              patible mode.
801
802       --mime-limit-aware
803              In address-like string detection, skf respects  character  count
804              limits for a line.
805
806
807   Shortcut
808       -m     same as --decode=mime
809
810       -mB    same as --decode=mime_b
811
812       -mQ    same as --decode=qencode
813
814       -m0    same as --decode=none
815
816       -M     same as --encode=mime_b
817
818       -MB    same as --encode=base64
819
820       -MQ    same as --encode=qencode
821
822   End of line control options
823       --lineend-thru
824              Output  end-of-line code as it is. Also output ^Z code as it is.
825              This is default.
826
827       --lineend-cr --lineend-mac-Lm
828              Use CR as end-of-line code.  Also  delete  ^Z  code  from  input
829              stream.
830
831       --lineend-lf --lineend-unix-Lu
832              Use  LF  as  end-of-line  code.  Also  delete ^Z code from input
833              stream.
834
835       --lineend-crlf --lineend-windows-Lw
836              Use CR+LF as end-of-line code. Also delete ^Z  code  from  input
837              stream.   This  option doesn't preserve original order of cr and
838              lf.
839
840       --input-cr
841              Assume input stream uses CR as end-of-line code.
842
843       --input-lf
844              Assume input stream uses LF as end-of-line code.
845
846       --input-crlf
847              Assume input stream uses CR+LF as end-of-line code.
848
849       -F[line_length[-kinsoku]]
850
851       -f[line_length[-kinsoku]] -f[line_length[+kinsoku]]
852              Wrap input  lines  by  line_length  columns.  f  option  deletes
853              CR/LF's in input, and F option doesn't delete them. For Japanese
854              convension,  both  gyoutou-kinsoku(by  burasage-gumi)  and   gy‐
855              oumatsu-kinsoku(by  oidasi-gumi)  is  supported.  The  burasage-
856              length is  controlled  by  kinsoku  option.  Default  value  for
857              line_length is 66, and must be < 1000. Default value for kinsoku
858              is 5, and must be <= 10. In 'f' option,  skf  autodetects  para‐
859              graph  and  retains some CR/LF. 2nd 'f' option format (with '+')
860              disables this behaviour.  In nkf compatible mode, some fold  be‐
861              haviors change as follows.
862              (1) Default line_length is set to 60, and kinsoku value is 10.
863              (2) alpha numeric characters become gyoutou-kinsoku characters.
864
865   File control options
866       --filewise-detect --force-reset
867              Reset and re-detect input code set at the start of each file.
868
869       --linewise-detect
870              Reset and re-detect input code set at the start of each line.
871
872
873   Compatibility options
874       --nkf-compat
875              interpret  following options as nkf compatible manners.  -l, -d,
876              -c, -x, -X, -w and -W works as nkf2.x  -f  and  -F  behavior  is
877              changed  as  shown above.  -T, -i, -o is not supported.  Most of
878              other nkf options and switches also work  like  nkf,  except  in
879              case of error.
880
881       --skf-compat
882              interpret following options as skf-native manners.
883
884       -r     nkf-compatible  rot.  Works only with --nkf-compat mode. Allowed
885              input encodings are limited to JIS/Shift_JIS/EUC.
886
887       -h[123]--hiragana--katakana--katakana-hiragana
888              -h, -h1 and --hiragana converts all kanas to hiragana.  -h2  and
889              --katakana   convert   all   kanas   to   katakana.    -h3   and
890              --katakana-hiragana swap katakana and hiragana.
891
892       --nkf-help
893              show option difference/compatibility between skf and nkf.
894
895       --in-place[=SUF]--overwrite[=SUF]
896              replace specified file with converted codeset. overwrite retains
897              file  create  time  stamp.   If a suffix is given, the suffix is
898              added to output file name and input file is not removed.
899
900
901   Lightweight language specific options
902       skf plugin for lightweight language has subset of options. More specif‐
903       ically,  file  input/output  related  options(-b, -u, --overwrite --in-
904       place, --filewise-detect --linewise-detect --show-filename  --suppress-
905       filename)  and  UTF-16 output is disabled(except ruby or python3).  The
906       calling methods differ depending on LWL, but each extension has two pa‐
907       rameters,  a  option string and a string to convert.  From 2.1.15, ruby
908       is not supported.
909
910
911     Python-3.x specific options
912       Since native codeset representation in python3.x is  `ATIN-1/UCS2/UCS4,
913       skf  behaves differently with output codeset option.  If output codeset
914       is either ASCII, UTF-16 or UTF-32(in wide mode),  skf  returns  Unicode
915       object,  and  for  all  other codesets skf returns binary array object.
916       Following options change this  behavior.   codesets  assumed  as  ascii
917       (UTF-7) and MIME encoded strings are returned as strings.
918
919       --py-out-binary
920              use  psuede  unicode  binary  array stream to output. BOM is en‐
921              abled.
922
923       --py-out-string
924              use binary array object on ASCII, UTF-16/32 output. This is  de‐
925              fault.
926              skf  accepts  either a binary array or an unicode object for in‐
927              put.  BOM is disabled.
928
929
930   Misc. Control options
931       --disable-space-convert --enable-space-convert
932              skf converts an ideographic space into two ascii  spaces.   Dis‐
933              able  option  disables, and enable option enables this behavior.
934              Default is disabled.
935
936       --html-sanitize
937              Convert several characters in HTML document to entity  reference
938              expression. Specifically, "!#$&%()/<>:;?´ are escaped by entity-
939              references.
940
941       --filewise-detect --force-reset
942              If multiple input files are given, detect input codeset for each
943              file.
944
945       --linewise-detect
946              Detect  input  code line-wise. Note this option weakens code de‐
947              tect correctness.
948
949       --reset
950              Reset all flags specified by extended  controls  and  enviroment
951              variables.
952
953       --inquiry --guess
954              skf  detects code and output detect result to stdout. No filter‐
955              ing output is performed. If  multiple  input  files  are  given,
956              --show-filename is automatically enabled.
957
958       --hard-inquiry
959              Similar  as  inquiry,  but  reports both code and an end-of-line
960              character.
961
962       --suppress-filename
963              When inquiry(--inquiry) is on, this option  disables  file  name
964              output.  This option overrides --show-filename.
965
966       --show-filename
967              When  inquiry(--inquiry)  is on, this option adds each file name
968              to output.
969
970       --invis-strip
971              Delete all escape sequences not belonging to ISO-2022  code  ex‐
972              tension.  This is intended to replace invisstrip command bundled
973              in inews package.
974
975       -I     Warn if input has unassigned code points.
976
977       -v     print version information and exit.
978
979       --help print brief help and exit.
980
981       --show-supported-codeset
982              Display supported codesets  (input)  and  exit.  Both  canonical
983              names  (left  side) and detailed names are shown. This canonical
984              name can be used as MIME charset  and  also  as  ic-option  code
985              specification.
986
987       --show-supported-charset
988              Display supported character sets (output) and exit. Both canoni‐
989              cal names and detailed names are shown. Some charsets with  spe‐
990              cial  treatments (i.e.  meaningless as set-g* parameters) inten‐
991              sionally lacks addressable cnames.
992
993

FILES

995       /usr/(local/)share/skf/lib/   (Unices)
996
997       /Program Files/skf/share/lib (MS Windows)
998              These directories are where external codeset  conversion  tables
999              go.   The  location that current skf assumes are shown by -h op‐
1000              tion.
1001
1002

AUTHOR

1004       skf is written by Seiji Kaneko (efialtes@osdn.jp) based  on  idea  from
1005       nkf written by Itaru Ichikawa (ichikawa@flab.fujitsu.co.jp) X 0213 code
1006       table is derived from work of earthian@tama.or.jp.  Some  codeset  map‐
1007       ping is derived from various sources. Detailed origin is shown in copy‐
1008       right document included in  this  distribution.   Unicode  Database  is
1009       copyrighted(c) by Unicode(R), Inc.
1010
1011

ACKNOWLEDGEMENT

1013       skf   is   inspired   by   works   or  requests  by  shinoda@cs.titech,
1014       kato@cs.titech, uematsu@cs.titech, void@global ohta@ricoh,  Hinata(HKE)
1015       Ashizawa(CRL)  Kunimoto(SDL) Oohara(Univ of Kyoto), Jokagi(elf2000) and
1016       Naruse (at osdn.jp). Thanks.
1017
1018

BUGS AND LIMITATIONS

1020       1. skf can handle mixed coding with some limitations. However, code de‐
1021       tection  tends  to  fail for mixed code, and giving explicit input code
1022       set is strongly encouraged, if codeset is known beforehand.
1023       In case of need, --linewise-detect option may help, but code  detecting
1024       will more likely fail.
1025
1026       2. skf implements ISO-2022 with following exceptions.
1027        i)  GL 0x20 is always space. Even when 96-character codeset is invoked
1028       to GL.
1029        ii) Sequences for setting codes to C1 and C2 are ignored.
1030        iii) If unknown sequence is given to G0, G0 is set to ascii, and lock‐
1031       ing/single  shift  is  cleared. Unknown sequece call to set to G1-G3 is
1032       just ignored.
1033        Private charset is also not supported and is ignored.
1034        iv) Sequences for 96 character multibyte coding is ignored (Currently,
1035       no codeset is registered).
1036        v) Calling UTF-8, UTF-16 coding system from iso-2022 is supported, and
1037       returns to previous coding system by standard return.
1038        Callings and returns to/from other coding schemes are ignored.
1039        vi) For supporting some of cellular phone glyphs, several private (not
1040       registered) codesets are defined in skf, and can be called by appropri‐
1041       ate sequences.
1042
1043       3. Error output coding is controlled by LOCALE environment variables in
1044       UN*X system. skf doesn't take care of situations like stdout and stderr
1045       are redirecting into a same stream. Such case should be handled by user
1046       side.
1047
1048       4. skf converts KEIS/JIS X 0213 code using CJK-extension B area and CJK
1049       compatibility area. For this reason, X 0213  and  KEIS  convert  result
1050       varies depending on --use-compat and --limit-to-ucs2 switches.
1051
1052       5.  JIS X 0207:1979 is not supported. JIS X 0211:1987 is designed to be
1053       supported (i.e. common terminal control sequence will be  transparently
1054       passed to output).
1055
1056       6.  Even if unbuffer option(-u) is specified, some code-translation re‐
1057       lated bufferings are still performed (in MIME, kana, VIQR etc.).
1058
1059       7. skf-1.9x or later recognizes and handles languages in iso639-1(alpha
1060       2).  iso639-2 is not supported as a valid language set.
1061
1062       8. Unicode IVS is not supported. Sequences are just discarded.
1063
1064       9.  skf-1.9x  or  later does not retain Macintosh RLO-ordered character
1065       property.  Codesets with this kind of codes are not supported.
1066
1067       10. CNS11643 4th, 5th, 6th planes are not supported.
1068
1069       11. In python 3 extension, a detected codeset by inquiry for input uni‐
1070       code strings are always UTF-32be.
1071
1072       12.   In   lightweight  language  extension  except  ruby  and  python,
1073       UCS2/UTF-16 are not supported.
1074
1075
1076

Notes

1078       1. Extended options are changed extensively since skf-1.9. Some archaic
1079       options (eg. -B, -@ and -r) have been deleted from this version.
1080
1081       2.  skf  is originally forked project from nkf, but doesn't contain any
1082       nkf codes now.  Copyright notice is retained by honor.
1083
1084       3. From version 1.9, default Japanese character set assumed by skf  has
1085       changed  to JIS X 0208:1990 with Microsoft Japanese Windows gaiji (i.e.
1086       CP932).
1087
1088       4. Code autodetection is not perfect by design. If it has failed to de‐
1089       tect  input  code  properly, please give input code information explic‐
1090       itly.
1091
1092       5. Some ligatures in Unicode, cp932 gaiji and KEIS83 are converted  us‐
1093       ing  JIS X 0124 and other convention.  During this conversion, its byte
1094       length is not preserved.
1095
1096       6. skf is intended to  pass  ANSI  compatible  terminal  control  codes
1097       transparently, but this is not guaranteed.
1098
1099       7.  nkf's  -i and -o options works only in nkf-compat mode. It is obso‐
1100       lete option in 1.97, and valid only when iso-2022-jp and  without  con‐
1101       sidering output codeset specifications.
1102
1103       8.  For unconverted character, skf uses geta and undefined character as
1104       --use-replace-char option.  If  output  codeset  doesn't  contain  geta
1105       code, skf prefers 'black square character', then uses '.' respectively.
1106
1107       9. There are some undocumented options. These options should be consid‐
1108       ered as highly experimental.
1109
1110       10. In lineend_thru mode and using folding, skf remembers order  of  cr
1111       and  lf appears in stream, and use that order.  For this design, if skf
1112       needs to output line-end character before any  line-end  character  ap‐
1113       pears in input stream, input order may not be preserved.
1114
1115       11. NKF-compatibility
1116       1)  --prefix,  some  --fb's  and --no-best-fit-chars are not supported.
1117       Error behaviors are not compatible.
1118       2) -r option and --decode=rot is different. See  each  option  descrip‐
1119       tion.
1120       3)  MSDOS  (and  -T), --exec-in and --exec-out are not supported. -O is
1121       supported.
1122       4) MIME decoding/encoding handling behaviors differ in various ways.
1123       5) lineend conversion acts differently. Results may  not  be  same  for
1124       text with multiple lineend characters.
1125       6)  detected codeset name is not compatible with nkf. --help and --ver‐
1126       sion return different results.
1127       7) in-place and overwrite suffix with * is not supported.
1128
1129       12. Conversion to NYUUKAN GAIJI is as follows
1130       1)   Kanji   codes   in   JIS   X0208(1997),   JIS   X0212(1990),   JIS
1131       X0213(2004/2012),
1132        Houmusho-kokuji No.582 beppyou No.1 are sent to output as it is.
1133       2)  Kanji codes in beppyou No.4-2 leftmost columns are converted to the
1134       first
1135        priority character in the table. If the second priority characters ap‐
1136       pear,
1137        the codes are sent to output as it is.
1138       3)  Other  kanji codes are converted as undefined codes. See above con‐
1139       version method.  Non-kanji codes (latins, glyphs etc.) are sent to out‐
1140       put as it is.
1141
1142       13. ARIB B24 compatibility
1143       1) Input only. ARIB B24 output is not supported.
1144       2) Neither international encoding nor X0213 extension are supported.
1145       3)  Macro  define  sequences are suppressed. These sequences are recog‐
1146       nized and
1147        discarded.
1148       4) Without specifying arib codeset, skf treats Arib-defined codepage as
1149       follows.
1150         i) private codepage are supported. ascii/jis x-0201 0x5f is not modi‐
1151       fied.
1152         ii) macro define/invoke and rpc invoke does not work.  These  charac‐
1153       ters are
1154           discarded.
1155
1156       14. option mnemonic table for -v option
1157       AA:  aware  ascii-art  in code detection DBG: Debugging feature enabled
1158       F64: Large file enabled(default) NE: Environment variable handling dis‐
1159       abled  NFJ:  suppress fj-newsgroup convension NLS: Native language mes‐
1160       saging enabled(default) NN: detect skf is called under nkf  name  OMST:
1161       Have  mkstemp PEP: Python3 PEP393 support enabled SG: Slow getc enabled
1162       SPNC: Space convert disabled.  STT: Use Static codeset  table  UFY_A_J:
1163       Unify  JIS  x-0201 to ascii UID/EUID: Have UID/EUID.  ULM: UCS2 generic
1164       latin support.  WIN32: Windows environment.
1165
1166
1167       15. feature mnemonic table for -v option
1168       98: old-nec-compat (ESC-H/ESC-K) feature enabled ACE: punycode  support
1169       enabled  ARIB:  ARIB  B24  support enabled FD: fold feature enabled KD:
1170       KEIS90 auto-detect enabled KX: KEIS90  extra  region  enabled  MIMEREC:
1171       Mime  recovery  feature  anabled  NFD:  Unic*de  decompose enabled ROT:
1172       rot13/47 support enabled UK: UTF16 hankaku-kana disabled UN: UTF16 nor‐
1173       malize  enabled  ONKF: nkf old -i, -o option enabled LE_*: lineend han‐
1174       dling.
1175
1176

Notice

1178       Unicode(TM) is a trademark of Unicode, Inc. Microsoft and  Windows  are
1179       registered  trademarks  of Microsoft corporation. Macintosh is a regis‐
1180       tered trademark of Apple Inc. Vodafone is a trademark of Vodafone  K.K.
1181       Other  names  and  terms  may be trademarks or registered trademarks of
1182       their respective owner.  Trademark symbol (TM) may be omitted  in  this
1183       manual page.
1184
1185
1186
1187
1188                                  10/Aug/2018                           SKF(1)
Impressum