1SKF(1)                      General Commands Manual                     SKF(1)
2
3
4

NAME

6       skf - simple Kanji Filter (v2.1)
7

SYNOPSIS

9       skf [-EIJKNQRSXZbehjknqrsuvxz] [ long_format_options ] [infiles..]
10

DESCRIPTION

12       skf  is  a  yet another i18n capable kanji-filter, designed for reading
13       various CJK-coded files on the Net.  skf converts input kanji texts  or
14       streams  into  a  character  stream using designated codeset and output
15       them to standard output. Specifically, skf is designed to be  a  versa‐
16       tile  filter  to read documents in various code sets, and does not pro‐
17       vide features not related to code conversion.
18
19       Like nkf, skf automatically recognizes an input file code when it is  a
20       kind  of ISO-2022 compliant code, and also detects EUC-variant codes if
21       input file is Japanese text without X 0201 kanas.   skf  2.1  can  read
22       various iso-2022 compliant character sets, including JIS Kanji codes (X
23       0208, X 0212 and X 0213), EUC encoding (euc-jp (with X  0213  support),
24       euc-cn,  euc-kr  and  euc-tw),  ISO  Europian latins (ISO-8859-1 to 11,
25       13/14/15/16) and many regional character sets.  skf can also read  some
26       non-iso2022   compliant   sets,  including  Microsoft  Shift-JIS  code,
27       KOI-8-R/U, GB2312 (HZ), big5, VISCII(rfc1456,  include  VIQR),  Unicode
28       standard  (UCS2/UTF-16,  UTF7  and  UTF8),  some of MS codesets (cp1250
29       etc.) and some other vendor specific codes (KEIS83, JEF etc).
30
31       Supported output character sets of skf  are  more  limited,  but  still
32       include  X  0208/X 0212/X 0213 JIS, X 0201 JIS, ASCII, Microsoft Shift-
33       JIS, EUC-jp/-kr/-cn, HZ, iso-2022-jp/kr, big5, VISCII and Unicode.
34
35       skf also provides some basic decoding features for some  common  encod‐
36       ings including MIME, Punycode and URI codepoint.  Unicode decomposition
37       feature is also supported since 1.96.
38
39       As noted above, skf is designed to convert input text into some kind of
40       human-readable  forms under a local environment (i.e. codeset), and has
41       several extra conversion features like GNU recode type  folding.   Such
42       conversions  include  Windows/Macintosh specific code swaps and old-new
43       jis glyph changes, html-format/TeX format conversion and variant unifi‐
44       cations.
45
46       skf also can be compiled as an extension of some lightweight languages.
47       See README.txt for details.
48
49       If one or more file names are given, skf read the files and output con‐
50       verted  stream  to  stdout.  If no file names are given, input is taken
51       from stdin and output is also stdout.  OPTIONS are taken from  environ‐
52       ment  variables  SKFENV,  skfenv and command line, respectively in this
53       order. Environment variables are not used when  skf  is  running  as  a
54       priviledged  user.   skf  does not use LOCALE-related environment vari‐
55       ables for conversions, but output  error  messages  are  controlled  by
56       given LOCALES.
57

CODESET OPTIONS

59       skf  is  written  from scratch, and inherits no code from nkf. However,
60       skf is intended to be a drop-in replacement for  nkf(v1.4)  and  has  a
61       similar commonly-used nkf option set.
62       skf  2.1  recognizes  following  options.  Defaults  are all off if not
63       explicitly specified.
64
65   buffering control
66       -b     use buffered output. This is default.
67
68       -u     use unbuffered output.  Code detection feature is disabled  when
69              this option is on.
70
71   Input/Output codeset options
72       --ic=  input_code_set
73              specify  input  codeset  is input_code_set.  Possible candidates
74              are shown below.
75
76       --oc=  output_code_set
77              specify output codeset is output_code_set.  Possible  candidates
78              are shown below. Default codeset in distribution package is euc-
79              jp, but depends on compile option. Default codeset is  shown  by
80              ´skf -h´.
81
82     Supported codeset
83       skf  recognizes  following  codesets  as an input/output codeset. These
84       codeset names are case insensitive,  and  minus  ('-')  and  underscore
85       ('_')  is ignored.  Note that iso-2022 escape-based input codeset (reg‐
86       istered to IANA) is recoginized automatically,  even  when  non-iso2022
87       codeset  (except  Unicode  and B-Right/V) is specified.  o in in-column
88       means named codeset can be specified as input and x means named codeset
89       is not for input. output-column is same except it is for output.
90
91       in out  name            description
92       o  o    iso8859-1       ascii + iso-8859-1 (latin-1)
93       o  o    iso8859-2       ascii + iso-8859-2 (latin-2)
94       o  o    iso8859-3       ascii + iso-8859-3 (latin-3)
95       o  o    iso8859-4       ascii + iso-8859-4 (latin-4)
96       o  o    iso8859-5       ascii + iso-8859-5 (Cyrillic)
97       o  o    iso8859-6       ascii + iso-8859-6 (Arabic)
98       o  o    iso8859-7       ascii + iso-8859-7 (Greek)
99       o  o    iso8859-8       ascii + iso-8859-8 (Hebrew)
100       o  o    iso8859-9       ascii + iso-8859-9 (latin-5)
101       o  o    iso8859-10      ascii + iso-8859-10 (latin-6)
102       o  o    iso8859-11      ascii + iso-8859-11 (Thai)
103       o  o    iso8859-13      ascii + iso-8859-13 (Baltic Rim)
104       o  o    iso8859-14      ascii + iso-8859-14 (Celtic)
105       o  o    iso8859-15      ascii + iso-8859-15 (Latin-9)
106       o  o    iso8859-16      ascii + iso-8859-16
107       o  o    koi-8r          koi-8r (Russian)
108       o  o    cp1251          Cyrillic latin MS cp1251
109       o  o    jis             iso-2022-jp (rfc1496 7bit JIS)
110       o  o    iso-2022-jp-x0213 iso-2022-jp-3 (JIS X 0213:2000)
111                               a.k.a. jis-x0213
112       o  o    jis-x0213-strict iso-2022-jp-3-strict
113       o  o    iso-2022-jp-2004 iso-2022-jp-2004(JIS X 0213:2004)
114                               a.k.a. jis-x0213-2004
115       o  o    oldjis          iso-2022-jp-1978(JIS X 0208:1978)
116       o  o    cp50220         Microsoft codepage 50220
117       o  o    cp50221         Microsoft codepage 50221
118       o  o    cp50222         Microsoft codepage 50222
119       o  o    euc-jp          EUC-encoded JIS X 0208:1997
120       o  o    euc-x0213       EUC-encoded JIS X 0213:2000
121       o  o    euc-jis-2004    EUC-encoded JIS X 0213:2004
122       o  o    cp51932         EUC-encoded Microsoft codepage 932
123       o  o    euc-kr          EUC-encoded KS X 1001 Korian
124       o  o    euc7-kr         7bit EUC-encoded KS X 1001 Korian
125       o  o    uhc             Unified hangle (Windows cp949)
126       o  o    johab           KS X 1001-johab Korian
127       o  o    euc-cn          EUC-encoded GB2312 Chinese
128       o  o    euc7-cn         7bit EUC-encoded GB2312 Chinese
129       o  o    hz              HZ-encoded GB2312 Chinese
130       o  o    euc-tw          EUC-encoded CNS 11643 Chinese
131       o  o    gb12345         EUC-encoded GB12345 Chinese
132       o  o    gbk             GB2312 Extension(cp936) Chinese
133       o  o    gb18030         GB18030 chinese
134       o  o    big5            BIG5 (with Eten extension + EURO)
135       o  o    cp950           BIG5 (Microsoft cp950 + EURO)
136       o  o    big5-hkscs      BIG5 with HKSCS
137       o  o    big5-2003       BIG5-2003
138       o  o    big5-uao        BIG5-Unicode at On
139       o  o    sjis            Shift-jis (Microsoft cp943)
140       o  o    shiftjis-x0213  Shiftjis-encoded JIS X 0213:2000
141       o  o    shiftjis-2004   Shiftjis-encoded JIS X 0213:2004
142       o  o    sjis-docomo Shiftjis-encoded with NTT Docomo emoticons.
143       o  o    sjis-au          Shiftjis-encoded with AU emoticons.
144       o  o    sjis-softbank    Shiftjis-encoded with SoftBank emoticons.
145       o  o    oldsjis         Shift-jis (JIS X 0208:1978)
146       o  o    cp932           Shift-jis-encoded MS cp932
147       o  o    cp932w          Shift-jis-encoded MS cp932 with
148                               MS compatibility
149       o  o    viscii          VISCII (rfc1456) Vietnamise
150       o  o    viqr            VISCII (rfc1456-VIQR) Vietnamise
151       o  o    keis            Hitachi KEIS83/90
152       o  x    jef             Fujitsu JEF (basic support only)
153       o  x    ibm930          IBM EBCDIC DBCS Japanese
154       o  x    ibm931          IBM EBCDIC DBCS Japanese w.latin
155       o  x    ibm933          IBM EBCDIC DBCS Korian
156       o  x    ibm935          IBM EBCDIC DBCS Simpl. Chinese
157       o  x    ibm937          IBM EBCDIC DBCS Trad. Chinese
158       o  o    unicode         Unicode(TM) UTF-16LE
159       o  o    unicodefffe     Unicode(TM) UTF-16BE
160       o  o    utf7            Unicode(TM) UTF-7
161       o  o    utf8            Unicode(TM) UTF-8
162       o  o  utf7-imap         IMAP modified Unicode(TM) UTF-7 (RFC2060)
163       o  o  mutf8             Java modified Unicode(TM) UTF-8
164       o  o  cesu8             CESU-8 (Unicode Technical Report #26)
165       x   o     nyukan-utf-8  nyukan-utf-16 Nyukan-moji(Japanese nyukoku-kan‐
166       rikyoku gaiji). Encoding is utf-8 and utf-16 respectively.
167       o  x    arib-b24        ARIB B24 8-bit JIS-based
168       o  x    arib-b24-sj     ARIB B24 8-bit SJIS-based
169       x  o    transparent     Transparent mode (see below)
170
171
172     Codeset explanations
173       iso-8859-*
174              When specified as output, G0 = GL  is  ascii  and  G1  =  GR  is
175              iso-8859-*. 8bit encoding is used.
176
177       iso-2022-jp, jis
178              Encoding  is  iso-2022-jp-2  (RFC1496).  G0  =  GL is JIS X 0201
179              roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1  and  G3  is
180              JIS X 0212:1990 Supplementary Kanji.
181
182       jis-x0213, iso-2022-jp-3
183              Encoding  is  iso-2022-jp-3  (JIS X 0213:2000 based). G0 = GL is
184              JIS X 0201 roman, For output, G1 = GR is JIS X 0201 kana, G2  is
185              iso-8859-1 and G3 is JIS X 0213 plane2 Kanji.
186
187       jis-x0213-strict
188              Encoding  is subset of iso-2022-jp-3-strict (uses Plane 1 only).
189              For output, G0 = GL is JIS X 0201 roman, G1 = GR is JIS  X  0201
190              kana,  G2 is iso-8859-1 and G3 is not set. Output code using JIS
191              X 0208 whenever possible. JIS X 0213 input is automatically rec‐
192              ognized.
193
194       jis-x0213-2004, iso-2022-jp-2004
195              Encoding  is iso-2022-jp-2003:2004. For output, G0 = GL is JIS X
196              0201 roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and  G3
197              is JIS X 0213 plane2 Kanji.
198
199       oldjis
200              Encoding  is iso-2022-jp using old JIS X 0208:1978).  G0 = GL is
201              JIS X 0201 roman, G1 = GR is JIS X 0201 kana, G2  is  iso-8859-1
202              and G3 is JIS X 0212 Supplementary Kanji.
203
204       euc-jp, euc
205              Encoding is 8-bit EUC using JIS X 0208:1997 character set.  G0 =
206              GL is ascii, G1 = GR is JIS X 0208, G2 is JIS X 0201 kana and G3
207              is JIS X 0212 Supplementary Kanji.
208
209       euc-x0213, euc-jis-2003
210              Encoding  is 8-bit EUC-based JIS X 0213:2000.  G0 = GL is ascii,
211              G1 = GR is X 0213:2000 plane 1, G2 is iso-8859-1 and G3 is JIS X
212              0213:2000 plane2 Kanji.
213
214       euc-jis-2004
215              Encoding  is  8-bit EUC-based JIS X0213:2004.  G0 = GL is ascii,
216              G1 = GR is X0213:2004 plane 1, G2 is iso-8859-1 and  G3  is  JIS
217              x0213:2004 plane2 Kanji.
218
219       euc-kr
220              Encoding is 8-bit EUC using KS X 1001 Wansung character set.  G0
221              = GR is KS X1003, G1 = GR is KS X1001, G2 and G3 is not set.
222
223       euc7-kr iso-2022-kr
224              Encoding is iso-2022-kr (rfc1557): 7-bit EUC  using  KS  X  1001
225              Wansung  character set.  G0 = GR is KS X1003, G1 is KS X1001, G2
226              and G3 is not set.
227
228       euc-cn
229              Encoding is 8-bit EUC using GB 2312 simplified chinese character
230              set.  G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3 is not set.
231
232       euc7-cn
233              Encoding is 7-bit EUC using GB 2312 simplified chinese character
234              set.  G0 = GR is ASCII, G1 is GB2312, G2 and G3 is not set.
235
236       hz
237              Encoding is HZ encoded  (rfc1842)  GB  2312  simplified  chinese
238              character  set.   G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3
239              is not set.
240
241       euc-tw
242              Encoding is EUC encoded CNS11643  Plane1/2  traditional  chinese
243              character set. Subset of iso-2022-cn.  G0 = GR is ASCII, G1 = GR
244              is CNS11643 plane 1, G2 is CNS11643 plane 2 and G3 is not set.
245
246       gb12345
247              Encoding is 8-bit EUC using GB 12345 (GBF)  traditional  chinese
248              character  set.  G0 = GR is ASCII, G1 = GR is GB12345, G2 and G3
249              is not set.
250
251       gbk, cp936
252              Encoding is GBK simplified chinese character set.  G0  =  GR  is
253              ASCII and G1 = GR is GBK. G2 and G3 is not set.
254
255       gb18030 (experimental)
256              Encoding  is GB18030 (ibm-1392, Windows cp54936) chinese charac‐
257              ter set.  Uses ASCII as latin part.
258
259       big5
260              Encoding is Big5 traditional chinese  character  set  with  ETen
261              extension.  Include Euro mapping.  Uses ASCII as latin part.
262
263       cp950
264              Encoding  is  Microsoft cp950-Big5 traditional chinese character
265              set.  Uses ASCII as latin part.
266
267       big5-hkscs (experimental)
268              Encoding is cp950-Big5 traditional chinese  character  set  with
269              HKSCS extension.  Uses ASCII as latin part.
270
271       big5-2003 (experimental)
272              Encoding  is  Big5-2003  Taiwanese  standard traditional chinese
273              character set.  Uses ASCII as latin part.
274
275       big5-uao (experimental)
276              Encoding is Big5-UAO (http://uao.cpatch.org) traditional chinese
277              character set.  Uses ASCII as latin part.
278
279       VISCII (experimental)
280              Vietnamise VISCII (rfc1456) character set. Not TCVN-5712.
281
282       VIQR (experimental)
283              Vietnamise VISCII character set with VIQR encoding(rfc1456).
284
285       sjis
286              Encoding  is  Shift-encoded JIS X 0208:1997 character set.  Note
287              that this is not cp932. Uses JIS X 0201 latin as latin(GL) part.
288
289       sjis-x0213, shift_jis-2000
290              Encoding is Shift-encoded JIS using JIS  X  0213:2000  character
291              set.
292
293       sjis-x0213-2004, shift_jis-2004
294              Encoding  is  Shift-encoded  JIS using JIS X 0213:2004 character
295              set.  10 newly defined character added, but Unicode  mapping  is
296              same  as  JIS  X  0213:2000.  Uses JIS X 0201 latin as latin(GL)
297              part.
298
299       sjis-cellular (experimental)
300              Encoding is Shift-encoded JIS X 0208:1997 character set with NTT
301              Docomo/Vodafone(SoftBank)  cellular phone glyph mapping.  Output
302              is not supported.
303
304       cp932 cp932w
305              Encoding is Microsoft SJIS cp932 with NEC/IBM gaiji area,  based
306              on Windows XP mapping. Uses ASCII as latin(GL) part.  --use-com‐
307              pat and --use-ms-compat is automatically enabled.   cp932w  pro‐
308              vides further WideCharToMultiByte compatibility.
309
310       cp51932
311              Encoding is Microsoft EUC-based cp51932 with NEC/IBM gaiji area,
312              based on Windows XP mapping.  Uses ASCII as G0 and  JIS  X  0201
313              kana  as  EUC  G2  part.   G3  is not used for output, and JIS X
314              0212:2000 as input.  --use-compat and --use-ms-compat  is  auto‐
315              matically enabled.
316
317       cp50220, cp50221, cp50222
318              Encoding  is  Microsoft JIS-based cp50220, cp50221, cp50222 with
319              NEC/IBM gaiji area, based on Windows XP mapping.  For input, skf
320              accepts cp50220, 50221 and 50222.  Note that this codeset is NOT
321              compatible with iso-2022.  Uses ASCII as default character  set.
322              --use-compat and --use-ms-compat is automatically enabled.
323
324       oldsjis
325              Encoding  is  Microsoft  SJIS  (JIS X 0208:1978 a.k.a. old JIS).
326              Uses JIS X 0201 latin as latin(GL) part.
327
328       johab
329              Encoding is KS X1001(Johab) character set. Uses KS  X1003  latin
330              as latin(GL) part.
331
332       uhc
333              Encoding  is  UHC (cp949) character set. Uses ASCII as latin(GL)
334              part.
335
336       unicode, unicodefffe, utf16, utf16le
337              Encoding is Unicode UTF-16 (v11.0). Input/Output  default  byte-
338              endian  is little for unicode and big for unicodefffe, and input
339              byte order mark is recognized. utf16  and  unicodefffe  is  big-
340              endian.  utf16le  and unicode is little endian.  Output includes
341              endian mark by default unless  --disable-endian-mark  is  speci‐
342              fied.  Output  range is within UTF-32 with surrogate pair unless
343              --limit-to-ucs2 is specified.
344              Note that ucs2 is  not  supported  within  lightweight  language
345              extension  in both in and output, because of SWIG's passing data
346              structure limitation. Specify to ucs2 will generate error.
347
348       utf8
349              Encoding  is  UTF-8  encoded  Unicode  (v11.0).  Output  doesn't
350              include  byte  order  mark unless --enable-endian-mark is speci‐
351              fied.  Output range is within UTF-32 unless  --limit-to-ucs2  is
352              specified.   By default, CESU-8 is not accepted as input. Option
353              --enable-cesu8 enables CESU-8 input for utf-8 converter.  CESU-8
354              output is not supported.  For UTF-8, endian mark (BOM) is always
355              ignored.
356
357       utf7
358              Encoding is UTF-7 encoded Unicode (v11.0). Input/output range is
359              limited  to UTF-16, and value above U+10000 is regarded as unde‐
360              fined.  BOM is always ignored for input, and never used for out‐
361              put.
362
363       utf7-imap
364              Modified  utf-7  for IMAP protocol described in RFC2060.  BOM is
365              always ignored for input, and never used for output.
366
367       mutf8
368              Modified utf-8 for Java language. CESU-8 plus  U-0000  encoding.
369              BOM is always ignored for input, and never used for output.
370
371       cesu-8
372              Modified  utf-8  described in unicode technical report #26.  BOM
373              is always ignored for input, and never used for output.
374
375       keis (experimental)
376              Encoding is Hitachi KEIS83/90. Output range is limited to EBCDIK
377              and JIS X 0208 area.
378
379       jef (experimental)
380              Encoding  is  Fujitsu  JEF.  Input only. Only basic part is sup‐
381              ported.
382
383       ibm930 (experimental)
384              Encoding is IBM DBCS Japanese with EBCDIC Kana
385
386       ibm931 (experimental)
387              Encoding is IBM DBCS Japanese with EBCDIC latin (ibm037)
388
389       ibm933 (experimental)
390              Encoding is IBM DBCS Korian with EBCDIC Wansung character set
391
392       ibm935 (experimental)
393              Encoding is IBM DBCS Simplified Chinese with EBCDIC Chinese
394
395       ibm937 (experimental)
396              Encoding is IBM DBCS Traditional Chinese with EBCDIC Chinese
397
398       koi8r
399              Russian KOI-8R code.
400
401       cp1250
402              Central Europian latin Microsoft cp1250 code
403
404       cp1251
405              Eastern Europian cyrillic Microsoft cp1251 code
406
407       arib-b24 arib-b24-sj
408              ARIB B24 code defined in ATIB-STD-B24 vol.1 part.2  chapt.  7.3.
409              b24 is 8-bit jis based, and b24-sj is sjis based.
410
411       nyukan-utf-8 nyukan-utf-16
412              Normalized  Unicode  UTF-8/UTF-16 based on Japanese law ministry
413              kokuji No. 582.
414
415       transparent
416              Transparent mode. Various code control features, include folding
417              and line end code conversion, is also ignored.
418
419
420     Shortcuts
421       -j     same as --oc=jis
422
423       -s     same as --oc=sjis
424
425       -e     same as --oc=euc-jp
426
427       -q     same as --oc=unicode
428
429       -z     same as --oc=sjis
430
431       -E     same as --ic=euc-jp. Assume input codeset is EUC-JP.
432
433       -J     same as --ic=jis. Assume input codeset is iso-2022-jp.
434
435       -S     same as --ic=sjis. Assume input codeset is shift JIS
436
437       -Q     same as --ic=utf-16 --input-little-endian.
438
439       -Z     same as --ic=utf8.
440
441
442     ISO-2022 Specific controls
443       Replaces  G0-3 after setting up according to specified input codeset by
444       assigned character set with this option. Note that this doesn't  change
445       any  codeset  properties  of  the  original  codeset, like language and
446       encoding.
447
448       --set-g0=`charset name'
449              Predefines specified code set to plane 0 (G0). Also set to GL at
450              initial state.
451
452       --set-g1=`charset name'
453              Predefines  specified  code set to right plane (G1). Also set to
454              GR at initial state.
455
456       --set-g2=`charset name'
457              Predefines specified code set to right plane (G2).
458
459       --set-g3=`charset name'
460              Predefines specified code set to right plane (G3).
461
462
463       Supported `char_set' is as follows. 'o' means the codeset can be speci‐
464       fied to set to the plane. 'x' means you can't. For unicode family code‐
465       sets, this option is ignored. For other  non-iso2022  categories,  this
466       option is not supported, and result is unpredictable.
467
468
469       g0 g1 g2 g3    codeset name   description
470       o  o  o  o     ascii          ANSI X3.4 ASCII
471       o  o  o  o     x0201          JIS X 0201 (latin part)
472       x  o  o  o     iso8859-1      ISO 8859-1 latin
473       x  o  o  o     iso8859-2      ISO 8859-2 latin
474       x  o  o  o     iso8859-3      ISO 8859-3 latin
475       x  o  o  o     iso8859-4      ISO 8859-4 latin
476       x  o  o  o     iso8859-5      ISO 8859-5 Cyrillic
477       x  o  o  o     iso8859-6      ISO 8859-6 Arabic
478       x  o  o  o     iso8859-7      ISO 8859-7 Greek-latin
479       x  o  o  o     iso8859-8      ISO 8859-8 Hebrew
480       x  o  o  o     iso8859-9      ISO 8859-9 latin
481       x  o  o  o     iso8859-10     ISO 8859-10 latin
482       x  o  o  o     iso8859-11     ISO 8859-11 Thai
483       x  o  o  o     iso8859-13     ISO 8859-13 latin
484       x  o  o  o     iso8859-14     ISO 8859-14 latin
485       x  o  o  o     iso8859-15     ISO 8859-15 latin
486       x  o  o  o     iso8859-16     ISO 8859-16 latin
487       x  o  o  o     tcvn5712       TCVN 5712 (Vietnamese)
488       x  o  o  o     ecma94         ECMA 94 Cyrillic (KOI-8e)
489       o  o  o  o     x0212          JIS X 0212:1990
490       o  o  o  o     x0208          JIS X 0208:1997
491       o  o  o  o     x0213          JIS X 0213 Plane 1:2000
492       o  o  o  o     x0213-2        JIS X 0213 Plane 2:2000
493       o  o  o  o     x0213n         JIS X 0213 Plane 1:2004
494       o  o  o  o     gb2312         Simplified Chinese GB2312
495       o  o  o  o     gb1988         Chinese GB1988(latin)
496       o  o  o  o     gb12345        Traditional Chinese GB12345
497       o  o  o  o     ksx1003        Korian KS X 1003(latin)
498       o  o  o  o     ksx1001        Korian KS X 1001
499       x  o  o  o     koi8-r         Cyrillic KOI-8R
500       x  o  o  o     koi8-u         Ukrainean Cyrillic KOI-8U
501       o  o  o  o     cns11643-1   Traditional Chinese CNS11643-1
502       x  o  o  o     viscii-r       RFC1496 VISCII (right plane)
503       o  o  o  o     viscii-l       RFC1496 VISCII (left plane)
504       x  o  o  o     cp437          Microsoft cp437 (US latin)
505       x  o  o  o     cp737          Microsoft cp737
506       x  o  o  o     cp775          Microsoft cp775
507       x  o  o  o     cp850          Microsoft cp850
508       x  o  o  o     cp852          Microsoft cp852
509       x  o  o  o     cp855          Microsoft cp855
510       x  o  o  o     cp857          Microsoft cp857
511       x  o  o  o     cp860          Microsoft cp860
512       x  o  o  o     cp861          Microsoft cp861
513       x  o  o  o     cp862          Microsoft cp862
514       x  o  o  o     cp863          Microsoft cp863
515       x  o  o  o     cp864          Microsoft cp864
516       x  o  o  o     cp865          Microsoft cp865
517       x  o  o  o     cp866          Microsoft cp866
518       x  o  o  o     cp869          Microsoft cp869
519       x  o  o  o     cp874          Microsoft cp874
520       x  o  o  o     cp932          Microsoft cp932 (Japanese)
521       x  o  o  o     cp1250     Microsoft cp1250(Central Europe)
522       x  o  o  o     cp1251         Microsoft cp1251 (Cyrillic)
523       x  o  o  o     cp1252         Microsoft cp1252 (Latin-1)
524       x  o  o  o     cp1253         Microsoft cp1253 (Greek)
525       x  o  o  o     cp1254         Microsoft cp1254 (Turkish)
526       x  o  o  o     cp1255         Microsoft cp1255
527       x  o  o  o     cp1256         Microsoft cp1256
528       x  o  o  o     cp1257         Microsoft cp1257
529       x  o  o  o     cp1258         Microsoft cp1258
530
531       --euc-protect-g1
532              In  EUC  input  mode, suppress sequences to set a charset to G1.
533              Such sequences are discarded.
534
535       --add-annon
536              Add announcer for JIS X 0208:1997 to X 0208 designate  sequence.
537              This option works only with iso-2022-based output.
538
539       --input-detect-jis78
540              Distinguish JIS X 0208:1978 codeset and JIS X 0208:1997 codeset.
541              By default, these two charsets are regarded as X 0208:1997. This
542              option is valid only when input encoding is JIS (iso-2022-jp).
543
544
545     JIS X 0212(Supplement Kanji code) Support
546       --x0212-enable
547              skf  by default does not output JIS X 0212 code in JIS/EUC mode.
548              This option enables use of JIS X 0212 part.  Non-Japanese  code,
549              Shift_JIS  variants,  Unicode or KEIS output ignore this option.
550              Note that this option is supported for  backward  compatibility.
551              It may not be supported in future versions.
552
553
554     Unicode coding specific control options
555       skf-2.10 is conformed on Unicode 11.0 specification.
556
557       --use-compat --suppress-compat
558              By --suppress-compat, skf substitutes characters in unicode com‐
559              patibility planes (U+F900 - U+FFFD) to appropriate characters in
560              non-compatibility planes. If this substitution is enabled, these
561              characters is converted to variants or undefined.  By --use-com‐
562              pat,  skf  outputs  character in this area as it is.  Default is
563              --use-compat.  Several codesets controls this as codeset feature
564              (i.e. Use compatibility planes). See codeset section.
565
566       --use-ms-compat
567              When output is Unicode, make Unicode map to be Microsoft windows
568              compatible). This only changes conversion for  some  symbols  in
569              JIS-Kanji,  and  adding  --use-compat  option is recommended for
570              roundtrip conversion. If you need more strict compatibility, try
571              cp932w for input codeset.
572
573       --use-cde-compat
574              When  output  is  Unicode, make translation CDE standard codeset
575              compatible.
576
577       --little-endian
578              When output is UTF-16le/be, use little endian byte-order.
579
580       --big-endian
581              When output is UTF-16le/be, use big endian byte-order.
582
583       --disable-endian-mark --enable-endian-mark
584              When output is UTF-16 or UTF-8, do not use/use byte order  mark‐
585              ing.  To  make UTF-16N, use this option with --little-endian. By
586              default, BOM is enabled for UTF-16 and disabled for UTF-8.
587
588       --input-little-endian
589              When input is UTF-16le/be, assume input is little  endian  byte-
590              ordered.
591
592       --input-big-endian
593              When  input  is  UTF-16le/be,  assume  input is big endian byte-
594              ordered.
595
596       --endian-protect
597              Do not use endian mark in input stream. Endian mark is just dis‐
598              carded.  This is off by default.
599
600       --limit-to-ucs2
601              Do  not  use > 0x10000 area code in Unicode (i.e. limits code to
602              BMP area).  This option doesn't limit  internal  code  range  in
603              skf. This is off by default.
604
605       --disable-cjk-extension
606              Treat  CJK  extension  A/B areas as undefined. This is off (i.e.
607              these areas are enabled) by default.
608
609       --enable-cesu8
610              Enable CESU-8 input in utf-8  codeset.  Ignored  for  any  other
611              codesets.
612
613       --non-strict-utf8
614              Enable broken (decodable but not obeying specs.) utf-8 input. If
615              you need this option, proceeds with extra care.
616
617       --enable-nfd-decomposition --disable-nfd-decomposition
618              Enable/Disable Unicode Normalized decomposition. Default is dis‐
619              abled.
620
621       --enable-nfda-decomposition --disable-nfda-decomposition
622              Enable/Disable  Apple-compatible  Unicode  Normalized decomposi‐
623              tion.  Default is disabled.
624
625       --oldcell-to-emoticon
626              Convert old cell-phone gaiji area to  emoticon.  Supported:  NTT
627              Docomo/AU emoticons. A reverse mapping is not supported.
628
629
630
631     Miscellanious codeset related options
632       --old-nec-compat
633              Enable  old  NEC  kanji sequence (ESC-K,H). Needs compile option
634              --enable-oldnec at configuration.
635
636       --no-utf7
637              Assume input codeset  is  *NOT*  UTF-7  encoded  Unicode.   This
638              option disables input utf7 testing.
639
640       --no-kana
641              Assume input codeset does *NOT* include JIS X 0201 kana.
642
643       --input-limit-to-jp
644              Tell  detection  mechanism  that  input is some kind of Japanese
645              codeset.
646
647
648   OUTPUT Conversions options
649       skf is intended to output stream to stdout,  buf  nkf-compatible  file-
650       encoding change option is also provided.
651
652       --overwrite[=SUFFIX] --in-place[=SUFFIX]
653              converts  encoding  of  file(s)  specified as input. --overwrite
654              preserves file change date. If SUFFIX parameter is added,  input
655              file is back-up'ed with a name appended this SUFFIX.
656
657       skf has various features to fix output files appropriate in local envi‐
658       ronment.  Most of these are controlled  by  extended  control  switches
659       described in this section.
660
661       --use-g0-ascii
662              set  G0(=GL) for output encoding to ASCII, ignoring codeset des‐
663              ignation.
664
665     X-0201 Kana/latin conversions
666       skf by default converts X-0201 kanas to X-0208 kanas. To output  X-0201
667       kana  as it is, use one of following options. When output is designated
668       to EUC or SJIS, these three options enable X-0201 kana output  by  ways
669       provided  by  each encoding. When Unicode output is specified, (equiv.)
670       kana part output is controlled by --use-compat, not following switches.
671       Valid only when output codeset is NOT Unicode family.
672
673       --kana-jis7
674              use SI/SO locking shift sequence to designate X-0201 kana.  This
675              switch is valid for jis, jis-x0213 and  cp50220  (i.e.  cp50221)
676              encoding.  For other codesets, this option is ignored.
677
678       --kana-jis8
679              output X-0201 kana using 8-bit code right plane.  This switch is
680              valid for jis and jis-x0213 encoding.  For other  codeset,  this
681              option is ignored.
682
683       --kana-esci --kana-call
684              use  ESC-(-I to designate X-0201 kana.  This switch is valid for
685              jis, jis-x0213 and cp50220 (i.e. cp50222) encoding.   For  other
686              codeset, this option is ignored.
687
688       --kana-enable
689              If  output  is  EUC-JP  or cp51932, use X-0201 kana with G2.  If
690              SJIS output, it is same as --kana-jis8.  When JIS output, it  is
691              same as --kana-call.
692
693       --use-iso8859-1
694              Enable iso-8859-1 output. Iso-8859-1 is invoked to G1 and set to
695              GR plane.
696
697
698     URI/TeX format conversion feature options
699       With Unicode(tm) family output  codings,  skf  output  non-ascii  latin
700       character  part  as  it is, but with other output codings, skf converts
701       these characters using following rules:
702
703       (1) If a code is defined in a specified output codeset, specified  code
704       point is used for output.
705       (2)  If  one  of  following html convert modes are enabled (i.e. --con‐
706       vert-html --convert-sgml) and the code is defined in html/sgml codeset,
707       it is converted to entity-reference or codepoint reference.
708       (3)  If tex convert mode enabled and the code is defined in tex expres‐
709       sion, it is converted to tex format.
710       (4) If the code is a kind of combined ligatures, it is shown by  a  set
711       of characters.
712       (5) A kind of replacement character is shown, with warning.
713
714       --convert-html --convert-sgml--convert-xml
715              Enable html convert mode. This mode is cleared by --reset. These
716              two options are synonyms, and are treated as same option.
717
718       --convert-html-decimal
719              Enable html  code-point  decimal  convert  mode.  This  mode  is
720              cleared by --reset.
721
722       --convert-html-hexadecimal
723              Enable  html  code-point  hexadecimal convert mode. This mode is
724              cleared by --reset.
725
726       --convert-tex
727              Enable TeX convert mode. This mode is cleared by --reset.
728
729       --convert-perl
730              Enable Perl5 literal convert  mode.  This  mode  is  cleared  by
731              --reset.
732
733       --convert-java
734              Enable  Java  literal  convert  mode.  This  mode  is cleared by
735              --reset.
736
737       --convert-python
738              Enable Python literal convert mode.  This  mode  is  cleared  by
739              --reset.
740
741       --use-replace-char
742              In Unicode, use unicode replacement chatacter (U+fffc) for unde‐
743              fined chatacter.
744
745
746 Extended Options
747   Encoding/Decoding control options
748       --decode=`encoding scheme'
749
750       --encode=`encoding scheme'
751              Specify an decoding/encoding scheme for input stream.  Supported
752              encoding  schemes  for  decoding  are  `hex',  'mime', 'mime_q',
753              'mime_b', 'uri', 'ace', 'hex_perc_encode', 'base64',  'qencode',
754              'rfc2231',  `rot'  and  'none'.  Each option means CAP hex-code,
755              mime, mime Q-encoding, mime B-encoding, uri character reference,
756              ACE  punycode, uri percent notation, base64, Q-encoding, rfc2231
757              and rot13/47 respectively. 'none' means no decode.
758              For encoding, 'hex', 'mime_b', 'mime_q', 'uri', 'ace', 'cap',
759               'hex_perc_encode', 'base64' and 'none'  are  supported.  EBCDIC
760              related  codesets  and  some already ascii-encoded codeset (e.g.
761              UTF-7) output with encoding is not supported.
762              Only one decode/encode option is valid, and  if  more  than  one
763              option  is  specified,  the  last one is used.  When one of mime
764              decodings is specified, base text is assumed to be EUC  encoding
765              unless  specified  otherwise.  Except  rot,  which assumes input
766              stream is Shift_JIS, EUC or iso-2022-jp, these encodings assumes
767              input  stream  is  ascii (as defined in RFC2045). Some encodings
768              may co-exist with encoding, but this is  not  guaranteed.  Espe‐
769              cially,  if input is UTF-16/UCS2 code, these encoding is ignored
770              in skf.
771
772       --mime-ms-compat
773              treat japanese generic codesets as Microsoft  cp932  compatible.
774              More  specifically,  with  this option skf treats iso-2022-jp as
775              cp50220, euc-jp as cp51932 and Shift_JIS as cp932w.  --mime-per‐
776              sistent  skf detects address-like strings and excludes them from
777              mime encoding.  This option disables such behavior.  Default  in
778              nkf-compatible mode.
779
780
781   Shortcut
782       -m     same as --decode=mime
783
784       -mB    same as --decode=mime_b
785
786       -mQ    same as --decode=qencode
787
788       -m0    same as --decode=none
789
790       -M     same as --encode=mime_b
791
792       -MB    same as --encode=base64
793
794       -MQ    same as --encode=qencode
795
796   End of line control options
797       --lineend-thru
798              Output  end-of-line code as it is. Also output ^Z code as it is.
799              This is default.
800
801       --lineend-cr --lineend-mac-Lm
802              Use CR as end-of-line code.  Also  delete  ^Z  code  from  input
803              stream.
804
805       --lineend-lf --lineend-unix-Lu
806              Use  LF  as  end-of-line  code.  Also  delete ^Z code from input
807              stream.
808
809       --lineend-crlf --lineend-windows-Lw
810              Use CR+LF as end-of-line code. Also delete ^Z  code  from  input
811              stream.   This  option doesn't preserve original order of cr and
812              lf.
813
814       --input-cr
815              Assume input stream uses CR as end-of-line code.
816
817       --input-lf
818              Assume input stream uses LF as end-of-line code.
819
820       --input-crlf
821              Assume input stream uses CR+LF as end-of-line code.
822
823       -F[line_length[-kinsoku]]
824
825       -f[line_length[-kinsoku]] -f[line_length[+kinsoku]]
826              Wrap input  lines  by  line_length  columns.  f  option  deletes
827              CR/LF's in input, and F option doesn't delete them. For Japanese
828              convension,   both   gyoutou-kinsoku(by    burasage-gumi)    and
829              gyoumatsu-kinsoku(by  oidasi-gumi)  is  supported. The burasage-
830              length is  controlled  by  kinsoku  option.  Default  value  for
831              line_length is 66, and must be < 1000. Default value for kinsoku
832              is 5, and must be <= 10. In 'f' option,  skf  autodetects  para‐
833              graph  and  retains some CR/LF. 2nd 'f' option format (with '+')
834              disables this behaviour.  In  nkf  compatible  mode,  some  fold
835              behaviors change as follows.
836              (1) Default line_length is set to 60, and kinsoku value is 10.
837              (2) alpha numeric characters become gyoutou-kinsoku characters.
838
839   File control options
840       --filewise-detect --force-reset
841              Reset and re-detect input code set at the start of each file.
842
843       --linewise-detect
844              Reset and re-detect input code set at the start of each line.
845
846
847   Compatibility options
848       --nkf-compat
849              interpret  following options as nkf compatible manners.  -l, -d,
850              -c, -x, -X, -w and -W works as nkf2.x  -f  and  -F  behavior  is
851              changed  as  shown above.  -T, -i, -o is not supported.  Most of
852              other nkf options and switches also work  like  nkf,  except  in
853              case of error.
854
855       --skf-compat
856              interpret following options as skf-native manners.
857
858       -r     nkf-compatible  rot.  Works only with --nkf-compat mode. Allowed
859              input encodings are limited to JIS/Shift_JIS/EUC.
860
861       -h[123]--hiragana--katakana--katakana-hiragana
862              -h, -h1 and --hiragana converts all kanas to hiragana.  -h2  and
863              --katakana   convert   all   kanas   to   katakana.    -h3   and
864              --katakana-hiragana swap katakana and hiragana.
865
866       --nkf-help
867              show option difference/compatibility between skf and nkf.
868
869       --in-place[=SUF]--overwrite[=SUF]
870              replace specified file with converted codeset. overwrite retains
871              file  create  time  stamp.   If a suffix is given, the suffix is
872              added to output file name and input file is not removed.
873
874
875   Lightweight language specific options
876       skf plugin for lightweight language has subset of options. More specif‐
877       ically,  file  input/output  related  options(-b, -u, --overwrite --in-
878       place, --filewise-detect --linewise-detect --show-filename  --suppress-
879       filename) and UTF-16 output is disabled(except ruby or python3).
880
881
882     Ruby-1.9.x/2.x specific options
883       Since  ruby  1.9,  ruby  uses  CCS  string handling. skf returns output
884       string with specified codeset. Following options override  this  behav‐
885       ior.
886
887       --rb-out-ascii8bit
888              returns string with ascii-8bit encoding.
889
890       --rb-out-string
891              returns string with specified encoding.
892
893     Python-3.x specific options
894       Since  native  codeset  representation  in  python3.x is UCS2/UCS4, skf
895       behaves differently with output codeset option. If  output  codeset  is
896       either  UTF-16 or UTF-32(in wide mode), skf returns Unicode object, and
897       for all other codesets  skf  returns  binary  array  object.  Following
898       options change this behavior.
899
900       --py-out-binary
901              use psuede unicode binary stream to output.
902
903       --py-out-string
904              use binary array object on UTF-16/32 output. BOM is enabled.
905              skf  accepts  either  a  binary  array  or an unicode object for
906              input.
907
908
909   Misc. Control options
910       --disable-space-convert --enable-space-convert
911              skf converts an ideographic space into two ascii  spaces.   Dis‐
912              able  option  disables, and enable option enables this behavior.
913              Default is disabled.
914
915       --html-sanitize
916              Convert several characters in HTML document to entity  reference
917              expression. Specifically, "!#$&%()/<>:;?´ are escaped by entity-
918              references.
919
920       --filewise-detect --force-reset
921              If multiple input files are given, detect input codeset for each
922              file.
923
924       --linewise-detect
925              Detect  input  code  line-wise.  Note  this  option weakens code
926              detect correctness.
927
928       --reset
929              Reset all flags specified by extended  controls  and  enviroment
930              variables.
931
932       --inquiry --guess
933              skf  detects code and output detect result to stdout. No filter‐
934              ing output is performed. If  multiple  input  files  are  given,
935              --show-filename is automatically enabled.
936
937       --hard-inquiry
938              Similar  as  inquiry,  but  reports both code and an end-of-line
939              character.
940
941       --suppress-filename
942              When inquiry(--inquiry) is on, this option  disables  file  name
943              output.  This option overrides --show-filename.
944
945       --show-filename
946              When  inquiry(--inquiry)  is on, this option adds each file name
947              to output.
948
949       --invis-strip
950              Delete all escape  sequences  not  belonging  to  ISO-2022  code
951              extension.  This  is intended to replace invisstrip command bun‐
952              dled in inews package.
953
954       -I     Warn if input has unassigned code points.
955
956       -v     print version information and exit.
957
958       --help print brief help and exit.
959
960       --show-supported-codeset
961              Display supported codesets  (input)  and  exit.  Both  canonical
962              names  (left  side) and detailed names are shown. This canonical
963              name can be used as MIME charset  and  also  as  ic-option  code
964              specification.
965
966       --show-supported-charset
967              Display supported character sets (output) and exit. Both canoni‐
968              cal names and detailed names are shown. Some charsets with  spe‐
969              cial  treatments (i.e.  meaningless as set-g* parameters) inten‐
970              sionally lacks addressable cnames.
971
972

FILES

974       /usr/(local/)share/skf/lib/   (Unices)
975
976       /Program Files/skf/share/lib (MS Windows)
977              These directories are where external codeset  conversion  tables
978              go.   The  location  that  current  skf  assumes are shown by -h
979              option.
980
981

AUTHOR

983       skf is written by Seiji Kaneko (efialtes@osdn.jp) based  on  idea  from
984       nkf written by Itaru Ichikawa (ichikawa@flab.fujitsu.co.jp) X 0213 code
985       table is derived from work of earthian@tama.or.jp.  Some  codeset  map‐
986       ping is derived from various sources. Detailed origin is shown in copy‐
987       right document included in this distribution.
988
989

ACKNOWLEDGEMENT

991       skf  is  inspired  by   works   or   requests   by   shinoda@cs.titech,
992       kato@cs.titech,  uematsu@cs.titech, void@global ohta@ricoh, Hinata(HKE)
993       Ashizawa(CRL) Kunimoto(SDL) Oohara(Univ of Kyoto), Jokagi(elf2000)  and
994       Naruse (at osdn.jp). Thanks.
995
996

BUGS AND LIMITATIONS

998       1.  skf  can  handle  mixed coding with some limitations. However, code
999       detection tends to fail for mixed code, and giving explicit input  code
1000       set is strongly encouraged, if codeset is known beforehand.
1001       In  case of need, --linewise-detect option may help, but code detecting
1002       will more likely fail.
1003
1004       2. skf implements ISO-2022 with following exceptions.
1005        i) GL 0x20 is always space. Even when 96-character codeset is  invoked
1006       to GL.
1007        ii) Sequences for setting codes to C1 and C2 are always ignored.
1008        iii) If unknown sequence is given to G0, G0 is set to ascii, and lock‐
1009       ing/single shift is cleared. Unknown sequece call to set  to  G1-G3  is
1010       just ignored.
1011        Private charset is also not supported and is ignored.
1012        iv) Sequences for 96 character multibyte coding is ignored (Currently,
1013       no codeset is registered).
1014        v) Calling UTF-8, UTF-16 coding system from iso-2022 is supported, and
1015       returns to previous coding system by standard return.
1016        Callings and returns to/from other coding schemes are ignored.
1017        vi) For supporting some of cellular phone glyphs, several private (not
1018       registered) codesets are defined in skf, and can be called by appropri‐
1019       ate sequences.
1020
1021       3. Error output coding is controlled by LOCALE environment variables in
1022       UN*X system. skf doesn't take care of situations like stdout and stderr
1023       are redirecting into a same stream. Such case should be handled by user
1024       side.
1025
1026       4. skf converts KEIS/JIS X 0213 code using CJK-extension B area and CJK
1027       compatibility  area.  For  this  reason, X 0213 and KEIS convert result
1028       varies depending on --use-compat and --limit-to-ucs2 switches.
1029
1030       5. JIS X 0207:1979 is not supported. JIS X 0211:1987 is designed to  be
1031       supported  (i.e. common terminal control sequence will be transparently
1032       passed to output).
1033
1034       6. Even if unbuffer  option(-u)  is  specified,  some  code-translation
1035       related bufferings are still performed (in MIME, kana, VIQR etc.).
1036
1037       7. skf-1.9x or later recognizes and handles languages in iso639-1(alpha
1038       2).  iso639-2 is not supported as a valid language set.
1039
1040       8. Unicode IVS is not supported. Sequences are just discarded.
1041
1042       9. skf-1.9x or later does not retain  Macintosh  RLO-ordered  character
1043       property.  Codesets with this kind of codes are not supported.
1044
1045

Notes

1047       1. Extended options are changed extensively since skf-1.9. Some archaic
1048       options (eg. -B, -@ and -r) have been deleted from this version.
1049
1050       2. skf is originally forked project from nkf, but doesn't  contain  any
1051       nkf codes now.  Copyright notice is retained by honor.
1052
1053       3.  From version 1.9, default Japanese character set assumed by skf has
1054       changed to JIS X 0208:1990 with Microsoft Japanese Windows gaiji  (i.e.
1055       CP932).
1056
1057       4.  Code  autodetection  is  not perfect by design. If it has failed to
1058       detect input code properly, please give input code information  explic‐
1059       itly.
1060
1061       5.  Some  ligatures  in  Unicode,  cp932 gaiji and KEIS83 are converted
1062       using JIS X 0124 and other convention.   During  this  conversion,  its
1063       byte length is not preserved.
1064
1065       6.  skf  is  intended  to  pass  ANSI compatible terminal control codes
1066       transparently, but this is not guaranteed.
1067
1068       7. nkf's -i and -o options works only in nkf-compat mode. It  is  obso‐
1069       lete  option  in 1.97, and valid only when iso-2022-jp and without con‐
1070       sidering output codeset specifications.
1071
1072       8. For unconverted character, skf uses geta and undefined character  as
1073       --use-replace-char  option.   If  output  codeset  doesn't contain geta
1074       code, skf prefers 'black square character', then uses '.' respectively.
1075
1076       9. There are some undocumented options. These options should be consid‐
1077       ered as highly experimental.
1078
1079       10.  In  lineend_thru mode and using folding, skf remembers order of cr
1080       and lf appears in stream, and use that order.  For this design, if  skf
1081       needs  to  output  line-end  character  before  any  line-end character
1082       appears in input stream, input order may not be preserved.
1083
1084       11. NKF-compatibility
1085       1) --prefix, some --fb's and --no-best-fit-chars are not supported.
1086       2) MSDOS (and -T), --exec-in and --exec-out are not supported.
1087       3) MIME decoding/encoding handling behaviors differ in various ways.
1088       4) lineend conversion acts differently. Results may  not  be  same  for
1089       some messy text.
1090       5)  -r  option  and --decode=rot is different. See each option descrip‐
1091       tion.
1092       6) detected codeset name is not compatible with nkf. --help and  --ver‐
1093       sion return different results.
1094       7) in-place and overwrite suffix with * is not supported.
1095
1096       12. Conversion to NYUUKAN GAIJI is as follows
1097       1)   Kanji   codes   in   JIS   X0208(1997),   JIS   X0212(1990),   JIS
1098       X0213(2004/2012),
1099        Houmusho-kokuji No.582 beppyou No.1 are sent to output as it is.
1100       2) Kanji codes in beppyou No.4-2 leftmost columns are converted to  the
1101       first
1102        priority  character  in  the  table. If the second priority characters
1103       appear,
1104        the codes are sent to output as it is.
1105       3) Other kanji codes are converted as undefined codes. See  above  con‐
1106       version method.  Non-kanji codes (latins, glyphs etc.) are sent to out‐
1107       put as it is.
1108
1109       13. ARIB B24 compatibility
1110       1) Input only. ARIB B24 output is not supported.
1111       2) Neither international encoding nor X0213 extension are supported.
1112       3) Macro define sequences are suppressed. These  sequences  are  recog‐
1113       nized and
1114        discarded.
1115       4) Without specifying arib codeset, skf treats Arib-defined codepage as
1116       follows.
1117         i) private codepage are supported. ascii/jis x-0201 0x5f is not modi‐
1118       fied.
1119         ii)  macro  define/invoke and rpc invoke does not work. These charac‐
1120       ters are
1121           discarded.
1122
1123

Notice

1125       Unicode(TM) is a trademark of Unicode, Inc. Microsoft and  Windows  are
1126       registered  trademarks  of Microsoft corporation. Macintosh is a regis‐
1127       tered trademark of Apple Inc. Vodafone is a trademark of Vodafone  K.K.
1128       Other  names  and  terms  may be trademarks or registered trademarks of
1129       their respective owner.  Trademark symbol (TM) may be omitted  in  this
1130       manual page.
1131
1132
1133
1134
1135                                  10/Aug/2018                           SKF(1)
Impressum