skf(1) - f36

1SKF(1)                      General Commands Manual                     SKF(1)
2
3
4

NAME

6       skf - simple Kanji Filter (v2.1)
7

SYNOPSIS

9       skf [-EIJKNQRSXZbehjknqrsuvxz] [ long_format_options ] [infiles..]
10

DESCRIPTION

12       skf  is  a  yet another i18n capable kanji-filter, designed for reading
13       various CJK-coded files on the Net.  skf converts input kanji texts  or
14       streams  into  a  character  stream using designated codeset and output
15       them to standard output. Specifically, skf is designed to be  a  versa‐
16       tile  filter  to read documents in various code sets, and does not pro‐
17       vide features not related to code conversion.
18
19       Like nkf, skf automatically recognizes an input file code when it is  a
20       kind  of ISO-2022 compliant code, and also detects EUC-variant codes if
21       input file is Japanese text without X 0201 kanas.   skf  2.1  can  read
22       various iso-2022 compliant character sets, including JIS Kanji codes (X
23       0208, X 0212 and X 0213), EUC encoding (euc-jp (with X  0213  support),
24       euc-cn,  euc-kr  and  euc-tw),  ISO  Europian latins (ISO-8859-1 to 11,
25       13/14/15/16) and many regional character sets.  skf can also read  some
26       non-iso2022   compliant   sets,  including  Microsoft  Shift-JIS  code,
27       KOI-8-R/U, GB2312 (HZ), big5, VISCII(rfc1456,  include  VIQR),  Unicode
28       standard  (UCS2/UTF-16,  UTF7  and  UTF8),  some of MS codesets (cp1250
29       etc.) and some other vendor specific codes (KEIS83, JEF etc).
30
31       Supported output character sets of skf  are  more  limited,  but  still
32       include  X  0208/X 0212/X 0213 JIS, X 0201 JIS, ASCII, Microsoft Shift-
33       JIS, EUC-jp/-kr/-cn, HZ, iso-2022-jp/kr, big5, VISCII and Unicode.
34
35       skf also provides some basic decoding features for some  common  encod‐
36       ings including MIME, Punycode and URI codepoint.  Unicode decomposition
37       feature is also supported since 1.96.
38
39       As noted above, skf is designed to convert input text into some kind of
40       human-readable  forms under a local environment (i.e. codeset), and has
41       several extra conversion features like GNU recode type  folding.   Such
42       conversions  include  Windows/Macintosh specific code swaps and old-new
43       jis glyph changes, html-format/TeX format conversion and variant unifi‐
44       cations.
45
46       skf also can be compiled as an extension of some lightweight languages.
47       See README.txt for details.
48
49       If one or more file names are given, skf read the files and output con‐
50       verted  stream  to  stdout.  If no file names are given, input is taken
51       from stdin and output is also stdout.  OPTIONS are taken from  environ‐
52       ment  variables  SKFENV,  skfenv and command line, respectively in this
53       order. Environment variables are not used when  skf  is  running  as  a
54       priviledged  user.   skf  does not use LOCALE-related environment vari‐
55       ables for conversions, but output  error  messages  are  controlled  by
56       given LOCALES.
57

CODESET OPTIONS

59       skf  is  written  from scratch, and inherits no code from nkf. However,
60       skf is intended to be a drop-in replacement for  nkf(v1.4)  and  has  a
61       similar commonly-used nkf option set.
62       skf  2.1  recognizes  following  options.  Defaults  are all off if not
63       explicitly specified.
64
65   buffering control
66       -b     use buffered output. This is default.
67
68       -u     use unbuffered output.  Code detection feature is disabled  when
69              this option is on.
70
71   Input/Output codeset options
72       --ic=  input_code_set
73              specify  input  codeset  is input_code_set.  Possible candidates
74              are shown below.
75
76       --oc=  output_code_set
77              specify output codeset is output_code_set.  Possible  candidates
78              are shown below. Default codeset in distribution package is euc-
79              jp, but depends on compile option. Default codeset is  shown  by
80              ´skf -h´.
81
82     Supported codeset
83       skf  recognizes  following  codesets  as an input/output codeset. These
84       codeset names are case insensitive,  and  minus  ('-')  and  underscore
85       ('_')  is ignored.  Note that iso-2022 escape-based input codeset (reg‐
86       istered to IANA) is recoginized automatically,  even  when  non-iso2022
87       codeset  (except  Unicode  and B-Right/V) is specified.  o in in-column
88       means named codeset can be specified as input and x means named codeset
89       is not for input. output-column is same except it is for output.
90
91       in out  name            description
92       o  o    iso8859-1       ascii + iso-8859-1 (latin-1)
93       o  o    iso8859-2       ascii + iso-8859-2 (latin-2)
94       o  o    iso8859-3       ascii + iso-8859-3 (latin-3)
95       o  o    iso8859-4       ascii + iso-8859-4 (latin-4)
96       o  o    iso8859-5       ascii + iso-8859-5 (Cyrillic)
97       o  o    iso8859-6       ascii + iso-8859-6 (Arabic)
98       o  o    iso8859-7       ascii + iso-8859-7 (Greek)
99       o  o    iso8859-8       ascii + iso-8859-8 (Hebrew)
100       o  o    iso8859-9       ascii + iso-8859-9 (latin-5)
101       o  o    iso8859-10      ascii + iso-8859-10 (latin-6)
102       o  o    iso8859-11      ascii + iso-8859-11 (Thai)
103       o  o    iso8859-13      ascii + iso-8859-13 (Baltic Rim)
104       o  o    iso8859-14      ascii + iso-8859-14 (Celtic)
105       o  o    iso8859-15      ascii + iso-8859-15 (Latin-9)
106       o  o    iso8859-16      ascii + iso-8859-16
107       o  o    koi-8r          koi-8r (Russian)
108       o  o    cp1251          Cyrillic latin MS cp1251
109       o  o    jis             iso-2022-jp (rfc1496 7bit JIS)
110       o  o    iso-2022-jp-x0213 iso-2022-jp-3 (JIS X 0213:2000)
111                               a.k.a. jis-x0213
112       o  o    jis-x0213-strict iso-2022-jp-3-strict
113       o  o    iso-2022-jp-2004 iso-2022-jp-2004(JIS X 0213:2004)
114                               a.k.a. jis-x0213-2004
115       o  o    oldjis          iso-2022-jp-1978(JIS X 0208:1978)
116       o  o    cp50220         Microsoft codepage 50220
117       o  o    cp50221         Microsoft codepage 50221
118       o  o    cp50222         Microsoft codepage 50222
119       o  o    euc-jp          EUC-encoded JIS X 0208:1997
120       o  o    euc-x0213       EUC-encoded JIS X 0213:2000
121       o  o    euc-jis-2004    EUC-encoded JIS X 0213:2004
122       o  o    cp51932         EUC-encoded Microsoft codepage 932
123       o  o    euc-kr          EUC-encoded KS X 1001 Korian
124       o  o    euc7-kr         7bit EUC-encoded KS X 1001 Korian
125       o  o    uhc             Unified hangle (Windows cp949)
126       o  o    johab           KS X 1001-johab Korian
127       o  o    euc-cn          EUC-encoded GB2312 Chinese
128       o  o    euc7-cn         7bit EUC-encoded GB2312 Chinese
129       o  o    hz              HZ-encoded GB2312 Chinese
130       o  o    euc-tw          EUC-encoded CNS 11643 Chinese
131       o  o    gb12345         EUC-encoded GB12345 Chinese
132       o  o    gbk             GB2312 Extension(cp936) Chinese
133       o  o    gb18030         GB18030 chinese
134       o  o    big5            BIG5 (with Eten extension + EURO)
135       o  o    cp950           BIG5 (Microsoft cp950 + EURO)
136       o  o    big5-hkscs      BIG5 with HKSCS
137       o  o    big5-2003       BIG5-2003
138       o  o    big5-uao        BIG5-Unicode at On
139       o  o    sjis            Shift-jis (Microsoft cp943)
140       o  o    shiftjis-x0213  Shiftjis-encoded JIS X 0213:2000
141       o  o    shiftjis-2004   Shiftjis-encoded JIS X 0213:2004
142       o  o    sjis-docomo Shiftjis-encoded with NTT Docomo emoticons.
143       o  o    sjis-au          Shiftjis-encoded with AU emoticons.
144       o  o    sjis-softbank    Shiftjis-encoded with SoftBank emoticons.
145       o  o    oldsjis         Shift-jis (JIS X 0208:1978)
146       o  o    cp932           Shift-jis-encoded MS cp932
147       o  o    cp932w          Shift-jis-encoded MS cp932 with
148                               MS compatibility
149       o  o    viscii          VISCII (rfc1456) Vietnamise
150       o  o    viqr            VISCII (rfc1456-VIQR) Vietnamise
151       o  o    keis            Hitachi KEIS83/90
152       o  x    jef             Fujitsu JEF (basic support only)
153       o  x    ibm930          IBM EBCDIC DBCS Japanese
154       o  x    ibm931          IBM EBCDIC DBCS Japanese w.latin
155       o  x    ibm933          IBM EBCDIC DBCS Korian
156       o  x    ibm935          IBM EBCDIC DBCS Simpl. Chinese
157       o  x    ibm937          IBM EBCDIC DBCS Trad. Chinese
158       o  o    unicode         Unicode(TM) UTF-16LE
159       o  o    unicodefffe     Unicode(TM) UTF-16BE
160       o  o    utf7            Unicode(TM) UTF-7
161       o  o    utf8            Unicode(TM) UTF-8
162       o  o  utf7-imap         IMAP modified Unicode(TM) UTF-7 (RFC2060)
163       o  o  mutf8             Java modified Unicode(TM) UTF-8
164       o  o  cesu8             CESU-8 (Unicode Technical Report #26)
165       x   o     nyukan-utf-8  nyukan-utf-16 Nyukan-moji(Japanese nyukoku-kan‐
166       rikyoku gaiji). Encoding is utf-8 and utf-16 respectively.
167       o  x    arib-b24        ARIB B24 8-bit JIS-based
168       o  x    arib-b24-sj     ARIB B24 8-bit SJIS-based
169       x  o    transparent     Transparent mode (see below)
170
171
172     Codeset explanations
173       iso-8859-*
174              When specified as output, G0 = GL  is  ascii  and  G1  =  GR  is
175              iso-8859-*. 8bit encoding is used.
176
177       iso-2022-jp, jis
178              Encoding  is  iso-2022-jp-2  (RFC1496).  G0  =  GL is JIS X 0201
179              roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1  and  G3  is
180              JIS X 0212:1990 Supplementary Kanji.
181
182       jis-x0213, iso-2022-jp-3
183              Encoding  is  iso-2022-jp-3  (JIS X 0213:2000 based). G0 = GL is
184              JIS X 0201 roman, For output, G1 = GR is JIS X 0201 kana, G2  is
185              iso-8859-1 and G3 is JIS X 0213 plane2 Kanji.
186
187       jis-x0213-strict
188              Encoding  is subset of iso-2022-jp-3-strict (uses Plane 1 only).
189              For output, G0 = GL is JIS X 0201 roman, G1 = GR is JIS  X  0201
190              kana,  G2 is iso-8859-1 and G3 is not set. Output code using JIS
191              X 0208 whenever possible. JIS X 0213 input is automatically rec‐
192              ognized.
193
194       jis-x0213-2004, iso-2022-jp-2004
195              Encoding  is iso-2022-jp-2003:2004. For output, G0 = GL is JIS X
196              0201 roman, G1 = GR is JIS X 0201 kana, G2 is iso-8859-1 and  G3
197              is JIS X 0213 plane2 Kanji.
198
199       oldjis
200              Encoding  is iso-2022-jp using old JIS X 0208:1978).  G0 = GL is
201              JIS X 0201 roman, G1 = GR is JIS X 0201 kana, G2  is  iso-8859-1
202              and G3 is JIS X 0212 Supplementary Kanji.
203
204       euc-jp, euc
205              Encoding is 8-bit EUC using JIS X 0208:1997 character set.  G0 =
206              GL is ascii, G1 = GR is JIS X 0208, G2 is JIS X 0201 kana and G3
207              is JIS X 0212 Supplementary Kanji.
208
209       euc-x0213, euc-jis-2003
210              Encoding  is 8-bit EUC-based JIS X 0213:2000.  G0 = GL is ascii,
211              G1 = GR is X 0213:2000 plane 1, G2 is iso-8859-1 and G3 is JIS X
212              0213:2000 plane2 Kanji.
213
214       euc-jis-2004
215              Encoding  is  8-bit EUC-based JIS X0213:2004.  G0 = GL is ascii,
216              G1 = GR is X0213:2004 plane 1, G2 is iso-8859-1 and  G3  is  JIS
217              x0213:2004 plane2 Kanji.
218
219       euc-kr
220              Encoding is 8-bit EUC using KS X 1001 Wansung character set.  G0
221              = GR is KS X1003, G1 = GR is KS X1001, G2 and G3 is not set.
222
223       euc7-kr iso-2022-kr
224              Encoding is iso-2022-kr (rfc1557): 7-bit EUC  using  KS  X  1001
225              Wansung  character set.  G0 = GR is KS X1003, G1 is KS X1001, G2
226              and G3 is not set.
227
228       euc-cn
229              Encoding is 8-bit EUC using GB 2312 simplified chinese character
230              set.  G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3 is not set.
231
232       euc7-cn
233              Encoding is 7-bit EUC using GB 2312 simplified chinese character
234              set.  G0 = GR is ASCII, G1 is GB2312, G2 and G3 is not set.
235
236       hz
237              Encoding is HZ encoded  (rfc1842)  GB  2312  simplified  chinese
238              character  set.   G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3
239              is not set.
240
241       euc-tw
242              Encoding is EUC encoded CNS11643  Plane1/2  traditional  chinese
243              character set. Subset of iso-2022-cn.  G0 = GR is ASCII, G1 = GR
244              is CNS11643 plane 1, G2 is CNS11643 plane 2 and G3 is not set.
245
246       gb12345
247              Encoding is 8-bit EUC using GB 12345 (GBF)  traditional  chinese
248              character  set.  G0 = GR is ASCII, G1 = GR is GB12345, G2 and G3
249              is not set.
250
251       gbk, cp936
252              Encoding is GBK simplified chinese character set.  G0  =  GR  is
253              ASCII and G1 = GR is GBK. G2 and G3 is not set.
254
255       gb18030 (experimental)
256              Encoding  is GB18030 (ibm-1392, Windows cp54936) chinese charac‐
257              ter set.  Uses ASCII as latin part.
258
259       big5
260              Encoding is Big5 traditional chinese  character  set  with  ETen
261              extension.  Include Euro mapping.  Uses ASCII as latin part.
262
263       cp950
264              Encoding  is  Microsoft cp950-Big5 traditional chinese character
265              set.  Uses ASCII as latin part.
266
267       big5-hkscs (experimental)
268              Encoding is cp950-Big5 traditional chinese  character  set  with
269              HKSCS extension.  Uses ASCII as latin part.
270
271       big5-2003 (experimental)
272              Encoding  is  Big5-2003  Taiwanese  standard traditional chinese
273              character set.  Uses ASCII as latin part.
274
275       big5-uao (experimental)
276              Encoding is Big5-UAO (http://uao.cpatch.org) traditional chinese
277              character set.  Uses ASCII as latin part.
278
279       VISCII (experimental)
280              Vietnamise VISCII (rfc1456) character set. Not TCVN-5712.
281
282       VIQR (experimental)
283              Vietnamise VISCII character set with VIQR encoding(rfc1456).
284
285       sjis
286              Encoding  is  Shift-encoded JIS X 0208:1997 character set.  Note
287              that this is not cp932. Uses JIS X 0201 latin as latin(GL) part.
288
289       sjis-x0213, shift_jis-2000
290              Encoding is Shift-encoded JIS using JIS  X  0213:2000  character
291              set.
292
293       sjis-x0213-2004, shift_jis-2004
294              Encoding  is  Shift-encoded  JIS using JIS X 0213:2004 character
295              set.  10 newly defined character added, but Unicode  mapping  is
296              same  as  JIS  X  0213:2000.  Uses JIS X 0201 latin as latin(GL)
297              part.
298
299       sjis-cellular (experimental)
300              Encoding is Shift-encoded JIS X 0208:1997 character set with NTT
301              Docomo/Vodafone(SoftBank)  cellular phone glyph mapping.  Output
302              is not supported.
303
304       cp932 cp932w
305              Encoding is Microsoft SJIS cp932 with NEC/IBM gaiji area,  based
306              on Windows XP mapping. Uses ASCII as latin(GL) part.  --use-com‐
307              pat and --use-ms-compat is automatically enabled.   cp932w  pro‐
308              vides further WideCharToMultiByte compatibility.
309
310       cp51932
311              Encoding is Microsoft EUC-based cp51932 with NEC/IBM gaiji area,
312              based on Windows XP mapping.  Uses ASCII as G0 and  JIS  X  0201
313              kana  as  EUC  G2  part.   G3  is not used for output, and JIS X
314              0212:2000 as input.  --use-compat and --use-ms-compat  is  auto‐
315              matically enabled.
316
317       cp50220, cp50221, cp50222
318              Encoding  is  Microsoft JIS-based cp50220, cp50221, cp50222 with
319              NEC/IBM gaiji area, based on Windows XP mapping.  For input, skf
320              accepts cp50220, 50221 and 50222.  Note that this codeset is NOT
321              compatible with iso-2022.  Uses ASCII as default character  set.
322              --use-compat and --use-ms-compat is automatically enabled.
323
324       oldsjis
325              Encoding  is  Microsoft  SJIS  (JIS X 0208:1978 a.k.a. old JIS).
326              Uses JIS X 0201 latin as latin(GL) part.
327
328       johab
329              Encoding is KS X1001(Johab) character set. Uses KS  X1003  latin
330              as latin(GL) part.
331
332       uhc
333              Encoding  is  UHC (cp949) character set. Uses ASCII as latin(GL)
334              part.
335
336       unicode, unicodefffe, utf16, utf16le
337              Encoding is Unicode UTF-16 (v11.0). Input/Output  default  byte-
338              endian  is little for unicode and big for unicodefffe, and input
339              byte order mark is recognized. utf16  and  unicodefffe  is  big-
340              endian.  utf16le  and unicode is little endian.  Output includes
341              endian mark by default unless  --disable-endian-mark  is  speci‐
342              fied.  Output  range is within UTF-32 with surrogate pair unless
343              --limit-to-ucs2 is specified.
344              Note that ucs2 is  not  supported  within  lightweight  language
345              extension  in both in and output, because of SWIG's passing data
346              structure limitation. Specify to ucs2 will generate error.
347
348       utf8
349              Encoding  is  UTF-8  encoded  Unicode  (v11.0).  Output  doesn't
350              include  byte  order  mark unless --enable-endian-mark is speci‐
351              fied.  Output range is within UTF-32 unless  --limit-to-ucs2  is
352              specified.   By default, CESU-8 is not accepted as input. Option
353              --enable-cesu8 enables CESU-8 input for utf-8 converter.  CESU-8
354              output is not supported.  For UTF-8, endian mark (BOM) is always
355              ignored.
356
357       utf7
358              Encoding is UTF-7 encoded Unicode (v11.0). Input/output range is
359              limited  to UTF-16, and value above U+10000 is regarded as unde‐
360              fined.  BOM is always ignored for input, and never used for out‐
361              put.
362
363       utf7-imap
364              Modified  utf-7  for IMAP protocol described in RFC2060.  BOM is
365              always ignored for input, and never used for output.
366
367       mutf8
368              Modified utf-8 for Java language. CESU-8 plus  U-0000  encoding.
369              BOM is always ignored for input, and never used for output.
370
371       cesu-8
372              Modified  utf-8  described in unicode technical report #26.  BOM
373              is always ignored for input, and never used for output.
374
375       keis (experimental)
376              Encoding is Hitachi KEIS83/90. Output range is limited to EBCDIK
377              and JIS X 0208 area.
378
379       jef (experimental)
380              Encoding  is  Fujitsu  JEF.  Input only. Only basic part is sup‐
381              ported.
382
383       ibm930 (experimental)
384              Encoding is IBM DBCS Japanese with EBCDIC Kana
385
386       ibm931 (experimental)
387              Encoding is IBM DBCS Japanese with EBCDIC latin (ibm037)
388
389       ibm933 (experimental)
390              Encoding is IBM DBCS Korian with EBCDIC Wansung character set
391
392       ibm935 (experimental)
393              Encoding is IBM DBCS Simplified Chinese with EBCDIC Chinese
394
395       ibm937 (experimental)
396              Encoding is IBM DBCS Traditional Chinese with EBCDIC Chinese
397
398       koi8r
399              Russian KOI-8R code.
400
401       cp1250
402              Central Europian latin Microsoft cp1250 code
403
404       cp1251
405              Eastern Europian cyrillic Microsoft cp1251 code
406
407       arib-b24 arib-b24-sj
408              ARIB B24 code defined in ATIB-STD-B24 vol.1 part.2  chapt.  7.3.
409              b24 is 8-bit jis based, and b24-sj is sjis based.
410
411       nyukan-utf-8 nyukan-utf-16
412              Normalized  Unicode  UTF-8/UTF-16 based on Japanese law ministry
413              kokuji No. 582.
414
415       transparent
416              Transparent mode. Various code control features, include folding
417              and line end code conversion, is also ignored.
418
419
420     Shortcuts
421       -j     same as --oc=jis
422
423       -s     same as --oc=sjis
424
425       -e     same as --oc=euc-jp
426
427       -q     same as --oc=unicode
428
429       -z     same as --oc=sjis
430
431       -E     same as --ic=euc-jp. Assume input codeset is EUC-JP.
432
433       -J     same as --ic=jis. Assume input codeset is iso-2022-jp.
434
435       -S     same as --ic=sjis. Assume input codeset is shift JIS
436
437       -Q     same as --ic=utf-16 --input-little-endian.
438
439       -Z     same as --ic=utf8.
440
441
442     ISO-2022 Specific controls
443       Replaces  G0-3 after setting up according to specified input codeset by
444       assigned character set with this option. Note that this doesn't  change
445       any  codeset  properties  of  the  original  codeset, like language and
446       encoding.
447
448       --set-g0=`charset name'
449              Predefines specified code set to plane 0 (G0). Also set to GL at
450              initial state.
451
452       --set-g1=`charset name'
453              Predefines  specified  code set to right plane (G1). Also set to
454              GR at initial state.
455
456       --set-g2=`charset name'
457              Predefines specified code set to right plane (G2).
458
459       --set-g3=`charset name'
460              Predefines specified code set to right plane (G3).
461
462
463       Supported `char_set' is as follows. 'o' means the codeset can be speci‐
464       fied to set to the plane. 'x' means you can't. For unicode family code‐
465       sets, this option is ignored. For other  non-iso2022  categories,  this
466       option is not supported, and result is unpredictable.
467
468
469       g0 g1 g2 g3    codeset name   description
470       o  o  o  o     ascii          ANSI X3.4 ASCII
471       o  o  o  o     x0201          JIS X 0201 (latin part)
472       x  o  o  o     iso8859-1      ISO 8859-1 latin
473       x  o  o  o     iso8859-2      ISO 8859-2 latin
474       x  o  o  o     iso8859-3      ISO 8859-3 latin
475       x  o  o  o     iso8859-4      ISO 8859-4 latin
476       x  o  o  o     iso8859-5      ISO 8859-5 Cyrillic
477       x  o  o  o     iso8859-6      ISO 8859-6 Arabic
478       x  o  o  o     iso8859-7      ISO 8859-7 Greek-latin
479       x  o  o  o     iso8859-8      ISO 8859-8 Hebrew
480       x  o  o  o     iso8859-9      ISO 8859-9 latin
481       x  o  o  o     iso8859-10     ISO 8859-10 latin
482       x  o  o  o     iso8859-11     ISO 8859-11 Thai
483       x  o  o  o     iso8859-13     ISO 8859-13 latin
484       x  o  o  o     iso8859-14     ISO 8859-14 latin
485       x  o  o  o     iso8859-15     ISO 8859-15 latin
486       x  o  o  o     iso8859-16     ISO 8859-16 latin
487       x  o  o  o     tcvn5712       TCVN 5712 (Vietnamese)
488       x  o  o  o     ecma94         ECMA 94 Cyrillic (KOI-8e)
489       o  o  o  o     x0212          JIS X 0212:1990
490       o  o  o  o     x0208          JIS X 0208:1997
491       o  o  o  o     x0213          JIS X 0213 Plane 1:2000
492       o  o  o  o     x0213-2        JIS X 0213 Plane 2:2000
493       o  o  o  o     x0213n         JIS X 0213 Plane 1:2004
494       o  o  o  o     gb2312         Simplified Chinese GB2312
495       o  o  o  o     gb1988         Chinese GB1988(latin)
496       o  o  o  o     gb12345        Traditional Chinese GB12345
497       o  o  o  o     ksx1003        Korian KS X 1003(latin)
498       o  o  o  o     ksx1001        Korian KS X 1001
499       x  o  o  o     koi8-r         Cyrillic KOI-8R
500       x  o  o  o     koi8-u         Ukrainean Cyrillic KOI-8U
501       o  o  o  o     cns11643-1   Traditional Chinese CNS11643-1
502       x  o  o  o     viscii-r       RFC1496 VISCII (right plane)
503       o  o  o  o     viscii-l       RFC1496 VISCII (left plane)
504       x  o  o  o     cp437          Microsoft cp437 (US latin)
505       x  o  o  o     cp737          Microsoft cp737
506       x  o  o  o     cp775          Microsoft cp775
507       x  o  o  o     cp850          Microsoft cp850
508       x  o  o  o     cp852          Microsoft cp852
509       x  o  o  o     cp855          Microsoft cp855
510       x  o  o  o     cp857          Microsoft cp857
511       x  o  o  o     cp860          Microsoft cp860
512       x  o  o  o     cp861          Microsoft cp861
513       x  o  o  o     cp862          Microsoft cp862
514       x  o  o  o     cp863          Microsoft cp863
515       x  o  o  o     cp864          Microsoft cp864
516       x  o  o  o     cp865          Microsoft cp865
517       x  o  o  o     cp866          Microsoft cp866
518       x  o  o  o     cp869          Microsoft cp869
519       x  o  o  o     cp874          Microsoft cp874
520       x  o  o  o     cp932          Microsoft cp932 (Japanese)
521       x  o  o  o     cp1250     Microsoft cp1250(Central Europe)
522       x  o  o  o     cp1251         Microsoft cp1251 (Cyrillic)
523       x  o  o  o     cp1252         Microsoft cp1252 (Latin-1)
524       x  o  o  o     cp1253         Microsoft cp1253 (Greek)
525       x  o  o  o     cp1254         Microsoft cp1254 (Turkish)
526       x  o  o  o     cp1255         Microsoft cp1255
527       x  o  o  o     cp1256         Microsoft cp1256
528       x  o  o  o     cp1257         Microsoft cp1257
529       x  o  o  o     cp1258         Microsoft cp1258
530
531       --euc-protect-g1
532              In  EUC  input  mode, suppress sequences to set a charset to G1.
533              Such sequences are discarded.
534
535       --add-annon
536              Add announcer for JIS X 0208:1997 to X 0208 designate  sequence.
537              This option works only with iso-2022-based output.
538
539       --input-detect-jis78
540              Distinguish JIS X 0208:1978 codeset and JIS X 0208:1997 codeset.
541              By default, these two charsets are regarded as X 0208:1997. This
542              option is valid only when input encoding is JIS (iso-2022-jp).
543
544
545     JIS X 0212(Supplement Kanji code) Support
546       --x0212-enable
547              skf  by default does not output JIS X 0212 code in JIS/EUC mode.
548              This option enables use of JIS X 0212 part.  Non-Japanese  code,
549              Shift_JIS  variants,  Unicode or KEIS output ignore this option.
550              Note that this option is supported for  backward  compatibility.
551              It may not be supported in future versions.
552
553
554     Unicode coding specific control options
555       skf-2.10 is conformed on Unicode 11.0 specification.
556
557       --use-compat --suppress-compat
558              By --suppress-compat, skf substitutes characters in unicode com‐
559              patibility planes (U+F900 - U+FFFD) to appropriate characters in
560              non-compatibility planes. If this substitution is enabled, these
561              characters is converted to variants or undefined.  By --use-com‐
562              pat,  skf  outputs  character in this area as it is.  Default is
563              --use-compat.  Several codesets controls this as codeset feature
564              (i.e. Use compatibility planes). See codeset section.
565
566       --use-ms-compat
567              When output is Unicode, make Unicode map to be Microsoft windows
568              compatible). This only changes conversion for  some  symbols  in
569              JIS-Kanji,  and  adding  --use-compat  option is recommended for
570              roundtrip conversion. If you need more strict compatibility, try
571              cp932w for input codeset.
572
573       --use-cde-compat
574              When  output  is  Unicode, make translation CDE standard codeset
575              compatible.
576
577       --little-endian
578              When output is UTF-16le/be, use little endian byte-order.
579
580       --big-endian
581              When output is UTF-16le/be, use big endian byte-order.
582
583       --disable-endian-mark --enable-endian-mark
584              When output is UTF-16 or UTF-8, do not use/use byte order  mark‐
585              ing.  To  make UTF-16N, use this option with --little-endian. By
586              default, BOM is enabled for UTF-16 and disabled for UTF-8.
587
588       --input-little-endian
589              When input is UTF-16le/be, assume input is little  endian  byte-
590              ordered.
591
592       --input-big-endian
593              When  input  is  UTF-16le/be,  assume  input is big endian byte-
594              ordered.
595
596       --endian-protect
597              Do not use endian mark in input stream. Endian mark is just dis‐
598              carded.  This is off by default.
599
600       --limit-to-ucs2
601              Do  not  use > 0x10000 area code in Unicode (i.e. limits code to
602              BMP area).  This option doesn't limit  internal  code  range  in
603              skf. This is off by default.
604
605       --disable-cjk-extension
606              Treat  CJK  extension  A/B areas as undefined. This is off (i.e.
607              these areas are enabled) by default.
608
609       --enable-cesu8
610              Enable CESU-8 input in utf-8  codeset.  Ignored  for  any  other
611              codesets.
612
613       --non-strict-utf8
614              Enable broken (decodable but not obeying specs.) utf-8 input. If
615              you need this option, proceeds with extra care.
616
617       --enable-nfd-decomposition --disable-nfd-decomposition
618              Enable/Disable Unicode Normalized decomposition. Default is dis‐
619              abled.
620
621       --enable-nfda-decomposition --disable-nfda-decomposition
622              Enable/Disable  Apple-compatible  Unicode  Normalized decomposi‐
623              tion.  Default is disabled.
624
625       --oldcell-to-emoticon
626              Convert old cell-phone gaiji area to  emoticon.  Supported:  NTT
627              Docomo/AU emoticons. A reverse mapping is not supported.
628
629       --fix-ms-radical-bug
630              mscvrt  bug  for  Windows  10  20H1 or later has an infamous bug
631              which convert some Kanji to Kanji radix. This  option  reconvert
632              radix  area  to  appropriate Kanjis.  This option is for Unicode
633              output.
634
635
636
637     Miscellanious codeset related options
638       --old-nec-compat
639              Enable old NEC kanji sequence (ESC-K,H).  Needs  compile  option
640              --enable-oldnec at configuration.
641
642       --no-utf7
643              Assume  input  codeset  is  *NOT*  UTF-7  encoded Unicode.  This
644              option disables input utf7 testing.
645
646       --no-kana
647              Assume input codeset does *NOT* include JIS X 0201 kana.
648
649       --input-limit-to-jp
650              Tell detection mechanism that input is  some  kind  of  Japanese
651              codeset.
652
653
654   OUTPUT Conversions options
655       skf  is  intended  to output stream to stdout, buf nkf-compatible file-
656       encoding change option is also provided.
657
658       --overwrite[=SUFFIX] --in-place[=SUFFIX]
659              converts encoding of file(s)  specified  as  input.  --overwrite
660              preserves  file change date. If SUFFIX parameter is added, input
661              file is back-up'ed with a name appended this SUFFIX.
662
663       skf has various features to fix output files appropriate in local envi‐
664       ronment.   Most  of  these  are controlled by extended control switches
665       described in this section.
666
667       --use-g0-ascii
668              set G0(=GL) for output encoding to ASCII, ignoring codeset  des‐
669              ignation.
670
671     X-0201 Kana/latin conversions
672       skf  by default converts X-0201 kanas to X-0208 kanas. To output X-0201
673       kana as it is, use one of following options. When output is  designated
674       to  EUC  or SJIS, these three options enable X-0201 kana output by ways
675       provided by each encoding. When Unicode output is  specified,  (equiv.)
676       kana part output is controlled by --use-compat, not following switches.
677       Valid only when output codeset is NOT Unicode family.
678
679       --kana-jis7
680              use SI/SO locking shift sequence to designate X-0201 kana.  This
681              switch  is  valid  for jis, jis-x0213 and cp50220 (i.e. cp50221)
682              encoding.  For other codesets, this option is ignored.
683
684       --kana-jis8
685              output X-0201 kana using 8-bit code right plane.  This switch is
686              valid  for  jis and jis-x0213 encoding.  For other codeset, this
687              option is ignored.
688
689       --kana-esci --kana-call
690              use ESC-(-I to designate X-0201 kana.  This switch is valid  for
691              jis,  jis-x0213  and cp50220 (i.e. cp50222) encoding.  For other
692              codeset, this option is ignored.
693
694       --kana-enable
695              If output is EUC-JP or cp51932, use X-0201  kana  with  G2.   If
696              SJIS  output, it is same as --kana-jis8.  When JIS output, it is
697              same as --kana-call.
698
699       --use-iso8859-1
700              Enable iso-8859-1 output. Iso-8859-1 is invoked to G1 and set to
701              GR plane.
702
703
704     URI/TeX format conversion feature options
705       With  Unicode(tm)  family  output  codings,  skf output non-ascii latin
706       character part as it is, but with other output  codings,  skf  converts
707       these characters using following rules:
708
709       (1)  If a code is defined in a specified output codeset, specified code
710       point is used for output.
711       (2) If one of following html convert modes  are  enabled  (i.e.  --con‐
712       vert-html --convert-sgml) and the code is defined in html/sgml codeset,
713       it is converted to entity-reference or codepoint reference.
714       (3) If tex convert mode enabled and the code is defined in tex  expres‐
715       sion, it is converted to tex format.
716       (4)  If  the code is a kind of combined ligatures, it is shown by a set
717       of characters.
718       (5) A kind of replacement character is shown, with warning.
719
720       --convert-html --convert-sgml--convert-xml
721              Enable html convert mode. This mode is cleared by --reset. These
722              two options are synonyms, and are treated as same option.
723
724       --convert-html-decimal
725              Enable  html  code-point  decimal  convert  mode.  This  mode is
726              cleared by --reset.
727
728       --convert-html-hexadecimal
729              Enable html code-point hexadecimal convert mode.  This  mode  is
730              cleared by --reset.
731
732       --convert-tex
733              Enable TeX convert mode. This mode is cleared by --reset.
734
735       --convert-perl
736              Enable  Perl5  literal  convert  mode.  This  mode is cleared by
737              --reset.
738
739       --convert-java
740              Enable Java literal  convert  mode.  This  mode  is  cleared  by
741              --reset.
742
743       --convert-python
744              Enable  Python  literal  convert  mode.  This mode is cleared by
745              --reset.
746
747       --use-replace-char
748              In Unicode, use unicode replacement chatacter (U+fffc) for unde‐
749              fined chatacter.
750
751
752 Extended Options
753   Encoding/Decoding control options
754       --decode=`encoding scheme'
755
756       --encode=`encoding scheme'
757              Specify an decoding/encoding scheme for input stream.  Supported
758              encoding schemes  for  decoding  are  `hex',  'mime',  'mime_q',
759              'mime_b',  'uri', 'ace', 'hex_perc_encode', 'base64', 'qencode',
760              'rfc2231', `rot' and 'none'.  Each option  means  CAP  hex-code,
761              mime, mime Q-encoding, mime B-encoding, uri character reference,
762              ACE punycode, uri percent notation, base64, Q-encoding,  rfc2231
763              and rot13/47 respectively. 'none' means no decode.
764              For encoding, 'hex', 'mime_b', 'mime_q', 'uri', 'ace', 'cap',
765               'hex_perc_encode',  'base64'  and  'none' are supported. EBCDIC
766              related codesets and some already  ascii-encoded  codeset  (e.g.
767              UTF-7) output with encoding is not supported.
768              Only  one  decode/encode  option  is valid, and if more than one
769              option is specified, the last one is used.   When  one  of  mime
770              decodings  is specified, base text is assumed to be EUC encoding
771              unless specified otherwise.  Except  rot,  which  assumes  input
772              stream is Shift_JIS, EUC or iso-2022-jp, these encodings assumes
773              input stream is ascii (as defined in  RFC2045).  Some  encodings
774              may  co-exist  with  encoding, but this is not guaranteed. Espe‐
775              cially, if input is UTF-16/UCS2 code, these encoding is  ignored
776              in skf.
777
778       --mime-ms-compat
779              treat  japanese  generic codesets as Microsoft cp932 compatible.
780              More specifically, with this option skf  treats  iso-2022-jp  as
781              cp50220, euc-jp as cp51932 and Shift_JIS as cp932w.  --mime-per‐
782              sistent skf detects address-like strings and excludes them  from
783              mime  encoding.   This option disables such behavior. Default in
784              nkf-compatible mode.
785
786
787   Shortcut
788       -m     same as --decode=mime
789
790       -mB    same as --decode=mime_b
791
792       -mQ    same as --decode=qencode
793
794       -m0    same as --decode=none
795
796       -M     same as --encode=mime_b
797
798       -MB    same as --encode=base64
799
800       -MQ    same as --encode=qencode
801
802   End of line control options
803       --lineend-thru
804              Output end-of-line code as it is. Also output ^Z code as it  is.
805              This is default.
806
807       --lineend-cr --lineend-mac-Lm
808              Use  CR  as  end-of-line  code.  Also  delete ^Z code from input
809              stream.
810
811       --lineend-lf --lineend-unix-Lu
812              Use LF as end-of-line code.  Also  delete  ^Z  code  from  input
813              stream.
814
815       --lineend-crlf --lineend-windows-Lw
816              Use  CR+LF  as  end-of-line code. Also delete ^Z code from input
817              stream.  This option doesn't preserve original order of  cr  and
818              lf.
819
820       --input-cr
821              Assume input stream uses CR as end-of-line code.
822
823       --input-lf
824              Assume input stream uses LF as end-of-line code.
825
826       --input-crlf
827              Assume input stream uses CR+LF as end-of-line code.
828
829       -F[line_length[-kinsoku]]
830
831       -f[line_length[-kinsoku]] -f[line_length[+kinsoku]]
832              Wrap  input  lines  by  line_length  columns.  f  option deletes
833              CR/LF's in input, and F option doesn't delete them. For Japanese
834              convension,    both    gyoutou-kinsoku(by   burasage-gumi)   and
835              gyoumatsu-kinsoku(by oidasi-gumi) is  supported.  The  burasage-
836              length  is  controlled  by  kinsoku  option.  Default  value for
837              line_length is 66, and must be < 1000. Default value for kinsoku
838              is  5,  and  must be <= 10. In 'f' option, skf autodetects para‐
839              graph and retains some CR/LF. 2nd 'f' option format  (with  '+')
840              disables  this  behaviour.   In  nkf  compatible mode, some fold
841              behaviors change as follows.
842              (1) Default line_length is set to 60, and kinsoku value is 10.
843              (2) alpha numeric characters become gyoutou-kinsoku characters.
844
845   File control options
846       --filewise-detect --force-reset
847              Reset and re-detect input code set at the start of each file.
848
849       --linewise-detect
850              Reset and re-detect input code set at the start of each line.
851
852
853   Compatibility options
854       --nkf-compat
855              interpret following options as nkf compatible manners.  -l,  -d,
856              -c,  -x,  -X,  -w  and  -W works as nkf2.x -f and -F behavior is
857              changed as shown above.  -T, -i, -o is not supported.   Most  of
858              other  nkf  options  and  switches also work like nkf, except in
859              case of error.
860
861       --skf-compat
862              interpret following options as skf-native manners.
863
864       -r     nkf-compatible rot. Works only with --nkf-compat  mode.  Allowed
865              input encodings are limited to JIS/Shift_JIS/EUC.
866
867       -h[123]--hiragana--katakana--katakana-hiragana
868              -h,  -h1 and --hiragana converts all kanas to hiragana.  -h2 and
869              --katakana   convert   all   kanas   to   katakana.    -h3   and
870              --katakana-hiragana swap katakana and hiragana.
871
872       --nkf-help
873              show option difference/compatibility between skf and nkf.
874
875       --in-place[=SUF]--overwrite[=SUF]
876              replace specified file with converted codeset. overwrite retains
877              file create time stamp.  If a suffix is  given,  the  suffix  is
878              added to output file name and input file is not removed.
879
880
881   Lightweight language specific options
882       skf plugin for lightweight language has subset of options. More specif‐
883       ically, file input/output related  options(-b,  -u,  --overwrite  --in-
884       place,  --filewise-detect --linewise-detect --show-filename --suppress-
885       filename) and UTF-16 output is disabled(except ruby or python3).
886
887
888     Ruby-1.9.x/2.x specific options
889       Since ruby 1.9, ruby uses  CCS  string  handling.  skf  returns  output
890       string  with  specified codeset. Following options override this behav‐
891       ior.
892
893       --rb-out-ascii8bit
894              returns string with ascii-8bit encoding.
895
896       --rb-out-string
897              returns string with specified encoding.
898
899     Python-3.x specific options
900       Since native codeset representation  in  python3.x  is  UCS2/UCS4,  skf
901       behaves  differently  with  output codeset option. If output codeset is
902       either UTF-16 or UTF-32(in wide mode), skf returns Unicode object,  and
903       for  all  other  codesets  skf  returns  binary array object. Following
904       options change this behavior.
905
906       --py-out-binary
907              use psuede unicode binary stream to output.
908
909       --py-out-string
910              use binary array object on UTF-16/32 output. BOM is enabled.
911              skf accepts either a binary  array  or  an  unicode  object  for
912              input.
913
914
915   Misc. Control options
916       --disable-space-convert --enable-space-convert
917              skf  converts  an ideographic space into two ascii spaces.  Dis‐
918              able option disables, and enable option enables  this  behavior.
919              Default is disabled.
920
921       --html-sanitize
922              Convert  several characters in HTML document to entity reference
923              expression. Specifically, "!#$&%()/<>:;?´ are escaped by entity-
924              references.
925
926       --filewise-detect --force-reset
927              If multiple input files are given, detect input codeset for each
928              file.
929
930       --linewise-detect
931              Detect input code  line-wise.  Note  this  option  weakens  code
932              detect correctness.
933
934       --reset
935              Reset  all  flags  specified by extended controls and enviroment
936              variables.
937
938       --inquiry --guess
939              skf detects code and output detect result to stdout. No  filter‐
940              ing  output  is  performed.  If  multiple input files are given,
941              --show-filename is automatically enabled.
942
943       --hard-inquiry
944              Similar as inquiry, but reports both  code  and  an  end-of-line
945              character.
946
947       --suppress-filename
948              When  inquiry(--inquiry)  is  on, this option disables file name
949              output.  This option overrides --show-filename.
950
951       --show-filename
952              When inquiry(--inquiry) is on, this option adds each  file  name
953              to output.
954
955       --invis-strip
956              Delete  all  escape  sequences  not  belonging  to ISO-2022 code
957              extension. This is intended to replace invisstrip  command  bun‐
958              dled in inews package.
959
960       -I     Warn if input has unassigned code points.
961
962       -v     print version information and exit.
963
964       --help print brief help and exit.
965
966       --show-supported-codeset
967              Display  supported  codesets  (input)  and  exit. Both canonical
968              names (left side) and detailed names are shown.  This  canonical
969              name  can  be  used  as  MIME charset and also as ic-option code
970              specification.
971
972       --show-supported-charset
973              Display supported character sets (output) and exit. Both canoni‐
974              cal  names and detailed names are shown. Some charsets with spe‐
975              cial treatments (i.e.  meaningless as set-g* parameters)  inten‐
976              sionally lacks addressable cnames.
977
978

FILES

980       /usr/(local/)share/skf/lib/   (Unices)
981
982       /Program Files/skf/share/lib (MS Windows)
983              These  directories  are where external codeset conversion tables
984              go.  The location that current  skf  assumes  are  shown  by  -h
985              option.
986
987

AUTHOR

989       skf  is  written  by Seiji Kaneko (efialtes@osdn.jp) based on idea from
990       nkf written by Itaru Ichikawa (ichikawa@flab.fujitsu.co.jp) X 0213 code
991       table  is  derived from work of earthian@tama.or.jp.  Some codeset map‐
992       ping is derived from various sources. Detailed origin is shown in copy‐
993       right document included in this distribution.
994
995

ACKNOWLEDGEMENT

997       skf   is   inspired   by   works   or  requests  by  shinoda@cs.titech,
998       kato@cs.titech, uematsu@cs.titech, void@global ohta@ricoh,  Hinata(HKE)
999       Ashizawa(CRL)  Kunimoto(SDL) Oohara(Univ of Kyoto), Jokagi(elf2000) and
1000       Naruse (at osdn.jp). Thanks.
1001
1002

BUGS AND LIMITATIONS

1004       1. skf can handle mixed coding with  some  limitations.  However,  code
1005       detection  tends to fail for mixed code, and giving explicit input code
1006       set is strongly encouraged, if codeset is known beforehand.
1007       In case of need, --linewise-detect option may help, but code  detecting
1008       will more likely fail.
1009
1010       2. skf implements ISO-2022 with following exceptions.
1011        i)  GL 0x20 is always space. Even when 96-character codeset is invoked
1012       to GL.
1013        ii) Sequences for setting codes to C1 and C2 are always ignored.
1014        iii) If unknown sequence is given to G0, G0 is set to ascii, and lock‐
1015       ing/single  shift  is  cleared. Unknown sequece call to set to G1-G3 is
1016       just ignored.
1017        Private charset is also not supported and is ignored.
1018        iv) Sequences for 96 character multibyte coding is ignored (Currently,
1019       no codeset is registered).
1020        v) Calling UTF-8, UTF-16 coding system from iso-2022 is supported, and
1021       returns to previous coding system by standard return.
1022        Callings and returns to/from other coding schemes are ignored.
1023        vi) For supporting some of cellular phone glyphs, several private (not
1024       registered) codesets are defined in skf, and can be called by appropri‐
1025       ate sequences.
1026
1027       3. Error output coding is controlled by LOCALE environment variables in
1028       UN*X system. skf doesn't take care of situations like stdout and stderr
1029       are redirecting into a same stream. Such case should be handled by user
1030       side.
1031
1032       4. skf converts KEIS/JIS X 0213 code using CJK-extension B area and CJK
1033       compatibility area. For this reason, X 0213  and  KEIS  convert  result
1034       varies depending on --use-compat and --limit-to-ucs2 switches.
1035
1036       5.  JIS X 0207:1979 is not supported. JIS X 0211:1987 is designed to be
1037       supported (i.e. common terminal control sequence will be  transparently
1038       passed to output).
1039
1040       6.  Even  if  unbuffer  option(-u)  is specified, some code-translation
1041       related bufferings are still performed (in MIME, kana, VIQR etc.).
1042
1043       7. skf-1.9x or later recognizes and handles languages in iso639-1(alpha
1044       2).  iso639-2 is not supported as a valid language set.
1045
1046       8. Unicode IVS is not supported. Sequences are just discarded.
1047
1048       9.  skf-1.9x  or  later does not retain Macintosh RLO-ordered character
1049       property.  Codesets with this kind of codes are not supported.
1050
1051       10. CNS11643 4th, 5th, 6th planes are not supported.
1052
1053       11. In python 3 extension, a detected codeset by inquiry for input uni‐
1054       code strings are always UTF-32be.
1055
1056       12.   In   lightweight  language  extension  except  ruby  and  python,
1057       UCS2/UTF-16 are not supported.
1058
1059
1060

Notes

1062       1. Extended options are changed extensively since skf-1.9. Some archaic
1063       options (eg. -B, -@ and -r) have been deleted from this version.
1064
1065       2.  skf  is originally forked project from nkf, but doesn't contain any
1066       nkf codes now.  Copyright notice is retained by honor.
1067
1068       3. From version 1.9, default Japanese character set assumed by skf  has
1069       changed  to JIS X 0208:1990 with Microsoft Japanese Windows gaiji (i.e.
1070       CP932).
1071
1072       4. Code autodetection is not perfect by design. If  it  has  failed  to
1073       detect  input code properly, please give input code information explic‐
1074       itly.
1075
1076       5. Some ligatures in Unicode, cp932  gaiji  and  KEIS83  are  converted
1077       using  JIS  X  0124  and other convention.  During this conversion, its
1078       byte length is not preserved.
1079
1080       6. skf is intended to  pass  ANSI  compatible  terminal  control  codes
1081       transparently, but this is not guaranteed.
1082
1083       7.  nkf's  -i and -o options works only in nkf-compat mode. It is obso‐
1084       lete option in 1.97, and valid only when iso-2022-jp and  without  con‐
1085       sidering output codeset specifications.
1086
1087       8.  For unconverted character, skf uses geta and undefined character as
1088       --use-replace-char option.  If  output  codeset  doesn't  contain  geta
1089       code, skf prefers 'black square character', then uses '.' respectively.
1090
1091       9. There are some undocumented options. These options should be consid‐
1092       ered as highly experimental.
1093
1094       10. In lineend_thru mode and using folding, skf remembers order  of  cr
1095       and  lf appears in stream, and use that order.  For this design, if skf
1096       needs to  output  line-end  character  before  any  line-end  character
1097       appears in input stream, input order may not be preserved.
1098
1099       11. NKF-compatibility
1100       1) --prefix, some --fb's and --no-best-fit-chars are not supported.
1101       2)  MSDOS  (and  -T), --exec-in and --exec-out are not supported. -O is
1102       supported.
1103       3) MIME decoding/encoding handling behaviors differ in various ways.
1104       4) lineend conversion acts differently. Results may  not  be  same  for
1105       some messy text.
1106       5)  -r  option  and --decode=rot is different. See each option descrip‐
1107       tion.
1108       6) detected codeset name is not compatible with nkf. --help and  --ver‐
1109       sion return different results.
1110       7) in-place and overwrite suffix with * is not supported.
1111
1112       12. Conversion to NYUUKAN GAIJI is as follows
1113       1)   Kanji   codes   in   JIS   X0208(1997),   JIS   X0212(1990),   JIS
1114       X0213(2004/2012),
1115        Houmusho-kokuji No.582 beppyou No.1 are sent to output as it is.
1116       2) Kanji codes in beppyou No.4-2 leftmost columns are converted to  the
1117       first
1118        priority  character  in  the  table. If the second priority characters
1119       appear,
1120        the codes are sent to output as it is.
1121       3) Other kanji codes are converted as undefined codes. See  above  con‐
1122       version method.  Non-kanji codes (latins, glyphs etc.) are sent to out‐
1123       put as it is.
1124
1125       13. ARIB B24 compatibility
1126       1) Input only. ARIB B24 output is not supported.
1127       2) Neither international encoding nor X0213 extension are supported.
1128       3) Macro define sequences are suppressed. These  sequences  are  recog‐
1129       nized and
1130        discarded.
1131       4) Without specifying arib codeset, skf treats Arib-defined codepage as
1132       follows.
1133         i) private codepage are supported. ascii/jis x-0201 0x5f is not modi‐
1134       fied.
1135         ii)  macro  define/invoke and rpc invoke does not work. These charac‐
1136       ters are
1137           discarded.
1138
1139

Notice

1141       Unicode(TM) is a trademark of Unicode, Inc. Microsoft and  Windows  are
1142       registered  trademarks  of Microsoft corporation. Macintosh is a regis‐
1143       tered trademark of Apple Inc. Vodafone is a trademark of Vodafone  K.K.
1144       Other  names  and  terms  may be trademarks or registered trademarks of
1145       their respective owner.  Trademark symbol (TM) may be omitted  in  this
1146       manual page.
1147
1148
1149
1150
1151                                  10/Aug/2018                           SKF(1)