kcc(1) - f31

1KCC(L)                                                                  KCC(L)
2
3
4

NAME

6       kcc - Kanji code coverter with encoding auto detection
7

SYNOPSIS

9       kcc [ -IOchnvxz ] [ -b bufsize ] [ file ] ...
10

DESCRIPTION

12       kcc  is a filter that reads file sequencially, converts kanji encodings
13       and output to stdou.  If no file is specified, or specified - as  file‐
14       name,  it  read  from  stdin.   You  can  specify  kanji  encodings for
15       input/output. However, kcc detect input encodig automatically,  if  you
16       don't specify input encoding.
17
18       Available  kanji  encodings  are  JIS  (7  bit  and/or  8  bit),  Shift
19       JISEUCDEC.  For input encoding, you can mix when these are pair of  one
20       of  EUC  DEC  or Shift JIS and 7 bit JIS.  SI/SOESC(I are recognized as
21       halfwidth of JIS.
22

OPTIONS

24       -O
25       -IO    I for input kanji encoding¡¤O for output kanji  encoding.   When
26              no  input encoding specified, it will be detected automatically,
27              and if both of input/output aren't specified, output encoding is
28              7 bit JIS.
29
30              You  can  specify  one  of the followings for the input encoding
31              option, I.
32
33                 e      EUC(available with 7 bit JIS )
34                 d      DEC(available with 7 bit JIS )
35                 s      Shift JIS(available with 7 bit JIS )
36                 j7 or k
37                        7 bit JIS
38                 8      8 bit JIS
39
40              You can specify  one  of  the  followings  for  output  encoding
41              option, O.
42
43                 e      EUC
44                 d      DEC
45                 s      Shift JIS
46                 jXY or 7XY
47                        7 bit JIS(usingSI/SO for JIS kana designation)
48                 kXY    7 bit JIS(usingESC(I for JIS kana designation)
49                 8XY    8 bit JIS
50
51              By XY in O option, You can specify which escape sequence used in
52              JIS encoding.  BJ is default.   Supplimental  kanji  designation
53              is fixed to ESC$(D
54
55                 X      Kanji is designated by:
56                      B      ESC$B(JIS X0208-1983)
57                      @      ESC$@(JIS X0208-1978)
58                      +      ESC&@ESC$B(JIS X0212-1990)
59                 Y      Alpha Numerical is designated by:
60                      B      ESC(B(ASCII)
61                      J      ESC(J(JIS Roman; JIS X0201)
62                      H      ESC(H(Swedish; strongly deprecated)
63
64       -v     outputs result of input encoding detection to stderr.
65
66       -x     Extension mode.  By auto detection of input encodings, recognize
67              user-defined characters and extended character region (  out  of
68              range  of  EUC,  undefined halfwidth kana, control character, C1
69              area and/or extended character region Shift C1  JIS  ).  Distin‐
70              guish between DEC and EUC is done in this mode.
71
72       -z     Shrink  mode. Don't recognize halfwidth kana (except 7 bit JIS )
73              with input encoding detection.  With this  option,  accuracy  of
74              auto  detection  of input encodings becomes much better for file
75              without halfwidth kana.
76
77       -h     Normally, When converted halfwidth kana  to  DEC  ,  it  becomes
78              fullwidth Katakana.  With this option, it becomes Hiragana.
79
80       -n     user-defined  characters,  extended  characters and supplimental
81              kanji characters areconverted to fullwidth white box, and  unde‐
82              fined  region  of halfwidth kana are converted to halfwidth cen‐
83              tered dot.
84
85       -b bufsize
86              specify buffer size.  8kbytes is default.
87
88       -c     don't convert but check input encoding and print result to  std‐
89              out.   Different  with normal auto-detection,  whole contents of
90              file is checked.  However, when inconsistency  of  encodings  is
91              found,  abort  reading  and print "data".  Options except -x¡¤-z
92              are ignored.
93

EXAMPLES

95       % kcc -e file
96              Input encoding are detect automatically, and output  is  in  EUC
97              encoding.
98
99       % kcc -sj file1 file2
100              Two files in Shift JIS concatinated with converting to JIS.
101
102       % command | kcc -k+J
103              output  of  command  are  converted to JIS(JIS JIS X0208 JIS JIS
104              Roman¡¤ESC(I Halfwidth Kana JIS )
105
106       % kcc -c file
107              Encoding of contents of file is detected(no conversion)
108

BUG

110       Auto detection of input encoding is well done for normal case, however,
111       it has the following problems.
112
113       7 bit JIS is recognized by escape sequence in certain.  EUC and DEC are
114       the same (refered as EUC series).  Halfwidth kana of 8 bit JIS  is  the
115       same  as  halfwidth  kana  of  Shift JIS (refered as Shift JIS series).
116       However, EUC series and JIS , which are both 8 bit encoding, are  shar‐
117       ing  the  same  regions  widely.   So, the problem in auto detection is
118       detection of these 2 encodings.
119
120       Detection of EUC series/Shift JIS series is done in line by line,  When
121       it  is  found  that  it's not Shift JIS series, or it's not EUC series,
122       encoding is determined.  When inconsistensy found, it will  be  treated
123       as "data" and contents of output is not guaranteed.
124
125       While  determined  between  EUC series/Shift JIS series after 8bit code
126       found,  conversions are pending and put input data in buffer,  however,
127       buffer  is  fulled, it assumes it's EUC series and forces to start con‐
128       version. Rationale. Usually, we can assume that  documents  with  kanji
129       include JIS non-kanji or JIS first standard, it can be detected in cer‐
130       tain if it is Shift JIS , which does not share region with EUC.  So  if
131       it can't be determined, it's very likely to be EUC.
132
133       8  bit  JIS  and it has always even number of halfwidth kana sequences,
134       then it will be wrongly detected as EUC kanji. Be ceraful.
135
136       If input encoding doesn't have halfwidth kana, use -z and  accuracy  of
137       detection  become  much  better.   This  is  because  shared region are
138       restricted to area of JIS second standards.
139
140       Extended region of Shift JIS user-defined area of EUC, control  charac‐
141       ters  C1  of  EUC, undefined region of halfwidth kana of EUC are out of
142       range of auto detection, so it will fails to detect encodings if  input
143       has these characters.  Use -x option to specify extended mode, or spec‐
144       ify input code.
145

NOTES

150       Usually, user-defined  characters,  extended  characters,  supplimental
151       kanji  characters  are  mapped respectively. However characters that is
152       out of range of extended characters become  FCFC  in  hexadecimal  when
153       converted  to  Shift  JIS.  Although control character region C1 of EUC
154       and DEC remains when converted to JIS , these will be deleted when con‐
155       verted  to  Shift JIS Undefined area of halfwidth kana become halfwidth
156       centered dot when convered to Shift JIS Halfwidth kana become fullwidth
157       kana when converted to DEC.
158
159       When  output  is JIS encoding, control characters such as newline, TAB,
160       DEL and white space (halfwidth) will be output in ASCII mode.
161
162       When encoding of input is detected wrongly, or input undefined  charac‐
163       ter for expected character sets, output is indefined.
164
165       This  manual  are  translated by Fumitoshi UKAI <ukai@debian.or.jp> for
166       Debian system, but you can use it for any purpose.
167
168
169
170
171Y. Tonooka                     November 19, 1992                        KCC(L)