1KCC(L) KCC(L)
2
3
4
6 kcc - Kanji code coverter with encoding auto detection
7
9 kcc [ -IOchnvxz ] [ -b bufsize ] [ file ] ...
10
12 kcc is a filter that reads file sequencially, converts kanji encodings
13 and output to stdou. If no file is specified, or specified - as file‐
14 name, it read from stdin. You can specify kanji encodings for
15 input/output. However, kcc detect input encodig automatically, if you
16 don't specify input encoding.
17
18 Available kanji encodings are JIS (7 bit and/or 8 bit), Shift
19 JISEUCDEC. For input encoding, you can mix when these are pair of one
20 of EUC DEC or Shift JIS and 7 bit JIS. SI/SOESC(I are recognized as
21 halfwidth of JIS.
22
24 -O
25 -IO I for input kanji encoding¡¤O for output kanji encoding. When
26 no input encoding specified, it will be detected automatically,
27 and if both of input/output aren't specified, output encoding is
28 7 bit JIS.
29
30 You can specify one of the followings for the input encoding
31 option, I.
32
33 e EUC(available with 7 bit JIS )
34 d DEC(available with 7 bit JIS )
35 s Shift JIS(available with 7 bit JIS )
36 j7 or k
37 7 bit JIS
38 8 8 bit JIS
39
40 You can specify one of the followings for output encoding
41 option, O.
42
43 e EUC
44 d DEC
45 s Shift JIS
46 jXY or 7XY
47 7 bit JIS(usingSI/SO for JIS kana designation)
48 kXY 7 bit JIS(usingESC(I for JIS kana designation)
49 8XY 8 bit JIS
50
51 By XY in O option, You can specify which escape sequence used in
52 JIS encoding. BJ is default. Supplimental kanji designation
53 is fixed to ESC$(D
54
55 X Kanji is designated by:
56 B ESC$B(JIS X0208-1983)
57 @ ESC$@(JIS X0208-1978)
58 + ESC&@ESC$B(JIS X0212-1990)
59 Y Alpha Numerical is designated by:
60 B ESC(B(ASCII)
61 J ESC(J(JIS Roman; JIS X0201)
62 H ESC(H(Swedish; strongly deprecated)
63
64 -v outputs result of input encoding detection to stderr.
65
66 -x Extension mode. By auto detection of input encodings, recognize
67 user-defined characters and extended character region ( out of
68 range of EUC, undefined halfwidth kana, control character, C1
69 area and/or extended character region Shift C1 JIS ). Distin‐
70 guish between DEC and EUC is done in this mode.
71
72 -z Shrink mode. Don't recognize halfwidth kana (except 7 bit JIS )
73 with input encoding detection. With this option, accuracy of
74 auto detection of input encodings becomes much better for file
75 without halfwidth kana.
76
77 -h Normally, When converted halfwidth kana to DEC , it becomes
78 fullwidth Katakana. With this option, it becomes Hiragana.
79
80 -n user-defined characters, extended characters and supplimental
81 kanji characters areconverted to fullwidth white box, and unde‐
82 fined region of halfwidth kana are converted to halfwidth cen‐
83 tered dot.
84
85 -b bufsize
86 specify buffer size. 8kbytes is default.
87
88 -c don't convert but check input encoding and print result to std‐
89 out. Different with normal auto-detection, whole contents of
90 file is checked. However, when inconsistency of encodings is
91 found, abort reading and print "data". Options except -x¡¤-z
92 are ignored.
93
95 % kcc -e file
96 Input encoding are detect automatically, and output is in EUC
97 encoding.
98
99 % kcc -sj file1 file2
100 Two files in Shift JIS concatinated with converting to JIS.
101
102 % command | kcc -k+J
103 output of command are converted to JIS(JIS JIS X0208 JIS JIS
104 Roman¡¤ESC(I Halfwidth Kana JIS )
105
106 % kcc -c file
107 Encoding of contents of file is detected(no conversion)
108
110 Auto detection of input encoding is well done for normal case, however,
111 it has the following problems.
112
113 7 bit JIS is recognized by escape sequence in certain. EUC and DEC are
114 the same (refered as EUC series). Halfwidth kana of 8 bit JIS is the
115 same as halfwidth kana of Shift JIS (refered as Shift JIS series).
116 However, EUC series and JIS , which are both 8 bit encoding, are shar‐
117 ing the same regions widely. So, the problem in auto detection is
118 detection of these 2 encodings.
119
120 Detection of EUC series/Shift JIS series is done in line by line, When
121 it is found that it's not Shift JIS series, or it's not EUC series,
122 encoding is determined. When inconsistensy found, it will be treated
123 as "data" and contents of output is not guaranteed.
124
125 While determined between EUC series/Shift JIS series after 8bit code
126 found, conversions are pending and put input data in buffer, however,
127 buffer is fulled, it assumes it's EUC series and forces to start con‐
128 version. Rationale. Usually, we can assume that documents with kanji
129 include JIS non-kanji or JIS first standard, it can be detected in cer‐
130 tain if it is Shift JIS , which does not share region with EUC. So if
131 it can't be determined, it's very likely to be EUC.
132
133 8 bit JIS and it has always even number of halfwidth kana sequences,
134 then it will be wrongly detected as EUC kanji. Be ceraful.
135
136 If input encoding doesn't have halfwidth kana, use -z and accuracy of
137 detection become much better. This is because shared region are
138 restricted to area of JIS second standards.
139
140 Extended region of Shift JIS user-defined area of EUC, control charac‐
141 ters C1 of EUC, undefined region of halfwidth kana of EUC are out of
142 range of auto detection, so it will fails to detect encodings if input
143 has these characters. Use -x option to specify extended mode, or spec‐
144 ify input code.
145
147 cat(1)
148
150 Usually, user-defined characters, extended characters, supplimental
151 kanji characters are mapped respectively. However characters that is
152 out of range of extended characters become FCFC in hexadecimal when
153 converted to Shift JIS. Although control character region C1 of EUC
154 and DEC remains when converted to JIS , these will be deleted when con‐
155 verted to Shift JIS Undefined area of halfwidth kana become halfwidth
156 centered dot when convered to Shift JIS Halfwidth kana become fullwidth
157 kana when converted to DEC.
158
159 When output is JIS encoding, control characters such as newline, TAB,
160 DEL and white space (halfwidth) will be output in ASCII mode.
161
162 When encoding of input is detected wrongly, or input undefined charac‐
163 ter for expected character sets, output is indefined.
164
165 This manual are translated by Fumitoshi UKAI <ukai@debian.or.jp> for
166 Debian system, but you can use it for any purpose.
167
168
169
170
171Y. Tonooka November 19, 1992 KCC(L)