1KAKASI(1) General Commands Manual KAKASI(1)
2
3
4
6 KAKASI - Kanji kana simple inverter (between Kanji, both Kana and
7 Romaji)
8
10 kakasi [options] [jisyo1 [jisyo2 [jisyo1,,]]]
11
13 KAKASI In Japanese sentences are often made up a mixture of Chinese
14 characters (Kanji), Kana (Hiragana and Katakana) and Romaji (Latin pho‐
15 netical pronunciation). This program converts between these four dif‐
16 ferent ways of writing Japanese.
17
18 This program is useful for those whose terminal or desktop does not
19 support the native display of Japanese. Also this is a great tool for
20 those who are learning Japanese (international students and children
21 etc).
22
23 A word can be passed into the standard input (stdin), then it is trans‐
24 lated and output to standard out (stdout). In the following example
25 the "bunchu" Kanji is converted into Hiragana.
26
27 kakasi -JH < document
28
29 Since version 2.3.0 text with spaces in-between words has been sup‐
30 ported. In the following example the output has spaces in-between each
31 word.
32
33 kakasi -w < document
34
35 Since version 2.3.5 level conversion mode has been supported. In the
36 following example, simple Kanjis are left them unconverted, and diffi‐
37 cult Kanjis are translated into Hiragana.
38
39 kakasi -l4 < document
40
41 KAKASI It is possible to convert letters to alphabetical characters.
42 Also Katakana letters in the JIS x0201 character set and the Hiragana
43 in the JIS x0208 character set can be converted between each other.
44
45 KAKASI The following character set in brackets which is displayed.
46
47 ASCII (a) Known as "ascii" character set.
48
49 JISROMAN (j)
50 Known as "jis roman" character set.
51
52 GRAPHIC (g)
53 It is the DEC graphic character set.
54
55 Katakana (k)
56 JIS x0201, defined as part of the GR character set.
57
58 As a matter of convinience, JIS x0208 is divided as stated
59 below.
60
61 Kanji (J)
62 JIS x0208 characters included between 16 and 94 sections.
63
64 Hiragana (H)
65 JIS x0208 characters included in section 4 (Hiragana)
66
67 Katakana (K)
68 JIS x0208 characters included in section 5 (Katakana)
69
70 Sign (E)
71 JIS x0208 characters included in section 1,2,3,6,7, and 8.
72 (Note that section 9-15 are undefined in JIS x0208.)
73
74 Translation between the following character sets are available.
75
76 ASCII -> JISROMAN, Sign
77
78 JISROMAN -> ASCII, Sign
79
80 GRAPHIC -> ASCII, JISROMAN, Sign
81
82 JISx0201 Katakana
83 -> ASCII, JISROMAN, Kana, Hiragana
84
85 Sign -> ASCII, JISROMAN
86
87 Katakana -> ASCII, JISROMAN, JISx0201 Katakana, Hiragana
88
89 Hiragana -> ASCII, JISROMAN, JISx0201 Katakana, Kana
90
91 Kanji -> ASCII, JISROMAN, JISx0201 Katakana, Kana, Hiragana
92
93 With conversion of ASCII and the JISROMAN the alphabetical character
94 conversion is done from JISx0201 Katakana, Katakana, Hiragana and
95 Kanji.
96
97 Example:
98
99 1. All kanji characters are converted to Hiragana.
100
101 kakasi -JH
102
103 2. All JIS x0208 characters are converted to JIS X 0201.
104
105
106 kakasi -Hk -Kk -Jk -Ea
107
108 3. All characters are converted to JIS X 0208.
109
110 kakasi -aE -jE -gE -kK
111
112 4. All characters are converted to ascii and words are separated.
113
114 kakasi -Ha -Ka -Ja -Ea -ka
115
116 5. Exchange between Katakana and Hiragana characters.
117
118 kakasi -HK -KH
119
121 Some character sets are categorized by kakasi and indicated by follow‐
122 ing mnemonics: a, j, g, k, E, H, K, J.
123
124 a --- ASCII characters
125 j --- JIS ROMAN ( nearly equal to ASCII, "~" and "
126 different ) defined by JIS x0201
127 g --- DEC Graphic Characters
128 k --- KATAKANA defined by JIS x0201
129
130 E, H, K, and J are included in JIS x0208 character set.
131
132 J --- KANJI characters of JIS x0208.
133 H --- HIRAGANA characters of JIS x0208.
134 K --- KATAKANA characters of JIS x0208.
135 E --- Rest of above characters of JIS x0208 which includes
136 alphabets, numbers, symbols and so on.
137
138 -(from)(to) means conversion from character set (from) to (to). For
139 example, -JK option causes KANJI characters are converted to HIRAGANA.
140 Combinations in the following table are available. (You must not
141 remember it, because the -h shows same information)
142
143 to\from| a j k E H K J g
144 -------+--------------------------------------------
145 a | - o o1 o o1 o1 o12 o
146 j | o - o1 o o1 o1 o12 o
147 k | - o o o2
148 E | o o - o
149 H | o - o o2
150 K | o o -
151
152 o -- converted.
153 1 -- converted to Romaji.
154 2 -- Kanji -> Kana conversion.
155
156
158 Unfortunately, several coding systems are used in Japan and JIS x0208
159 standard are changed at 1983. Therefore, KAKASI can automatically dis‐
160 tinguish the coding system and coding revision and then use the same
161 output coding system if the document does not include JIS x0201
162 KATAKANA. If JIS x0201 KATAKANA is included or you wish to change
163 kanji coding system, you may use the next options.
164
165 -i : input coding
166 -o : output coding
167
168 jis -- Widely used on the internet. (Ex: fj, jp, .. newsgroups)
169 Derived from ISO-2022 coding manner.
170 newjis: JISx0208 (1983) invoked by ESC-$-B.
171 oldjis: JISx0208 (1978) invoked by ESC-$-@.
172 euc,dec -- Often used in UNIX like computers. JISx0208 is
173 assigned to GR ( MSB is 1 ). The major difference between
174 euc and dec is assignment of JISx0201 KATAKANA and
175 the DEC graphic character.
176 sjis -- Defined by Microsoft Corp. Widely used on the personal
177 computers ( MSDOS, Mac, .. )
178 utf8 -- Current international standard. All modern OSs use this
179 encoding of the Unicode character set as the default.
180
181
183 Kanji kana conversion options. Used with -J? option. There are 2 types
184 of Romaji writing. The first is the Kunrei method defined by Japanese
185 government, and the second is the Hepburn method. I think Hepburn
186 method sounds naturally to foreigners.
187
188 -rhepburn : Hepburn Method (default)
189 -rkunrei : Kunrei Method
190
191
193 -p: List all possible readings. If there exist two or more
194 possible readings, KAKASI shows them in braces {aaa,bbb}.
195 -s: Insert a separate character between words.
196 -f: Furigana mode. Shows the original kanji word with reading.
197 -c: Skip characters within word. ( default TAB CR LF BLANK )
198 -C: Capitalize Romaji word (with -Ja or -Jj option)
199 -U: Upcase romaji word (with -Ja or -Jj option)
200 -u: Call fflush().
201 -w: wakatigaki mode. 'wakatigaki' is word segmentation for
202 Japanese sentences.
203
204
205
207 KAKASI can accept additional dictionary to the system dictionary. The
208 acceptable format of additional dictionary is SKK format, and Wnn for‐
209 mat, and so on. Namely, each record is one line with two fields, Yomi
210 (reading) and Jukugo(idiom). Fields are separated with commas (or TAB,
211 or blank). The kanji code is restricted to JIS or EUC. See another
212 document named JISYO for more details.
213
215 The behavior is affected by the following environment variables.
216
217 KANWADICTPATH
218 Specifies a path of kanwadict (full-path including filename).
219 Default value is $prefix/share/kakasi/kanwadict.
220
221 ITAIJIDICTPATH
222 Specifies a path of itaijidict (full-path including filename).
223 Default value is $prefix/share/kakasi/itaijidict.
224
226 Hironobu Takahasi <takahasi@tiny.or.jp>
227
229 $prefix/share/kakasi/kanwadict
230 It is a binary dictionary of KAKASI. It is automatically con‐
231 verted from kakasidict by mkkanwa when the package is installed.
232
234 mkkanwa(1)
235
237 Return status except 0 when there is any trouble.
238
240 Report bugs to KAKASI Project <kakasi-dev@namazu.org>. Please DO NOT
241 CONTACT to the originator (Takahasi-san).
242
244 The content of English manual is not exactly same as that of Japanese
245 manual.
246
247
248
249
2504.3 Berkeley Distribution LOCAL KAKASI(1)