NKF(3pm) - f36

1NKF(3)                User Contributed Perl Documentation               NKF(3)
2
3
4

NAME

6       NKF - Perl extension for Network Kanji Filter
7

SYNOPSIS

9         use NKF;
10         $output = nkf("-s",$input);
11

DESCRIPTION

13       This is a Perl Extension version of nkf (Network Kanji Filter).  It
14       converts the last argument and return converted result. Conversion
15       details are specified by flags before the last argument.  Nkf is a yet
16       another kanji code converter among networks, hosts and terminals.  It
17       converts input kanji code to designated kanji code such as ISO-2022-JP,
18       Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.
19
20       One of the most unique faculty of nkf is the guess of the input kanji
21       encodings.  It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP,
22       UTF-8, UTF-16 and UTF-32.  So users needn't set the input kanji code
23       explicitly.
24
25       By default, X0201 kana is converted into X0208 kana.  For X0201 kana,
26       SO/SI, SSO and ESC-(-I methods are supported.  For automatic code
27       detection, nkf assumes no X0201 kana in Shift_JIS.  To accept X0201 in
28       Shift_JIS, use -X, -x or -S.
29
30       multiple options are specifed as seprate strings, such as
31
32         print nkf('--ic=UTF8-MAC', '-w', $string), "\n";
33
34       except the last arguments.
35

OPTIONS

37       -J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32
38           Specify input and output encodings. Upper case is input.  cf. --ic
39           and --oc.
40
41           -J  ISO-2022-JP (JIS code).
42
43           -S  Shift_JIS and JIS X 0201 kana.  EUC-JP is recognized as X0201
44               kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth
45               kana) is converted into JIS X 0208.  If you use Windows, see
46               Windows-31J (CP932).
47
48           -E  EUC-JP.
49
50           -W  UTF-8N.
51
52           -W16[BL][0]
53               UTF-16.  B or L gives whether Big Endian or Little Endian.  0
54               gives whther put BOM or not.
55
56           -W32[BL][0]
57               UTF-32.  B or L gives whether Big Endian or Little Endian.  0
58               gives whther put BOM or not.
59
60       -b -u
61           Output is buffered (DEFAULT), Output is unbuffered.
62
63       -t  No conversion.
64
65       -i[@B]
66           Specify the escape sequence for JIS X 0208.
67
68           -i@ Use ESC ( @. (JIS X 0208-1978)
69
70           -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)
71
72       -o[BJ]
73           Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT
74           B)
75
76       -r  {de/en}crypt ROT13/47
77
78       -h[123] --hiragana --katakana --katakana-hiragana
79           -h1 --hiragana
80               Katakana to Hiragana conversion.
81
82           -h2 --katakana
83               Hiragana to Katakana conversion.
84
85           -h3 --katakana-hiragana
86               Katakana to Hiragana and Hiragana to Katakana conversion.
87
88       -T  Text mode output (MS-DOS)
89
90       -f[m [- n]]
91           Folding on m length with n margin in a line.  Without this option,
92           fold length is 60 and fold margin is 10.
93
94       -F  New line preserving line folding.
95
96       -Z[0-3]
97           Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.
98
99           -Z -Z0
100               Convert X0208 alphabet to ASCII.
101
102           -Z1 Convert X0208 kankaku to single ASCII space.
103
104           -Z2 Convert X0208 kankaku to double ASCII spaces.
105
106           -Z3 Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;',
107               '&amp;' as in HTML.
108
109       -X -x
110           With -X or without this option, X0201 is converted into X0208 Kana.
111           With -x, try to preserve X0208 kana and do not convert X0201 kana
112           to X0208.  In JIS output, ESC-(-I is used. In EUC output, SS2 is
113           used.
114
115       -B[0-2]
116           Assume broken JIS-Kanji input, which lost ESC.  Useful when your
117           site is using old B-News Nihongo patch.
118
119           -B1 allows any chars after ESC-( or ESC-$.
120
121           -B2 force ASCII after NL.
122
123       -I  Replacing non iso-2022-jp char into a geta character (substitute
124           character in Japanese).
125
126       -m[BQN0]
127           MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1
128           (Latin-1) -l is necessary.
129
130           -mB Decode MIME base64 encoded stream. Remove header or other part
131               before conversion.
132
133           -mQ Decode MIME quoted stream. '_' in quoted stream is converted to
134               space.
135
136           -mN Non-strict decoding.  It allows line break in the middle of the
137               base64 encoding.
138
139           -m0 No MIME decode.
140
141       -M  MIME encode. Header style. All ASCII code and control characters
142           are intact.
143
144           -MB MIME encode Base64 stream.  Kanji conversion is performed
145               before encoding, so this cannot be used as a picture encoder.
146
147           -MQ Perform quoted encoding.
148
149       -l  Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP.  -s,
150           -e and -x are not compatible with this option.
151
152       -L[uwm] -d -c
153           Convert line breaks.
154
155           -Lu -d
156               unix (LF)
157
158           -Lw -c
159               windows (CRLF)
160
161           -Lm mac (CR)
162
163               Without this option, nkf doesn't convert line breaks.
164
165       --fj --unix --mac --msdos --windows
166           Convert for these systems.
167
168       --jis --euc --sjis --mime --base64
169           Convert to named code.
170
171       --jis-input --euc-input --sjis-input --mime-input --base64-input
172           Assume input system
173
174       --ic=input codeset --oc=output codeset
175           Set the input or output codeset.  NKF supports following codesets
176           and those codeset names are case insensitive.
177
178           ISO-2022-JP
179               a.k.a. RFC1468, 7bit JIS, JUNET
180
181           EUC-JP (eucJP-nkf)
182               a.k.a. AT&T JIS, Japanese EUC, UJIS
183
184           eucJP-ascii
185           eucJP-ms
186           CP51932
187               Microsoft Version of EUC-JP.
188
189           Shift_JIS
190               a.k.a. SJIS, MS_Kanji
191
192           Windows-31J
193               a.k.a. CP932
194
195           UTF-8
196               same as UTF-8N
197
198           UTF-8N
199               UTF-8 without BOM
200
201           UTF-8-BOM
202               UTF-8 with BOM
203
204           UTF8-MAC (input only)
205               decomposed UTF-8
206
207           UTF-16
208               same as UTF-16BE
209
210           UTF-16BE
211               UTF-16 Big Endian without BOM
212
213           UTF-16BE-BOM
214               UTF-16 Big Endian with BOM
215
216           UTF-16LE
217               UTF-16 Little Endian without BOM
218
219           UTF-16LE-BOM
220               UTF-16 Little Endian with BOM
221
222           UTF-32
223               same as UTF-32BE
224
225           UTF-32BE
226               UTF-32 Big Endian without BOM
227
228           UTF-32BE-BOM
229               UTF-32 Big Endian with BOM
230
231           UTF-32LE
232               UTF-32 Little Endian without BOM
233
234           UTF-32LE-BOM
235               UTF-32 Little Endian with BOM
236
237       --fb-{skip, html, xml, perl, java, subchar}
238           Specify the way that nkf handles unassigned characters.  Without
239           this option, --fb-skip is assumed.
240
241       --prefix=escape charactertarget character..
242           When nkf converts to Shift_JIS, nkf adds a specified escape
243           character to specified 2nd byte of Shift_JIS characters.  1st byte
244           of argument is the escape character and following bytes are target
245           characters.
246
247       --no-cp932ext
248           Handle the characters extended in CP932 as unassigned characters.
249
250       --no-best-fit-chars
251           When Unicode to Encoded byte conversion, don't convert characters
252           which is not round trip safe.  When Unicode to Unicode conversion,
253           with this and -x option, nkf can be used as UTF converter.  (In
254           other words, without this and -x option, nkf doesn't save some
255           characters)
256
257           When nkf converts strings that related to path, you should use this
258           opion.
259
260       --cap-input
261           Decode hex encoded characters.
262
263       --url-input
264           Unescape percent escaped characters.
265
266       --numchar-input
267           Decode character reference, such as "&#....;".
268
269       --  Ignore rest of -option.
270

AUTHOR

272       Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).
273
274       Copyright (c) 1996-2015, The nkf Project.
275

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

AUTHOR

SEE ALSO