1enca(1) enca(1)
2
3
4
6 enca -- detect and convert encoding of text files
7
9 enca [-L LANGUAGE] [OPTION]... [FILE]...
10 enconv [-L LANGUAGE] [OPTION]... [FILE]...
11
13 If you are lucky enough, the only two things you will ever need to know
14 are: command
15
16 enca FILE
17
18 will tell you which encoding file FILE uses (without changing it), and
19
20 enconv FILE
21
22 will convert file FILE to your locale native encoding. To convert the
23 file to some other encoding use the -x option (see -x entry in section
24 OPTIONS and sections CONVERSION and ENCODINGS for details).
25
26 Both work with multiple files and standard input (output) too. E.g.
27
28 enca -x latin2 <sometext | lpr
29
30 assures file `sometext' is in ISO Latin 2 when it's sent to printer.
31
32 The main reason why these command will fail and turn your files into
33 garbage is that Enca needs to know their language to detect the encod‐
34 ing. It tries to determine your language and preferred charset from
35 locale settings, which might not be what you want.
36
37 You can (or have to) use -L option to tell it the right language. Sup‐
38 pose, you downloaded some Russian HTML file, `file.htm', it claims it's
39 windows-1251 but it isn't. So you run
40
41 enca -L ru file.htm
42
43 and find out it's KOI8-R (for example). Be warned, currently there are
44 not many supported languages (see section LANGUAGES).
45
46 Another warning concerns the fact several Enca's features, namely its
47 charset conversion capabilities, strongly depend on what other tools
48 are installed on your system (see section CONVERSION)--run
49
50 enca --version
51
52 to get list of features (see section FEATURES). Also try
53
54 enca --help
55
56 to get description of all other Enca options (and to find the rest of
57 this manual page redundant).
58
60 Enca reads given text files, or standard input when none are given, and
61 uses knowledge about their language (must be supported by you) and a
62 mixture of parsing, statistical analysis, guessing and black magic to
63 determine their encodings, which it then prints to standard output (or
64 it confesses it doesn't have any idea what the encoding could be). By
65 default, Enca presents results as a multiline human-readable descrip‐
66 tions, several other formats are available--see Output type selectors
67 below.
68
69 Enca can also convert files to some other encoding ENC when you ask for
70 it--either using a built-in converter, some conversion library, or by
71 calling an external converter.
72
73 Enca's primary goal is to be usable unattended, as an automatic conver‐
74 sion tool, though it perhaps have not reached this point yet (please
75 see section SECURITY).
76
77 Please note except rare cases Enca really has to know the language of
78 input files to give you a reliable answer. On the other hand, it can
79 then cope quite well with files that are not purely textual or even
80 detect charset of text strings inside some binary file; of course, it
81 depends on the character of the non-text component.
82
83 Enca doesn't care about structure of input files, it views them as a
84 uniform piece of text/data. In case of multipart files (e.g. mail‐
85 boxes), you have to use some tool knowing the structure to extract the
86 individual parts first. It's the cost of ability to detect encodings
87 of any damaged, incomplete or otherwise incorrect files.
88
90 There are several categories of options: operation mode options, output
91 type selectors, guessing parameters, conversion parameters, general
92 options and listings.
93
94 All long options can be abbreviated as long as they are unambiguous,
95 mandatory parameters of long options are mandatory for short options
96 too.
97
98 Operation modes
99 are following:
100
101 -c, --auto-convert
102 Equivalent to calling Enca as enconv.
103
104 If no output type selector is specified, detect file encodings,
105 guess your preferred charset from locales, and convert files to
106 it (only available with +target-charset-auto feature).
107
108 -g, --guess
109 Equivalent to calling Enca as enca.
110
111 If no output type selector is specified, detect file encodings
112 and report them.
113
114 Output type selectors
115 select what action Enca will take when it determines the encoding; most
116 of them just choose between different names, formats and conventions
117 how encodings can be printed, but one of them (-x) is special: it tells
118 Enca to recode files to some other encoding ENC. These options are
119 mutually exclusive; if you specify more than one output type selector
120 the last one takes precedence.
121
122 Several output types represent charset name used by some other program,
123 but not all these programs know all the charsets which Enca recognises.
124 Be warned, Enca makes no difference between unrecognised charset and
125 charset having no name in given namespace in such situations.
126
127 -d, --details
128 It used to print a few pages of details about the guessing
129 process, but since Enca is just a program linked against Enca
130 library, this is not possible and this option is roughly equiva‐
131 lent to --human-readable, except it reports failure reason when
132 Enca doesn't recognize the encoding.
133
134 -e, --enca-name
135 Prints Enca's nice name of the charset, i.e., perhaps the most
136 generally accepted and more or less human-readable charset iden‐
137 tifier, with surfaces appended.
138
139 This name is used when calling an external converter, too.
140
141 -f, --human-readable
142 Prints verbal description of the detected charset and sur‐
143 faces--something a human understands best. This is the default
144 behaviour.
145
146 The precise format is following: the first line contains charset
147 name alone, and it's followed by zero or more indented lines
148 containing names of detected surfaces. This format is not, how‐
149 ever, suitable or intended for further machine-processing, and
150 the verbal charset descriptions are like to change in the
151 future.
152
153 -i, --iconv-name
154 Prints how iconv(3) (and/or iconv(1)) calls the detected
155 charset. More precisely, it prints one, more or less arbitrar‐
156 ily chosen, alias accepted by iconv. A charset unknown to iconv
157 counts as unknown.
158
159 This output type makes sense only when Enca is compiled with
160 iconv support (feature +iconv-interface).
161
162 -r, --rfc1345-name
163 Prints RFC 1345 charset name. When such a name doesn't exist
164 because RFC 1345 doesn't define a given encoding, some other
165 name defined in some other RFC or just the name which author
166 considers `the most canonical', is printed.
167
168 Since RFC 1345 doesn't define surfaces, no surface info is
169 appended.
170
171 -m, --mime-name
172 Prints preferred MIME name of detected charset. This is the
173 name you should normally use when fixing e-mails or web pages.
174
175 A charset not present in http://www.iana.org/assignments/charac‐
176 ter-sets counts as unknown.
177
178 -s, --cstocs-name
179 Prints how cstocs(1) calls the detected charset. A charset
180 unknown to cstocs counts as unknown.
181
182 -n, --name=WORD
183 Prints charset (encoding) name selected by WORD (can be abbrevi‐
184 ated as long as is unambiguous). For names listed above,
185 --name=WORD is equivalent to --WORD.
186
187 Using aliases as the output type causes Enca to print list of
188 all accepted aliases of detected charset.
189
190 -x, --convert-to=[..]ENC
191 Converts file to encoding ENC.
192
193 The optional `..' before encoding name has no special meaning,
194 except you can use it to remind yourself that, unlike in
195 recode(1), you should specify desired encoding, instead of cur‐
196 rent.
197
198 You can use recode(1) recoding chains or any other kind of
199 braindead recoding specification for ENC, provided that you tell
200 Enca to use some tool understanding it for conversion (see sec‐
201 tion CONVERSION).
202
203 When Enca fails to determine the encoding, it prints a warning
204 and leaves the the file as is; when it is run as a filter it
205 tries to do its best to copy standard input to standard output
206 unchanged. Nevertheless, you should not rely on it and do
207 backup.
208
209 Guessing parameters
210 There's only one: -L setting language of input files. This option is
211 mandatory (but see below).
212
213 -L, --language=LANG
214 Sets language of input files to LANG.
215
216 More precisely, LANG can be any valid locale name (or alias with
217 +locale-alias feature) of some supported language. You can also
218 specify `none' as language name, only multibyte encodings are
219 recognised then. Run
220
221 enca --list languages
222
223 to get list of supported languages. When you don't specify any
224 language Enca tries to guess your language from locale settings
225 and assumes input files use this language. See section LAN‐
226 GUAGES for details.
227
228 Conversion parameters
229 give you finer control of how charset conversion will be performed.
230 They don't affect anything when -x is not specified as output type.
231 Please see section CONVERSION for the gory conversion details.
232
233 -C, --try-converters=LIST
234 Appends comma separated LIST to the list of converters that will
235 be tried when you ask for conversion. Their names can be abbre‐
236 viated as long as they are unambiguous. Run
237
238 enca --list converters
239
240 to get list of all valid converter names (and see section CON‐
241 VERSION for their description).
242
243 The default list depends on how Enca has been compiled, run
244
245 enca --help
246
247 to find out default converter list.
248
249 Note the default list is used only when you don't specify -C at
250 all. Otherwise, the list is built as if it were initially empty
251 and every -C adds new converter(s) to it. Moreover, specifying
252 none as converter name causes clearing the converter list.
253
254 -E, --external-converter-program=PATH
255 Sets external converter program name to PATH. Default external
256 converter depends on how enca has been complied, and the possi‐
257 bility to use external converters may not be available at all.
258 Run
259
260 enca --help
261
262 to find out default converter program in your enca build.
263
264 General options
265 don't fit to other option categories...
266
267 -p, --with-filename
268 Forces Enca to prefix each result with corresponding file name.
269 By default, Enca prefixes results with filenames when run on
270 multiple files.
271
272 Standard input is printed as STDIN and standard output as STDOUT
273 (the latter can be probably seen in error messages only).
274
275 -P, --no-filename
276 Forces Enca to not prefix results with file names. By default,
277 Enca doesn't prefix result with file name when run on a single
278 file (including standard input).
279
280 -V, --verbose
281 Increases verbosity level (each use increases it by one).
282
283 Currently this option in not very useful because different parts
284 of Enca respond differently to the same verbosity level, mostly
285 not at all.
286
287 Listings
288 are all terminal, i.e. when Enca encounters some of them it prints the
289 required listing and terminates without processing any following
290 options.
291
292 -h, --help
293 Prints brief usage help.
294
295 -G, --license
296 Prints full Enca license (through a pager, if possible).
297
298 -l, --list=WORD
299 Prints list specified by WORD (can be abbreviated as long as it
300 is unambiguous). Available lists include:
301
302 built-in-charsets. All encodings convertible by built-in con‐
303 verter, by group (both input and output encoding must be from
304 this list and belong to the same group for internal conversion).
305
306 built-in-encodings. Equivalent to built-in-charsets, but con‐
307 sidered obsolete; will be accepted with a warning, for a while.
308
309 converters. All valid converter names (to be used with -C).
310
311 charsets. All encodings (charsets). You can select what names
312 will be printed with --name or any name output type selector (of
313 course, only encodings having a name in given namespace will be
314 printed then), the selector must be specified before --list.
315
316 encodings. Equivalent to charsets, but considered obsolete;
317 will be accepted with a warning, for a while.
318
319 languages. All supported languages together with charsets
320 belonging to them. Note output type selects language name
321 style, not charset name style here.
322
323 names. All possible values of --name option.
324
325 lists. All possible values of this option. (Crazy?)
326
327 surfaces. All surfaces Enca recognises.
328
329 -v, --version
330 Prints program version and list of features (see section FEA‐
331 TURES).
332
334 Though Enca has been originally designed as a tool for guessing encod‐
335 ing only, it now features several methods of charset conversion. You
336 can control which of them will be used with -C.
337
338 Enca sequentially tries converters from the list specified by -C until
339 it finds some that is able to perform required conversion or until it
340 exhausts the list. You should specify preferred converters first, less
341 preferred later. External converter (extern) should be always speci‐
342 fied last, only as last resort, since it's usually not possible to
343 recover when it fails. The default list of converters always starts
344 with built-in and then continues with the first one available from:
345 librecode, iconv, nothing.
346
347 It should be noted when Enca says it is not able to perform the conver‐
348 sion it only means none of the converters is able to perform it. It
349 can be still possible to perform the required conversion in several
350 steps, using several converters, but to figure out how, human intelli‐
351 gence is probably needed.
352
353 Built-in converter
354 is the simplest and far the fastest of all, can perform only a few
355 byte-to-byte conversions and modifies files directly in place (may be
356 considered dangerous, but is pretty efficient). You can get list of
357 all encodings it can convert with
358
359 enca --list built-in
360
361 Beside speed, its main advantage (and also disadvantage) is that it
362 doesn't care: it simply converts characters having a representation in
363 target encoding, doesn't touch anything else and never prints any error
364 message.
365
366 This converter can be specified as built-in with -C.
367
368 Librecode converter
369 is an interface to GNU recode library, that does the actual recoding
370 job. It may or may not be compiled in; run
371
372 enca --version
373
374 to find out its availability in your enca build (feature +libre‐
375 code-interface).
376
377 You should be familiar with recode(1) before using it, since recode is
378 a quite sophisticated and powerful charset conversion tool. You may
379 run into problems using it together with Enca particularly because
380 Enca's support for surfaces not 100% compatible, because recode tries
381 too hard to make the transformation reversible, because it sometimes
382 silently ignores I/O errors, and because it's incredibly buggy. Please
383 see GNU recode info pages for details about recode library.
384
385 This converter can be specified as librecode with -C.
386
387 Iconv converter
388 is an interface to the UNIX98 iconv(3) conversion functions, that do
389 the actual recoding job. It may or may not be compiled in; run
390
391 enca --version
392
393 to find out its availability in your enca build (feature +iconv-inter‐
394 face).
395
396 While iconv is present on most today systems it only rarely offer some
397 useful set of available conversions, the only notable exception being
398 iconv from GNU libc. It is usually quite picky about surfaces, too
399 (while, at the same time, not implementing surface conversion). It
400 however probably represents the only standard(ized) tool able to per‐
401 form conversion from/to Unicode. Please see iconv documentation about
402 for details about its capabilities on your particular system.
403
404 This converter can be specified as iconv with -C.
405
406 External converter
407 is an arbitrary external conversion tool that can be specified with -E
408 option (at most one can be defined simultaneously). There are some
409 standard, provided together with enca: cstocs, recode, map, umap, and
410 piconv. All are wrapper scripts: for cstocs(1), recode(1), map(1),
411 umap(1), and piconv(1).
412
413 Please note enca has little control what the external converter really
414 does. If you set it to /bin/rm you are fully responsible for the con‐
415 sequences.
416
417 If you want to make your own converter to use with enca, you should
418 know it is always called
419
420 CONVERTER ENC_CURRENT ENC FILE [-]
421
422 where CONVERTER is what has been set by -E, ENC_CURRENT is detected
423 encoding, ENC is what has been specified with -x, and FILE is the file
424 to convert, i.e. it is called for each file separately. The optional
425 fourth parameter, -, should cause (when present) sending result of con‐
426 version to standard output instead of overwriting the file FILE. The
427 converter should also take care of not changing file permissions,
428 returning error code 1 when it fails and cleaning its temporary files.
429 Please see the standard external converters for examples.
430
431 This converter can be specified as extern with -C.
432
433 Default target charset
434 The straightforward way of specifying target charset is the -x option,
435 which overrides any defaults. When Enca is called as enconv, default
436 target charset is selected exactly the same way as recode(1) does it.
437
438 If the DEFAULT_CHARSET environment variable is set, it's used as the
439 target charset.
440
441 Otherwise, if you system provides the nl_langinfo(3) function, current
442 locale's native charset is used as the target charset.
443
444 When both methods fail, Enca complains and terminates.
445
446 Reversibility notes
447 If reversibility is crucial for you, you shouldn't use enca as con‐
448 verter at all (or maybe you can, with very specifically designed
449 recode(1) wrapper). Otherwise you should at least know that there four
450 basic means of handling inconvertible character entities:
451
452 fail--this is a possibility, too, and incidentally it's exactly what
453 current GNU libc iconv implementation does (recode can be also told to
454 do it)
455
456 don't touch them--this is what enca internal converter always does and
457 recode can do; though it is not reversible, a human being is usually
458 able to reconstruct the original (at least in principle)
459
460 approximate them--this is what cstocs can do, and recode too, though
461 differently; and the best choice if you just want to make the accursed
462 text readable
463
464 drop them out--this is what both recode and cstocs can do (cstocs can
465 also replace these characters by some fixed character instead of mere
466 ignoring); useful when the to-be-omitted characters contain only noise.
467
468 Please consult your favourite converter manual for details of this
469 issue. Generally, if you are not lucky enough to have all convertible
470 characters in you file, manual intervention is needed anyway.
471
472 Performance notes
473 Poor performance of available converters has been one of main reasons
474 for including built-in converter in enca. Try to use it whenever pos‐
475 sible, i.e. when files in consideration are charset-clean enough or
476 charset-messy enough so that its zero built-in intelligence doesn't
477 matter. It requires no extra disk space nor extra memory and can out‐
478 perform recode(1) more than 10 times on large files and Perl version
479 (i.e. the faster one) of cstocs(1) more than 400 times on small files
480 (in fact it's almost as fast as mere cp(1)).
481
482 Try to avoid external converters when it's not absolutely necessary
483 since all the forking and moving stuff around is incredibly slow.
484
486 You can get list of recognised character sets with
487
488 enca --list charsets
489
490 and using --name parameter you can select any name you want to be used
491 in the listing. You can also list all surfaces with
492
493 enca --list surfaces
494
495 Encoding and surface names are case insensitive and non-alphanumeric
496 characters are not taken into account. However, non-alphanumeric char‐
497 acters are mostly not allowed at all. The only allowed are: `-', `_',
498 `.', `:', and `/' (as charset/surface separator). So `ibm852' and
499 `IBM-852' are the same, while `IBM 852' is not accepted.
500
501 Charsets
502 Following list of recognised charsets uses Enca's names (-e) and verbal
503 descriptions as reported by Enca (-f):
504
505 ASCII 7bit ASCII characters
506 ISO-8859-2 ISO 8859-2 standard; ISO Latin 2
507 ISO-8859-4 ISO 8859-4 standard; Latin 4
508 ISO-8859-5 ISO 8859-5 standard; ISO Cyrillic
509 ISO-8859-13 ISO 8859-13 standard; ISO Baltic; Latin 7
510 ISO-8859-16 ISO 8859-16 standard
511 CP1125 MS-Windows code page 1125
512 CP1250 MS-Windows code page 1250
513 CP1251 MS-Windows code page 1251
514 CP1257 MS-Windows code page 1257; WinBaltRim
515 IBM852 IBM/MS code page 852; PC (DOS) Latin 2
516 IBM855 IBM/MS code page 855
517 IBM775 IBM/MS code page 775
518 IBM866 IBM/MS code page 866
519 baltic ISO-IR-179; Baltic
520 KEYBCS2 Kamenicky encoding; KEYBCS2
521 macce Macintosh Central European
522 maccyr Macintosh Cyrillic
523 ECMA-113 Ecma Cyrillic; ECMA-113
524 KOI-8_CS_2 KOI8-CS2 code (`T602')
525 KOI8-R KOI8-R Cyrillic
526 KOI8-U KOI8-U Cyrillic
527 KOI8-UNI KOI8-Unified Cyrillic
528 TeX (La)TeX control sequences
529 UCS-2 Universal character set 2 bytes; UCS-2; BMP
530 UCS-4 Universal character set 4 bytes; UCS-4; ISO-10646
531 UTF-7 Universal transformation format 7 bits; UTF-7
532 UTF-8 Universal transformation format 8 bits; UTF-8
533 CORK Cork encoding; T1
534 GBK Simplified Chinese National Standard; GB2312
535 BIG5 Traditional Chinese Industrial Standard; Big5
536 HZ HZ encoded GB2312
537 unknown Unrecognized encoding
538
539 where unknown is not any real encoding, it's reported when Enca is not
540 able to give a reliable answer.
541
542 Surfaces
543 Enca has some experimental support for so-called surfaces (see below).
544 It detects following surfaces (not all can be applied to all charsets):
545
546 /CR CR line terminators
547 /LF LF line terminators
548 /CRLF CRLF line terminators
549 N.A. Mixed line terminators
550 N.A. Surrounded by/intermixed with non-text data
551 /21 Byte order reversed in pairs (1,2 -> 2,1)
552 /4321 Byte order reversed in quadruples (1,2,3,4 -> 4,3,2,1)
553 N.A. Both little and big endian chunks, concatenated
554 /qp Quoted-printable encoded
555
556 Note some surfaces have N.A. in place of identifier--they cannot be
557 specified on command line, they can only be reported by Enca. This is
558 intentional because they only inform you why the file cannot be consid‐
559 ered surface-consistent instead of representing a real surface.
560
561 Each charset has its natural surface (called `implied' in recode) which
562 is not reported, e.g., for IBM 852 charset it's `CRLF line termina‐
563 tors'. For UCS encodings, big endian is considered as natural surface;
564 unusual byte orders are constructed from 21 and 4321 permutations: 2143
565 is reported simply as 21, while 3412 is reported as combination of 4321
566 and 21.
567
568 Doubly-encoded UTF-8 is neither charset nor surface, it's just
569 reported.
570
571 About charsets, encodings and surfaces
572 Charset is a set of character entities while encoding is its represen‐
573 tation in the terms of bytes and bits. In Enca, the word encoding
574 means the same as `representation of text', i.e. the relation between
575 sequence of character entities constituting the text and sequence of
576 bytes (bits) constituting the file.
577
578 So, encoding is both character set and so-called surface (line termina‐
579 tors, byte order, combining, Base64 transformation, etc.). Neverthe‐
580 less, it proves convenient to work with some {charset,surface} pairs as
581 with genuine charsets. So, as in recode(1), all UCS- and UTF- encod‐
582 ings of Universal character set are called charsets. Please see recode
583 documentation for more details of this issue.
584
585 The only good thing about surfaces is: when you don't start playing
586 with them, neither Enca won't start and it will try to behave as much
587 as possible as a surface-unaware program, even when talking to recode.
588
590 Enca needs to know the language of input files to work reliably, at
591 least in case of regular 8bit encoding. Multibyte encodings should be
592 recognised for any Latin, Cyrillic or Greek language.
593
594 You can (or have to) use -L option to tell Enca the language. Since
595 people most often work with files in the same language for which they
596 have configured locales, Enca tries tries to guess the language by
597 examining value of LC_CTYPE and other locale categories (please see
598 locale(7)) and using it for the language when you don't specify any.
599 Of course, it may be completely wrong and will give you nonsense
600 answers and damage your files, so please don't forget to use the -L
601 option. You can also use ENCAOPT environment variable to set a default
602 language (see section ENVIRONMENT).
603
604 Following languages are supported by Enca (each language is listed
605 together with supported 8bit encodings).
606
607 Belarusian CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855
608 Bulgarian CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
609 Czech ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
610 Estonian ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
611 Croatian CP1250 ISO-8859-2 IBM852 macce CORK
612 Hungarian ISO-8859-2 CP1250 IBM852 macce CORK
613 Lithuanian CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
614
615 Latvian CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
616 Polish ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
617 Russian KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
618 Slovak CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
619 Slovene ISO-8859-2 CP1250 IBM852 macce CORK
620 Ukrainian CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
621 Chinese GBK BIG5 HZ
622 none
623
624 The special language none can be shortened to __, it contains no 8bit
625 encodings, so only multibyte encodings are detected.
626
627 You can also use locale names instead of languages:
628
629 Belarusian be
630 Bulgarian bg
631 Czech cs
632 Estonian et
633 Croatian hr
634 Hungarian hu
635 Lithuanian lt
636 Latvian lv
637 Polish pl
638 Russian ru
639 Slovak sk
640 Slovene sl
641 Ukrainian uk
642 Chinese zh
643
645 Several Enca's features depend on what is available on your system and
646 how it was compiled. You can get their list with
647
648 enca --version
649
650 Plus sign before a feature name means it's available, minus sign means
651 this build lacks the particular feature.
652
653 librecode-interface. Enca has interface to GNU recode library charset
654 conversion functions.
655
656 iconv-interface. Enca has interface to UNIX98 iconv charset conversion
657 functions.
658
659 external-converter. Enca can use external conversion programs (if you
660 have some suitable installed).
661
662 language-detection. Enca tries to guess language (-L) from locales.
663 You don't need the --language option, at least in principle.
664
665 locale-alias. Enca is able to decrypt locale aliases used for language
666 names.
667
668 target-charset-auto. Enca tries to detect your preferred charset from
669 locales. Option --auto-convert and calling Enca as enconv works, at
670 least in principle.
671
672 ENCAOPT. Enca is able to correctly parse this environment variable
673 before command line parameters. Simple stuff like ENCAOPT="-L uk" will
674 work even without this feature.
675
677 The variable ENCAOPT can hold set of default Enca options. Its content
678 is interpreted before command line arguments. Unfortunately, this
679 doesn't work everywhere (must have +ENCAOPT feature).
680
681 LC_CTYPE, LC_COLLATE, LC_MESSAGES (possibly inherited from LC_ALL or
682 LANG) is used for guessing your language (must have +language-detection
683 feature).
684
685 The variable DEFAULT_CHARSET can be used by enconv as the default tar‐
686 get charset.
687
689 Enca returns exit code 0 when all input files were successfully pro‐
690 ceeded (i.e. all encodings were detected and all files were converted
691 to required encoding, if conversion was asked for). Exit code 1 is
692 returned when Enca wasn't able to either guess encoding or perform con‐
693 version on any input file because it's not clever enough. Exit code 2
694 is returned in case of serious (e.g. I/O) troubles.
695
697 It should be possible to let Enca work unattended, it's its goal. How‐
698 ever:
699
700 There's no warranty the detection works 100%. Don't bet on it, you can
701 easily lose valuable data.
702
703 Don't use enca (the program), link to libenca instead if you want any‐
704 thing resembling security. You have to perform the eventual conversion
705 yourself then.
706
707 Don't use external converters. Ideally, disable them compile-time.
708
709 Be aware of ENCAOPT and all the built-in automagic guessing various
710 things from environment, namely locales.
711
713 autoconvert(1), cstocs(1), file(1), iconv(1), iconv(3), nl_langinfo(3),
714 map(1), piconv(1), recode(1), locale(5), locale(7), ltt(1), umap(1),
715 unicode(7), utf-8(7), xcode(1)
716
718 It has too many unknown bugs.
719
720 The idea of using LC_* value for language is certainly braindead. How‐
721 ever I like it.
722
723 It can't backup files before mangling them.
724
725 In certain situations, it may behave incorrectly on >31bit file systems
726 and/or over NFS (both untested but shouldn't cause problems in prac‐
727 tice).
728
729 Built-in converter does not convert character `ch' from KOI8-CS2, and
730 possibly some other characters you've probably never heard about any‐
731 way.
732
733 EOL type recognition works poorly on Quoted-printable encoded files.
734 This should be fixed someday.
735
736 There are no command line options to tune libenca parameters. This is
737 intentional (Enca should DWIM) but sometimes this is a nuisance.
738
739 The manual page is too long, especially this section. This doesn't
740 matter since nobody does read it.
741
742 Send bug reports to <https://github.com/nijel/enca/issues>.
743
745 Enca is Extremely Naive Charset Analyser. Nevertheless, the `enc'
746 originally comes from `encoding' so the leading `e' should be read as
747 in `encoding' not as in `extreme'.
748
750 David Necas (Yeti) <yeti@physics.muni.cz>
751
752 Michal Cihar <michal@cihar.com>
753
754 Unicode data has been generated from various (free) on-line resources
755 or using GNU recode. Statistical data has been generated from various
756 texts on the Net, I hope character counting doesn't break anyone's
757 copyright.
758
760 Please see the file THANKS in distribution.
761
763 Copyright (C) 2000-2003 David Necas (Yeti).
764
765 Copyright (C) 2009 Michal Cihar <michal@cihar.com>.
766
767 Enca is free software; you can redistribute it and/or modify it under
768 the terms of version 2 of the GNU General Public License as published
769 by the Free Software Foundation.
770
771 Enca is distributed in the hope that it will be useful, but WITHOUT ANY
772 WARRANTY; without even the implied warranty of MERCHANTABILITY or FIT‐
773 NESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
774 more details.
775
776 You should have received a copy of the GNU General Public License along
777 with Enca; if not, write to the Free Software Foundation, Inc., 675
778 Mass Ave, Cambridge, MA 02139, USA.
779
780
781
782
783enca 1.11 Sep 2009 enca(1)