1
2UTRAC(1) Alliance MCA UTRAC(1)
3
4
5
7 utrac - recognize and convert charset and end-of-line of text files
8
9
10
12 utrac [OPTION] [FILE]
13
14
16 Utrac is a tool (and a library) that recognize the charset and the end
17 of line type used in a text file. It can also convert it. In case of
18 8bits charsets, recognition is not sure, so it can also assist the user
19 to choose the correct charset, for instance by filtering the text and
20 displaying only lines that matter.
21
22
24 With no FILE, read standard input. With no OPTION, recognize and write
25 converted text to standard output.
26
27
28 -p, --print-charset
29 Print the name of the charset that suits best the input file.
30
31
32 -P, --print-all-charset
33 Print ranked list of charsets. The first column is the mark
34 with locale bonus (language and system), the second is the mark
35 brut, the third is the checksum of all extended character (to
36 know which charsets produce the same results) and the fourth is
37 the charset name (on the same line if their mark with bonus and
38 their checksum are identical).
39 If the recognition is sure (ASCII or UTF-8), print only one
40 name.
41
42
43 -f, --from
44 Force input charset (disable recognition) and/or EOL. For
45 instance, "UTF-8/CRLF".
46
47
48 -t, --to
49 Select output charset and/or EOL. See above.
50
51
52 -L, --language
53 Select language. All charsets that fit this language will get a
54 bonus during recognition. If none specified, LC_* variables are
55 used.
56
57 -S, --system
58 Select system. All charsets that fit this language will get a
59 bonus during recognition.
60
61
62 -x, --ext-chars
63 Print lines with extended characters (try to print each extended
64 character not more than once).
65
66
67 -d, --distribution
68 Print distribution, i.e. the count of each 8bits character.
69
70
71 -a, --all-ext-chars
72 Print each extended character of the file in each different
73 charset (UTF-8 output is recommended).
74
75
76 -c, --colors
77 (with -x or -a) Use color.
78
79
80 -b, --bar
81 Display a progress bar.
82
83
84 -i, --info
85 Print default/chosen parameters.
86
87
88 -l, --list
89 List charsets/eol/languages/systems.
90
91
92 -h, --help
93 Print some help.
94
95 -v, --version
96 Print version.
97
98
100 charset.dat
101 This file should be located in /usr/local/share/utrac/ or
102 /usr/share/utrac/. It contains informations about charsets and
103 their related charmap. If you want to add new charsets (they
104 must be 8bits and ASCII compatible), check the script merge.pl
105 in Utrac source directory.
106
107
109 Utrac is still a beta version, so you can expect to find some bugs...
110 Please report them to <antoine@alliancemca.net>. If you have a text
111 file that is not well recognize by Utrac, please send it to improve the
112 recognition algorithm.
113
114
116 Written by Antoine Calando <antoine@alliancemca.net>.
117
118
120 Copyright © 2004 Alliance MCA.
121 This is free software; see the source for copying conditions. There is
122 NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
123 PURPOSE.
124
125
127 You can find more documentation from http://utrac.sourceforge.net
128
129
130
131Utrac 0.3 January 2005 UTRAC(1)