1CONVMV(1)                                                            CONVMV(1)
2
3
4

NAME

6       convmv - converts filenames from one encoding to another
7

SYNOPSIS

9       convmv [options] FILE(S) ... DIRECTORY(S)
10

OPTIONS

12       -f ENCODING
13           specify the current encoding of the filename(s) from which should
14           be converted
15
16       -t ENCODING
17           specify the encoding to which the filename(s) should be converted
18
19       -i  interactive mode (ask y/n for each action)
20
21       -r  recursively go through directories
22
23       --nfc
24           target files will be normalization form C for UTF-8 (Linux etc.)
25
26       --nfd
27           target files will be normalization form D for UTF-8 (OS X etc.).
28
29       --qfrom , --qto
30           be more quiet about the "from" or "to" of a rename (if it screws up
31           your terminal e.g.). This will in fact do nothing else than replace
32           any non-ASCII character (bytewise) with ? and any control character
33           with * on printout, this does not affect rename operation itself.
34
35       --exec command
36           execute the given command. You have to quote the command and #1
37           will be substituted by the old, #2 by the new filename. Using this
38           option link targets will stay untouched. Have in mind that #1 and
39           #2 will be quoted by convmv already, you must not add extra
40           quotation marks around them.
41
42           Example:
43
44           convmv -f latin1 -t utf-8 -r --exec "echo #1 should be renamed to
45           #2" path/to/files
46
47       --list
48           list all available encodings. To get support for more Chinese or
49           Japanese encodings install the Perl HanExtra or JIS2K Encode
50           packages.
51
52       --lowmem
53           keep memory footprint low by not creating a hash of all files. This
54           disables checking if symlink targets are in subtree. Symlink target
55           pointers will be converted regardlessly. If you convert multiple
56           hundredthousands or millions of files the memory usage of convmv
57           might grow quite high. This option would help you out in that case.
58
59       --nosmart
60           by default convmv will detect if a filename is already UTF8 encoded
61           and will skip this file if conversion from some charset to UTF8
62           should be performed.  "--nosmart" will also force conversion to
63           UTF-8 for such files, which might result in "double encoded UTF-8"
64           (see section below).
65
66       --fixdouble
67           using the "--fixdouble" option convmv does only convert files which
68           will still be UTF-8 encoded after conversion. That's useful for
69           fixing double-encoded UTF-8 files. All files which are not UTF-8 or
70           will not result in UTF-8 after conversion will not be touched. Also
71           see chapter "How to undo double UTF-8 ..."  below.
72
73       --notest
74           Needed to actually rename the files. By default convmv will just
75           print what it wants to do.
76
77       --parsable
78           This is an advanced option that people who want to write a GUI
79           front end will find useful (some others maybe, too). It will convmv
80           make print out what it would do in an easy parsable way. The first
81           column contains the action or some kind of information, the second
82           column mostly contains the file that is to be modified and if
83           appropriate the third column contains the modified value.  Each
84           column is separated by \0\n (nullbyte newline). Each row (one
85           action) is separated by \0\0\n (nullbyte nullbyte newline).
86
87       --no-preserve-mtimes
88           modifying filenames usually causes the parent directory's mtime
89           being updated.  Since version 2 convmv by default resets the mtime
90           to the old value. If your filesystem supports sub-second resolution
91           the sub-second part of the atime and mtime will be lost as Perl
92           does not yet support that. With this option you can disable the
93           preservation of the mtimes.
94
95       --replace
96           if the file to which shall be renamed already exists, it will be
97           overwritten if the other file content is equal.
98
99       --unescape
100           this option will remove this ugly % hex sequences from filenames
101           and turn them into (hopefully) nicer 8-bit characters. After
102           --unescape you might want to do a charset conversion. This
103           sequences like %20 etc. are sometimes produced when downloading via
104           http or ftp.
105
106       --upper , --lower
107           turn filenames into all upper or all lower case. When the file is
108           not ASCII-encoded, convmv expects a charset to be entered via the
109           -f switch.
110
111       --map=some-extra-mapping
112           apply some custom character mappings, currently supported are:
113
114           ntfs-sfm(-undo), ntfs-sfu(-undo) for the mapping of illegal ntfs
115           characters for Linux or Macintosh cifs clients (see MS KB 117258
116           also mapchars mount option of mount.cifs on Linux).
117
118           ntfs-pretty(-undo) for for the mapping of illegal ntfs characters
119           to pretty legal Japanese versions of them.
120
121           See the map_get_newname() function how to easily add own mappings
122           if needed.  Let me know if you think convmv is missing some useful
123           mapping here.
124
125       --dotlessi
126           care about the dotless i/I issue. A lowercase version of "I" will
127           also be dotless while an uppercase version of "i" will also be
128           dotted. This is an issue for Turkish and Azeri.
129
130           By the way: The superscript dot of the letter i was added in the
131           Middle Ages to distinguish the letter (in manuscripts) from
132           adjacent vertical strokes in such letters as u, m, and n. J is a
133           variant form of i which emerged at this time and subsequently
134           became a separate letter.
135
136       --help
137           print a short summary of available options
138
139       --dump-options
140           print a list of all available options
141

DESCRIPTION

143       convmv is meant to help convert a single filename, a directory tree and
144       the contained files or a whole filesystem into a different encoding. It
145       just converts the filenames, not the content of the files. A special
146       feature of convmv is that it also takes care of symlinks, also converts
147       the symlink target pointer in case the symlink target is being
148       converted, too.
149
150       All this comes in very handy when one wants to switch over from old
151       8-bit locales to UTF-8 locales. It is also possible to convert
152       directories to UTF-8 which are already partly UTF-8 encoded. convmv is
153       able to detect if certain files are UTF-8 encoded and will skip them by
154       default. To turn this smartness off use the "--nosmart" switch.
155
156   Filesystem issues
157       Almost all POSIX filesystems do not care about how filenames are
158       encoded, here are some exceptions:
159
160       HFS+ on OS X / Darwin
161
162       Linux and (most?) other Unix-like operating systems use the so called
163       normalization form C (NFC) for its UTF-8 encoding by default but do not
164       enforce this.  Darwin, the base of the Macintosh OS enforces
165       normalization form D (NFD), where a few characters are encoded in a
166       different way. On OS X it's not possible to create NFC UTF-8 filenames
167       because this is prevented at filesystem layer.  On HFS+ filenames are
168       internally stored in UTF-16 and when converted back to UTF-8, for the
169       underlying BSD system to be handable, NFD is created.  See
170       http://developer.apple.com/qa/qa2001/qa1173.html for defails. I think
171       it was a very bad idea and breaks many things under OS X which expect a
172       normal POSIX conforming system. Anywhere else convmv is able to convert
173       files from NFC to NFD or vice versa which makes interoperability with
174       such systems a lot easier.
175
176       JFS
177
178       If people mount JFS partitions with iocharset=utf8, there is a similar
179       problem, because JFS is designed to store filenames internally in
180       UTF-16, too; that is because Linux' JFS is really JFS2, which was a
181       rewrite of JFS for OS/2. JFS partitions should always be mounted with
182       iocharset=iso8859-1, which is also the default with recent 2.6.6
183       kernels. If this is not done, JFS does not behave like a POSIX
184       filesystem and it might happen that certain files cannot be created at
185       all, for example filenames in ISO-8859-1 encoding. Only when
186       interoperation with OS/2 is needed iocharset should be set according to
187       your used locale charmap.
188
189       NFS4
190
191       Despite other POSIX filesystems RFC3530 (NFS 4) mandates UTF-8 but also
192       says: "The nfs4_cs_prep profile does not specify a normalization form.
193       A later revision of this specification may specify a particular
194       normalization form." In other words, if you want to use NFS4 you might
195       find the conversion and normalization features of convmv quite useful.
196
197       FAT/VFAT and NTFS
198
199       NTFS and VFAT (for long filenames) use UTF-16 internally to store
200       filenames.  You should not need to convert filenames if you mount one
201       of those filesystems.  Use appropriate mount options instead!
202
203   How to undo double UTF-8 (or other) encoded filenames
204       Sometimes it might happen that you "double-encoded" certain filenames,
205       for example the file names already were UTF-8 encoded and you
206       accidently did another conversion from some charset to UTF-8. You can
207       simply undo that by converting that the other way round. The from-
208       charset has to be UTF-8 and the to-charset has to be the from-charset
209       you previously accidently used.  If you use the "--fixdouble" option
210       convmv will make sure that only files will be processed that will still
211       be UTF-8 encoded after conversion and it will leave non-UTF-8 files
212       untouched. You should check to get the correct results by doing the
213       conversion without "--notest" before, also the "--qfrom" option might
214       be helpful, because the double utf-8 file names might screw up your
215       terminal if they are being printed - they often contain control
216       sequences which do funny things with your terminal window. If you are
217       not sure about the charset which was accidently converted from, using
218       "--qfrom" is a good way to fiddle out the required encoding without
219       destroying the file names finally.
220
221   How to repair Samba files
222       When in the smb.conf (of Samba 2.x) there hasn't been set a correct
223       "character set" variable, files which are created from Win* clients are
224       being created in the client's codepage, e.g. cp850 for western european
225       languages. As a result of that the files which contain non-ASCII
226       characters are screwed up if you "ls" them on the Unix server. If you
227       change the "character set" variable afterwards to iso8859-1, newly
228       created files are okay, but the old files are still screwed up in the
229       Windows encoding. In this case convmv can also be used to convert the
230       old Samba-shared files from cp850 to iso8859-1.
231
232       By the way: Samba 3.x finally maps to UTF-8 filenames by default, so
233       also when you migrate from Samba 2 to Samba 3 you might have to convert
234       your file names.
235
236   Netatalk interoperability issues
237       When Netatalk is being switched to UTF-8 which is supported in version
238       2 then it is NOT sufficient to rename the file names. There needs to be
239       done more. See
240       http://netatalk.sourceforge.net/2.0/htmldocs/upgrade.html#volumes-and-filenames
241       and the uniconv utility of Netatalk for details.
242

SEE ALSO

244       locale(1) utf-8(7) charsets(7)
245

BUGS

247       no bugs or fleas known
248
250       You can support convmv by doing a donation, see
251       <https://www.j3e.de/donate.html>
252

AUTHOR

254       Bjoern JACKE
255
256       Send mail to bjoern [at] j3e.de for bug reports and suggestions.
257
258
259
260perl v5.26.3                      2017-05-04                         CONVMV(1)
Impressum