1CONVMV(1) CONVMV(1)
2
3
4
6 convmv - converts filenames from one encoding to another
7
9 convmv [options] FILE(S) ... DIRECTORY(S)
10
12 -f ENCODING
13 specify the current encoding of the filename(s) from which should
14 be converted
15
16 -t ENCODING
17 specify the encoding to which the filename(s) should be converted
18
19 -i interactive mode (ask y/n for each action)
20
21 -r recursively go through directories
22
23 --nfc
24 target files will be normalization form C for UTF-8 (Linux etc.)
25
26 --nfd
27 target files will be normalization form D for UTF-8 (OS X etc.).
28
29 --qfrom , --qto
30 be more quiet about the "from" or "to" of a rename (if it screws up
31 your terminal e.g.). This will in fact do nothing else than replace
32 any non-ASCII character (bytewise) with ? and any control character
33 with * on printout, this does not affect rename operation itself.
34
35 --exec command
36 execute the given command. You have to quote the command and #1
37 will be substituted by the old, #2 by the new filename. Using this
38 option link targets will stay untouched. Have in mind that #1 and
39 #2 will be quoted by convmv already, you must not add extra
40 quotation marks around them.
41
42 Example:
43
44 convmv -f latin1 -t utf-8 -r --exec "echo #1 should be renamed to
45 #2" path/to/files
46
47 --list
48 list all available encodings. To get support for more Chinese or
49 Japanese encodings install the Perl HanExtra or JIS2K Encode
50 packages.
51
52 --lowmem
53 keep memory footprint low by not creating a hash of all files. This
54 disables checking if symlink targets are in subtree. Symlink target
55 pointers will be converted regardlessly. If you convert multiple
56 hundredthousands or millions of files the memory usage of convmv
57 might grow quite high. This option would help you out in that case.
58
59 --nosmart
60 by default convmv will detect if a filename is already UTF8 encoded
61 and will skip this file if conversion from some charset to UTF8
62 should be performed. "--nosmart" will also force conversion to
63 UTF-8 for such files, which might result in "double encoded UTF-8"
64 (see section below).
65
66 --fixdouble
67 using the "--fixdouble" option convmv does only convert files which
68 will still be UTF-8 encoded after conversion. That's useful for
69 fixing double-encoded UTF-8 files. All files which are not UTF-8 or
70 will not result in UTF-8 after conversion will not be touched. Also
71 see chapter "How to undo double UTF-8 ..." below.
72
73 --notest
74 Needed to actually rename the files. By default convmv will just
75 print what it wants to do.
76
77 --parsable
78 This is an advanced option that people who want to write a GUI
79 front end will find useful (some others maybe, too). It will convmv
80 make print out what it would do in an easy parsable way. The first
81 column contains the action or some kind of information, the second
82 column mostly contains the file that is to be modified and if
83 appropriate the third column contains the modified value. Each
84 column is separated by \0\n (nullbyte newline). Each row (one
85 action) is separated by \0\0\n (nullbyte nullbyte newline).
86
87 --no-preserve-mtimes
88 modifying filenames usually causes the parent directory's mtime
89 being updated. Since version 2 convmv by default resets the mtime
90 to the old value. If your filesystem supports sub-second resolution
91 the sub-second part of the atime and mtime will be lost as Perl
92 does not yet support that. With this option you can disable the
93 preservation of the mtimes.
94
95 --replace
96 if the file to which shall be renamed already exists, it will be
97 overwritten if the other file content is equal.
98
99 --unescape
100 this option will remove this ugly % hex sequences from filenames
101 and turn them into (hopefully) nicer 8-bit characters. After
102 --unescape you might want to do a charset conversion. This
103 sequences like %20 etc. are sometimes produced when downloading via
104 http or ftp.
105
106 --upper , --lower
107 turn filenames into all upper or all lower case. When the file is
108 not ASCII-encoded, convmv expects a charset to be entered via the
109 -f switch.
110
111 --map=some-extra-mapping
112 apply some custom character mappings, currently supported are:
113
114 ntfs-sfm(-undo), ntfs-sfu(-undo) for the mapping of illegal ntfs
115 characters for Linux or Macintosh cifs clients (see MS KB 117258
116 also mapchars mount option of mount.cifs on Linux).
117
118 ntfs-pretty(-undo) for for the mapping of illegal ntfs characters
119 to pretty legal Japanese versions of them.
120
121 See the map_get_newname() function how to easily add own mappings
122 if needed. Let me know if you think convmv is missing some useful
123 mapping here.
124
125 --dotlessi
126 care about the dotless i/I issue. A lowercase version of "I" will
127 also be dotless while an uppercase version of "i" will also be
128 dotted. This is an issue for Turkish and Azeri.
129
130 By the way: The superscript dot of the letter i was added in the
131 Middle Ages to distinguish the letter (in manuscripts) from
132 adjacent vertical strokes in such letters as u, m, and n. J is a
133 variant form of i which emerged at this time and subsequently
134 became a separate letter.
135
136 --help
137 print a short summary of available options
138
139 --dump-options
140 print a list of all available options
141
143 convmv is meant to help convert a single filename, a directory tree and
144 the contained files or a whole filesystem into a different encoding. It
145 just converts the filenames, not the content of the files. A special
146 feature of convmv is that it also takes care of symlinks, also converts
147 the symlink target pointer in case the symlink target is being
148 converted, too.
149
150 All this comes in very handy when one wants to switch over from old
151 8-bit locales to UTF-8 locales. It is also possible to convert
152 directories to UTF-8 which are already partly UTF-8 encoded. convmv is
153 able to detect if certain files are UTF-8 encoded and will skip them by
154 default. To turn this smartness off use the "--nosmart" switch.
155
156 Filesystem issues
157 Almost all POSIX filesystems do not care about how filenames are
158 encoded, here are some exceptions:
159
160 HFS+ on OS X / Darwin
161
162 Linux and (most?) other Unix-like operating systems use the so called
163 normalization form C (NFC) for its UTF-8 encoding by default but do not
164 enforce this. Darwin, the base of the Macintosh OS enforces
165 normalization form D (NFD), where a few characters are encoded in a
166 different way. On OS X it's not possible to create NFC UTF-8 filenames
167 because this is prevented at filesystem layer. On HFS+ filenames are
168 internally stored in UTF-16 and when converted back to UTF-8, for the
169 underlying BSD system to be handable, NFD is created. See
170 http://developer.apple.com/qa/qa2001/qa1173.html for defails. I think
171 it was a very bad idea and breaks many things under OS X which expect a
172 normal POSIX conforming system. Anywhere else convmv is able to convert
173 files from NFC to NFD or vice versa which makes interoperability with
174 such systems a lot easier.
175
176 JFS
177
178 If people mount JFS partitions with iocharset=utf8, there is a similar
179 problem, because JFS is designed to store filenames internally in
180 UTF-16, too; that is because Linux' JFS is really JFS2, which was a
181 rewrite of JFS for OS/2. JFS partitions should always be mounted with
182 iocharset=iso8859-1, which is also the default with recent 2.6.6
183 kernels. If this is not done, JFS does not behave like a POSIX
184 filesystem and it might happen that certain files cannot be created at
185 all, for example filenames in ISO-8859-1 encoding. Only when
186 interoperation with OS/2 is needed iocharset should be set according to
187 your used locale charmap.
188
189 NFS4
190
191 Despite other POSIX filesystems RFC3530 (NFS 4) mandates UTF-8 but also
192 says: "The nfs4_cs_prep profile does not specify a normalization form.
193 A later revision of this specification may specify a particular
194 normalization form." In other words, if you want to use NFS4 you might
195 find the conversion and normalization features of convmv quite useful.
196
197 FAT/VFAT and NTFS
198
199 NTFS and VFAT (for long filenames) use UTF-16 internally to store
200 filenames. You should not need to convert filenames if you mount one
201 of those filesystems. Use appropriate mount options instead!
202
203 How to undo double UTF-8 (or other) encoded filenames
204 Sometimes it might happen that you "double-encoded" certain filenames,
205 for example the file names already were UTF-8 encoded and you
206 accidently did another conversion from some charset to UTF-8. You can
207 simply undo that by converting that the other way round. The from-
208 charset has to be UTF-8 and the to-charset has to be the from-charset
209 you previously accidently used. If you use the "--fixdouble" option
210 convmv will make sure that only files will be processed that will still
211 be UTF-8 encoded after conversion and it will leave non-UTF-8 files
212 untouched. You should check to get the correct results by doing the
213 conversion without "--notest" before, also the "--qfrom" option might
214 be helpful, because the double utf-8 file names might screw up your
215 terminal if they are being printed - they often contain control
216 sequences which do funny things with your terminal window. If you are
217 not sure about the charset which was accidently converted from, using
218 "--qfrom" is a good way to fiddle out the required encoding without
219 destroying the file names finally.
220
221 How to repair Samba files
222 When in the smb.conf (of Samba 2.x) there hasn't been set a correct
223 "character set" variable, files which are created from Win* clients are
224 being created in the client's codepage, e.g. cp850 for western european
225 languages. As a result of that the files which contain non-ASCII
226 characters are screwed up if you "ls" them on the Unix server. If you
227 change the "character set" variable afterwards to iso8859-1, newly
228 created files are okay, but the old files are still screwed up in the
229 Windows encoding. In this case convmv can also be used to convert the
230 old Samba-shared files from cp850 to iso8859-1.
231
232 By the way: Samba 3.x finally maps to UTF-8 filenames by default, so
233 also when you migrate from Samba 2 to Samba 3 you might have to convert
234 your file names.
235
236 Netatalk interoperability issues
237 When Netatalk is being switched to UTF-8 which is supported in version
238 2 then it is NOT sufficient to rename the file names. There needs to be
239 done more. See
240 http://netatalk.sourceforge.net/2.0/htmldocs/upgrade.html#volumes-and-filenames
241 and the uniconv utility of Netatalk for details.
242
244 locale(1) utf-8(7) charsets(7)
245
247 no bugs or fleas known
248
250 You can support convmv by doing a donation, see
251 <https://www.j3e.de/donate.html>
252
254 Bjoern JACKE
255
256 Send mail to bjoern [at] j3e.de for bug reports and suggestions.
257
258
259
260perl v5.26.3 2017-05-04 CONVMV(1)