convmv(1)

1CONVMV(1)                                                            CONVMV(1)
2
3
4

NAME

6       convmv - converts filenames from one encoding to another
7

SYNOPSIS

9       convmv [options] FILE(S) ... DIRECTORY(S)
10

OPTIONS

12       -f ENCODING
13           specify the current encoding of the filename(s) from which should
14           be converted
15
16       -t ENCODING
17           specify the encoding to which the filename(s) should be converted
18
19       -i  interactive mode (ask y/n for each action)
20
21       -r  recursively go through directories
22
23       --nfc
24           target files will be normalization form C for UTF-8 (Linux etc.)
25
26       --nfd
27           target files will be normalization form D for UTF-8 (OS X etc.).
28
29       --qfrom , --qto
30           be more quiet about the "from" or "to" of a rename (if it screws up
31           your terminal e.g.). This will in fact do nothing else than replace
32           any non-ASCII character (bytewise) with ? and any control character
33           with * on printout, this does not affect rename operation itself.
34
35       --exec command
36           execute the given command. You have to quote the command and #1
37           will be substituted by the old, #2 by the new filename. Using this
38           option link targets will stay untouched. Have in mind that #1 and
39           #2 will be quoted by convmv already, you must not add extra
40           quotation marks around them.
41
42           Example:
43
44           convmv -f latin1 -t utf-8 -r --exec "echo #1 should be renamed to
45           #2" path/to/files
46
47       --list
48           list all available encodings. To get support for more Chinese or
49           Japanese encodings install the Perl HanExtra or JIS2K Encode
50           packages.
51
52       --lowmem
53           keep memory footprint low by not creating a hash of all files. This
54           disables checking if symlink targets are in subtree. Symlink target
55           pointers will be converted regardlessly. If you convert multiple
56           hundredthousands or millions of files the memory usage of convmv
57           might grow quite high. This option would help you out in that case.
58
59       --nosmart
60           by default convmv will detect if a filename is already UTF8 encoded
61           and will skip this file if conversion from some charset to UTF8
62           should be performed.  "--nosmart" will also force conversion to
63           UTF-8 for such files, which might result in "double encoded UTF-8"
64           (see section below).
65
66       --fixdouble
67           using the "--fixdouble" option convmv does only convert files which
68           will still be UTF-8 encoded after conversion. That's useful for
69           fixing double-encoded UTF-8 files. All files which are not UTF-8 or
70           will not result in UTF-8 after conversion will not be touched. Also
71           see chapter "How to undo double UTF-8 ..."  below.
72
73       --notest
74           Needed to actually rename the files. By default convmv will just
75           print what it wants to do.
76
77       --parsable
78           This is an advanced option that people who want to write a GUI
79           front end will find useful (some others maybe, too). It will convmv
80           make print out what it would do in an easy parsable way. The first
81           column contains the action or some kind of information, the second
82           column mostly contains the file that is to be modified and if
83           appropriate the third column contains the modified value.  Each
84           column is separated by \0\n (nullbyte newline). Each row (one
85           action) is separated by \0\0\n (nullbyte nullbyte newline).
86
87       --run-parsable
88           This option can be used to blindly execute the output of a previous
89           --parsable run.  This way it's possible to rename a huge amount of
90           file in a minimum of time.
91
92       --no-preserve-mtimes
93           modifying filenames usually causes the parent directory's mtime
94           being updated.  Since version 2 convmv by default resets the mtime
95           to the old value. If your filesystem supports sub-second resolution
96           the sub-second part of the atime and mtime will be lost as Perl
97           does not yet support that. With this option you can disable the
98           preservation of the mtimes.
99
100       --replace
101           if the file to which shall be renamed already exists, it will be
102           overwritten if the other file content is equal.
103
104       --unescape
105           this option will remove this ugly % hex sequences from filenames
106           and turn them into (hopefully) nicer 8-bit characters. After
107           --unescape you might want to do a charset conversion. This
108           sequences like %20 etc. are sometimes produced when downloading via
109           http or ftp.
110
111       --upper , --lower
112           turn filenames into all upper or all lower case. When the file is
113           not ASCII-encoded, convmv expects a charset to be entered via the
114           -f switch.
115
116       --map=some-extra-mapping
117           apply some custom character mappings, currently supported are:
118
119           ntfs-sfm(-undo), ntfs-sfu(-undo) for the mapping of illegal ntfs
120           characters for Linux or Macintosh cifs clients (see MS KB 117258
121           also mapchars mount option of mount.cifs on Linux).
122
123           ntfs-pretty(-undo) for for the mapping of illegal ntfs characters
124           to pretty legal Japanese versions of them.
125
126           See the map_get_newname() function how to easily add own mappings
127           if needed.  Let me know if you think convmv is missing some useful
128           mapping here.
129
130       --dotlessi
131           care about the dotless i/I issue. A lowercase version of "I" will
132           also be dotless while an uppercase version of "i" will also be
133           dotted. This is an issue for Turkish and Azeri.
134
135           By the way: The superscript dot of the letter i was added in the
136           Middle Ages to distinguish the letter (in manuscripts) from
137           adjacent vertical strokes in such letters as u, m, and n. J is a
138           variant form of i which emerged at this time and subsequently
139           became a separate letter.
140
141       --caseful-sz
142           let convmv convert the sz ligature (U+00DF) to the uppercase
143           version (U+1E9E) and vice versa. As of 2017 most fs case mapping
144           tables don't treat those two code points as case equivalents. Thus
145           the default of convmv is to treat it caseless for now also (unless
146           this option is used).
147
148       --help
149           print a short summary of available options
150
151       --dump-options
152           print a list of all available options
153

DESCRIPTION

155       convmv is meant to help convert a single filename, a directory tree and
156       the contained files or a whole filesystem into a different encoding. It
157       just converts the filenames, not the content of the files. A special
158       feature of convmv is that it also takes care of symlinks, also converts
159       the symlink target pointer in case the symlink target is being
160       converted, too.
161
162       All this comes in very handy when one wants to switch over from old
163       8-bit locales to UTF-8 locales. It is also possible to convert
164       directories to UTF-8 which are already partly UTF-8 encoded. convmv is
165       able to detect if certain files are UTF-8 encoded and will skip them by
166       default. To turn this smartness off use the "--nosmart" switch.
167
168   Filesystem issues
169       Almost all POSIX filesystems do not care about how filenames are
170       encoded, here are some exceptions:
171
172       HFS+ on OS X / Darwin
173
174       Linux and (most?) other Unix-like operating systems use the so called
175       normalization form C (NFC) for its UTF-8 encoding by default but do not
176       enforce this. HFS+ on the Macintosh OS enforces normalization form D
177       (NFD), where a few characters are encoded in a different way. On OS X
178       it's not possible to create NFC UTF-8 filenames because this is
179       prevented at filesystem layer.  On HFS+ filenames are internally stored
180       in UTF-16 and when converted back to UTF-8 (because the Unix based OS
181       can't deal with UTF-16 directly), NFD is created for whatever reason.
182       See http://developer.apple.com/qa/qa2001/qa1173.html for defails. I
183       think it was a very bad idea and breaks many things under OS X which
184       expect a normal POSIX conforming system. Anywhere else convmv is able
185       to convert files from NFC to NFD or vice versa which makes
186       interoperability with such systems a lot easier.
187
188       APFS on macOS
189
190       Apple, with the introduction of APFS in macOS 10.3, gave up to impose
191       NFD on user space. But once you enforced NFD there is no easy way back
192       without breaking existing applications. So they had to make APFS
193       normalization-insensitive, that means a file can be created in NFC or
194       NFD in the filesystem and it can be accessed with both forms also.
195       Under the hood they store hashes of the normalized form of the filename
196       to provide normalization insensitivity. Sounds like a great idea? Let's
197       see: If you readddir a directory, you will get back the files in the
198       the normalization form that was used when those files were created. If
199       you stat a file in NFC or in NFD form you will get back whatever
200       normalization form you used in the stat call. So user space
201       applications can't expect that a file that can be stat'ed and accessed
202       successfully, is also part of directory listings because the returned
203       normalization form is faked to match what the user asked for.
204       Theoretically also user space will have to normalize strings all the
205       time. This is the same problem as for the case insensitivity of
206       filenames before, which still breaks many user space applications. Just
207       that the latter one was much more obvious to spot and to implement than
208       this thing. So long, and thanks for all the fish.
209
210       JFS
211
212       If people mount JFS partitions with iocharset=utf8, there is a similar
213       problem, because JFS is designed to store filenames internally in
214       UTF-16, too; that is because Linux' JFS is really JFS2, which was a
215       rewrite of JFS for OS/2. JFS partitions should always be mounted with
216       iocharset=iso8859-1, which is also the default with recent 2.6.6
217       kernels. If this is not done, JFS does not behave like a POSIX
218       filesystem and it might happen that certain files cannot be created at
219       all, for example filenames in ISO-8859-1 encoding. Only when
220       interoperation with OS/2 is needed iocharset should be set according to
221       your used locale charmap.
222
223       NFS4
224
225       Despite other POSIX filesystems RFC3530 (NFS 4) mandates UTF-8 but also
226       says: "The nfs4_cs_prep profile does not specify a normalization form.
227       A later revision of this specification may specify a particular
228       normalization form." In other words, if you want to use NFS4 you might
229       find the conversion and normalization features of convmv quite useful.
230
231       FAT/VFAT and NTFS
232
233       NTFS and VFAT (for long filenames) use UTF-16 internally to store
234       filenames.  You should not need to convert filenames if you mount one
235       of those filesystems.  Use appropriate mount options instead!
236
237   How to undo double UTF-8 (or other) encoded filenames
238       Sometimes it might happen that you "double-encoded" certain filenames,
239       for example the file names already were UTF-8 encoded and you
240       accidently did another conversion from some charset to UTF-8. You can
241       simply undo that by converting that the other way round. The from-
242       charset has to be UTF-8 and the to-charset has to be the from-charset
243       you previously accidently used.  If you use the "--fixdouble" option
244       convmv will make sure that only files will be processed that will still
245       be UTF-8 encoded after conversion and it will leave non-UTF-8 files
246       untouched. You should check to get the correct results by doing the
247       conversion without "--notest" before, also the "--qfrom" option might
248       be helpful, because the double utf-8 file names might screw up your
249       terminal if they are being printed - they often contain control
250       sequences which do funny things with your terminal window. If you are
251       not sure about the charset which was accidently converted from, using
252       "--qfrom" is a good way to fiddle out the required encoding without
253       destroying the file names finally.
254
255   How to repair Samba files
256       When in the smb.conf (of Samba 2.x) there hasn't been set a correct
257       "character set" variable, files which are created from Win* clients are
258       being created in the client's codepage, e.g. cp850 for western european
259       languages. As a result of that the files which contain non-ASCII
260       characters are screwed up if you "ls" them on the Unix server. If you
261       change the "character set" variable afterwards to iso8859-1, newly
262       created files are okay, but the old files are still screwed up in the
263       Windows encoding. In this case convmv can also be used to convert the
264       old Samba-shared files from cp850 to iso8859-1.
265
266       By the way: Samba 3.x finally maps to UTF-8 filenames by default, so
267       also when you migrate from Samba 2 to Samba 3 you might have to convert
268       your file names.
269
270   Netatalk interoperability issues
271       When Netatalk is being switched to UTF-8 which is supported in version
272       2 then it is NOT sufficient to rename the file names. There needs to be
273       done more. See
274       http://netatalk.sourceforge.net/2.0/htmldocs/upgrade.html#volumes-and-filenames
275       and the uniconv utility of Netatalk for details.
276

BUGS

281       no bugs or fleas known
282

DONATE

284       You can support convmv by doing a donation, see
285       <https://www.j3e.de/donate.html>
286

AUTHOR

288       Bjoern JACKE
289
290       Send mail to bjoern [at] j3e.de for bug reports and suggestions.
291
292
293
294perl v5.32.0                      2020-07-27                         CONVMV(1)