1rdfind(1) rdfind rdfind(1)
2
3
4
6 rdfind - finds duplicate files
7
9 rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10
12 rdfind finds duplicate files across and/or within several directories.
13 It calculates checksum only if necessary. rdfind runs in O(Nlog(N))
14 time with N being the number of files.
15
16 If two (or more) equal files are found, the program decides which of
17 them is the original and the rest are considered duplicates. This is
18 done by ranking the files to each other and deciding which has the
19 highest rank. See section RANKING for details.
20
21 By default, no action is taken besides creating a file with the
22 detected files and showing the possible amount of saved space.
23
24 If you need better control over the ranking than given, you can use
25 some preprocessor which sorts the file names in desired order and then
26 run the program using xargs. See examples below for how to use find and
27 xargs in conjunction with rdfind.
28
29 To include files or directories that have names starting with -, use
30 rdfind ./- to not confuse them with options.
31
32
34 Given two or more equal files, the one with the highest rank is
35 selected to be the original and the rest are duplicates. The rules of
36 ranking are given below, where the rules are executed from start until
37 an original has been found. Given two files A and B which have equal
38 size and content, the ranking is as follows:
39
40 If A was found while scanning an input argument earlier than than B, A
41 is higher ranked.
42
43 If A was found at a directory depth lower than B, A is higher ranked (A
44 is closer to the root).
45
46 if A and B are found during scanning of the same input argument and
47 share the same directory depth, the one that ranks highest depends on
48 if deterministic operation is enabled. This is on by default, see
49 option -deterministic). If enabled, which one ranks highest is unspeci‐
50 fied but deterministic. If disabled, the one that was reported first
51 from the file system is highest ranked.
52
53
55 Searching options etc:
56
57 -ignoreempty true|false
58 Ignore empty files. Setting this to true (the default) is equiv‐
59 alent to -minsize 1, false is equivalent to -minsize 0.
60
61 -minsize N
62 Ignores files with less than N bytes. Default is 1, meaning
63 empty files are ignored.
64
65 -followsymlinks true|false
66 Follow symlinks. Default is false.
67
68 -removeidentinode true|false
69 Removes items found which have identical inode and device ID.
70 Default is true.
71
72 -checksum md5|sha1|sha256
73 What type of checksum to be used: md5, sha1 or sha256. The
74 default is sha1 since version 1.4.0.
75
76 -deterministic true|false
77 If set (the default), sort files of equal rank in an unspecified
78 but deterministic order. This makes the behaviour independent of
79 in which order files are listed when querying the file system.
80
81 Action options:
82
83 -makesymlinks true|false
84 Replace duplicate files with symbolic links. Default is false.
85
86 -makehardlinks true|false
87 Replace duplicate files with hard links. Default is false.
88
89 -makeresultsfile true|false
90 Make a results file in the current directory. Default is true.
91 If the file exists, it is overwritten.
92
93 -outputname name
94 Make the results file name to be "name" instead of the default
95 results.txt.
96
97 -deleteduplicates true|false
98 Delete (unlink) files. Default is false.
99
100 General options:
101
102 -sleep Xms
103 Sleeps X milliseconds between reading each file, to reduce load.
104 Default is 0 (no sleep). Note that only a few values are sup‐
105 ported at present: 0,1-5,10,25,50,100 milliseconds.
106
107 -n, -dryrun true|false
108 Displays what should have been done, don't actually delete or
109 link anything. Default is false.
110
111 -h, -help, --help
112 Displays a brief help message.
113
114 -v, -version, --version
115 Displays the version number.
116
118 Search for duplicate files in the home directory and a backup direc‐
119 tory:
120 rdfind ~ /mnt/backup
121
122 Delete duplicates in a backup directory:
123 rdfind -deleteduplicates true /mnt/backup
124
125 Search for duplicate files in directories called foo:
126 find . -type d -name foo -print0 |xargs -0 rdfind
127
129 results.txt (the default name is results.txt and can be changed with
130 option outputname, see above) The results file results.txt will contain
131 one row per duplicate file found, along with a header row explaining
132 the columns. A text describes why the file is considered a duplicate:
133
134 DUPTYPE_UNKNOWN some internal error
135
136 DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the origi‐
137 nal.
138
139 DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
140 the directory in the same input argument as the original)
141
142 DUPTYPE_OUTSIDE_TREE the file is found during processing another input
143 argument than the original.
144
148 0 on success, nonzero otherwise.
149
151 When specifying the same directory twice, it keeps the first encoun‐
152 tered as the most important (original), and the rest as duplicates.
153 This might not be what you want.
154
155 The symlink creates absolute links. This might not be what you want. To
156 create relative links instead, you may use the symlinks (2) command,
157 which is able to convert absolute links to relative links.
158
159 Older versions unfortunately contained a misspelling on the word occur‐
160 rence. This is now corrected (since 1.3), which might affect user
161 scripts parsing the output file written by rdfind.
162
163
165 Avoid manipulating the directories while rdfind is reading. rdfind is
166 quite brittle in that case. Especially, when deleting or making links,
167 rdfind can be subject to a symlink attack. Use with care!
168
170 Paul Dreik 2006-2018, reachable at rdfind@pauldreik.se Rdfind can be
171 found at https://rdfind.pauldreik.se/
172
173 Do you find rdfind useful? Drop me a line! It is always fun to hear
174 from people who actually use it and what data collections they run it
175 on.
176
178 Several persons have helped with suggestions and improvements: Niels
179 Möller, Carl Payne and Salvatore Ansani. Thanks also to you who tested
180 the program and sent me feedback.
181
183 1.4.1 (release date 2018-11-12)
184
186 This program is distributed under GPLv2 or later, at your option.
187
189 md5sum(1), sha1sum(1), find(1), symlinks(2)
190
191
192
193Nov 2018 1.4.1 rdfind(1)