1rdfind(1) rdfind rdfind(1)
2
3
4
6 rdfind - finds duplicate files
7
9 rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10
12 rdfind finds duplicate files across and/or within several directories.
13 It calculates checksum only if necessary. rdfind runs in O(Nlog(N))
14 time with N being the number of files.
15
16 If two (or more) equal files are found, the program decides which of
17 them is the original and the rest are considered duplicates. This is
18 done by ranking the files to each other and deciding which has the
19 highest rank. See section RANKING for details.
20
21 By default, no action is taken besides creating a file with the de‐
22 tected files and showing the possible amount of saved space.
23
24 If you need better control over the ranking than given, you can use
25 some preprocessor which sorts the file names in desired order and then
26 run the program using xargs. See examples below for how to use find and
27 xargs in conjunction with rdfind.
28
29 To include files or directories that have names starting with -, use
30 rdfind ./- to not confuse them with options.
31
32
34 Given two or more equal files, the one with the highest rank is se‐
35 lected to be the original and the rest are duplicates. The rules of
36 ranking are given below, where the rules are executed from start until
37 an original has been found. Given two files A and B which have equal
38 size and content, the ranking is as follows:
39
40 If A was found while scanning an input argument earlier than B, A is
41 higher ranked.
42
43 If A was found at a directory depth lower than B, A is higher ranked (A
44 is closer to the root).
45
46 if A and B are found during scanning of the same input argument and
47 share the same directory depth, the one that ranks highest depends on
48 if deterministic operation is enabled. This is on by default, see op‐
49 tion -deterministic). If enabled, which one ranks highest is unspeci‐
50 fied but deterministic. If disabled, the one that was reported first
51 from the file system is highest ranked.
52
53
55 Searching options etc:
56
57 -ignoreempty true|false
58 Ignore empty files. Setting this to true (the default) is equiv‐
59 alent to -minsize 1, false is equivalent to -minsize 0.
60
61 -minsize N
62 Ignores files with less than N bytes. Default is 1, meaning
63 empty files are ignored.
64
65 -maxsize N
66 Ignores files with N bytes or more. Default is 0, which means
67 this check is disabled.
68
69 -followsymlinks true|false
70 Follow symlinks. Default is false.
71
72 -removeidentinode true|false
73 Removes items found which have identical inode and device ID.
74 Default is true.
75
76 -checksum md5|sha1|sha256|sha512
77 What type of checksum to be used: md5, sha1, sha256 or sha512.
78 The default is sha1 since version 1.4.0.
79
80 -deterministic true|false
81 If set (the default), sort files of equal rank in an unspecified
82 but deterministic order. This makes the behaviour independent of
83 in which order files are listed when querying the file system.
84
85 Action options:
86
87 -makesymlinks true|false
88 Replace duplicate files with symbolic links. Default is false.
89
90 -makehardlinks true|false
91 Replace duplicate files with hard links. Default is false.
92
93 -makeresultsfile true|false
94 Make a results file in the current directory. Default is true.
95 If the file exists, it is overwritten. This does not affect
96 whether items are deleted. See -dryrun for how to disable dele‐
97 tions.
98
99 -outputname name
100 Make the results file name to be "name" instead of the default
101 results.txt.
102
103 -deleteduplicates true|false
104 Delete (unlink) files. Default is false.
105
106 General options:
107
108 -sleep Xms
109 Sleeps X milliseconds between reading each file, to reduce load.
110 Default is 0 (no sleep). Note that only a few values are sup‐
111 ported at present: 0,1-5,10,25,50,100 milliseconds.
112
113 -n, -dryrun true|false
114 Displays what should have been done, don't actually delete or
115 link anything. Default is false.
116
117 -h, -help, --help
118 Displays a brief help message.
119
120 -v, -version, --version
121 Displays the version number.
122
124 Search for duplicate files in the home directory and a backup direc‐
125 tory:
126 rdfind ~ /mnt/backup
127
128 Delete duplicates in a backup directory:
129 rdfind -deleteduplicates true /mnt/backup
130
131 Search for duplicate files in directories called foo:
132 find . -type d -name foo -print0 |xargs -0 rdfind
133
135 results.txt (the default name is results.txt and can be changed with
136 option outputname, see above) The results file results.txt will contain
137 one row per duplicate file found, along with a header row explaining
138 the columns. A text describes why the file is considered a duplicate:
139
140 DUPTYPE_UNKNOWN some internal error
141
142 DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the origi‐
143 nal.
144
145 DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
146 the directory in the same input argument as the original)
147
148 DUPTYPE_OUTSIDE_TREE the file is found during processing another input
149 argument than the original.
150
154 0 on success, nonzero otherwise.
155
157 When specifying the same directory twice, it keeps the first encoun‐
158 tered as the most important (original), and the rest as duplicates.
159 This might not be what you want.
160
161 The symlink creates absolute links. This might not be what you want. To
162 create relative links instead, you may use the symlink (2) command,
163 which is able to convert absolute links to relative links.
164
165 Older versions unfortunately contained a misspelling on the word occur‐
166 rence. This is now corrected (since 1.3), which might affect user
167 scripts parsing the output file written by rdfind.
168
169
171 Avoid manipulating the directories while rdfind is reading. rdfind is
172 quite brittle in that case. Especially, when deleting or making links,
173 rdfind can be subject to a symlink attack. Use with care!
174
176 Paul Dreik 2006-2018, reachable at rdfind@pauldreik.se Rdfind can be
177 found at https://rdfind.pauldreik.se/
178
179 Do you find rdfind useful? Drop me a line! It is always fun to hear
180 from people who actually use it and what data collections they run it
181 on.
182
184 Several persons have helped with suggestions and improvements: Niels
185 Möller, Carl Payne and Salvatore Ansani. Thanks also to you who tested
186 the program and sent me feedback.
187
189 1.6.0 (release date 2023-06-17)
190
192 This program is distributed under GPLv2 or later, at your option.
193
195 md5sum, sha1sum, find, symlinks
196
197
198
199Jun 2023 1.6.0 rdfind(1)