1rdfind(1) rdfind rdfind(1)
2
3
4
6 rdfind - finds duplicate files
7
9 rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10
12 rdfind finds duplicate files across and/or within several directories.
13 It calculates checksum only if necessary. rdfind runs in O(Nlog(N))
14 time with N being the number of files.
15
16 If two (or more) equal files are found, the program decides which of
17 them is the original and the rest are considered duplicates. This is
18 done by ranking the files to each other and deciding which has the
19 highest rank. See section RANKING for details.
20
21 If you need better control over the ranking than given, you can use
22 some preprocessor which sorts the file names in desired order and then
23 run the program using xargs. See examples below for how to use find and
24 xargs in conjunction with rdfind.
25
26 To include files or directories that have names starting with -, use
27 rdfind ./- to not confuse them with options.
28
29
31 Given two or more equal files, the one with the highest rank is
32 selected to be the original and the rest are duplicates. The rules of
33 ranking are given below, where the rules are executed from start until
34 an original has been found. Given two files A and B which have equal
35 content, the ranking is as follows:
36
37 If A was found while scanning an input argument earlier than than B, A
38 is higher ranked.
39
40 If A was found at a depth lower than B, A is higher ranked (A closer to
41 the root)
42
43 If A was found earlier than B, A is higher ranked.
44
45 The last rule is needed when two files are found in the same directory
46 (obviously not given in separate arguments, otherwise the first rule
47 applies) and gives the same order between the files as the operating
48 system delivers the files while listing the directory. This is operat‐
49 ing system specific behaviour.
50
51
53 Searching options etc:
54
55 -ignoreempty true|false
56 Ignore empty files. (default)
57
58 -followsymlinks true|false
59 Follow symlinks. Default is false.
60
61 -removeidentinode true|false
62 removes items found which have identical inode and device ID.
63 Default is true.
64
65 -checksum md5|sha1
66 what type of checksum to be used: md5 or sha1. Default is md5.
67
68 Action options:
69
70 -makesymlinks true|false
71 Replace duplicate files with symbolic links
72
73 -makehardlinks true|false
74 Replace duplicate files with hard links
75
76 -makeresultsfile true|false
77 Make a results file results.txt (default) in the current direc‐
78 tory.
79
80 -outputname name
81 Make the results file name to be "name" instead of the default
82 results.txt.
83
84 -deleteduplicates true|false
85 Delete (unlink) files.
86
87 General options:
88
89 -sleep Xms
90 sleeps X milliseconds between reading each file, to reduce load.
91 Default is 0 (no sleep). Note that only a few values are sup‐
92 ported at present: 0,1-5,10,25,50,100 milliseconds.
93
94 -n, -dryrun true|(false)
95 displays what should have been done, don't actually delete or
96 link anything.
97
98 -h, -help, --help
99 displays brief help message.
100
101 -v, -version, --version
102 displays version number.
103
105 Search for duplicate files in home directory and a backup directory:
106 rdfind ~ /mnt/backup
107
108 Delete duplicate in a backup directory:
109 rdfind -deleteduplicates true /mnt/backup
110
111 Search for duplicate files in directories called foo:
112 find . -type d -name foo -print0 |xargs -0 rdfind
113
115 results.txt (the default name is results.txt and can be changed with
116 option outputname, see above) The results file results.txt will contain
117 one row per duplicate file found, along with a header row explaining
118 the columns. A text describes why the file is considered a duplicate:
119
120 DUPTYPE_UNKNOWN some internal error
121
122 DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the origi‐
123 nal.
124
125 DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
126 the directory in the same input argument as the original)
127
128 DUPTYPE_OUTSIDE_TREE the file is found during processing another input
129 argument than the original.
130
134 0 on success, nonzero otherwise.
135
137 When specifying the same directory twice, it keeps the first encoun‐
138 tered as the most important (original), and the rest as duplicates.
139 This might not be what you want.
140
141 The symlink creates absolute links. This might not be what you want. To
142 create relative links instead, you may use the symlinks (2) command,
143 which is able to convert absolute links to relative links.
144
145 Older versions unfortunately contained a misspelling on the word occur‐
146 rence. This is now corrected (since 1.3), which might affect user
147 scripts parsing the output file written by rdfind.
148
149 There are lots of enhancements left to do. Please contribute!
150
152 Avoid manipulating the directories while rdfind is reading. rdfind is
153 quite brittle in that case. Especially, when deleting or making links,
154 rdfind can be subject to a symlink attack. Use with care!
155
157 Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can be found
158 at https://rdfind.pauldreik.se/
159
160 Do you find rdfind useful? Drop me a line! It is always fun to hear
161 from people who actually use it and what data collections they run it
162 on.
163
165 Several persons have helped with suggestions and improvements: Niels
166 Möller, Carl Payne and Salvatore Ansani. Thanks also to you who tested
167 the program and sent me feedback.
168
170 1.3.5 (release date 2017-01-04)
171
173 This program is distributed under GPLv2 or later, at your option.
174
176 md5sum(1), sha1sum(1), find(1), symlinks(2)
177
178
179
180Jan 2017 1.3.5 rdfind(1)