1rdfind(1)                           rdfind                           rdfind(1)
2
3
4

NAME

6       rdfind - finds duplicate files
7

SYNOPSIS

9       rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10

DESCRIPTION

12       rdfind  finds duplicate files across and/or within several directories.
13       It calculates checksum only if necessary.  rdfind  runs  in  O(Nlog(N))
14       time with N being the number of files.
15
16       If  two  (or  more) equal files are found, the program decides which of
17       them is the original and the rest are considered  duplicates.  This  is
18       done  by  ranking  the  files  to each other and deciding which has the
19       highest rank. See section RANKING for details.
20
21       By default, no action is taken besides creating a  file  with  the  de‐
22       tected files and showing the possible amount of saved space.
23
24       If  you  need  better  control over the ranking than given, you can use
25       some preprocessor which sorts the file names in desired order and  then
26       run the program using xargs. See examples below for how to use find and
27       xargs in conjunction with rdfind.
28
29       To include files or directories that have names starting  with  -,  use
30       rdfind ./- to not confuse them with options.
31
32

RANKING

34       Given  two  or  more  equal files, the one with the highest rank is se‐
35       lected to be the original and the rest are  duplicates.  The  rules  of
36       ranking  are given below, where the rules are executed from start until
37       an original has been found. Given two files A and B  which  have  equal
38       size and content, the ranking is as follows:
39
40       If  A  was  found while scanning an input argument earlier than B, A is
41       higher ranked.
42
43       If A was found at a directory depth lower than B, A is higher ranked (A
44       is closer to the root).
45
46       if  A  and  B  are found during scanning of the same input argument and
47       share the same directory depth, the one that ranks highest  depends  on
48       if  deterministic  operation is enabled. This is on by default, see op‐
49       tion -deterministic). If enabled, which one ranks highest  is  unspeci‐
50       fied  but  deterministic.  If disabled, the one that was reported first
51       from the file system is highest ranked.
52
53

OPTIONS

55       Searching options etc:
56
57       -ignoreempty true|false
58              Ignore empty files. Setting this to true (the default) is equiv‐
59              alent to -minsize 1, false is equivalent to -minsize 0.
60
61       -minsize N
62              Ignores  files  with  less  than  N bytes. Default is 1, meaning
63              empty files are ignored.
64
65       -maxsize N
66              Ignores files with N bytes or more. Default is  0,  which  means
67              this check is disabled.
68
69       -followsymlinks true|false
70              Follow symlinks. Default is false.
71
72       -removeidentinode true|false
73              Removes  items  found  which have identical inode and device ID.
74              Default is true.
75
76       -checksum md5|sha1|sha256|sha512
77              What type of checksum to be used: md5, sha1, sha256  or  sha512.
78              The default is sha1 since version 1.4.0.
79
80       -deterministic true|false
81              If set (the default), sort files of equal rank in an unspecified
82              but deterministic order. This makes the behaviour independent of
83              in which order files are listed when querying the file system.
84
85       Action options:
86
87       -makesymlinks true|false
88              Replace duplicate files with symbolic links. Default is false.
89
90       -makehardlinks true|false
91              Replace duplicate files with hard links. Default is false.
92
93       -makeresultsfile true|false
94              Make  a  results file in the current directory. Default is true.
95              If the file exists, it is  overwritten.  This  does  not  affect
96              whether  items are deleted. See -dryrun for how to disable dele‐
97              tions.
98
99       -outputname name
100              Make the results file name to be "name" instead of  the  default
101              results.txt.
102
103       -deleteduplicates true|false
104              Delete (unlink) files. Default is false.
105
106       General options:
107
108       -sleep Xms
109              Sleeps X milliseconds between reading each file, to reduce load.
110              Default is 0 (no sleep). Note that only a few  values  are  sup‐
111              ported at present: 0,1-5,10,25,50,100 milliseconds.
112
113       -n, -dryrun true|false
114              Displays  what  should  have been done, don't actually delete or
115              link anything. Default is false.
116
117       -h, -help, --help
118              Displays a brief help message.
119
120       -v, -version, --version
121              Displays the version number.
122

EXAMPLES

124       Search for duplicate files in the home directory and  a  backup  direc‐
125       tory:
126              rdfind ~ /mnt/backup
127
128       Delete duplicates in a backup directory:
129              rdfind -deleteduplicates true /mnt/backup
130
131       Search for duplicate files in directories called foo:
132              find . -type d -name foo -print0 |xargs -0 rdfind
133

FILES

135       results.txt  (the  default  name is results.txt and can be changed with
136       option outputname, see above) The results file results.txt will contain
137       one  row  per  duplicate file found, along with a header row explaining
138       the columns.  A text describes why the file is considered a duplicate:
139
140       DUPTYPE_UNKNOWN some internal error
141
142       DUPTYPE_FIRST_OCCURRENCE the file that is considered to be  the  origi‐
143       nal.
144
145       DUPTYPE_WITHIN_SAME_TREE  files in the same tree (found when processing
146       the directory in the same input argument as the original)
147
148       DUPTYPE_OUTSIDE_TREE the file is found during processing another  input
149       argument than the original.
150

ENVIRONMENT

DIAGNOSTICS

EXIT VALUES

154       0 on success, nonzero otherwise.
155

BUGS/FEATURES

157       When  specifying  the  same directory twice, it keeps the first encoun‐
158       tered as the most important (original), and  the  rest  as  duplicates.
159       This might not be what you want.
160
161       The symlink creates absolute links. This might not be what you want. To
162       create relative links instead, you may use  the  symlink  (2)  command,
163       which is able to convert absolute links to relative links.
164
165       Older versions unfortunately contained a misspelling on the word occur‐
166       rence. This is now corrected  (since  1.3),  which  might  affect  user
167       scripts parsing the output file written by rdfind.
168
169

SECURITY CONSIDERATIONS

171       Avoid  manipulating the directories while rdfind is reading.  rdfind is
172       quite brittle in that case. Especially, when deleting or making  links,
173       rdfind can be subject to a symlink attack.  Use with care!
174

AUTHOR

176       Paul  Dreik  2006-2018,  reachable at rdfind@pauldreik.se Rdfind can be
177       found at https://rdfind.pauldreik.se/
178
179       Do you find rdfind useful? Drop me a line! It is  always  fun  to  hear
180       from  people  who actually use it and what data collections they run it
181       on.
182

THANKS

184       Several persons have helped with suggestions  and  improvements:  Niels
185       Möller,  Carl Payne and Salvatore Ansani. Thanks also to you who tested
186       the program and sent me feedback.
187

VERSION

189       1.6.0 (release date 2023-06-17)
190
192       This program is distributed under GPLv2 or later, at your option.
193

SEE ALSO

195       md5sum, sha1sum, find, symlinks
196
197
198
199Jun 2023                             1.6.0                           rdfind(1)
Impressum