1rdfind(1)                           rdfind                           rdfind(1)
2
3
4

NAME

6       rdfind - finds duplicate files
7

SYNOPSIS

9       rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10

DESCRIPTION

12       rdfind  finds duplicate files across and/or within several directories.
13       It calculates checksum only if necessary.  rdfind  runs  in  O(Nlog(N))
14       time with N being the number of files.
15
16       If  two  (or  more) equal files are found, the program decides which of
17       them is the original and the rest are considered  duplicates.  This  is
18       done  by  ranking  the  files  to each other and deciding which has the
19       highest rank. See section RANKING for details.
20
21       By default, no action  is  taken  besides  creating  a  file  with  the
22       detected files and showing the possible amount of saved space.
23
24       If  you  need  better  control over the ranking than given, you can use
25       some preprocessor which sorts the file names in desired order and  then
26       run the program using xargs. See examples below for how to use find and
27       xargs in conjunction with rdfind.
28
29       To include files or directories that have names starting  with  -,  use
30       rdfind ./- to not confuse them with options.
31
32

RANKING

34       Given  two  or  more  equal  files,  the  one  with the highest rank is
35       selected to be the original and the rest are duplicates. The  rules  of
36       ranking  are given below, where the rules are executed from start until
37       an original has been found. Given two files A and B  which  have  equal
38       size and content, the ranking is as follows:
39
40       If  A was found while scanning an input argument earlier than than B, A
41       is higher ranked.
42
43       If A was found at a directory depth lower than B, A is higher ranked (A
44       is closer to the root).
45
46       if  A  and  B  are found during scanning of the same input argument and
47       share the same directory depth, the one that ranks highest  depends  on
48       if  deterministic  operation  is  enabled.  This  is on by default, see
49       option -deterministic). If enabled, which one ranks highest is unspeci‐
50       fied  but  deterministic.  If disabled, the one that was reported first
51       from the file system is highest ranked.
52
53

OPTIONS

55       Searching options etc:
56
57       -ignoreempty true|false
58              Ignore empty files. Setting this to true (the default) is equiv‐
59              alent to -minsize 1, false is equivalent to -minsize 0.
60
61       -minsize N
62              Ignores  files  with  less  than  N bytes. Default is 1, meaning
63              empty files are ignored.
64
65       -followsymlinks true|false
66              Follow symlinks. Default is false.
67
68       -removeidentinode true|false
69              Removes items found which have identical inode  and  device  ID.
70              Default is true.
71
72       -checksum md5|sha1|sha256
73              What  type  of  checksum  to  be  used: md5, sha1 or sha256. The
74              default is sha1 since version 1.4.0.
75
76       -deterministic true|false
77              If set (the default), sort files of equal rank in an unspecified
78              but deterministic order. This makes the behaviour independent of
79              in which order files are listed when querying the file system.
80
81       Action options:
82
83       -makesymlinks true|false
84              Replace duplicate files with symbolic links. Default is false.
85
86       -makehardlinks true|false
87              Replace duplicate files with hard links. Default is false.
88
89       -makeresultsfile true|false
90              Make a results file in the current directory. Default  is  true.
91              If the file exists, it is overwritten.
92
93       -outputname name
94              Make  the  results file name to be "name" instead of the default
95              results.txt.
96
97       -deleteduplicates true|false
98              Delete (unlink) files. Default is false.
99
100       General options:
101
102       -sleep Xms
103              Sleeps X milliseconds between reading each file, to reduce load.
104              Default  is  0  (no sleep). Note that only a few values are sup‐
105              ported at present: 0,1-5,10,25,50,100 milliseconds.
106
107       -n, -dryrun true|false
108              Displays what should have been done, don't  actually  delete  or
109              link anything. Default is false.
110
111       -h, -help, --help
112              Displays a brief help message.
113
114       -v, -version, --version
115              Displays the version number.
116

EXAMPLES

118       Search  for  duplicate  files in the home directory and a backup direc‐
119       tory:
120              rdfind ~ /mnt/backup
121
122       Delete duplicates in a backup directory:
123              rdfind -deleteduplicates true /mnt/backup
124
125       Search for duplicate files in directories called foo:
126              find . -type d -name foo -print0 |xargs -0 rdfind
127

FILES

129       results.txt (the default name is results.txt and can  be  changed  with
130       option outputname, see above) The results file results.txt will contain
131       one row per duplicate file found, along with a  header  row  explaining
132       the columns.  A text describes why the file is considered a duplicate:
133
134       DUPTYPE_UNKNOWN some internal error
135
136       DUPTYPE_FIRST_OCCURRENCE  the  file that is considered to be the origi‐
137       nal.
138
139       DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when  processing
140       the directory in the same input argument as the original)
141
142       DUPTYPE_OUTSIDE_TREE  the file is found during processing another input
143       argument than the original.
144

ENVIRONMENT

DIAGNOSTICS

EXIT VALUES

148       0 on success, nonzero otherwise.
149

BUGS/FEATURES

151       When specifying the same directory twice, it keeps  the  first  encoun‐
152       tered  as  the  most  important (original), and the rest as duplicates.
153       This might not be what you want.
154
155       The symlink creates absolute links. This might not be what you want. To
156       create  relative  links  instead, you may use the symlinks (2) command,
157       which is able to convert absolute links to relative links.
158
159       Older versions unfortunately contained a misspelling on the word occur‐
160       rence.  This  is  now  corrected  (since  1.3), which might affect user
161       scripts parsing the output file written by rdfind.
162
163

SECURITY CONSIDERATIONS

165       Avoid manipulating the directories while rdfind is reading.  rdfind  is
166       quite  brittle in that case. Especially, when deleting or making links,
167       rdfind can be subject to a symlink attack.  Use with care!
168

AUTHOR

170       Paul Dreik 2006-2018, reachable at rdfind@pauldreik.se  Rdfind  can  be
171       found at https://rdfind.pauldreik.se/
172
173       Do  you  find  rdfind  useful? Drop me a line! It is always fun to hear
174       from people who actually use it and what data collections they  run  it
175       on.
176

THANKS

178       Several  persons  have  helped with suggestions and improvements: Niels
179       Möller, Carl Payne and Salvatore Ansani. Thanks also to you who  tested
180       the program and sent me feedback.
181

VERSION

183       1.4.1 (release date 2018-11-12)
184
186       This program is distributed under GPLv2 or later, at your option.
187

SEE ALSO

189       md5sum(1), sha1sum(1), find(1), symlinks(2)
190
191
192
193Nov 2018                             1.4.1                           rdfind(1)
Impressum