1rdfind(1)                           rdfind                           rdfind(1)
2
3
4

NAME

6       rdfind - finds duplicate files
7

SYNOPSIS

9       rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
10

DESCRIPTION

12       rdfind  finds duplicate files across and/or within several directories.
13       It calculates checksum only if necessary.  rdfind  runs  in  O(Nlog(N))
14       time with N being the number of files.
15
16       If  two  (or  more) equal files are found, the program decides which of
17       them is the original and the rest are considered  duplicates.  This  is
18       done  by  ranking  the  files  to each other and deciding which has the
19       highest rank. See section RANKING for details.
20
21       If you need better control over the ranking than  given,  you  can  use
22       some  preprocessor which sorts the file names in desired order and then
23       run the program using xargs. See examples below for how to use find and
24       xargs in conjunction with rdfind.
25
26       To  include  files  or directories that have names starting with -, use
27       rdfind ./- to not confuse them with options.
28
29

RANKING

31       Given two or more equal  files,  the  one  with  the  highest  rank  is
32       selected  to  be the original and the rest are duplicates. The rules of
33       ranking are given below, where the rules are executed from start  until
34       an  original  has  been found. Given two files A and B which have equal
35       content, the ranking is as follows:
36
37       If A was found while scanning an input argument earlier than than B,  A
38       is higher ranked.
39
40       If A was found at a depth lower than B, A is higher ranked (A closer to
41       the root)
42
43       If A was found earlier than B, A is higher ranked.
44
45       The last rule is needed when two files are found in the same  directory
46       (obviously  not  given  in separate arguments, otherwise the first rule
47       applies) and gives the same order between the files  as  the  operating
48       system  delivers the files while listing the directory. This is operat‐
49       ing system specific behaviour.
50
51

OPTIONS

53       Searching options etc:
54
55       -ignoreempty true|false
56              Ignore empty files. (default)
57
58       -followsymlinks true|false
59              Follow symlinks. Default is false.
60
61       -removeidentinode true|false
62              removes items found which have identical inode  and  device  ID.
63              Default is true.
64
65       -checksum md5|sha1
66              what type of checksum to be used: md5 or sha1. Default is md5.
67
68       Action options:
69
70       -makesymlinks true|false
71              Replace duplicate files with symbolic links
72
73       -makehardlinks true|false
74              Replace duplicate files with hard links
75
76       -makeresultsfile true|false
77              Make  a results file results.txt (default) in the current direc‐
78              tory.
79
80       -outputname name
81              Make the results file name to be "name" instead of  the  default
82              results.txt.
83
84       -deleteduplicates true|false
85              Delete (unlink) files.
86
87       General options:
88
89       -sleep Xms
90              sleeps X milliseconds between reading each file, to reduce load.
91              Default is 0 (no sleep). Note that only a few  values  are  sup‐
92              ported at present: 0,1-5,10,25,50,100 milliseconds.
93
94       -n, -dryrun true|(false)
95              displays  what  should  have been done, don't actually delete or
96              link anything.
97
98       -h, -help, --help
99              displays brief help message.
100
101       -v, -version, --version
102              displays version number.
103

EXAMPLES

105       Search for duplicate files in home directory and a backup directory:
106              rdfind ~ /mnt/backup
107
108       Delete duplicate in a backup directory:
109              rdfind -deleteduplicates true /mnt/backup
110
111       Search for duplicate files in directories called foo:
112              find . -type d -name foo -print0 |xargs -0 rdfind
113

FILES

115       results.txt (the default name is results.txt and can  be  changed  with
116       option outputname, see above) The results file results.txt will contain
117       one row per duplicate file found, along with a  header  row  explaining
118       the columns.  A text describes why the file is considered a duplicate:
119
120       DUPTYPE_UNKNOWN some internal error
121
122       DUPTYPE_FIRST_OCCURRENCE  the  file that is considered to be the origi‐
123       nal.
124
125       DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when  processing
126       the directory in the same input argument as the original)
127
128       DUPTYPE_OUTSIDE_TREE  the file is found during processing another input
129       argument than the original.
130

ENVIRONMENT

DIAGNOSTICS

EXIT VALUES

134       0 on success, nonzero otherwise.
135

BUGS/FEATURES

137       When specifying the same directory twice, it keeps  the  first  encoun‐
138       tered  as  the  most  important (original), and the rest as duplicates.
139       This might not be what you want.
140
141       The symlink creates absolute links. This might not be what you want. To
142       create  relative  links  instead, you may use the symlinks (2) command,
143       which is able to convert absolute links to relative links.
144
145       Older versions unfortunately contained a misspelling on the word occur‐
146       rence.  This  is  now  corrected  (since  1.3), which might affect user
147       scripts parsing the output file written by rdfind.
148
149       There are lots of enhancements left to do. Please contribute!
150

SECURITY CONSIDERATIONS

152       Avoid manipulating the directories while rdfind is reading.  rdfind  is
153       quite  brittle in that case. Especially, when deleting or making links,
154       rdfind can be subject to a symlink attack.  Use with care!
155

AUTHOR

157       Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can  be  found
158       at https://rdfind.pauldreik.se/
159
160       Do  you  find  rdfind  useful? Drop me a line! It is always fun to hear
161       from people who actually use it and what data collections they  run  it
162       on.
163

THANKS

165       Several  persons  have  helped with suggestions and improvements: Niels
166       Möller, Carl Payne and Salvatore Ansani. Thanks also to you who  tested
167       the program and sent me feedback.
168

VERSION

170       1.3.5 (release date 2017-01-04)
171
173       This program is distributed under GPLv2 or later, at your option.
174

SEE ALSO

176       md5sum(1), sha1sum(1), find(1), symlinks(2)
177
178
179
180Jan 2017                             1.3.5                           rdfind(1)
Impressum