jdupes(1)

1JDUPES(1)                   General Commands Manual                  JDUPES(1)
2
3
4

NAME

6       jdupes - finds and performs actions upon duplicate files
7

SYNOPSIS

9       jdupes [ options ] DIRECTORIES ...
10
11

DESCRIPTION

13       Searches the given path(s) for duplicate files. Such files are found by
14       comparing file sizes, then partial and full file hashes, followed by  a
15       byte-by-byte comparison. The default behavior with no other "action op‐
16       tions" specified (delete, summarize, link, dedupe, etc.)  is  to  print
17       sets of matching files.
18
19

OPTIONS

21       -@ --loud
22              output annoying low-level debug info while running
23
24       -0 --print-null
25              when  printing  matches,  use null bytes instead of CR/LF bytes,
26              just like 'find -print0' does. This has no effect with  any  ac‐
27              tion  mode other than the default "print matches" (delete, link,
28              etc. will still print normal line endings in the output.)
29
30       -1 --one-file-system
31              do not match files that are on different filesystems or devices
32
33       -A --no-hidden
34              exclude hidden files from consideration
35
36       -B --dedupe
37              call same-extents ioctl or clonefile() to trigger a  filesystem-
38              level  data  deduplication on disk (known as copy-on-write, CoW,
39              cloning, or  reflink);  only  a  few  filesystems  support  this
40              (BTRFS;  XFS when mkfs.xfs was used with -m crc=1,reflink=1; Ap‐
41              ple APFS)
42
43       -C --chunk-size=number-of-KiB
44              set the I/O chunk size manually; larger values may improve  per‐
45              formance  on rotating media by reducing the number of head seeks
46              required, but also increases memory usage and can reduce perfor‐
47              mance in some cases
48
49       -D --debug
50              if  this  feature  is compiled in, show debugging statistics and
51              info at the end of program execution
52
53       -d --delete
54              prompt user for files to  preserve,  deleting  all  others  (see
55              CAVEATS below)
56
57       -e --error-on-dupe
58              exit on any duplicate found with status code 255
59
60       -f --omit-first
61              omit the first file in each set of matches
62
63       -H --hard-links
64              normally,  when  two  or  more files point to the same disk area
65              they are treated as non-duplicates; this option will change this
66              behavior
67
68       -h --help
69              displays help
70
71       -i --reverse
72              reverse (invert) the sort order of matches
73
74       -I --isolate
75              isolate each command-line parameter from one another; only match
76              if the files are under different parameter specifications
77
78       -j --json
79              produce JSON (machine-readable) output
80
81       -L --link-hard
82              replace all duplicate files with hardlinks to the first file  in
83              each set of duplicates
84
85       -m --summarize
86              summarize duplicate file information
87
88       -M --print-summarize
89              print  matches  and  summarize the duplicate file information at
90              the end
91
92       -N --no-prompt
93              when used together with --delete, preserve  the  first  file  in
94              each  set  of duplicates and delete the others without prompting
95              the user
96
97       -O --param-order
98              parameter order preservation is more important than  the  chosen
99              sort;  this  is particularly useful with the -N option to ensure
100              that automatic deletion behaves in a controllable way
101
102       -o --order=WORD
103              order files according to WORD: time - sort by modification  time
104              name - sort by filename (default)
105
106       -p --permissions
107              don't  consider  files  with different owner/group or permission
108              bits as duplicates
109
110       -P --print=type
111              print extra information to stdout; valid options  are:  early  -
112              matches that pass early size/permission/link/etc. checks partial
113              - files whose partial hashes match fullhash - files  whose  full
114              hashes match
115
116       -Q --quick
117              [WARNING:  RISK  OF  DATA  LOSS, SEE CAVEATS] skip byte-for-byte
118              verification of duplicate pairs (use hashes only)
119
120       -q --quiet
121              hide progress indicator
122
123       -R --recurse:
124              for each directory given after this option follow subdirectories
125              encountered  within  (note the ':' at the end of option; see the
126              Examples section below for further explanation)
127
128       -r --recurse
129              for every  directory  given  follow  subdirectories  encountered
130              within
131
132       -l --link-soft
133              replace  all  duplicate files with symlinks to the first file in
134              each set of duplicates
135
136       -S --size
137              show size of duplicate files
138
139       -s --symlinks
140              follow symlinked directories
141
142       -T --partial-only
143              [WARNING: EXTREME RISK OF DATA LOSS, SEE CAVEATS] match based on
144              hash of first block of file data, ignoring the rest
145
146       -U --no-trav-check
147              disable double-traversal safety check (BE VERY CAREFUL)
148
149       -u --print-unique
150              print only a list of unique (non-duplicate, unmatched) files
151
152       -v --version
153              display jdupes version and compilation feature flags
154
155       -y --hash-db=file
156              create/use  a hash database text file to speed up future runs by
157              caching file hash data
158
159       -X --ext-filter=spec:info
160              exclude/filter files based on specified criteria;  general  for‐
161              mat:
162
163              jdupes -X filter[:value][size_suffix]
164
165              Some  filters take no value or multiple values. Filters that can
166              take a numeric option generally  support  the  size  multipliers
167              K/M/G/T/P/E  with  or  without an added iB or B. Multipliers are
168              binary-style unless the -B suffix is used, which will use  deci‐
169              mal  multipliers.  For  example,  16k  or  16kib = 16384; 16kb =
170              16000. Multipliers are case-insensitive.
171
172              Filters have cumulative effects: jdupes -X size+:99 -X size-:101
173              will  cause  only  files  of exactly 100 bytes in size to be in‐
174              cluded.
175
176              Extension matching is case-insensitive.  Path substring matching
177              is case-sensitive.
178
179              Supported filters are:
180
181              `size[+-=]:number[suffix]'
182                     match  only  if  size  is  greater (+), less than (-), or
183                     equal to (=) the specified number. The +/- and  =  speci‐
184                     fiers  can  be combined, i.e.  "size+=:4K" will only con‐
185                     sider files with a size greater than  or  equal  to  four
186                     kilobytes (4096 bytes).
187
188              `noext:ext1[,ext2,...]'
189                     exclude  files  with certain extension(s), specified as a
190                     comma-separated list. Do not use a leading dot.
191
192              `onlyext:ext1[,ext2,...]'
193                     only include files with certain  extension(s),  specified
194                     as a comma-separated list. Do not use a leading dot.
195
196              `nostr:text_string'
197                     exclude  all  paths containing the substring text_string.
198                     This scans the full file path, so it can be used to match
199                     directories: -X nostr:dir_name/
200
201              `onlystr:text_string'
202                     require  all  paths to contain the substring text_string.
203                     This scans the full file path, so it can be used to match
204                     directories: -X onlystr:dir_name/
205
206              `newer:datetime`
207                     only  include files newer than specified date.  Date/time
208                     format: "YYYY-MM-DD HH:MM:SS" (time is optional).
209
210              `older:datetime`
211                     only include files older than specified date.   Date/time
212                     format: "YYYY-MM-DD HH:MM:SS" (time is optional).
213
214
215       -z --zero-match
216              consider  zero-length  files to be duplicates; this replaces the
217              old default behavior when -n was not specified
218
219       -Z --soft-abort
220              if the user aborts the program  (as  with  CTRL-C)  act  on  the
221              matches that were found before the abort was received. For exam‐
222              ple, if -L and -Z are specified, all matches found prior to  the
223              abort will be hard linked. The default behavior without -Z is to
224              abort without taking any actions.
225
226

NOTES

228       A set of arrows are used in hard linking to show what action was  taken
229       on each link candidate. These arrows are as follows:
230
231
232       ---->  This  file was successfully hard linked to the first file in the
233              duplicate chain
234
235       -@@->  This file was successfully symlinked to the first  file  in  the
236              chain
237
238       -##->  This  file  was  successfully  cloned from the first file in the
239              chain
240
241       -==->  This file was already a hard link to the first file in the chain
242
243       -//->  Linking this file failed due to  an  error  during  the  linking
244              process
245
246
247       Duplicate  files are listed together in groups with each file displayed
248       on a separate line. The groups are then separated from  each  other  by
249       blank lines.
250
251

EXAMPLES

253       jdupes a --recurse: b
254              will follow subdirectories under b, but not those under a.
255
256       jdupes a --recurse b
257              will follow subdirectories under both a and b.
258
259       jdupes -O dir1 dir3 dir2
260              will  always  place 'dir1' results first in any match set (where
261              relevant)
262
263

CAVEATS

265       Using -1 or --one-file-system prevents matches that cross  filesystems,
266       but  a more relaxed form of this option may be added that allows cross-
267       matching for all filesystems that each parameter is present on.
268
269       When using -d or --delete, care should be taken to insure against acci‐
270       dental data loss.
271
272       -Z  or  --soft-abort used to be --hardabort in jdupes prior to v1.5 and
273       had the opposite behavior.  Defaulting to taking  action  on  abort  is
274       probably  not  what  most  users  would  expect. The decision to invert
275       rather than reassign to a different option was made because  this  fea‐
276       ture was still fairly new at the time of the change.
277
278       The  -O  or  --param-order  option allows the user greater control over
279       what appears in the first position of a  match  set,  specifically  for
280       keeping  the  -N  option  from  deleting all but one file in a set in a
281       seemingly random way. All directories specified  on  the  command  line
282       will be used as the sorting order of result sets first, followed by the
283       sorting algorithm set by the -o or --order option. This means that  the
284       order  of all match pairs for a single directory specification will re‐
285       tain the old sorting behavior even if this option is specified.
286
287       When used together with options -s or --symlink, a user could  acciden‐
288       tally preserve a symlink while deleting the file it points to.
289
290       The -Q or --quick option only reads each file once, hashes it, and per‐
291       forms comparisons based solely on the hashes. There is a small but sig‐
292       nificant  risk of a hash collision which is the purpose of the failsafe
293       byte-for-byte comparison that this option explicitly bypasses.  Do  not
294       use  it  on ANY data set for which any amount of data loss is unaccept‐
295       able. This option is not included in the help text for the program  due
296       to its risky nature.  You have been warned!
297
298       The -T or --partial-only option produces results based on a hash of the
299       first block of file data in each file, ignoring everything else in  the
300       file.  Partial hash checks have always been an important exclusion step
301       in the jdupes algorithm, usually hashing the first 4096 bytes  of  data
302       and  allowing  files  that  are  different  at the start to be rejected
303       early. In certain scenarios it may be a useful heuristic for a user  to
304       see  that  a set of files has the same size and the same starting data,
305       even if the remaining data does not match; one example of this would be
306       comparing files with data blocks that are damaged or missing such as an
307       incomplete file transfer or checking a data recovery against known-good
308       copies  to  see  what damaged data can be deleted in favor of restoring
309       the known-good copy. This option is meant to be used with informational
310       actions  and  can result in EXTREME DATA LOSS if used with options that
311       delete files, create hard links, or perform other  destructive  actions
312       on data based on the matching output. Because of the potential for mas‐
313       sive data destruction, this option MUST BE SPECIFIED TWICE to take  ef‐
314       fect and will error out if it is only specified once.
315
316       Using  the -C or --chunk-size option to override I/O chunk size can in‐
317       crease performance on rotating storage media by reducing "head  thrash‐
318       ing,"  reading larger amounts of data sequentially from each file. This
319       tunable size can have bad side effects; the default size maximizes  al‐
320       gorithmic  performance without regard to the I/O characteristics of any
321       given device and uses a modest amount of memory, but other  values  may
322       greatly increase memory usage or incur a lot more system call overhead.
323       Try several different values to see how  they  affect  performance  for
324       your  hardware  and data set. This option does not affect match results
325       in any way, so even if it slows down the file matching process it  will
326       not hurt anything.
327
328       The  -y  or  --hash-db feature creates and maintains a text file with a
329       list of file paths, hashes, and other metadata that enables  jdupes  to
330       "remember"  file data across runs. Specifying a period '.' as the data‐
331       base file name will use a name  of  "jdupes_hashdb.txt"  instead;  this
332       alias  makes  it easy to use the hash database feature without typing a
333       descriptive name each time. THIS FEATURE IS CURRENTLY UNDER DEVELOPMENT
334       AND HAS MANY QUIRKS. USE IT AT YOUR OWN RISK. In particular, one of the
335       biggest problems with this feature is that it stores every path exactly
336       as  specified  on the command line; if any paths are passed into jdupes
337       on a subsequent run with a different prefix then they will not be  rec‐
338       ognized  and they will be treated as totally different files. For exam‐
339       ple, running jdupes -y . foo/ is not the same as jdupes -y . ./foo  nor
340       the  same  as (from a sibling directory) jdupes -y ../foo. You must run
341       jdupes from the same working directory and with the same path  specifi‐
342       cations  to take advantage of the hash database feature. When used cor‐
343       rectly, a fully populated hash database can reduce subsequent runs with
344       hundreds  of  thousands of files that normally take a very long time to
345       run down to the directory scanning time plus a couple  of  seconds.  If
346       the  directory data is already in the OS disk cache, this can make sub‐
347       sequent runs with over 100K files finish in under one second.
348
349

REPORTING BUGS

351       Send bug reports and feature requests to jody@jodybruchon.com,  or  for
352       general information and help, visit www.jdupes.com
353
354

SUPPORTING DEVELOPMENT

356       If  you find this software useful, please consider financially support‐
357       ing its development through the author's home page:
358
359       https://www.jodybruchon.com/
360
361

AUTHOR

363       jdupes is created and maintained by Jody Bruchon <jody@jodybruchon.com>
364       and was forked from fdupes 1.51 by Adrian Lopez <adrian2@caribe.net>
365
366

LICENSE

368       MIT License
369
370       Copyright (c) 2015-2023 Jody Lee Bruchon <jody@jodybruchon.com>
371
372       Permission is hereby granted, free of charge, to any person obtaining a
373       copy of this software and associated documentation  files  (the  "Soft‐
374       ware"),  to deal in the Software without restriction, including without
375       limitation the rights to use, copy, modify, merge, publish, distribute,
376       sublicense,  and/or  sell copies of the Software, and to permit persons
377       to whom the Software is furnished to do so, subject  to  the  following
378       conditions:
379
380       The above copyright notice and this permission notice shall be included
381       in all copies or substantial portions of the Software.
382
383       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
384       OR  IMPLIED,  INCLUDING  BUT  NOT  LIMITED  TO  THE  WARRANTIES OF MER‐
385       CHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
386       NO  EVENT  SHALL  THE  AUTHORS  OR  COPYRIGHT HOLDERS BE LIABLE FOR ANY
387       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN  ACTION  OF  CONTRACT,
388       TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFT‐
389       WARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
390
391
392
393                                                                     JDUPES(1)