1JDUPES(1) General Commands Manual JDUPES(1)
2
3
4
6 jdupes - finds and performs actions upon duplicate files
7
9 jdupes [ options ] DIRECTORIES ...
10
11
13 Searches the given path(s) for duplicate files. Such files are found by
14 comparing file sizes, then partial and full file hashes, followed by a
15 byte-by-byte comparison. The default behavior with no other "action op‐
16 tions" specified (delete, summarize, link, dedupe, etc.) is to print
17 sets of matching files.
18
19
21 -@ --loud
22 output annoying low-level debug info while running
23
24 -0 --print-null
25 when printing matches, use null bytes instead of CR/LF bytes,
26 just like 'find -print0' does. This has no effect with any ac‐
27 tion mode other than the default "print matches" (delete, link,
28 etc. will still print normal line endings in the output.)
29
30 -1 --one-file-system
31 do not match files that are on different filesystems or devices
32
33 -A --no-hidden
34 exclude hidden files from consideration
35
36 -B --dedupe
37 call same-extents ioctl or clonefile() to trigger a filesystem-
38 level data deduplication on disk (known as copy-on-write, CoW,
39 cloning, or reflink); only a few filesystems support this
40 (BTRFS; XFS when mkfs.xfs was used with -m crc=1,reflink=1; Ap‐
41 ple APFS)
42
43 -C --chunk-size=number-of-KiB
44 set the I/O chunk size manually; larger values may improve per‐
45 formance on rotating media by reducing the number of head seeks
46 required, but also increases memory usage and can reduce perfor‐
47 mance in some cases
48
49 -D --debug
50 if this feature is compiled in, show debugging statistics and
51 info at the end of program execution
52
53 -d --delete
54 prompt user for files to preserve, deleting all others (see
55 CAVEATS below)
56
57 -e --error-on-dupe
58 exit on any duplicate found with status code 255
59
60 -f --omit-first
61 omit the first file in each set of matches
62
63 -H --hard-links
64 normally, when two or more files point to the same disk area
65 they are treated as non-duplicates; this option will change this
66 behavior
67
68 -h --help
69 displays help
70
71 -i --reverse
72 reverse (invert) the sort order of matches
73
74 -I --isolate
75 isolate each command-line parameter from one another; only match
76 if the files are under different parameter specifications
77
78 -j --json
79 produce JSON (machine-readable) output
80
81 -L --link-hard
82 replace all duplicate files with hardlinks to the first file in
83 each set of duplicates
84
85 -m --summarize
86 summarize duplicate file information
87
88 -M --print-summarize
89 print matches and summarize the duplicate file information at
90 the end
91
92 -N --no-prompt
93 when used together with --delete, preserve the first file in
94 each set of duplicates and delete the others without prompting
95 the user
96
97 -O --param-order
98 parameter order preservation is more important than the chosen
99 sort; this is particularly useful with the -N option to ensure
100 that automatic deletion behaves in a controllable way
101
102 -o --order=WORD
103 order files according to WORD: time - sort by modification time
104 name - sort by filename (default)
105
106 -p --permissions
107 don't consider files with different owner/group or permission
108 bits as duplicates
109
110 -P --print=type
111 print extra information to stdout; valid options are: early -
112 matches that pass early size/permission/link/etc. checks partial
113 - files whose partial hashes match fullhash - files whose full
114 hashes match
115
116 -Q --quick
117 [WARNING: RISK OF DATA LOSS, SEE CAVEATS] skip byte-for-byte
118 verification of duplicate pairs (use hashes only)
119
120 -q --quiet
121 hide progress indicator
122
123 -R --recurse:
124 for each directory given after this option follow subdirectories
125 encountered within (note the ':' at the end of option; see the
126 Examples section below for further explanation)
127
128 -r --recurse
129 for every directory given follow subdirectories encountered
130 within
131
132 -l --link-soft
133 replace all duplicate files with symlinks to the first file in
134 each set of duplicates
135
136 -S --size
137 show size of duplicate files
138
139 -s --symlinks
140 follow symlinked directories
141
142 -T --partial-only
143 [WARNING: EXTREME RISK OF DATA LOSS, SEE CAVEATS] match based on
144 hash of first block of file data, ignoring the rest
145
146 -U --no-trav-check
147 disable double-traversal safety check (BE VERY CAREFUL)
148
149 -u --print-unique
150 print only a list of unique (non-duplicate, unmatched) files
151
152 -v --version
153 display jdupes version and compilation feature flags
154
155 -y --hash-db=file
156 create/use a hash database text file to speed up future runs by
157 caching file hash data
158
159 -X --ext-filter=spec:info
160 exclude/filter files based on specified criteria; general for‐
161 mat:
162
163 jdupes -X filter[:value][size_suffix]
164
165 Some filters take no value or multiple values. Filters that can
166 take a numeric option generally support the size multipliers
167 K/M/G/T/P/E with or without an added iB or B. Multipliers are
168 binary-style unless the -B suffix is used, which will use deci‐
169 mal multipliers. For example, 16k or 16kib = 16384; 16kb =
170 16000. Multipliers are case-insensitive.
171
172 Filters have cumulative effects: jdupes -X size+:99 -X size-:101
173 will cause only files of exactly 100 bytes in size to be in‐
174 cluded.
175
176 Extension matching is case-insensitive. Path substring matching
177 is case-sensitive.
178
179 Supported filters are:
180
181 `size[+-=]:number[suffix]'
182 match only if size is greater (+), less than (-), or
183 equal to (=) the specified number. The +/- and = speci‐
184 fiers can be combined, i.e. "size+=:4K" will only con‐
185 sider files with a size greater than or equal to four
186 kilobytes (4096 bytes).
187
188 `noext:ext1[,ext2,...]'
189 exclude files with certain extension(s), specified as a
190 comma-separated list. Do not use a leading dot.
191
192 `onlyext:ext1[,ext2,...]'
193 only include files with certain extension(s), specified
194 as a comma-separated list. Do not use a leading dot.
195
196 `nostr:text_string'
197 exclude all paths containing the substring text_string.
198 This scans the full file path, so it can be used to match
199 directories: -X nostr:dir_name/
200
201 `onlystr:text_string'
202 require all paths to contain the substring text_string.
203 This scans the full file path, so it can be used to match
204 directories: -X onlystr:dir_name/
205
206 `newer:datetime`
207 only include files newer than specified date. Date/time
208 format: "YYYY-MM-DD HH:MM:SS" (time is optional).
209
210 `older:datetime`
211 only include files older than specified date. Date/time
212 format: "YYYY-MM-DD HH:MM:SS" (time is optional).
213
214
215 -z --zero-match
216 consider zero-length files to be duplicates; this replaces the
217 old default behavior when -n was not specified
218
219 -Z --soft-abort
220 if the user aborts the program (as with CTRL-C) act on the
221 matches that were found before the abort was received. For exam‐
222 ple, if -L and -Z are specified, all matches found prior to the
223 abort will be hard linked. The default behavior without -Z is to
224 abort without taking any actions.
225
226
228 A set of arrows are used in hard linking to show what action was taken
229 on each link candidate. These arrows are as follows:
230
231
232 ----> This file was successfully hard linked to the first file in the
233 duplicate chain
234
235 -@@-> This file was successfully symlinked to the first file in the
236 chain
237
238 -##-> This file was successfully cloned from the first file in the
239 chain
240
241 -==-> This file was already a hard link to the first file in the chain
242
243 -//-> Linking this file failed due to an error during the linking
244 process
245
246
247 Duplicate files are listed together in groups with each file displayed
248 on a separate line. The groups are then separated from each other by
249 blank lines.
250
251
253 jdupes a --recurse: b
254 will follow subdirectories under b, but not those under a.
255
256 jdupes a --recurse b
257 will follow subdirectories under both a and b.
258
259 jdupes -O dir1 dir3 dir2
260 will always place 'dir1' results first in any match set (where
261 relevant)
262
263
265 Using -1 or --one-file-system prevents matches that cross filesystems,
266 but a more relaxed form of this option may be added that allows cross-
267 matching for all filesystems that each parameter is present on.
268
269 When using -d or --delete, care should be taken to insure against acci‐
270 dental data loss.
271
272 -Z or --soft-abort used to be --hardabort in jdupes prior to v1.5 and
273 had the opposite behavior. Defaulting to taking action on abort is
274 probably not what most users would expect. The decision to invert
275 rather than reassign to a different option was made because this fea‐
276 ture was still fairly new at the time of the change.
277
278 The -O or --param-order option allows the user greater control over
279 what appears in the first position of a match set, specifically for
280 keeping the -N option from deleting all but one file in a set in a
281 seemingly random way. All directories specified on the command line
282 will be used as the sorting order of result sets first, followed by the
283 sorting algorithm set by the -o or --order option. This means that the
284 order of all match pairs for a single directory specification will re‐
285 tain the old sorting behavior even if this option is specified.
286
287 When used together with options -s or --symlink, a user could acciden‐
288 tally preserve a symlink while deleting the file it points to.
289
290 The -Q or --quick option only reads each file once, hashes it, and per‐
291 forms comparisons based solely on the hashes. There is a small but sig‐
292 nificant risk of a hash collision which is the purpose of the failsafe
293 byte-for-byte comparison that this option explicitly bypasses. Do not
294 use it on ANY data set for which any amount of data loss is unaccept‐
295 able. This option is not included in the help text for the program due
296 to its risky nature. You have been warned!
297
298 The -T or --partial-only option produces results based on a hash of the
299 first block of file data in each file, ignoring everything else in the
300 file. Partial hash checks have always been an important exclusion step
301 in the jdupes algorithm, usually hashing the first 4096 bytes of data
302 and allowing files that are different at the start to be rejected
303 early. In certain scenarios it may be a useful heuristic for a user to
304 see that a set of files has the same size and the same starting data,
305 even if the remaining data does not match; one example of this would be
306 comparing files with data blocks that are damaged or missing such as an
307 incomplete file transfer or checking a data recovery against known-good
308 copies to see what damaged data can be deleted in favor of restoring
309 the known-good copy. This option is meant to be used with informational
310 actions and can result in EXTREME DATA LOSS if used with options that
311 delete files, create hard links, or perform other destructive actions
312 on data based on the matching output. Because of the potential for mas‐
313 sive data destruction, this option MUST BE SPECIFIED TWICE to take ef‐
314 fect and will error out if it is only specified once.
315
316 Using the -C or --chunk-size option to override I/O chunk size can in‐
317 crease performance on rotating storage media by reducing "head thrash‐
318 ing," reading larger amounts of data sequentially from each file. This
319 tunable size can have bad side effects; the default size maximizes al‐
320 gorithmic performance without regard to the I/O characteristics of any
321 given device and uses a modest amount of memory, but other values may
322 greatly increase memory usage or incur a lot more system call overhead.
323 Try several different values to see how they affect performance for
324 your hardware and data set. This option does not affect match results
325 in any way, so even if it slows down the file matching process it will
326 not hurt anything.
327
328 The -y or --hash-db feature creates and maintains a text file with a
329 list of file paths, hashes, and other metadata that enables jdupes to
330 "remember" file data across runs. Specifying a period '.' as the data‐
331 base file name will use a name of "jdupes_hashdb.txt" instead; this
332 alias makes it easy to use the hash database feature without typing a
333 descriptive name each time. THIS FEATURE IS CURRENTLY UNDER DEVELOPMENT
334 AND HAS MANY QUIRKS. USE IT AT YOUR OWN RISK. In particular, one of the
335 biggest problems with this feature is that it stores every path exactly
336 as specified on the command line; if any paths are passed into jdupes
337 on a subsequent run with a different prefix then they will not be rec‐
338 ognized and they will be treated as totally different files. For exam‐
339 ple, running jdupes -y . foo/ is not the same as jdupes -y . ./foo nor
340 the same as (from a sibling directory) jdupes -y ../foo. You must run
341 jdupes from the same working directory and with the same path specifi‐
342 cations to take advantage of the hash database feature. When used cor‐
343 rectly, a fully populated hash database can reduce subsequent runs with
344 hundreds of thousands of files that normally take a very long time to
345 run down to the directory scanning time plus a couple of seconds. If
346 the directory data is already in the OS disk cache, this can make sub‐
347 sequent runs with over 100K files finish in under one second.
348
349
351 Send bug reports and feature requests to jody@jodybruchon.com, or for
352 general information and help, visit www.jdupes.com
353
354
356 If you find this software useful, please consider financially support‐
357 ing its development through the author's home page:
358
359 https://www.jodybruchon.com/
360
361
363 jdupes is created and maintained by Jody Bruchon <jody@jodybruchon.com>
364 and was forked from fdupes 1.51 by Adrian Lopez <adrian2@caribe.net>
365
366
368 MIT License
369
370 Copyright (c) 2015-2023 Jody Lee Bruchon <jody@jodybruchon.com>
371
372 Permission is hereby granted, free of charge, to any person obtaining a
373 copy of this software and associated documentation files (the "Soft‐
374 ware"), to deal in the Software without restriction, including without
375 limitation the rights to use, copy, modify, merge, publish, distribute,
376 sublicense, and/or sell copies of the Software, and to permit persons
377 to whom the Software is furnished to do so, subject to the following
378 conditions:
379
380 The above copyright notice and this permission notice shall be included
381 in all copies or substantial portions of the Software.
382
383 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
384 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MER‐
385 CHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
386 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
387 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
388 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFT‐
389 WARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
390
391
392
393 JDUPES(1)