1XZ(1) XZ Utils XZ(1)
2
3
4
6 xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and
7 .lzma files
8
10 xz [option...] [file...]
11
13 unxz is equivalent to xz --decompress.
14 xzcat is equivalent to xz --decompress --stdout.
15 lzma is equivalent to xz --format=lzma.
16 unlzma is equivalent to xz --format=lzma --decompress.
17 lzcat is equivalent to xz --format=lzma --decompress --stdout.
18
19 When writing scripts that need to decompress files, it is recommended
20 to always use the name xz with appropriate arguments (xz -d or xz -dc)
21 instead of the names unxz and xzcat.
22
24 xz is a general-purpose data compression tool with command line syntax
25 similar to gzip(1) and bzip2(1). The native file format is the .xz
26 format, but the legacy .lzma format used by LZMA Utils and raw com‐
27 pressed streams with no container format headers are also supported.
28 In addition, decompression of the .lz format used by lzip is supported.
29
30 xz compresses or decompresses each file according to the selected oper‐
31 ation mode. If no files are given or file is -, xz reads from standard
32 input and writes the processed data to standard output. xz will refuse
33 (display an error and skip the file) to write compressed data to stan‐
34 dard output if it is a terminal. Similarly, xz will refuse to read
35 compressed data from standard input if it is a terminal.
36
37 Unless --stdout is specified, files other than - are written to a new
38 file whose name is derived from the source file name:
39
40 • When compressing, the suffix of the target file format (.xz or
41 .lzma) is appended to the source filename to get the target file‐
42 name.
43
44 • When decompressing, the .xz, .lzma, or .lz suffix is removed from
45 the filename to get the target filename. xz also recognizes the
46 suffixes .txz and .tlz, and replaces them with the .tar suffix.
47
48 If the target file already exists, an error is displayed and the file
49 is skipped.
50
51 Unless writing to standard output, xz will display a warning and skip
52 the file if any of the following applies:
53
54 • File is not a regular file. Symbolic links are not followed, and
55 thus they are not considered to be regular files.
56
57 • File has more than one hard link.
58
59 • File has setuid, setgid, or sticky bit set.
60
61 • The operation mode is set to compress and the file already has a
62 suffix of the target file format (.xz or .txz when compressing to
63 the .xz format, and .lzma or .tlz when compressing to the .lzma for‐
64 mat).
65
66 • The operation mode is set to decompress and the file doesn't have a
67 suffix of any of the supported file formats (.xz, .txz, .lzma, .tlz,
68 or .lz).
69
70 After successfully compressing or decompressing the file, xz copies the
71 owner, group, permissions, access time, and modification time from the
72 source file to the target file. If copying the group fails, the per‐
73 missions are modified so that the target file doesn't become accessible
74 to users who didn't have permission to access the source file. xz
75 doesn't support copying other metadata like access control lists or ex‐
76 tended attributes yet.
77
78 Once the target file has been successfully closed, the source file is
79 removed unless --keep was specified. The source file is never removed
80 if the output is written to standard output or if an error occurs.
81
82 Sending SIGINFO or SIGUSR1 to the xz process makes it print progress
83 information to standard error. This has only limited use since when
84 standard error is a terminal, using --verbose will display an automati‐
85 cally updating progress indicator.
86
87 Memory usage
88 The memory usage of xz varies from a few hundred kilobytes to several
89 gigabytes depending on the compression settings. The settings used
90 when compressing a file determine the memory requirements of the decom‐
91 pressor. Typically the decompressor needs 5 % to 20 % of the amount of
92 memory that the compressor needed when creating the file. For example,
93 decompressing a file created with xz -9 currently requires 65 MiB of
94 memory. Still, it is possible to have .xz files that require several
95 gigabytes of memory to decompress.
96
97 Especially users of older systems may find the possibility of very
98 large memory usage annoying. To prevent uncomfortable surprises, xz
99 has a built-in memory usage limiter, which is disabled by default.
100 While some operating systems provide ways to limit the memory usage of
101 processes, relying on it wasn't deemed to be flexible enough (for exam‐
102 ple, using ulimit(1) to limit virtual memory tends to cripple mmap(2)).
103
104 The memory usage limiter can be enabled with the command line option
105 --memlimit=limit. Often it is more convenient to enable the limiter by
106 default by setting the environment variable XZ_DEFAULTS, for example,
107 XZ_DEFAULTS=--memlimit=150MiB. It is possible to set the limits sepa‐
108 rately for compression and decompression by using --memlimit-com‐
109 press=limit and --memlimit-decompress=limit. Using these two options
110 outside XZ_DEFAULTS is rarely useful because a single run of xz cannot
111 do both compression and decompression and --memlimit=limit (or -M
112 limit) is shorter to type on the command line.
113
114 If the specified memory usage limit is exceeded when decompressing, xz
115 will display an error and decompressing the file will fail. If the
116 limit is exceeded when compressing, xz will try to scale the settings
117 down so that the limit is no longer exceeded (except when using --for‐
118 mat=raw or --no-adjust). This way the operation won't fail unless the
119 limit is very small. The scaling of the settings is done in steps that
120 don't match the compression level presets, for example, if the limit is
121 only slightly less than the amount required for xz -9, the settings
122 will be scaled down only a little, not all the way down to xz -8.
123
124 Concatenation and padding with .xz files
125 It is possible to concatenate .xz files as is. xz will decompress such
126 files as if they were a single .xz file.
127
128 It is possible to insert padding between the concatenated parts or af‐
129 ter the last part. The padding must consist of null bytes and the size
130 of the padding must be a multiple of four bytes. This can be useful,
131 for example, if the .xz file is stored on a medium that measures file
132 sizes in 512-byte blocks.
133
134 Concatenation and padding are not allowed with .lzma files or raw
135 streams.
136
138 Integer suffixes and special values
139 In most places where an integer argument is expected, an optional suf‐
140 fix is supported to easily indicate large integers. There must be no
141 space between the integer and the suffix.
142
143 KiB Multiply the integer by 1,024 (2^10). Ki, k, kB, K, and KB are
144 accepted as synonyms for KiB.
145
146 MiB Multiply the integer by 1,048,576 (2^20). Mi, m, M, and MB are
147 accepted as synonyms for MiB.
148
149 GiB Multiply the integer by 1,073,741,824 (2^30). Gi, g, G, and GB
150 are accepted as synonyms for GiB.
151
152 The special value max can be used to indicate the maximum integer value
153 supported by the option.
154
155 Operation mode
156 If multiple operation mode options are given, the last one takes ef‐
157 fect.
158
159 -z, --compress
160 Compress. This is the default operation mode when no operation
161 mode option is specified and no other operation mode is implied
162 from the command name (for example, unxz implies --decompress).
163
164 -d, --decompress, --uncompress
165 Decompress.
166
167 -t, --test
168 Test the integrity of compressed files. This option is equiva‐
169 lent to --decompress --stdout except that the decompressed data
170 is discarded instead of being written to standard output. No
171 files are created or removed.
172
173 -l, --list
174 Print information about compressed files. No uncompressed out‐
175 put is produced, and no files are created or removed. In list
176 mode, the program cannot read the compressed data from standard
177 input or from other unseekable sources.
178
179 The default listing shows basic information about files, one
180 file per line. To get more detailed information, use also the
181 --verbose option. For even more information, use --verbose
182 twice, but note that this may be slow, because getting all the
183 extra information requires many seeks. The width of verbose
184 output exceeds 80 characters, so piping the output to, for exam‐
185 ple, less -S may be convenient if the terminal isn't wide
186 enough.
187
188 The exact output may vary between xz versions and different lo‐
189 cales. For machine-readable output, --robot --list should be
190 used.
191
192 Operation modifiers
193 -k, --keep
194 Don't delete the input files.
195
196 Since xz 5.2.6, this option also makes xz compress or decompress
197 even if the input is a symbolic link to a regular file, has more
198 than one hard link, or has the setuid, setgid, or sticky bit
199 set. The setuid, setgid, and sticky bits are not copied to the
200 target file. In earlier versions this was only done with
201 --force.
202
203 -f, --force
204 This option has several effects:
205
206 • If the target file already exists, delete it before compress‐
207 ing or decompressing.
208
209 • Compress or decompress even if the input is a symbolic link
210 to a regular file, has more than one hard link, or has the
211 setuid, setgid, or sticky bit set. The setuid, setgid, and
212 sticky bits are not copied to the target file.
213
214 • When used with --decompress --stdout and xz cannot recognize
215 the type of the source file, copy the source file as is to
216 standard output. This allows xzcat --force to be used like
217 cat(1) for files that have not been compressed with xz. Note
218 that in future, xz might support new compressed file formats,
219 which may make xz decompress more types of files instead of
220 copying them as is to standard output. --format=format can
221 be used to restrict xz to decompress only a single file for‐
222 mat.
223
224 -c, --stdout, --to-stdout
225 Write the compressed or decompressed data to standard output in‐
226 stead of a file. This implies --keep.
227
228 --single-stream
229 Decompress only the first .xz stream, and silently ignore possi‐
230 ble remaining input data following the stream. Normally such
231 trailing garbage makes xz display an error.
232
233 xz never decompresses more than one stream from .lzma files or
234 raw streams, but this option still makes xz ignore the possible
235 trailing data after the .lzma file or raw stream.
236
237 This option has no effect if the operation mode is not --decom‐
238 press or --test.
239
240 --no-sparse
241 Disable creation of sparse files. By default, if decompressing
242 into a regular file, xz tries to make the file sparse if the de‐
243 compressed data contains long sequences of binary zeros. It
244 also works when writing to standard output as long as standard
245 output is connected to a regular file and certain additional
246 conditions are met to make it safe. Creating sparse files may
247 save disk space and speed up the decompression by reducing the
248 amount of disk I/O.
249
250 -S .suf, --suffix=.suf
251 When compressing, use .suf as the suffix for the target file in‐
252 stead of .xz or .lzma. If not writing to standard output and
253 the source file already has the suffix .suf, a warning is dis‐
254 played and the file is skipped.
255
256 When decompressing, recognize files with the suffix .suf in ad‐
257 dition to files with the .xz, .txz, .lzma, .tlz, or .lz suffix.
258 If the source file has the suffix .suf, the suffix is removed to
259 get the target filename.
260
261 When compressing or decompressing raw streams (--format=raw),
262 the suffix must always be specified unless writing to standard
263 output, because there is no default suffix for raw streams.
264
265 --files[=file]
266 Read the filenames to process from file; if file is omitted,
267 filenames are read from standard input. Filenames must be ter‐
268 minated with the newline character. A dash (-) is taken as a
269 regular filename; it doesn't mean standard input. If filenames
270 are given also as command line arguments, they are processed be‐
271 fore the filenames read from file.
272
273 --files0[=file]
274 This is identical to --files[=file] except that each filename
275 must be terminated with the null character.
276
277 Basic file format and compression options
278 -F format, --format=format
279 Specify the file format to compress or decompress:
280
281 auto This is the default. When compressing, auto is equiva‐
282 lent to xz. When decompressing, the format of the input
283 file is automatically detected. Note that raw streams
284 (created with --format=raw) cannot be auto-detected.
285
286 xz Compress to the .xz file format, or accept only .xz files
287 when decompressing.
288
289 lzma, alone
290 Compress to the legacy .lzma file format, or accept only
291 .lzma files when decompressing. The alternative name
292 alone is provided for backwards compatibility with LZMA
293 Utils.
294
295 lzip Accept only .lz files when decompressing. Compression is
296 not supported.
297
298 The .lz format version 0 and the unextended version 1 are
299 supported. Version 0 files were produced by lzip 1.3 and
300 older. Such files aren't common but may be found from
301 file archives as a few source packages were released in
302 this format. People might have old personal files in
303 this format too. Decompression support for the format
304 version 0 was removed in lzip 1.18.
305
306 lzip 1.4 and later create files in the format version 1.
307 The sync flush marker extension to the format version 1
308 was added in lzip 1.6. This extension is rarely used and
309 isn't supported by xz (diagnosed as corrupt input).
310
311 raw Compress or uncompress a raw stream (no headers). This
312 is meant for advanced users only. To decode raw streams,
313 you need use --format=raw and explicitly specify the fil‐
314 ter chain, which normally would have been stored in the
315 container headers.
316
317 -C check, --check=check
318 Specify the type of the integrity check. The check is calcu‐
319 lated from the uncompressed data and stored in the .xz file.
320 This option has an effect only when compressing into the .xz
321 format; the .lzma format doesn't support integrity checks. The
322 integrity check (if any) is verified when the .xz file is decom‐
323 pressed.
324
325 Supported check types:
326
327 none Don't calculate an integrity check at all. This is usu‐
328 ally a bad idea. This can be useful when integrity of
329 the data is verified by other means anyway.
330
331 crc32 Calculate CRC32 using the polynomial from IEEE-802.3
332 (Ethernet).
333
334 crc64 Calculate CRC64 using the polynomial from ECMA-182. This
335 is the default, since it is slightly better than CRC32 at
336 detecting damaged files and the speed difference is neg‐
337 ligible.
338
339 sha256 Calculate SHA-256. This is somewhat slower than CRC32
340 and CRC64.
341
342 Integrity of the .xz headers is always verified with CRC32. It
343 is not possible to change or disable it.
344
345 --ignore-check
346 Don't verify the integrity check of the compressed data when de‐
347 compressing. The CRC32 values in the .xz headers will still be
348 verified normally.
349
350 Do not use this option unless you know what you are doing. Pos‐
351 sible reasons to use this option:
352
353 • Trying to recover data from a corrupt .xz file.
354
355 • Speeding up decompression. This matters mostly with SHA-256
356 or with files that have compressed extremely well. It's rec‐
357 ommended to not use this option for this purpose unless the
358 file integrity is verified externally in some other way.
359
360 -0 ... -9
361 Select a compression preset level. The default is -6. If mul‐
362 tiple preset levels are specified, the last one takes effect.
363 If a custom filter chain was already specified, setting a com‐
364 pression preset level clears the custom filter chain.
365
366 The differences between the presets are more significant than
367 with gzip(1) and bzip2(1). The selected compression settings
368 determine the memory requirements of the decompressor, thus us‐
369 ing a too high preset level might make it painful to decompress
370 the file on an old system with little RAM. Specifically, it's
371 not a good idea to blindly use -9 for everything like it often
372 is with gzip(1) and bzip2(1).
373
374 -0 ... -3
375 These are somewhat fast presets. -0 is sometimes faster
376 than gzip -9 while compressing much better. The higher
377 ones often have speed comparable to bzip2(1) with compa‐
378 rable or better compression ratio, although the results
379 depend a lot on the type of data being compressed.
380
381 -4 ... -6
382 Good to very good compression while keeping decompressor
383 memory usage reasonable even for old systems. -6 is the
384 default, which is usually a good choice for distributing
385 files that need to be decompressible even on systems with
386 only 16 MiB RAM. (-5e or -6e may be worth considering
387 too. See --extreme.)
388
389 -7 ... -9
390 These are like -6 but with higher compressor and decom‐
391 pressor memory requirements. These are useful only when
392 compressing files bigger than 8 MiB, 16 MiB, and 32 MiB,
393 respectively.
394
395 On the same hardware, the decompression speed is approximately a
396 constant number of bytes of compressed data per second. In
397 other words, the better the compression, the faster the decom‐
398 pression will usually be. This also means that the amount of
399 uncompressed output produced per second can vary a lot.
400
401 The following table summarises the features of the presets:
402
403 Preset DictSize CompCPU CompMem DecMem
404 -0 256 KiB 0 3 MiB 1 MiB
405 -1 1 MiB 1 9 MiB 2 MiB
406 -2 2 MiB 2 17 MiB 3 MiB
407 -3 4 MiB 3 32 MiB 5 MiB
408 -4 4 MiB 4 48 MiB 5 MiB
409 -5 8 MiB 5 94 MiB 9 MiB
410 -6 8 MiB 6 94 MiB 9 MiB
411 -7 16 MiB 6 186 MiB 17 MiB
412 -8 32 MiB 6 370 MiB 33 MiB
413 -9 64 MiB 6 674 MiB 65 MiB
414
415 Column descriptions:
416
417 • DictSize is the LZMA2 dictionary size. It is waste of memory
418 to use a dictionary bigger than the size of the uncompressed
419 file. This is why it is good to avoid using the presets -7
420 ... -9 when there's no real need for them. At -6 and lower,
421 the amount of memory wasted is usually low enough to not mat‐
422 ter.
423
424 • CompCPU is a simplified representation of the LZMA2 settings
425 that affect compression speed. The dictionary size affects
426 speed too, so while CompCPU is the same for levels -6 ... -9,
427 higher levels still tend to be a little slower. To get even
428 slower and thus possibly better compression, see --extreme.
429
430 • CompMem contains the compressor memory requirements in the
431 single-threaded mode. It may vary slightly between xz ver‐
432 sions. Memory requirements of some of the future multi‐
433 threaded modes may be dramatically higher than that of the
434 single-threaded mode.
435
436 • DecMem contains the decompressor memory requirements. That
437 is, the compression settings determine the memory require‐
438 ments of the decompressor. The exact decompressor memory us‐
439 age is slightly more than the LZMA2 dictionary size, but the
440 values in the table have been rounded up to the next full
441 MiB.
442
443 -e, --extreme
444 Use a slower variant of the selected compression preset level
445 (-0 ... -9) to hopefully get a little bit better compression ra‐
446 tio, but with bad luck this can also make it worse. Decompres‐
447 sor memory usage is not affected, but compressor memory usage
448 increases a little at preset levels -0 ... -3.
449
450 Since there are two presets with dictionary sizes 4 MiB and
451 8 MiB, the presets -3e and -5e use slightly faster settings
452 (lower CompCPU) than -4e and -6e, respectively. That way no two
453 presets are identical.
454
455 Preset DictSize CompCPU CompMem DecMem
456 -0e 256 KiB 8 4 MiB 1 MiB
457 -1e 1 MiB 8 13 MiB 2 MiB
458 -2e 2 MiB 8 25 MiB 3 MiB
459 -3e 4 MiB 7 48 MiB 5 MiB
460 -4e 4 MiB 8 48 MiB 5 MiB
461 -5e 8 MiB 7 94 MiB 9 MiB
462 -6e 8 MiB 8 94 MiB 9 MiB
463 -7e 16 MiB 8 186 MiB 17 MiB
464 -8e 32 MiB 8 370 MiB 33 MiB
465 -9e 64 MiB 8 674 MiB 65 MiB
466
467 For example, there are a total of four presets that use 8 MiB
468 dictionary, whose order from the fastest to the slowest is -5,
469 -6, -5e, and -6e.
470
471 --fast
472 --best These are somewhat misleading aliases for -0 and -9, respec‐
473 tively. These are provided only for backwards compatibility
474 with LZMA Utils. Avoid using these options.
475
476 --block-size=size
477 When compressing to the .xz format, split the input data into
478 blocks of size bytes. The blocks are compressed independently
479 from each other, which helps with multi-threading and makes lim‐
480 ited random-access decompression possible. This option is typi‐
481 cally used to override the default block size in multi-threaded
482 mode, but this option can be used in single-threaded mode too.
483
484 In multi-threaded mode about three times size bytes will be al‐
485 located in each thread for buffering input and output. The de‐
486 fault size is three times the LZMA2 dictionary size or 1 MiB,
487 whichever is more. Typically a good value is 2–4 times the size
488 of the LZMA2 dictionary or at least 1 MiB. Using size less than
489 the LZMA2 dictionary size is waste of RAM because then the LZMA2
490 dictionary buffer will never get fully used. The sizes of the
491 blocks are stored in the block headers, which a future version
492 of xz will use for multi-threaded decompression.
493
494 In single-threaded mode no block splitting is done by default.
495 Setting this option doesn't affect memory usage. No size infor‐
496 mation is stored in block headers, thus files created in single-
497 threaded mode won't be identical to files created in multi-
498 threaded mode. The lack of size information also means that a
499 future version of xz won't be able decompress the files in
500 multi-threaded mode.
501
502 --block-list=sizes
503 When compressing to the .xz format, start a new block after the
504 given intervals of uncompressed data.
505
506 The uncompressed sizes of the blocks are specified as a comma-
507 separated list. Omitting a size (two or more consecutive com‐
508 mas) is a shorthand to use the size of the previous block.
509
510 If the input file is bigger than the sum of sizes, the last
511 value in sizes is repeated until the end of the file. A special
512 value of 0 may be used as the last value to indicate that the
513 rest of the file should be encoded as a single block.
514
515 If one specifies sizes that exceed the encoder's block size (ei‐
516 ther the default value in threaded mode or the value specified
517 with --block-size=size), the encoder will create additional
518 blocks while keeping the boundaries specified in sizes. For ex‐
519 ample, if one specifies --block-size=10MiB
520 --block-list=5MiB,10MiB,8MiB,12MiB,24MiB and the input file is
521 80 MiB, one will get 11 blocks: 5, 10, 8, 10, 2, 10, 10, 4, 10,
522 10, and 1 MiB.
523
524 In multi-threaded mode the sizes of the blocks are stored in the
525 block headers. This isn't done in single-threaded mode, so the
526 encoded output won't be identical to that of the multi-threaded
527 mode.
528
529 --flush-timeout=timeout
530 When compressing, if more than timeout milliseconds (a positive
531 integer) has passed since the previous flush and reading more
532 input would block, all the pending input data is flushed from
533 the encoder and made available in the output stream. This can
534 be useful if xz is used to compress data that is streamed over a
535 network. Small timeout values make the data available at the
536 receiving end with a small delay, but large timeout values give
537 better compression ratio.
538
539 This feature is disabled by default. If this option is speci‐
540 fied more than once, the last one takes effect. The special
541 timeout value of 0 can be used to explicitly disable this fea‐
542 ture.
543
544 This feature is not available on non-POSIX systems.
545
546 This feature is still experimental. Currently xz is unsuitable
547 for decompressing the stream in real time due to how xz does
548 buffering.
549
550 --memlimit-compress=limit
551 Set a memory usage limit for compression. If this option is
552 specified multiple times, the last one takes effect.
553
554 If the compression settings exceed the limit, xz will attempt to
555 adjust the settings downwards so that the limit is no longer ex‐
556 ceeded and display a notice that automatic adjustment was done.
557 The adjustments are done in this order: reducing the number of
558 threads, switching to single-threaded mode if even one thread in
559 multi-threaded mode exceeds the limit, and finally reducing the
560 LZMA2 dictionary size.
561
562 When compressing with --format=raw or if --no-adjust has been
563 specified, only the number of threads may be reduced since it
564 can be done without affecting the compressed output.
565
566 If the limit cannot be met even with the adjustments described
567 above, an error is displayed and xz will exit with exit status
568 1.
569
570 The limit can be specified in multiple ways:
571
572 • The limit can be an absolute value in bytes. Using an inte‐
573 ger suffix like MiB can be useful. Example: --memlimit-com‐
574 press=80MiB
575
576 • The limit can be specified as a percentage of total physical
577 memory (RAM). This can be useful especially when setting the
578 XZ_DEFAULTS environment variable in a shell initialization
579 script that is shared between different computers. That way
580 the limit is automatically bigger on systems with more mem‐
581 ory. Example: --memlimit-compress=70%
582
583 • The limit can be reset back to its default value by setting
584 it to 0. This is currently equivalent to setting the limit
585 to max (no memory usage limit).
586
587 For 32-bit xz there is a special case: if the limit would be
588 over 4020 MiB, the limit is set to 4020 MiB. On MIPS32 2000 MiB
589 is used instead. (The values 0 and max aren't affected by this.
590 A similar feature doesn't exist for decompression.) This can be
591 helpful when a 32-bit executable has access to 4 GiB address
592 space (2 GiB on MIPS32) while hopefully doing no harm in other
593 situations.
594
595 See also the section Memory usage.
596
597 --memlimit-decompress=limit
598 Set a memory usage limit for decompression. This also affects
599 the --list mode. If the operation is not possible without ex‐
600 ceeding the limit, xz will display an error and decompressing
601 the file will fail. See --memlimit-compress=limit for possible
602 ways to specify the limit.
603
604 --memlimit-mt-decompress=limit
605 Set a memory usage limit for multi-threaded decompression. This
606 can only affect the number of threads; this will never make xz
607 refuse to decompress a file. If limit is too low to allow any
608 multi-threading, the limit is ignored and xz will continue in
609 single-threaded mode. Note that if also --memlimit-decompress
610 is used, it will always apply to both single-threaded and multi-
611 threaded modes, and so the effective limit for multi-threading
612 will never be higher than the limit set with --memlimit-decom‐
613 press.
614
615 In contrast to the other memory usage limit options, --mem‐
616 limit-mt-decompress=limit has a system-specific default limit.
617 xz --info-memory can be used to see the current value.
618
619 This option and its default value exist because without any
620 limit the threaded decompressor could end up allocating an in‐
621 sane amount of memory with some input files. If the default
622 limit is too low on your system, feel free to increase the limit
623 but never set it to a value larger than the amount of usable RAM
624 as with appropriate input files xz will attempt to use that
625 amount of memory even with a low number of threads. Running out
626 of memory or swapping will not improve decompression perfor‐
627 mance.
628
629 See --memlimit-compress=limit for possible ways to specify the
630 limit. Setting limit to 0 resets the limit to the default sys‐
631 tem-specific value.
632
633
634
635 -M limit, --memlimit=limit, --memory=limit
636 This is equivalent to specifying --memlimit-compress=limit
637 --memlimit-decompress=limit --memlimit-mt-decompress=limit.
638
639 --no-adjust
640 Display an error and exit if the memory usage limit cannot be
641 met without adjusting settings that affect the compressed out‐
642 put. That is, this prevents xz from switching the encoder from
643 multi-threaded mode to single-threaded mode and from reducing
644 the LZMA2 dictionary size. Even when this option is used the
645 number of threads may be reduced to meet the memory usage limit
646 as that won't affect the compressed output.
647
648 Automatic adjusting is always disabled when creating raw streams
649 (--format=raw).
650
651 -T threads, --threads=threads
652 Specify the number of worker threads to use. Setting threads to
653 a special value 0 makes xz use up to as many threads as the pro‐
654 cessor(s) on the system support. The actual number of threads
655 can be fewer than threads if the input file is not big enough
656 for threading with the given settings or if using more threads
657 would exceed the memory usage limit.
658
659 The single-threaded and multi-threaded compressors produce dif‐
660 ferent output. Single-threaded compressor will give the small‐
661 est file size but only the output from the multi-threaded com‐
662 pressor can be decompressed using multiple threads. Setting
663 threads to 1 will use the single-threaded mode. Setting threads
664 to any other value, including 0, will use the multi-threaded
665 compressor even if the system supports only one hardware thread.
666 (xz 5.2.x used single-threaded mode in this situation.)
667
668 To use multi-threaded mode with only one thread, set threads to
669 +1. The + prefix has no effect with values other than 1. A
670 memory usage limit can still make xz switch to single-threaded
671 mode unless --no-adjust is used. Support for the + prefix was
672 added in xz 5.4.0.
673
674 If an automatic number of threads has been requested and no mem‐
675 ory usage limit has been specified, then a system-specific de‐
676 fault soft limit will be used to possibly limit the number of
677 threads. It is a soft limit in sense that it is ignored if the
678 number of threads becomes one, thus a soft limit will never stop
679 xz from compressing or decompressing. This default soft limit
680 will not make xz switch from multi-threaded mode to single-
681 threaded mode. The active limits can be seen with xz
682 --info-memory.
683
684 Currently the only threading method is to split the input into
685 blocks and compress them independently from each other. The de‐
686 fault block size depends on the compression level and can be
687 overridden with the --block-size=size option.
688
689 Threaded decompression only works on files that contain multiple
690 blocks with size information in block headers. All large enough
691 files compressed in multi-threaded mode meet this condition, but
692 files compressed in single-threaded mode don't even if
693 --block-size=size has been used.
694
695 Custom compressor filter chains
696 A custom filter chain allows specifying the compression settings in de‐
697 tail instead of relying on the settings associated to the presets.
698 When a custom filter chain is specified, preset options (-0 ... -9 and
699 --extreme) earlier on the command line are forgotten. If a preset op‐
700 tion is specified after one or more custom filter chain options, the
701 new preset takes effect and the custom filter chain options specified
702 earlier are forgotten.
703
704 A filter chain is comparable to piping on the command line. When com‐
705 pressing, the uncompressed input goes to the first filter, whose output
706 goes to the next filter (if any). The output of the last filter gets
707 written to the compressed file. The maximum number of filters in the
708 chain is four, but typically a filter chain has only one or two fil‐
709 ters.
710
711 Many filters have limitations on where they can be in the filter chain:
712 some filters can work only as the last filter in the chain, some only
713 as a non-last filter, and some work in any position in the chain. De‐
714 pending on the filter, this limitation is either inherent to the filter
715 design or exists to prevent security issues.
716
717 A custom filter chain is specified by using one or more filter options
718 in the order they are wanted in the filter chain. That is, the order
719 of filter options is significant! When decoding raw streams (--for‐
720 mat=raw), the filter chain is specified in the same order as it was
721 specified when compressing.
722
723 Filters take filter-specific options as a comma-separated list. Extra
724 commas in options are ignored. Every option has a default value, so
725 you need to specify only those you want to change.
726
727 To see the whole filter chain and options, use xz -vv (that is, use
728 --verbose twice). This works also for viewing the filter chain options
729 used by presets.
730
731 --lzma1[=options]
732 --lzma2[=options]
733 Add LZMA1 or LZMA2 filter to the filter chain. These filters
734 can be used only as the last filter in the chain.
735
736 LZMA1 is a legacy filter, which is supported almost solely due
737 to the legacy .lzma file format, which supports only LZMA1.
738 LZMA2 is an updated version of LZMA1 to fix some practical is‐
739 sues of LZMA1. The .xz format uses LZMA2 and doesn't support
740 LZMA1 at all. Compression speed and ratios of LZMA1 and LZMA2
741 are practically the same.
742
743 LZMA1 and LZMA2 share the same set of options:
744
745 preset=preset
746 Reset all LZMA1 or LZMA2 options to preset. Preset con‐
747 sist of an integer, which may be followed by single-let‐
748 ter preset modifiers. The integer can be from 0 to 9,
749 matching the command line options -0 ... -9. The only
750 supported modifier is currently e, which matches --ex‐
751 treme. If no preset is specified, the default values of
752 LZMA1 or LZMA2 options are taken from the preset 6.
753
754 dict=size
755 Dictionary (history buffer) size indicates how many bytes
756 of the recently processed uncompressed data is kept in
757 memory. The algorithm tries to find repeating byte se‐
758 quences (matches) in the uncompressed data, and replace
759 them with references to the data currently in the dictio‐
760 nary. The bigger the dictionary, the higher is the
761 chance to find a match. Thus, increasing dictionary size
762 usually improves compression ratio, but a dictionary big‐
763 ger than the uncompressed file is waste of memory.
764
765 Typical dictionary size is from 64 KiB to 64 MiB. The
766 minimum is 4 KiB. The maximum for compression is cur‐
767 rently 1.5 GiB (1536 MiB). The decompressor already sup‐
768 ports dictionaries up to one byte less than 4 GiB, which
769 is the maximum for the LZMA1 and LZMA2 stream formats.
770
771 Dictionary size and match finder (mf) together determine
772 the memory usage of the LZMA1 or LZMA2 encoder. The same
773 (or bigger) dictionary size is required for decompressing
774 that was used when compressing, thus the memory usage of
775 the decoder is determined by the dictionary size used
776 when compressing. The .xz headers store the dictionary
777 size either as 2^n or 2^n + 2^(n-1), so these sizes are
778 somewhat preferred for compression. Other sizes will get
779 rounded up when stored in the .xz headers.
780
781 lc=lc Specify the number of literal context bits. The minimum
782 is 0 and the maximum is 4; the default is 3. In addi‐
783 tion, the sum of lc and lp must not exceed 4.
784
785 All bytes that cannot be encoded as matches are encoded
786 as literals. That is, literals are simply 8-bit bytes
787 that are encoded one at a time.
788
789 The literal coding makes an assumption that the highest
790 lc bits of the previous uncompressed byte correlate with
791 the next byte. For example, in typical English text, an
792 upper-case letter is often followed by a lower-case let‐
793 ter, and a lower-case letter is usually followed by an‐
794 other lower-case letter. In the US-ASCII character set,
795 the highest three bits are 010 for upper-case letters and
796 011 for lower-case letters. When lc is at least 3, the
797 literal coding can take advantage of this property in the
798 uncompressed data.
799
800 The default value (3) is usually good. If you want maxi‐
801 mum compression, test lc=4. Sometimes it helps a little,
802 and sometimes it makes compression worse. If it makes it
803 worse, test lc=2 too.
804
805 lp=lp Specify the number of literal position bits. The minimum
806 is 0 and the maximum is 4; the default is 0.
807
808 Lp affects what kind of alignment in the uncompressed
809 data is assumed when encoding literals. See pb below for
810 more information about alignment.
811
812 pb=pb Specify the number of position bits. The minimum is 0
813 and the maximum is 4; the default is 2.
814
815 Pb affects what kind of alignment in the uncompressed
816 data is assumed in general. The default means four-byte
817 alignment (2^pb=2^2=4), which is often a good choice when
818 there's no better guess.
819
820 When the alignment is known, setting pb accordingly may
821 reduce the file size a little. For example, with text
822 files having one-byte alignment (US-ASCII, ISO-8859-*,
823 UTF-8), setting pb=0 can improve compression slightly.
824 For UTF-16 text, pb=1 is a good choice. If the alignment
825 is an odd number like 3 bytes, pb=0 might be the best
826 choice.
827
828 Even though the assumed alignment can be adjusted with pb
829 and lp, LZMA1 and LZMA2 still slightly favor 16-byte
830 alignment. It might be worth taking into account when
831 designing file formats that are likely to be often com‐
832 pressed with LZMA1 or LZMA2.
833
834 mf=mf Match finder has a major effect on encoder speed, memory
835 usage, and compression ratio. Usually Hash Chain match
836 finders are faster than Binary Tree match finders. The
837 default depends on the preset: 0 uses hc3, 1–3 use hc4,
838 and the rest use bt4.
839
840 The following match finders are supported. The memory
841 usage formulas below are rough approximations, which are
842 closest to the reality when dict is a power of two.
843
844 hc3 Hash Chain with 2- and 3-byte hashing
845 Minimum value for nice: 3
846 Memory usage:
847 dict * 7.5 (if dict <= 16 MiB);
848 dict * 5.5 + 64 MiB (if dict > 16 MiB)
849
850 hc4 Hash Chain with 2-, 3-, and 4-byte hashing
851 Minimum value for nice: 4
852 Memory usage:
853 dict * 7.5 (if dict <= 32 MiB);
854 dict * 6.5 (if dict > 32 MiB)
855
856 bt2 Binary Tree with 2-byte hashing
857 Minimum value for nice: 2
858 Memory usage: dict * 9.5
859
860 bt3 Binary Tree with 2- and 3-byte hashing
861 Minimum value for nice: 3
862 Memory usage:
863 dict * 11.5 (if dict <= 16 MiB);
864 dict * 9.5 + 64 MiB (if dict > 16 MiB)
865
866 bt4 Binary Tree with 2-, 3-, and 4-byte hashing
867 Minimum value for nice: 4
868 Memory usage:
869 dict * 11.5 (if dict <= 32 MiB);
870 dict * 10.5 (if dict > 32 MiB)
871
872 mode=mode
873 Compression mode specifies the method to analyze the data
874 produced by the match finder. Supported modes are fast
875 and normal. The default is fast for presets 0–3 and nor‐
876 mal for presets 4–9.
877
878 Usually fast is used with Hash Chain match finders and
879 normal with Binary Tree match finders. This is also what
880 the presets do.
881
882 nice=nice
883 Specify what is considered to be a nice length for a
884 match. Once a match of at least nice bytes is found, the
885 algorithm stops looking for possibly better matches.
886
887 Nice can be 2–273 bytes. Higher values tend to give bet‐
888 ter compression ratio at the expense of speed. The de‐
889 fault depends on the preset.
890
891 depth=depth
892 Specify the maximum search depth in the match finder.
893 The default is the special value of 0, which makes the
894 compressor determine a reasonable depth from mf and nice.
895
896 Reasonable depth for Hash Chains is 4–100 and 16–1000 for
897 Binary Trees. Using very high values for depth can make
898 the encoder extremely slow with some files. Avoid set‐
899 ting the depth over 1000 unless you are prepared to in‐
900 terrupt the compression in case it is taking far too
901 long.
902
903 When decoding raw streams (--format=raw), LZMA2 needs only the
904 dictionary size. LZMA1 needs also lc, lp, and pb.
905
906 --x86[=options]
907 --arm[=options]
908 --armthumb[=options]
909 --arm64[=options]
910 --powerpc[=options]
911 --ia64[=options]
912 --sparc[=options]
913 Add a branch/call/jump (BCJ) filter to the filter chain. These
914 filters can be used only as a non-last filter in the filter
915 chain.
916
917 A BCJ filter converts relative addresses in the machine code to
918 their absolute counterparts. This doesn't change the size of
919 the data but it increases redundancy, which can help LZMA2 to
920 produce 0–15 % smaller .xz file. The BCJ filters are always re‐
921 versible, so using a BCJ filter for wrong type of data doesn't
922 cause any data loss, although it may make the compression ratio
923 slightly worse. The BCJ filters are very fast and use an in‐
924 significant amount of memory.
925
926 These BCJ filters have known problems related to the compression
927 ratio:
928
929 • Some types of files containing executable code (for example,
930 object files, static libraries, and Linux kernel modules)
931 have the addresses in the instructions filled with filler
932 values. These BCJ filters will still do the address conver‐
933 sion, which will make the compression worse with these files.
934
935 • If a BCJ filter is applied on an archive, it is possible that
936 it makes the compression ratio worse than not using a BCJ
937 filter. For example, if there are similar or even identical
938 executables then filtering will likely make the files less
939 similar and thus compression is worse. The contents of non-
940 executable files in the same archive can matter too. In
941 practice one has to try with and without a BCJ filter to see
942 which is better in each situation.
943
944 Different instruction sets have different alignment: the exe‐
945 cutable file must be aligned to a multiple of this value in the
946 input data to make the filter work.
947
948 Filter Alignment Notes
949 x86 1 32-bit or 64-bit x86
950 ARM 4
951 ARM-Thumb 2
952 ARM64 4 4096-byte alignment is best
953 PowerPC 4 Big endian only
954 IA-64 16 Itanium
955 SPARC 4
956
957 Since the BCJ-filtered data is usually compressed with LZMA2,
958 the compression ratio may be improved slightly if the LZMA2 op‐
959 tions are set to match the alignment of the selected BCJ filter.
960 For example, with the IA-64 filter, it's good to set pb=4 or
961 even pb=4,lp=4,lc=0 with LZMA2 (2^4=16). The x86 filter is an
962 exception; it's usually good to stick to LZMA2's default four-
963 byte alignment when compressing x86 executables.
964
965 All BCJ filters support the same options:
966
967 start=offset
968 Specify the start offset that is used when converting be‐
969 tween relative and absolute addresses. The offset must
970 be a multiple of the alignment of the filter (see the ta‐
971 ble above). The default is zero. In practice, the de‐
972 fault is good; specifying a custom offset is almost never
973 useful.
974
975 --delta[=options]
976 Add the Delta filter to the filter chain. The Delta filter can
977 be only used as a non-last filter in the filter chain.
978
979 Currently only simple byte-wise delta calculation is supported.
980 It can be useful when compressing, for example, uncompressed
981 bitmap images or uncompressed PCM audio. However, special pur‐
982 pose algorithms may give significantly better results than Delta
983 + LZMA2. This is true especially with audio, which compresses
984 faster and better, for example, with flac(1).
985
986 Supported options:
987
988 dist=distance
989 Specify the distance of the delta calculation in bytes.
990 distance must be 1–256. The default is 1.
991
992 For example, with dist=2 and eight-byte input A1 B1 A2 B3
993 A3 B5 A4 B7, the output will be A1 B1 01 02 01 02 01 02.
994
995 Other options
996 -q, --quiet
997 Suppress warnings and notices. Specify this twice to suppress
998 errors too. This option has no effect on the exit status. That
999 is, even if a warning was suppressed, the exit status to indi‐
1000 cate a warning is still used.
1001
1002 -v, --verbose
1003 Be verbose. If standard error is connected to a terminal, xz
1004 will display a progress indicator. Specifying --verbose twice
1005 will give even more verbose output.
1006
1007 The progress indicator shows the following information:
1008
1009 • Completion percentage is shown if the size of the input file
1010 is known. That is, the percentage cannot be shown in pipes.
1011
1012 • Amount of compressed data produced (compressing) or consumed
1013 (decompressing).
1014
1015 • Amount of uncompressed data consumed (compressing) or pro‐
1016 duced (decompressing).
1017
1018 • Compression ratio, which is calculated by dividing the amount
1019 of compressed data processed so far by the amount of uncom‐
1020 pressed data processed so far.
1021
1022 • Compression or decompression speed. This is measured as the
1023 amount of uncompressed data consumed (compression) or pro‐
1024 duced (decompression) per second. It is shown after a few
1025 seconds have passed since xz started processing the file.
1026
1027 • Elapsed time in the format M:SS or H:MM:SS.
1028
1029 • Estimated remaining time is shown only when the size of the
1030 input file is known and a couple of seconds have already
1031 passed since xz started processing the file. The time is
1032 shown in a less precise format which never has any colons,
1033 for example, 2 min 30 s.
1034
1035 When standard error is not a terminal, --verbose will make xz
1036 print the filename, compressed size, uncompressed size, compres‐
1037 sion ratio, and possibly also the speed and elapsed time on a
1038 single line to standard error after compressing or decompressing
1039 the file. The speed and elapsed time are included only when the
1040 operation took at least a few seconds. If the operation didn't
1041 finish, for example, due to user interruption, also the comple‐
1042 tion percentage is printed if the size of the input file is
1043 known.
1044
1045 -Q, --no-warn
1046 Don't set the exit status to 2 even if a condition worth a warn‐
1047 ing was detected. This option doesn't affect the verbosity
1048 level, thus both --quiet and --no-warn have to be used to not
1049 display warnings and to not alter the exit status.
1050
1051 --robot
1052 Print messages in a machine-parsable format. This is intended
1053 to ease writing frontends that want to use xz instead of li‐
1054 blzma, which may be the case with various scripts. The output
1055 with this option enabled is meant to be stable across xz re‐
1056 leases. See the section ROBOT MODE for details.
1057
1058 --info-memory
1059 Display, in human-readable format, how much physical memory
1060 (RAM) and how many processor threads xz thinks the system has
1061 and the memory usage limits for compression and decompression,
1062 and exit successfully.
1063
1064 -h, --help
1065 Display a help message describing the most commonly used op‐
1066 tions, and exit successfully.
1067
1068 -H, --long-help
1069 Display a help message describing all features of xz, and exit
1070 successfully
1071
1072 -V, --version
1073 Display the version number of xz and liblzma in human readable
1074 format. To get machine-parsable output, specify --robot before
1075 --version.
1076
1078 The robot mode is activated with the --robot option. It makes the out‐
1079 put of xz easier to parse by other programs. Currently --robot is sup‐
1080 ported only together with --version, --info-memory, and --list. It
1081 will be supported for compression and decompression in the future.
1082
1083 Version
1084 xz --robot --version prints the version number of xz and liblzma in the
1085 following format:
1086
1087 XZ_VERSION=XYYYZZZS
1088 LIBLZMA_VERSION=XYYYZZZS
1089
1090 X Major version.
1091
1092 YYY Minor version. Even numbers are stable. Odd numbers are alpha
1093 or beta versions.
1094
1095 ZZZ Patch level for stable releases or just a counter for develop‐
1096 ment releases.
1097
1098 S Stability. 0 is alpha, 1 is beta, and 2 is stable. S should be
1099 always 2 when YYY is even.
1100
1101 XYYYZZZS are the same on both lines if xz and liblzma are from the same
1102 XZ Utils release.
1103
1104 Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002.
1105
1106 Memory limit information
1107 xz --robot --info-memory prints a single line with multiple tab-sepa‐
1108 rated columns:
1109
1110 1. Total amount of physical memory (RAM) in bytes.
1111
1112 2. Memory usage limit for compression in bytes (--memlimit-compress).
1113 A special value of 0 indicates the default setting which for sin‐
1114 gle-threaded mode is the same as no limit.
1115
1116 3. Memory usage limit for decompression in bytes (--memlimit-decom‐
1117 press). A special value of 0 indicates the default setting which
1118 for single-threaded mode is the same as no limit.
1119
1120 4. Since xz 5.3.4alpha: Memory usage for multi-threaded decompression
1121 in bytes (--memlimit-mt-decompress). This is never zero because a
1122 system-specific default value shown in the column 5 is used if no
1123 limit has been specified explicitly. This is also never greater
1124 than the value in the column 3 even if a larger value has been
1125 specified with --memlimit-mt-decompress.
1126
1127 5. Since xz 5.3.4alpha: A system-specific default memory usage limit
1128 that is used to limit the number of threads when compressing with
1129 an automatic number of threads (--threads=0) and no memory usage
1130 limit has been specified (--memlimit-compress). This is also used
1131 as the default value for --memlimit-mt-decompress.
1132
1133 6. Since xz 5.3.4alpha: Number of available processor threads.
1134
1135 In the future, the output of xz --robot --info-memory may have more
1136 columns, but never more than a single line.
1137
1138 List mode
1139 xz --robot --list uses tab-separated output. The first column of every
1140 line has a string that indicates the type of the information found on
1141 that line:
1142
1143 name This is always the first line when starting to list a file. The
1144 second column on the line is the filename.
1145
1146 file This line contains overall information about the .xz file. This
1147 line is always printed after the name line.
1148
1149 stream This line type is used only when --verbose was specified. There
1150 are as many stream lines as there are streams in the .xz file.
1151
1152 block This line type is used only when --verbose was specified. There
1153 are as many block lines as there are blocks in the .xz file.
1154 The block lines are shown after all the stream lines; different
1155 line types are not interleaved.
1156
1157 summary
1158 This line type is used only when --verbose was specified twice.
1159 This line is printed after all block lines. Like the file line,
1160 the summary line contains overall information about the .xz
1161 file.
1162
1163 totals This line is always the very last line of the list output. It
1164 shows the total counts and sizes.
1165
1166 The columns of the file lines:
1167 2. Number of streams in the file
1168 3. Total number of blocks in the stream(s)
1169 4. Compressed size of the file
1170 5. Uncompressed size of the file
1171 6. Compression ratio, for example, 0.123. If ratio is over
1172 9.999, three dashes (---) are displayed instead of the ra‐
1173 tio.
1174 7. Comma-separated list of integrity check names. The follow‐
1175 ing strings are used for the known check types: None, CRC32,
1176 CRC64, and SHA-256. For unknown check types, Unknown-N is
1177 used, where N is the Check ID as a decimal number (one or
1178 two digits).
1179 8. Total size of stream padding in the file
1180
1181 The columns of the stream lines:
1182 2. Stream number (the first stream is 1)
1183 3. Number of blocks in the stream
1184 4. Compressed start offset
1185 5. Uncompressed start offset
1186 6. Compressed size (does not include stream padding)
1187 7. Uncompressed size
1188 8. Compression ratio
1189 9. Name of the integrity check
1190 10. Size of stream padding
1191
1192 The columns of the block lines:
1193 2. Number of the stream containing this block
1194 3. Block number relative to the beginning of the stream (the
1195 first block is 1)
1196 4. Block number relative to the beginning of the file
1197 5. Compressed start offset relative to the beginning of the
1198 file
1199 6. Uncompressed start offset relative to the beginning of the
1200 file
1201 7. Total compressed size of the block (includes headers)
1202 8. Uncompressed size
1203 9. Compression ratio
1204 10. Name of the integrity check
1205
1206 If --verbose was specified twice, additional columns are included on
1207 the block lines. These are not displayed with a single --verbose, be‐
1208 cause getting this information requires many seeks and can thus be
1209 slow:
1210 11. Value of the integrity check in hexadecimal
1211 12. Block header size
1212 13. Block flags: c indicates that compressed size is present,
1213 and u indicates that uncompressed size is present. If the
1214 flag is not set, a dash (-) is shown instead to keep the
1215 string length fixed. New flags may be added to the end of
1216 the string in the future.
1217 14. Size of the actual compressed data in the block (this ex‐
1218 cludes the block header, block padding, and check fields)
1219 15. Amount of memory (in bytes) required to decompress this
1220 block with this xz version
1221 16. Filter chain. Note that most of the options used at com‐
1222 pression time cannot be known, because only the options that
1223 are needed for decompression are stored in the .xz headers.
1224
1225 The columns of the summary lines:
1226 2. Amount of memory (in bytes) required to decompress this file
1227 with this xz version
1228 3. yes or no indicating if all block headers have both com‐
1229 pressed size and uncompressed size stored in them
1230 Since xz 5.1.2alpha:
1231 4. Minimum xz version required to decompress the file
1232
1233 The columns of the totals line:
1234 2. Number of streams
1235 3. Number of blocks
1236 4. Compressed size
1237 5. Uncompressed size
1238 6. Average compression ratio
1239 7. Comma-separated list of integrity check names that were
1240 present in the files
1241 8. Stream padding size
1242 9. Number of files. This is here to keep the order of the ear‐
1243 lier columns the same as on file lines.
1244
1245 If --verbose was specified twice, additional columns are included on
1246 the totals line:
1247 10. Maximum amount of memory (in bytes) required to decompress
1248 the files with this xz version
1249 11. yes or no indicating if all block headers have both com‐
1250 pressed size and uncompressed size stored in them
1251 Since xz 5.1.2alpha:
1252 12. Minimum xz version required to decompress the file
1253
1254 Future versions may add new line types and new columns can be added to
1255 the existing line types, but the existing columns won't be changed.
1256
1258 0 All is good.
1259
1260 1 An error occurred.
1261
1262 2 Something worth a warning occurred, but no actual errors oc‐
1263 curred.
1264
1265 Notices (not warnings or errors) printed on standard error don't affect
1266 the exit status.
1267
1269 xz parses space-separated lists of options from the environment vari‐
1270 ables XZ_DEFAULTS and XZ_OPT, in this order, before parsing the options
1271 from the command line. Note that only options are parsed from the en‐
1272 vironment variables; all non-options are silently ignored. Parsing is
1273 done with getopt_long(3) which is used also for the command line argu‐
1274 ments.
1275
1276 XZ_DEFAULTS
1277 User-specific or system-wide default options. Typically this is
1278 set in a shell initialization script to enable xz's memory usage
1279 limiter by default. Excluding shell initialization scripts and
1280 similar special cases, scripts must never set or unset XZ_DE‐
1281 FAULTS.
1282
1283 XZ_OPT This is for passing options to xz when it is not possible to set
1284 the options directly on the xz command line. This is the case
1285 when xz is run by a script or tool, for example, GNU tar(1):
1286
1287 XZ_OPT=-2v tar caf foo.tar.xz foo
1288
1289 Scripts may use XZ_OPT, for example, to set script-specific de‐
1290 fault compression options. It is still recommended to allow
1291 users to override XZ_OPT if that is reasonable. For example, in
1292 sh(1) scripts one may use something like this:
1293
1294 XZ_OPT=${XZ_OPT-"-7e"}
1295 export XZ_OPT
1296
1298 The command line syntax of xz is practically a superset of lzma, un‐
1299 lzma, and lzcat as found from LZMA Utils 4.32.x. In most cases, it is
1300 possible to replace LZMA Utils with XZ Utils without breaking existing
1301 scripts. There are some incompatibilities though, which may sometimes
1302 cause problems.
1303
1304 Compression preset levels
1305 The numbering of the compression level presets is not identical in xz
1306 and LZMA Utils. The most important difference is how dictionary sizes
1307 are mapped to different presets. Dictionary size is roughly equal to
1308 the decompressor memory usage.
1309
1310 Level xz LZMA Utils
1311 -0 256 KiB N/A
1312 -1 1 MiB 64 KiB
1313 -2 2 MiB 1 MiB
1314 -3 4 MiB 512 KiB
1315 -4 4 MiB 1 MiB
1316 -5 8 MiB 2 MiB
1317 -6 8 MiB 4 MiB
1318 -7 16 MiB 8 MiB
1319 -8 32 MiB 16 MiB
1320 -9 64 MiB 32 MiB
1321
1322 The dictionary size differences affect the compressor memory usage too,
1323 but there are some other differences between LZMA Utils and XZ Utils,
1324 which make the difference even bigger:
1325
1326 Level xz LZMA Utils 4.32.x
1327 -0 3 MiB N/A
1328 -1 9 MiB 2 MiB
1329 -2 17 MiB 12 MiB
1330 -3 32 MiB 12 MiB
1331 -4 48 MiB 16 MiB
1332 -5 94 MiB 26 MiB
1333 -6 94 MiB 45 MiB
1334 -7 186 MiB 83 MiB
1335 -8 370 MiB 159 MiB
1336 -9 674 MiB 311 MiB
1337
1338 The default preset level in LZMA Utils is -7 while in XZ Utils it is
1339 -6, so both use an 8 MiB dictionary by default.
1340
1341 Streamed vs. non-streamed .lzma files
1342 The uncompressed size of the file can be stored in the .lzma header.
1343 LZMA Utils does that when compressing regular files. The alternative
1344 is to mark that uncompressed size is unknown and use end-of-payload
1345 marker to indicate where the decompressor should stop. LZMA Utils uses
1346 this method when uncompressed size isn't known, which is the case, for
1347 example, in pipes.
1348
1349 xz supports decompressing .lzma files with or without end-of-payload
1350 marker, but all .lzma files created by xz will use end-of-payload
1351 marker and have uncompressed size marked as unknown in the .lzma
1352 header. This may be a problem in some uncommon situations. For exam‐
1353 ple, a .lzma decompressor in an embedded device might work only with
1354 files that have known uncompressed size. If you hit this problem, you
1355 need to use LZMA Utils or LZMA SDK to create .lzma files with known un‐
1356 compressed size.
1357
1358 Unsupported .lzma files
1359 The .lzma format allows lc values up to 8, and lp values up to 4. LZMA
1360 Utils can decompress files with any lc and lp, but always creates files
1361 with lc=3 and lp=0. Creating files with other lc and lp is possible
1362 with xz and with LZMA SDK.
1363
1364 The implementation of the LZMA1 filter in liblzma requires that the sum
1365 of lc and lp must not exceed 4. Thus, .lzma files, which exceed this
1366 limitation, cannot be decompressed with xz.
1367
1368 LZMA Utils creates only .lzma files which have a dictionary size of 2^n
1369 (a power of 2) but accepts files with any dictionary size. liblzma ac‐
1370 cepts only .lzma files which have a dictionary size of 2^n or 2^n +
1371 2^(n-1). This is to decrease false positives when detecting .lzma
1372 files.
1373
1374 These limitations shouldn't be a problem in practice, since practically
1375 all .lzma files have been compressed with settings that liblzma will
1376 accept.
1377
1378 Trailing garbage
1379 When decompressing, LZMA Utils silently ignore everything after the
1380 first .lzma stream. In most situations, this is a bug. This also
1381 means that LZMA Utils don't support decompressing concatenated .lzma
1382 files.
1383
1384 If there is data left after the first .lzma stream, xz considers the
1385 file to be corrupt unless --single-stream was used. This may break ob‐
1386 scure scripts which have assumed that trailing garbage is ignored.
1387
1389 Compressed output may vary
1390 The exact compressed output produced from the same uncompressed input
1391 file may vary between XZ Utils versions even if compression options are
1392 identical. This is because the encoder can be improved (faster or bet‐
1393 ter compression) without affecting the file format. The output can
1394 vary even between different builds of the same XZ Utils version, if
1395 different build options are used.
1396
1397 The above means that once --rsyncable has been implemented, the result‐
1398 ing files won't necessarily be rsyncable unless both old and new files
1399 have been compressed with the same xz version. This problem can be
1400 fixed if a part of the encoder implementation is frozen to keep rsynca‐
1401 ble output stable across xz versions.
1402
1403 Embedded .xz decompressors
1404 Embedded .xz decompressor implementations like XZ Embedded don't neces‐
1405 sarily support files created with integrity check types other than none
1406 and crc32. Since the default is --check=crc64, you must use
1407 --check=none or --check=crc32 when creating files for embedded systems.
1408
1409 Outside embedded systems, all .xz format decompressors support all the
1410 check types, or at least are able to decompress the file without veri‐
1411 fying the integrity check if the particular check is not supported.
1412
1413 XZ Embedded supports BCJ filters, but only with the default start off‐
1414 set.
1415
1417 Basics
1418 Compress the file foo into foo.xz using the default compression level
1419 (-6), and remove foo if compression is successful:
1420
1421 xz foo
1422
1423 Decompress bar.xz into bar and don't remove bar.xz even if decompres‐
1424 sion is successful:
1425
1426 xz -dk bar.xz
1427
1428 Create baz.tar.xz with the preset -4e (-4 --extreme), which is slower
1429 than the default -6, but needs less memory for compression and decom‐
1430 pression (48 MiB and 5 MiB, respectively):
1431
1432 tar cf - baz | xz -4e > baz.tar.xz
1433
1434 A mix of compressed and uncompressed files can be decompressed to stan‐
1435 dard output with a single command:
1436
1437 xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
1438
1439 Parallel compression of many files
1440 On GNU and *BSD, find(1) and xargs(1) can be used to parallelize com‐
1441 pression of many files:
1442
1443 find . -type f \! -name '*.xz' -print0 \
1444 | xargs -0r -P4 -n16 xz -T1
1445
1446 The -P option to xargs(1) sets the number of parallel xz processes.
1447 The best value for the -n option depends on how many files there are to
1448 be compressed. If there are only a couple of files, the value should
1449 probably be 1; with tens of thousands of files, 100 or even more may be
1450 appropriate to reduce the number of xz processes that xargs(1) will
1451 eventually create.
1452
1453 The option -T1 for xz is there to force it to single-threaded mode, be‐
1454 cause xargs(1) is used to control the amount of parallelization.
1455
1456 Robot mode
1457 Calculate how many bytes have been saved in total after compressing
1458 multiple files:
1459
1460 xz --robot --list *.xz | awk '/^totals/{print $5-$4}'
1461
1462 A script may want to know that it is using new enough xz. The follow‐
1463 ing sh(1) script checks that the version number of the xz tool is at
1464 least 5.0.0. This method is compatible with old beta versions, which
1465 didn't support the --robot option:
1466
1467 if ! eval "$(xz --robot --version 2> /dev/null)" ||
1468 [ "$XZ_VERSION" -lt 50000002 ]; then
1469 echo "Your xz is too old."
1470 fi
1471 unset XZ_VERSION LIBLZMA_VERSION
1472
1473 Set a memory usage limit for decompression using XZ_OPT, but if a limit
1474 has already been set, don't increase it:
1475
1476 NEWLIM=$((123 << 20)) # 123 MiB
1477 OLDLIM=$(xz --robot --info-memory | cut -f3)
1478 if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then
1479 XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM"
1480 export XZ_OPT
1481 fi
1482
1483 Custom compressor filter chains
1484 The simplest use for custom filter chains is customizing a LZMA2 pre‐
1485 set. This can be useful, because the presets cover only a subset of
1486 the potentially useful combinations of compression settings.
1487
1488 The CompCPU columns of the tables from the descriptions of the options
1489 -0 ... -9 and --extreme are useful when customizing LZMA2 presets.
1490 Here are the relevant parts collected from those two tables:
1491
1492 Preset CompCPU
1493 -0 0
1494
1495 -1 1
1496 -2 2
1497 -3 3
1498 -4 4
1499 -5 5
1500 -6 6
1501 -5e 7
1502 -6e 8
1503
1504 If you know that a file requires somewhat big dictionary (for example,
1505 32 MiB) to compress well, but you want to compress it quicker than xz
1506 -8 would do, a preset with a low CompCPU value (for example, 1) can be
1507 modified to use a bigger dictionary:
1508
1509 xz --lzma2=preset=1,dict=32MiB foo.tar
1510
1511 With certain files, the above command may be faster than xz -6 while
1512 compressing significantly better. However, it must be emphasized that
1513 only some files benefit from a big dictionary while keeping the CompCPU
1514 value low. The most obvious situation, where a big dictionary can help
1515 a lot, is an archive containing very similar files of at least a few
1516 megabytes each. The dictionary size has to be significantly bigger
1517 than any individual file to allow LZMA2 to take full advantage of the
1518 similarities between consecutive files.
1519
1520 If very high compressor and decompressor memory usage is fine, and the
1521 file being compressed is at least several hundred megabytes, it may be
1522 useful to use an even bigger dictionary than the 64 MiB that xz -9
1523 would use:
1524
1525 xz -vv --lzma2=dict=192MiB big_foo.tar
1526
1527 Using -vv (--verbose --verbose) like in the above example can be useful
1528 to see the memory requirements of the compressor and decompressor. Re‐
1529 member that using a dictionary bigger than the size of the uncompressed
1530 file is waste of memory, so the above command isn't useful for small
1531 files.
1532
1533 Sometimes the compression time doesn't matter, but the decompressor
1534 memory usage has to be kept low, for example, to make it possible to
1535 decompress the file on an embedded system. The following command uses
1536 -6e (-6 --extreme) as a base and sets the dictionary to only 64 KiB.
1537 The resulting file can be decompressed with XZ Embedded (that's why
1538 there is --check=crc32) using about 100 KiB of memory.
1539
1540 xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo
1541
1542 If you want to squeeze out as many bytes as possible, adjusting the
1543 number of literal context bits (lc) and number of position bits (pb)
1544 can sometimes help. Adjusting the number of literal position bits (lp)
1545 might help too, but usually lc and pb are more important. For example,
1546 a source code archive contains mostly US-ASCII text, so something like
1547 the following might give slightly (like 0.1 %) smaller file than xz -6e
1548 (try also without lc=4):
1549
1550 xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar
1551
1552 Using another filter together with LZMA2 can improve compression with
1553 certain file types. For example, to compress a x86-32 or x86-64 shared
1554 library using the x86 BCJ filter:
1555
1556 xz --x86 --lzma2 libfoo.so
1557
1558 Note that the order of the filter options is significant. If --x86 is
1559 specified after --lzma2, xz will give an error, because there cannot be
1560 any filter after LZMA2, and also because the x86 BCJ filter cannot be
1561 used as the last filter in the chain.
1562
1563 The Delta filter together with LZMA2 can give good results with bitmap
1564 images. It should usually beat PNG, which has a few more advanced fil‐
1565 ters than simple delta but uses Deflate for the actual compression.
1566
1567 The image has to be saved in uncompressed format, for example, as un‐
1568 compressed TIFF. The distance parameter of the Delta filter is set to
1569 match the number of bytes per pixel in the image. For example, 24-bit
1570 RGB bitmap needs dist=3, and it is also good to pass pb=0 to LZMA2 to
1571 accommodate the three-byte alignment:
1572
1573 xz --delta=dist=3 --lzma2=pb=0 foo.tiff
1574
1575 If multiple images have been put into a single archive (for example,
1576 .tar), the Delta filter will work on that too as long as all images
1577 have the same number of bytes per pixel.
1578
1580 xzdec(1), xzdiff(1), xzgrep(1), xzless(1), xzmore(1), gzip(1),
1581 bzip2(1), 7z(1)
1582
1583 XZ Utils: <https://tukaani.org/xz/>
1584 XZ Embedded: <https://tukaani.org/xz/embedded.html>
1585 LZMA SDK: <https://7-zip.org/sdk.html>
1586
1587
1588
1589Tukaani 2023-07-17 XZ(1)