1SoX(1) Sound eXchange SoX(1)
2
3
4
6 SoX - Sound eXchange - The Swiss Army knife of audio manipulation
7
9 sox [global-options] [format-options] infile1
10 [[format-options] infile2] ... [format-options] outfile
11 [effect [effect-options]] ...
12
13 play [global-options] [format-options] infile1
14 [[format-options] infile2] ... [format-options]
15 [effect [effect-options]] ...
16
17 rec [global-options] [format-options] outfile
18 [effect [effect-options]] ...
19
21 SoX reads and writes audio files in most popular formats and can
22 optionally apply effects to them; it can combine multiple input
23 sources, synthesise audio, and, on many systems, act as a general pur‐
24 pose audio player or a multi-track audio recorder.
25
26 The entire SoX functionality is available using just the `sox' command,
27 however, to simplify playing and recording audio, if SoX is invoked as
28 `play', the output file is automatically set to be the default sound
29 device and if invoked as `rec', the default sound device is used as an
30 input source.
31
32 The heart of SoX is a library called `Sound Tools'. Those interested
33 in extending SoX or using it in other programs should refer to the
34 Sound Tools manual page: libst(3).
35
36 The overall SoX processing chain can be summarised as follows:
37
38 Input(s) → Balancing → Combiner → Effects → Output
39
40 To show how this works in practise, here are some examples of how SoX
41 might be used. The simple:
42
43 sox recital.au recital.wav
44
45 translates an audio file in Sun AU format to a Microsoft WAV file,
46 whilst:
47
48 sox recital.au -r 12000 -b -c 1 recital.wav vol 0.7 dither
49
50 performs the same format translation, but also changes the audio sam‐
51 pling rate & sample size, down-mixes to mono, and applies the vol and
52 dither effects.
53
54 sox -r 8000 -u -b -c 1 voice-memo.raw voice-memo.wav
55
56 adds a header to a raw audio file,
57
58 sox slow.aiff fixed.aiff speed 1.027 rabbit -c0
59
60 adjusts audio speed using the most accurate rabbit algorithm,
61
62 sox short.au long.au longer.au
63
64 concatenates two audio files, and
65
66 sox -m music.mp3 voice.wav mixed.flac
67
68 mixes together two audio files.
69
70 play "The Moonbeams/Greatest/*.ogg" bass +3
71
72 plays a collection of audio files whilst applying a bass boosting
73 effect,
74
75 play -c 4 -n -c 1 synth sin %-12 sin %-9 sin %-5 sin %-2 vol 0.7
76 mixer fade q 0.1 1 0.1
77
78 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
79
80 rec -c 2 test.aiff trim 0 10
81
82 records 10 seconds of stereo audio, and
83
84 rec -M take1.aiff take1-dub.aiff
85
86 records a new track in a multi-track recording.
87
88 Further examples are included throughout this manual; more-detailed
89 examples can be found in the separate soxexam(7) manual.
90
91 File Formats
92 There are two types of audio file format that SoX can work with. The
93 first is `self-describing'; these formats include a header that com‐
94 pletely describes the characteristics of the audio data that follows.
95 The second type is `headerless' (or `raw data'); here, the audio data
96 characteristics must be described using the SoX command line.
97
98 The following four characteristics are sufficient to describe the for‐
99 mat of audio data such that it can be processed with SoX:
100
101 sample rate
102 The sample rate in samples per second (`Hertz' or `Hz'). For
103 example, digital telephony traditionally uses a sample rate of
104 8000 Hz (8 kHz); audio Compact Discs use 44100 Hz (44.1 kHz).
105
106 sample size
107 The number of bits used to store each sample. Most popular are
108 8-bit (one byte) and 16-bit (two bytes). (Since many now-common
109 sound formats were invented when most computers used a 16-bit
110 word, two bytes is often called a `word', but since current per‐
111 sonal computers overwhelmingly have 32-bit or 64-bit words, this
112 usage is confusing, and is not used in the SoX documentation.)
113
114 data encoding
115 The way in which each audio sample is represented (or
116 `encoded'). Some encodings have variants with different byte-
117 orderings or bit-orderings; some `compress' the audio data, i.e.
118 the stored audio data takes up less space (i.e. disk-space or
119 transmission band-width) than the other format parameters and
120 the number of samples would imply. Commonly-used encoding types
121 include floating-point, μ-law, ADPCM, signed linear, and FLAC.
122
123 channels
124 The number of audio channels contained in the file. One
125 (`mono') and two (`stereo') are widely used.
126
127 The term `bit-rate' is sometimes used as an overall measure of an audio
128 format and may incorporate elements of all of the above.
129
130 Most self-describing formats also allow textual `comments' to be embed‐
131 ded in the file that can be used to describe the audio in some way,
132 e.g. for music, the title, the author, etc.
133
134 One important use of audio file comments is to convey `Replay Gain'
135 information. SoX supports applying Replay Gain information, but not
136 generating it. Note that by default, SoX copies input file comments to
137 output files that support comments, so output files may contain Replay
138 Gain information if some was present in the input file. In this case,
139 if anything other than a simple format conversion was performed then
140 the output file Replay Gain information is likely to be incorrect and
141 so should be recalculated using a tool that supports this (not SoX).
142
143 Determining & Setting The File Format
144 There are several mechanisms available for SoX to use to determine or
145 set the format characteristics of an audio file. Depending on the cir‐
146 cumstances, individual characteristics may be determined or set using
147 different mechanisms.
148
149 To determine the format of an input file, SoX will use, in order of
150 precedence and as given or available:
151
152
153 1. Command-line format options.
154 2. The contents of the file header.
155 3. The filename extension.
156
157 To set the output file format, SoX will use, in order of precedence and
158 as given or available:
159
160
161 1. Command-line format options.
162 2. The filename extension.
163 3. The input file format characteristics, or the closest to
164 them that is supported by the output file type.
165
166 For all files, SoX will exit with an error if the file type cannot be
167 determined; command-line format options may need to be added or changed
168 to resolve the problem.
169
170 Accuracy
171 Many file formats that compress audio discard some of the audio signal
172 information whilst doing so; converting to such a format then convert‐
173 ing back again will not produce an exact copy of the original audio.
174 This is the case for many formats used in telephony (e.g. A-law, GSM)
175 where low signal bandwidth is more important than high audio fidelity,
176 and for many formats used in portable music players (e.g. MP3, Vorbis)
177 where adequate fidelity can be retained even with the large compression
178 ratios that are needed to make portable players practical.
179
180 Formats that discard audio signal information are called `lossy', and
181 formats that do not, `lossless'. The term `quality' is used as a mea‐
182 sure of how closely the original audio signal can be reproduced when
183 using a lossy format.
184
185 Audio file conversion with SoX is lossless when it can be, i.e. when
186 not using lossy compression, when not reducing the sampling rate or
187 number of channels, and when the number of bits used in the destination
188 format is not less than in the source format. E.g. converting from an
189 8-bit PCM format to a 16-bit PCM format is lossless but converting from
190 an 8-bit PCM format to (8-bit) A-law isn't.
191
192 N.B. SoX converts all audio files to an internal uncompressed format
193 before performing any audio processing; this means that manipulating a
194 file that is stored in a lossy format can cause further losses in audio
195 fidelity. E.g. with
196
197 sox long.mp3 short.mp3 trim 10
198
199 SoX first decompresses the input MP3 file, then applies the trim
200 effect, and finally creates the output MP3 file by recompressing the
201 audio - with a possible reduction in fidelity above that which occurred
202 when the input file was created. Hence, if what is ultimately desired
203 is lossily compressed audio, it is highly recommended to perform all
204 audio processing using lossless file formats and then convert to the
205 lossy format at the final stage.
206
207 N.B. Applying multiple effects with a single SoX invocation will, in
208 general, produce more accurate results than those produced using multi‐
209 ple SoX invocations; hence this is also recommended.
210
211 Clipping
212 Clipping is distortion that occurs when an audio signal level (or `vol‐
213 ume') exceeds the range of the chosen representation. It is nearly
214 always undesirable and so should usually be corrected by adjusting the
215 volume prior to the point at which clipping occurs.
216
217 In SoX, clipping could occur, as you might expect, when using the vol
218 effect to increase the audio volume, but could also occur with many
219 other effects, when converting one format to another, and even when
220 simply playing the audio.
221
222 Playing an audio file often involves re-sampling, and processing by
223 analogue components that can introduce a small DC offset and/or ampli‐
224 fication, all of which can produce distortion if the audio signal level
225 was initially too close to the clipping point.
226
227 For these reasons, it is usual to make sure that an audio file's signal
228 level does not exceed around 70% of the maximum (linear) range avail‐
229 able, as this will avoid the majority of clipping problems. SoX's stat
230 effect can assist in determining the signal level in an audio file; the
231 vol effect can be used to prevent clipping, e.g.
232
233 sox dull.au bright.au vol -6 dB treble +6
234
235 guarantees that the treble boost will not clip.
236
237 If clipping occurs at any point during processing, then SoX will dis‐
238 play a warning message to that effect.
239
240 Input File Combining
241 SoX's input combiner can combine multiple files using one of four dif‐
242 ferent methods: `concatenate', `sequence', `mix', or `merge'. The
243 default method is `sequence' for play, and `concatenate' for rec and
244 sox.
245
246 For all methods other than `sequence', multiple input files must have
247 the same sampling rate; if necessary, separate SoX invocations can be
248 used to make sampling rate adjustments prior to combining.
249
250 If the `concatenate' combining method is selected (usually, this will
251 be by default) then the input files must also have the same number of
252 channels. The audio from each input will be concatenated in the order
253 given to form the output file.
254
255 The `sequence' combining method is selected automatically for play. It
256 is similar to `concatenate' in that the audio from each input file is
257 sent serially to the output file, however here the output file may be
258 closed and reopened at the corresponding transition between input files
259 - this may be just what is needed when sending audio to an output
260 device, but is not generally useful when the output file is a normal
261 file.
262
263 If the `mix' combining method is selected (with -m) then two or more
264 input files must be given and will be mixed together to form the output
265 file. The number of channels in each input file need not be the same,
266 however, SoX will issue a warning if they are not and some channels in
267 the output file will not contain audio from every input file. A mixed
268 audio file cannot be un-mixed.
269
270 If the `merge' combining method is selected (with -M), then two or more
271 input files must be given and will be merged together to form the out‐
272 put file. The number of channels in each input file need not be the
273 same. A merged audio file comprises all of the channels from all of
274 the input files; un-merging is possible using multiple invocations of
275 SoX with the mixer effect. For example, two mono files could be merged
276 to form one stereo file; the first and second mono files would become
277 the left and right channels of the stereo file.
278
279 When combining input files, SoX applies any specified effects (includ‐
280 ing, for example, the vol volume adjustment effect) after the audio has
281 been combined; however, it is often useful to be able to set the volume
282 of (i.e. `balance') the inputs individually, before combining takes
283 place.
284
285 For all combining methods, input file volume adjustments can be made
286 manually using the -v option (below) which can be given for one or more
287 input files; if it is given for only some of the input files then the
288 others receive no volume adjustment. In some circumstances, automatic
289 volume adjustments may be applied (see below).
290
291 The -V option (below) can be used to show the input file volume adjust‐
292 ments that have been selected (either manually or automatically).
293
294 There are some special considerations that need to made when mixing
295 input files:
296
297 Unlike the other methods, `mix' combining has the potential to cause
298 clipping in the combiner if no balancing is performed. So here, if
299 manual volume adjustments are not given, to ensure that clipping does
300 not occur, SoX will automatically adjust the volume (amplitude) of each
301 input signal by a factor of ¹/n, where n is the number of input files.
302 If this results in audio that is too quiet or otherwise unbalanced then
303 the input file volumes should be set manually as described above.
304
305 If mixed audio seems loud enough at some points through the audio but
306 too quiet in others, then dynamic-range compression should be applied
307 to correct this - see the compand effect.
308
309 Stopping SoX
310 Usually SoX will complete its processing and exit automatically, how‐
311 ever if desired, it can be terminated by pressing the keyboard inter‐
312 rupt key (usually Ctrl-C). This is a natural requirement in some cir‐
313 cumstances, e.g. when using SoX to make a recording. Note that when
314 using SoX to play multiple files, Ctrl-C behaves slightly differently:
315 pressing it once causes SoX to skip to the next file; pressing it twice
316 in quick succession causes SoX to exit.
317
319 The following `special' filenames may be used in certain circumstances
320 in place of a normal filename on the command line:
321
322 - SoX can be used in pipeline operations by using the special
323 filename `-' which, if used in place of an input filename, will
324 cause SoX will read audio data from `standard input' (stdin),
325 and which, if used in place of the output filename, will cause
326 SoX will send audio data to `standard output' (stdout). Note
327 that when using this option, the file-type (see -t below) must
328 also be given.
329
330 -n This can be used in place of an input or output filename to
331 specify that a `null file' is to be used. Note that here, `null
332 file' refers to a SoX-specific mechanism and is not related to
333 any operating-system mechanism with a similar name.
334
335 Using a null file to input audio is equivalent to using a normal
336 audio file that contains an infinite amount of silence, and as
337 such is not generally useful unless used with an effect that
338 specifies a finite time length (such as trim or synth).
339
340 Using a null file to output audio amounts to discarding the
341 audio and is useful mainly with effects that produce information
342 about the audio instead of affecting it (such as noiseprof or
343 stat).
344
345 The number of channels and the sampling rate associated with a
346 null file are by default 2 and 44.1 kHz respectively, but, as
347 with a normal file, these can be overridden if desired using
348 command-line format options (see below).
349
350 One other use of -n is to use it in conjunction with -V to dis‐
351 play information from the audio file header without having to
352 read any further into the file, e.g. sox -V *.wav -n will dis‐
353 play header information for each `WAV' file in the current
354 directory.
355
356 -e This is an alias of -n and is retained for backwards compatibil‐
357 ity only.
358
359 N.B. Giving SoX an input or output filename that is the same as a SoX
360 effect-name will not work since SoX will treat it as an effect specifi‐
361 cation. The only work-around to this is to avoid such filenames; how‐
362 ever, this is generally not difficult since most audio filenames have a
363 filename `extension', whilst effect-names do not.
364
366 Global Options
367 These options can be specified on the command line at any point before
368 the first effect name.
369
370 -h, --help
371 Show version number and usage information.
372
373 --help-effect=name
374 Show usage information on the specified effect. The name all
375 can be used to show usage on all effects.
376
377 --interactive
378 Prompt before overwriting an existing file with the same name as
379 that given for the output file.
380
381 N.B. Unintentionally overwriting a file is easier than you
382 might think, for example, if you accidentally enter
383
384 sox file1 file2 effect1 effect2 ...
385
386 when what you really meant was
387
388 play file1 file2 effect1 effect2 ...
389
390 then, without this option, file2 will be overwritten. Hence,
391 using this option is strongly recommended; a `shell' alias,
392 script, or batch file may be an appropriate way of permanently
393 enabling it.
394
395 -m|-M|--combine=concatenate|merge|mix|sequence
396 Select the input file combining method; -m selects `mix', -M
397 selects `merge',
398
399 See Input File Combining above for a description of the differ‐
400 ent combining methods.
401
402 --octave
403 Run in a mode that can be used, in conjunction with the GNU
404 Octave program, to assist with the selection and configuration
405 of many of the filtering effects. For the first given effect
406 that supports the --octave option, SoX will output Octave com‐
407 mands to plot the effect's transfer function, and then exit
408 without actually processing any audio. E.g.
409
410 sox --octave input-file -n highpass 1320 > plot.m
411 octave plot.m
412
413 -q, --no-show-progress
414 Run in quiet mode when SoX wouldn't otherwise do so; this is the
415 opposite of the -S option.
416
417 --replay-gain=track
418 --replay-gain=album
419 --replay-gain=off
420 Select whether or not to apply replay-gain adjustment to input
421 files. The default is track for play and off otherwise.
422
423 -S, --show-progress
424 Display input file format/header information and input file(s)
425 processing progress in terms of elapsed/remaining time and per‐
426 centage complete. This option is enabled by default when using
427 SoX to play or record audio.
428
429 --version
430 Show version number and exit.
431
432 -V[level]
433 Set verbosity. SoX prints messages to the console (stderr)
434 according to the following verbosity levels:
435
436 0 No messages are printed at all; use the exit status to
437 determine if an error has occurred.
438
439 1 Only error messages are printed. These are generated if
440 SoX cannot complete the requested commands.
441
442 2 Warning messages are also printed. These are generated
443 if SoX can complete the requested commands, but not
444 exactly according to the requested command parameters, or
445 if clipping occurs.
446
447 3 Descriptions of SoX's processing phases are also printed.
448 Useful for seeing exactly how SoX is mangling your audio.
449
450 4 and above
451 Messages to help with debugging SoX are also printed.
452
453 By default, the verbosity level is set to 2. Each occurrence of
454 the -V option increases the verbosity level by 1. Alterna‐
455 tively, the verbosity level can be set to an absolute number by
456 specifying it immediately after the -V e.g. -V0 sets it to 0.
457
458 Input File Options
459 These options apply only to input files and may precede only input
460 filenames on the command line.
461
462 -v volume, --volume=volume
463 Adjust volume by a factor of volume. This is a linear (ampli‐
464 tude) adjustment, so a number less than 1 decreases the volume;
465 greater than 1 increases it. If a negative number is given,
466 then in addition to the volume adjustment, the audio signal will
467 be inverted.
468
469 See also the stat effect for information on how to find the max‐
470 imum volume of an audio file; this can be used to help select
471 suitable values for this option.
472
473 See also Input File Balancing above.
474
475 Input & Output File Format Options
476 These options apply to the input or output file whose name they immedi‐
477 ately precede on the command line and are used mainly when working with
478 headerless file formats or when specifying a format for the output file
479 that is different to that of the input file.
480
481 -c channels, --channels=channels
482 The number of audio channels in the audio file. This may be 1,
483 2, or 4; for mono, stereo, or quad audio. To cause the output
484 file to have a different number of channels than the input file,
485 include this option with the output file options. If the input
486 and output file have a different number of channels then the
487 mixer effect must be used. If the mixer effect is not specified
488 on the command line it will be invoked internally with default
489 parameters.
490
491 --comment text
492 Specify the comment text to store in the output file header
493 (where applicable).
494
495 SoX will provide a default comment if this option (or --com‐
496 ment-file) is not given; to specify that no comment should be
497 stored in the output file, use --comment "" or --comment=.
498
499 --comment-file filename
500 Specify a file containing the comment text to store in the out‐
501 put file header (where applicable).
502
503 -r rate, --rate=rate
504 Gives the sample rate in Hz of the file. To cause the output
505 file to have a different sample rate than the input file,
506 include this option with the output file format options.
507
508 If the input and output files have different rates then a sample
509 rate change effect must be run. Since SoX has multiple rate
510 changing effects, the user can specify which to use as an
511 effect. If no rate change effect is specified then a default
512 one will be chosen.
513
514 -t file-type, --type=file-type
515 Gives the type of the audio file. This is useful when the file
516 extension is non-standard or when the type can not be determined
517 by looking at the header of the file.
518
519 The -t option can also be used to override the type implied by
520 an input filename extension, but if overriding with a type that
521 has a header, SoX will exit with an appropriate error message if
522 such a header is not actually present.
523
524 See FILE TYPES below for a list of supported file types.
525
526 -L, --endian=little
527 -B, --endian=big
528 -x, --endian=swap
529 These options specify whether the byte-order of the audio data
530 is, respectively, `little endian', `big endian', or the opposite
531 to that of the system on which SoX is being used. Endianness
532 applies only to data encoded as signed or unsigned integers of
533 16 or more bits. It is often necessary to specify one of these
534 options for headerless files, and sometimes necessary for (oth‐
535 erwise) self-describing files. A given endian-setting option
536 may be ignored for an input file whose header contains a spe‐
537 cific endianness identifier, or for an output file that is actu‐
538 ally an audio device.
539
540 N.B. Unlike normal format characteristics, the endianness
541 (byte, nibble, & bit ordering) of the input file is not automat‐
542 ically used for the output file; so, for example, when the fol‐
543 lowing is run on a little-endian system:
544
545 sox -B audio.uw trimmed.uw trim 2
546
547 trimmed.uw will be created as little-endian;
548
549 sox -B audio.uw -B trimmed.uw trim 2
550
551 must be used to preserve big-endianness in the output file.
552
553 The -V option can be used to check the selected orderings.
554
555 -N, --reverse-nibbles
556 Specifies that the nibble ordering (i.e. the 2 halves of a byte)
557 of the samples should be reversed; sometimes useful with ADPCM-
558 based formats.
559
560 N.B. See also N.B. in section on -x above.
561
562 -X, --reverse-bits
563 Specifies that the bit ordering of the samples should be
564 reversed; sometimes useful with a few (mostly headerless) for‐
565 mats.
566
567 N.B. See also N.B. in section on -x above.
568
569 -s/-u/-U/-A/-a/-i/-g/-f
570 The audio data encoding is signed linear (2's complement),
571 unsigned linear, μ-law (logarithmic), A-law (logarithmic),
572 ADPCM, IMA-ADPCM, GSM, or floating-point.
573
574 μ-law (or mu-law) and A-law are the U.S. and international stan‐
575 dards for logarithmic telephone audio compression. When uncom‐
576 pressed μ-law has roughly the precision of 14-bit PCM audio and
577 A-law has roughly the precision of 13-bit PCM audio.
578
579 A-law and μ-law are sometimes encoded using reversed bit-order‐
580 ing (i.e. MSB becomes LSB). Internally, SoX understands how to
581 work with these encodings but there is currently no command line
582 option to specify them. If you need this support then you can
583 use the pseudo file types of `.la' and `.lu' to inform SoX of
584 the encoding. See supported file types for more information.
585
586 ADPCM is a form of audio compression that has a good compromise
587 between good audio quality and fast encoding/decoding time. It
588 is used for telephone audio compression and places were full
589 fidelity is not as important. When uncompressed it has roughly
590 the precision of 16-bit PCM audio. Popular version of ADPCM
591 include G.726, MS ADPCM, and IMA ADPCM. The -a flag has differ‐
592 ent meanings in different file handlers. In .wav files it rep‐
593 resents MS ADPCM files, in all others it means G.726 ADPCM. IMA
594 ADPCM is a specific form of ADPCM compression, slightly simpler
595 and slightly lower fidelity than Microsoft's flavor of ADPCM.
596 IMA ADPCM is also called DVI ADPCM.
597
598 GSM is currently used for the vast majority of the world's digi‐
599 tal wireless telephone calls. It utilises several audio formats
600 with different bit-rates and associated speech quality. SoX has
601 support for GSM's original 13kbps `Full Rate' audio format. It
602 is usually CPU intensive to work with GSM audio.
603
604 -1/-2/-3/-4/-8
605 The sample datum size is 1, 2, 3, 4, or 8 bytes; i.e. 8, 16, 24,
606 32, or 64 bits.
607
608 The flags
609 -b/-w/-l/-d which are respectively aliases for -1/-2/-4/-8, and
610 abbreviate byte, word, long word, double long (long long) word,
611 are retained for backwards compatibility only.
612
613 Output File Format Options
614 These options apply only to the output file and may precede only the
615 output filename on the command line.
616
617 -C compression-factor, --compression=compression-factor
618 The compression factor for variably compressing output file for‐
619 mats. If this option is not given, then a default compression
620 factor will apply. The compression factor is interpreted dif‐
621 ferently for different compressing file formats. See the
622 description of the file formats that use this option for more
623 information.
624
626 File types can be set by the filename extension or the -t option (see
627 above). File types that can be determined by a filename extension are
628 listed with their names preceded by a dot. File types that require
629 optional libsndfile support are marked `(libsndfile)'. File types that
630 can be handled by libsndfile using -t sndfile are marked `(also with -t
631 sndfile)'. This might be useful if you have a file that doesn't work
632 with SoX's default format readers and writers, and there's a libsndfile
633 reader and writer for that format.
634
635
636 .raw (also with -t sndfile)
637 Raw (headerless) audio files. The sample rate, sample size, and
638 data encoding must be given using command-line format options;
639 the number of channels defaults to 1.
640
641 .ub, .sb, .uw, .sw, .ul, .al, .lu, .la, .sl (also with -t sndfile)
642 These filename extensions serve as shorthand for identifying the
643 format of headerless audio files. Thus, ub, sb, uw, sw, ul, al,
644 lu, la and sl indicate a file with a single audio channel, sam‐
645 ple rate of 8000 Hz, and samples encoded as `unsigned byte',
646 `signed byte', `unsigned word', `signed word', `μ-law' (byte),
647 `A-law' (byte), inverse bit order `μ-law', inverse bit order `A-
648 law', or `signed long' respectively. Command-line format
649 options can also be given to modify the selected format if it
650 does not provide an exact match for a particular file.
651
652 Headerless audio files on a SPARC computer are likely to be of
653 format ul; on a Mac, they're likely to be ub but with a sample
654 rate of 11025 or 22050 Hz.
655
656 .8svx (also with -t sndfile)
657 Amiga 8SVX musical instrument description format.
658
659 .aiff, .aif (also with -t sndfile)
660 AIFF files used on Apple IIc/IIgs and SGI. Note: the AIFF for‐
661 mat supports only one SSND chunk. It does not support multiple
662 audio chunks, or the 8SVX musical instrument description format.
663 AIFF files are multimedia archives and can have multiple audio
664 and picture chunks. You may need a separate archiver to work
665 with them.
666
667 .aiffc, .aifc (also with -t sndfile)
668 AIFF-C (not compressed, linear), defined in DAVIC 1.4 Part 9
669 Annex B. This format is referred from ARIB STD-B24, which is
670 specified for Japanese data broadcasting. Any private chunks
671 are not supported.
672
673 Note: The input file is currently processed as .aiff.
674
675 alsa ALSA default device driver. This is a pseudo-file type and can
676 be optionally compiled into SoX. Run sox -h to see if you have
677 support for this file type. When this driver is used it allows
678 you to open up a ALSA device and configure it to use the same
679 data format as passed in to SoX. It works for both playing and
680 recording audio files. When playing audio files it attempts to
681 set up the ALSA driver to use the same format as the input file.
682 It is suggested to always override the output values to use the
683 highest quality format your ALSA system can handle. Example:
684 sox infile -t alsa default
685
686 .au, .snd (also with -t sndfile)
687 Sun Microsystems AU files. There are many types of AU file; DEC
688 has invented its own with a different magic number and byte
689 order. SoX can read these files but will not write them. Some
690 .au files are known to have invalid AU headers; these are proba‐
691 bly original Sun μ-law 8000 Hz files and can be dealt with using
692 the .ul format (see below).
693
694 It is possible to override AU file header information with the
695 -r and -c options, in which case SoX will issue a warning to
696 that effect.
697
698 auto This format type name exists for backwards compatibility only.
699 If given for an input file it will be silently ignored, if given
700 for an output file it will cause SoX to exit with an error.
701
702 .avr Audio Visual Research. The AVR format is produced by a number
703 of commercial packages on the Mac.
704
705 .caf (libsndfile)
706 Core Audio File format.
707
708 .cdda, .cdr
709 `Red Book' Compact Disc Digital Audio. CDDA has two audio chan‐
710 nels formatted as 16-bit signed integers at a sample rate of
711 44.1 kHz. The number of (stereo) samples in each CDDA track is
712 always a multiple of 588 which is why it needs its own handler.
713
714 .cvsd, .cvs
715 Continuously Variable Slope Delta modulation. A headerless for‐
716 mat used to compress speech audio for applications such as voice
717 mail. This format is sometimes used with bit-reversed samples‐
718 the -X format option can be used to set the bit-order.
719
720 .dat Text Data files. These files contain a textual representation
721 of the sample data. There is one line at the beginning that
722 contains the sample rate. Subsequent lines contain two numeric
723 data items: the time since the beginning of the first sample and
724 the sample value. Values are normalized so that the maximum and
725 minimum are 1 and -1. This file format can be used to create
726 data files for external programs such as FFT analysers or graph
727 routines. SoX can also convert a file in this format back into
728 one of the other file formats.
729
730 .dvms, .vms
731 Used to compress speech audio for applications such as voice
732 mail. A self-describing variant of cvsd.
733
734 .fap (libsndfile)
735 See .paf.
736
737 .flac (also with -t sndfile)
738 Free Lossless Audio CODEC compressed audio. FLAC is an open,
739 patent-free CODEC designed for compressing music. It is similar
740 to MP3 and Ogg Vorbis, but lossless, meaning that audio is com‐
741 pressed in FLAC without any loss in quality.
742
743 SoX can decode native FLAC files (.flac) but not Ogg FLAC files
744 (.ogg). [But see .ogg below for information relating to support
745 for Ogg Vorbis files.]
746
747 SoX has basic support for writing FLAC files: it can encode to
748 native FLAC using compression levels 0 to 8. 8 is the default
749 compression level and gives the best (but slowest) compression;
750 0 gives the least (but fastest) compression. The compression
751 level can be selected using the -C option (see above) with a
752 whole number from 0 to 8.
753
754 FLAC support in SoX is optional and requires optional FLAC
755 libraries. To see if there is support for FLAC run sox -h and
756 look for it under the list of supported file formats as `flac'.
757
758 .fssd An alias for the .ub format.
759
760 .gsm (also with -t sndfile)
761 GSM 06.10 Lossy Speech Compression. A lossy format for com‐
762 pressing speech which is used in the Global Standard for Mobile
763 telecommunications (GSM). It's good for its purpose, shrinking
764 audio data size, but it will introduce lots of noise when a
765 given audio signal is encoded and decoded multiple times. This
766 format is used by some voice mail applications. It is rather
767 CPU intensive.
768
769 GSM in SoX is optional and requires access to an external GSM
770 library. To see if there is support for GSM run sox -h and look
771 for it under the list of supported file formats.
772
773 .hcom Macintosh HCOM files. These are (apparently) Mac FSSD files
774 with some variant of Huffman compression. The Macintosh has
775 wacky file formats and this format handler apparently doesn't
776 handle all the ones it should. Mac users will need their usual
777 arsenal of file converters to deal with an HCOM file on other
778 systems.
779
780 ircam (also with -t sndfile)
781 Another name for .sf.
782
783 .ima (also with -t sndfile)
784 A headerless file of IMA ADPCM audio data. IMA ADPCM claims
785 16-bit precision packed into only 4 bits, but in fact sounds no
786 better than .vox.
787
788 .mat, .mat4, .mat5 (libsndfile)
789 Matlab 4.2/5.0 (respectively GNU Octave 2.0/2.1) format (.mat is
790 the same as .mat4).
791
792 .maud An IFF-conforming audio file type, registered by MS MacroSystem
793 Computer GmbH, published along with the `Toccata' sound-card on
794 the Amiga. Allows 8bit linear, 16bit linear, A-Law, μ-law in
795 mono and stereo.
796
797 .mp3, .mp2
798 MP3 compressed audio. MP3 (MPEG Layer 3) is part of the MPEG
799 standards for audio and video compression. It is a lossy com‐
800 pression format that achieves good compression rates with little
801 quality loss. See also Ogg Vorbis for a similar format.
802
803 MP3 support in SoX is optional and requires access to either or
804 both the external libmad and libmp3lame libraries. To see if
805 there is support for Mp3 run sox -h and look for it under the
806 list of supported file formats as `mp3'.
807
808
809 .nist (also with -t sndfile)
810 See .sph.
811
812 .ogg, .vorbis
813 Ogg Vorbis compressed audio. Ogg Vorbis is a open, patent-free
814 CODEC designed for compressing music and streaming audio. It is
815 a lossy compression format (similar to MP3, VQF & AAC) that
816 achieves good compression rates with a minimum amount of quality
817 loss. See also MP3 for a similar format.
818
819 SoX can decode all types of Ogg Vorbis files, and can encode at
820 different compression levels/qualities given as a number from -1
821 (highest compression/lowest quality) to 10 (lowest compression,
822 highest quality). By default the encoding quality level is 3
823 (which gives an encoded rate of approx. 112kbps), but this can
824 be changed using the -C option (see above) with a number from -1
825 to 10; fractional numbers (e.g. 3.6) are also allowed.
826
827 Decoding is somewhat CPU intensive and encoding is very CPU
828 intensive.
829
830 Ogg Vorbis in SoX is optional and requires access to external
831 Ogg Vorbis libraries. To see if there is support for Ogg Vorbis
832 run sox -h and look for it under the list of supported file for‐
833 mats as `vorbis'.
834
835 ossdsp OSS /dev/dsp device driver. This is a pseudo-file that can be
836 optionally compiled into SoX. Run sox -h to see if it is sup‐
837 ported. When this driver is used it allows you to play and
838 record sounds on supported systems. When playing audio files it
839 attempts to set up the OSS driver to use the same format as the
840 input file. It is suggested to always override the output values
841 to use the highest quality format your OSS system can handle.
842 Example: sox infile -t ossdsp -w -s /dev/dsp
843
844 .paf, .fap (libsndfile)
845 Ensoniq PARIS file format (big and little-endian respectively).
846
847 .prc Psion Record. Used in some Psion devices for System alarms and
848 recordings made by the built-in Record application. This format
849 is newer then the .wve format that is also used in some Psion
850 devices.
851
852 .pvf (libsndfile)
853 Portable Voice Format.
854
855 .sd2 (libsndfile)
856 Sound Designer 2 format.
857
858 .sds (libsndfile)
859 MIDI Sample Dump Standard.
860
861 .sf (also with -t sndfile)
862 IRCAM SDIF (Institut de Recherche et Coordination Acous‐
863 tique/Musique Sound Description Interchange Format). Used by
864 academic music software such as the CSound package, and the
865 MixView sound sample editor.
866
867 .sph, .nist (also with -t sndfile)
868 SPHERE (SPeech HEader Resources) is a file format defined by
869 NIST (National Institute of Standards and Technology) and is
870 used with speech audio. SoX can read these files when they con‐
871 tain μ-law and PCM data. It will ignore any header information
872 that says the data is compressed using shorten compression and
873 will treat the data as either μ-law or PCM. This will allow SoX
874 and the command line shorten program to be run together using
875 pipes to encompasses the data and then pass the result to SoX
876 for processing.
877
878 .smp Turtle Beach SampleVision files. SMP files are for use with the
879 PC-DOS package SampleVision by Turtle Beach Softworks. This
880 package is for communication to several MIDI samplers. All sam‐
881 ple rates are supported by the package, although not all are
882 supported by the samplers themselves. Currently loop points are
883 ignored.
884
885 .snd See .au.
886
887 sndfile
888 This is a pseudo-type that forces libsndfile to be used, even
889 for file types normally handled internally by SoX. For writing
890 files, the actual file type is then taken from the output file
891 name; for reading them, it is deduced from the file and any
892 other format parameters. This pseudo-type depends on SoX having
893 been built with optional libsndfile support.
894
895 .sndt SoundTool files. This is an older DOS file format.
896
897 .sou An alias for the .ub format.
898
899 sunau Sun /dev/audio device driver. This is a pseudo-file type and
900 can be optionally compiled into SoX. Run sox -h to see if you
901 have support for this file type. When this driver is used it
902 allows you to open up a Sun /dev/audio file and configure it to
903 use the same data type as passed in to SoX. It works for both
904 playing and recording audio files. When playing audio files it
905 attempts to set up the audio driver to use the same format as
906 the input file. It is suggested to always override the output
907 values to use the highest quality format your hardware can han‐
908 dle. Example: sox infile -t sunau -w -s /dev/audio or sox
909 infile -t sunau -U -c 1 /dev/audio for older sun equipment.
910
911 .txw Yamaha TX-16W sampler. A file format from a Yamaha sampling
912 keyboard which wrote IBM-PC format 3.5" floppies. Handles read‐
913 ing of files which do not have the sample rate field set to one
914 of the expected by looking at some other bytes in the
915 attack/loop length fields, and defaulting to 33 kHz if the sam‐
916 ple rate is still unknown.
917
918 .vms See .dvms.
919
920 .voc (also with -t sndfile)
921 Sound Blaster VOC files. VOC files are multi-part and contain
922 silence parts, looping, and different sample rates for different
923 chunks. On input, the silence parts are filled out, loops are
924 rejected, and sample data with a new sample rate is rejected.
925 Silence with a different sample rate is generated appropriately.
926 On output, silence is not detected, nor are impossible sample
927 rates. Note, this version now supports playing VOC files with
928 multiple blocks and supports playing files containing μ-law and
929 A-law samples.
930
931 .vorbis
932 See .ogg.
933
934 .vox (also with -t sndfile)
935 A headerless file of Dialogic/OKI ADPCM audio data commonly
936 comes with the extension .vox. This ADPCM data has 12-bit pre‐
937 cision packed into only 4-bits.
938
939 .w64 (libsndfile)
940 Sonic Foundry's 64-bit RIFF/WAV format.
941
942 .wav (also with -t sndfile)
943 Microsoft .WAV RIFF files. This is the native audio file format
944 of Windows, and widely used for uncompressed audio.
945
946 Normally .wav files have all formatting information in their
947 headers, and so do not need any format options specified for an
948 input file. If any are, they will override the file header, and
949 you will be warned to this effect. You had better know what you
950 are doing! Output format options will cause a format conversion,
951 and the .wav will written appropriately.
952
953 SoX currently can read PCM, μ-law, A-law, MS ADPCM, and IMA (or
954 DVI) ADPCM. It can write all of these formats including the
955 ADPCM encoding. Big endian versions of RIFF files, called RIFX,
956 can also be read and written. To write a RIFX file, use the -B
957 option with the output file options.
958
959 .wve Psion 8-bit A-law. Used on older Psion PDAs.
960
961 .xa Maxis XA files. These are 16-bit ADPCM audio files used by
962 Maxis games. Writing .xa files is currently not supported,
963 although adding write support should not be very difficult.
964
965 .xi (libsndfile)
966 Fasttracker 2 Extended Instrument format.
967
969 Multiple effects may be applied to the audio by specifying them one
970 after another at the end of the command line.
971
972 Note: Brackets [ ] are used to denote parameters that are optional,
973 braces { } to denote those that are both optional and repeatable, and
974 angle brackets < > to denote those that are repeatable but not
975 optional.
976
977 allpass frequency width[h|o|q]
978 Apply a two-pole all-pass filter with central frequency (in Hz)
979 frequency, and filter-width width: in Hz (the default, or if
980 appended with `h'), in octaves (if appended with `o'), or as a
981 Q-factor (if appended with `q'). An all-pass filter changes the
982 audio's frequency to phase relationship without changing its
983 frequency to amplitude relationship. The filter is described in
984 detail in [1].
985
986 This effect supports the --octave global option.
987
988 band [-n] center [width[h|o|q]]
989 Apply a band-pass filter. The frequency response drops loga‐
990 rithmically around the center frequency. The width in Hz (the
991 default, or if appended with `h'), in octaves (if appended with
992 `o'), or as a Q-factor (if appended with `q'), gives the slope
993 of the drop. The frequencies at center + width and center -
994 width will be half of their original amplitudes. band defaults
995 to a mode oriented to pitched audio, i.e. voice, singing, or
996 instrumental music. The -n (for noise) option uses the alter‐
997 nate mode for un-pitched audio (e.g. percussion). Warning: -n
998 introduces a power-gain of about 11dB in the filter, so beware
999 of output clipping. band introduces noise in the shape of the
1000 filter, i.e. peaking at the center frequency and settling around
1001 it.
1002
1003 This effect supports the --octave global option.
1004
1005 See also filter for a bandpass filter with steeper shoulders.
1006
1007 bandpass|bandreject [-c] frequency width[h|o|q]
1008 Apply a two-pole Butterworth band-pass or band-reject filter
1009 with central frequency (in Hz) frequency, and (3dB-point) band-
1010 width width: in Hz (the default, or if appended with `h'), in
1011 octaves (if appended with `o'), or as a Q-factor (if appended
1012 with `q'). The -c option applies only to bandpass and selects a
1013 constant skirt gain (peak gain = Q) instead of the default: con‐
1014 stant 0dB peak gain. The filters roll off at 6dB per octave
1015 (20dB per decade) and are described in detail in [1].
1016
1017 These effects support the --octave global option.
1018
1019 See also filter for a bandpass filter with steeper shoulders.
1020
1021 bandreject frequency width[h|o|q]
1022 Apply a band-reject filter. See the description of the bandpass
1023 effect for details.
1024
1025 bass|treble gain [frequency [width[s|h|o|q]]]
1026 Boost or cut the bass (lower) or treble (upper) frequencies of
1027 the audio using a two-pole shelving filter with a response simi‐
1028 lar to that of a standard hi-fi's (Baxandall) tone-controls.
1029 This is also known as shelving equalisation (EQ).
1030
1031 gain gives the dB gain at 0 Hz (for bass), or whichever is the
1032 lower of ∼22 kHz and the Nyquist frequency (for treble). Its
1033 useful range is about -20 (for a large cut) to +20 (for a large
1034 boost). Beware of Clipping when using a positive gain.
1035
1036 If desired, the filter can be fine-tuned using the following
1037 optional parameters:
1038
1039 frequency sets the filter's central frequency and so can be used
1040 to extend or reduce the frequency range to be boosted or cut.
1041 The default value is 100 Hz (for bass) or 3 kHz (for treble).
1042
1043 width determines how steep the filter's shelf transition is and
1044 can be expressed as: a `slope' (the default, or if appended with
1045 `s'), a Q-factor (if appended with `q'), the transition width in
1046 octaves (if appended with `o'), or the transition width in Hz
1047 (if appended with `h'). The useful range of `slope' is about
1048 0.3, for a gentle slope, to 1 (the maximum), for a steep slope;
1049 the default value is 0.5.
1050
1051 The filters are described in detail in [1].
1052
1053 These effects support the --octave global option.
1054
1055 See also equalizer for a peaking equalisation effect.
1056
1057 chorus gain-in gain-out <delay decay speed depth -s|-t>
1058 Add a chorus effect to the audio. Each four-tuple
1059 delay/decay/speed/depth gives the delay in milliseconds and the
1060 decay (relative to gain-in) with a modulation speed in Hz using
1061 depth in milliseconds. The modulation is either sinusoidal (-s)
1062 or triangular (-t). Gain-out is the volume of the output.
1063
1064 compand attack1,decay1{,attack2,decay2}
1065 in-dB1,out-dB1{,in-dB2,out-dB2}
1066 [gain [initial-volume [delay]]]
1067
1068 Compand (compress or expand) the dynamic range of the audio.
1069 The attack and decay time specify the integration time over
1070 which the absolute value of the input signal is integrated to
1071 determine its volume; attacks refer to increases in volume and
1072 decays refer to decreases. Where more than one pair of
1073 attack/decay parameters are specified, each channel is treated
1074 separately and the number of pairs must agree with the number of
1075 input channels. The second parameter is a list of points on the
1076 compander's transfer function specified in dB relative to the
1077 maximum possible signal amplitude. The input values must be in
1078 a strictly increasing order but the transfer function does not
1079 have to be monotonically rising. The special value -inf may be
1080 used to indicate that the input volume should be associated out‐
1081 put volume. The points -inf,-inf and 0,0 are assumed; the lat‐
1082 ter may be overridden, but the former may not.
1083
1084 The third (optional) parameter is a post-processing gain in dB
1085 which is applied after the compression has taken place; the
1086 fourth (optional) parameter is an initial volume to be assumed
1087 for each channel when the effect starts. This permits the user
1088 to supply a nominal level initially, so that, for example, a
1089 very large gain is not applied to initial signal levels before
1090 the companding action has begun to operate: it is quite probable
1091 that in such an event, the output would be severely clipped
1092 while the compander gain properly adjusts itself.
1093
1094 The fifth (optional) parameter is a delay in seconds. The input
1095 signal is analysed immediately to control the compander, but it
1096 is delayed before being fed to the volume adjuster. Specifying
1097 a delay approximately equal to the attack/decay times allows the
1098 compander to effectively operate in a `predictive' rather than a
1099 reactive mode.
1100
1101 See also mcompand for a multiple-band companding effect.
1102
1103 dcshift shift [limitergain]
1104 DC Shift the audio, with basic linear amplitude formula. This
1105 is most useful if your audio tends to not be centered around a
1106 value of 0. Shifting it back will allow you to get the most
1107 volume adjustments without clipping.
1108
1109 The first option is the dcshift value. It is a floating point
1110 number that indicates the amount to shift.
1111
1112 An optional limitergain can be specified as well. It should
1113 have a value much less than 1 (e.g. 0.05 or 0.02) and is used
1114 only on peaks to prevent clipping.
1115
1116 deemph Apply a treble attenuation shelving filter to audio in audio-CD
1117 format. The frequency response of pre-emphasized recordings is
1118 rectified. The filter is defined in the standard document ISO
1119 908.
1120
1121 This effect supports the --octave global option.
1122
1123 See also the bass and treble shelving equalisation effects.
1124
1125 dither [depth]
1126 Apply dithering to the audio. Dithering deliberately adds digi‐
1127 tal white noise to the signal in order to mask audible quantiza‐
1128 tion effects that can occur if the output sample size is less
1129 than 24 bits. By default, the amount of noise added is ½ bit;
1130 the optional depth parameter is a (linear or voltage) multiplier
1131 of this amount.
1132
1133 This effect should not be followed by any other effect that
1134 affects the audio.
1135
1136 earwax Makes audio easier to listen to on headphones. Adds `cues' to
1137 audio in audio-CD format so that when listened to on headphones
1138 the stereo image is moved from inside your head (standard for
1139 headphones) to outside and in front of the listener (standard
1140 for speakers). See http://www.geocities.com/beinges for a full
1141 explanation.
1142
1143 echo gain-in gain-out <delay decay>
1144 Add echoing to the audio. Each delay decay pair gives the delay
1145 in milliseconds and the decay (relative to gain-in) of that
1146 echo. Gain-out is the volume of the output.
1147
1148 echos gain-in gain-out <delay decay>
1149 Add a sequence of echos to the audio. Each delay decay pair
1150 gives the delay in milliseconds and the decay (relative to gain-
1151 in) of that echo. Gain-out is the volume of the output.
1152
1153 equalizer frequency width[q|o|h] gain
1154 Apply a two-pole peaking equalisation (EQ) filter. With this
1155 filter, the signal-level at and around a selected frequency can
1156 be increased or decreased, whilst (unlike band-pass and band-
1157 reject filters) that at all other frequencies is unchanged.
1158
1159 frequency gives the filter's central frequency in Hz, width, the
1160 band-width, as a Q-factor [2] (the default, or if appended with
1161 `q'), in octaves (if appended with `o'), or in Hz (if appended
1162 with `h'), and gain the required gain or attenuation in dB.
1163 Beware of Clipping when using a positive gain.
1164
1165 In order to produce complex equalisation curves, this effect can
1166 be given several times, each with a different central frequency.
1167
1168 The filter is described in detail in [1].
1169
1170 This effect supports the --octave global option.
1171
1172 See also bass and treble for shelving equalisation effects.
1173
1174 fade [type] fade-in-length [stop-time [fade-out-length]]
1175 Add a fade effect to the beginning, end, or both of the audio.
1176
1177 For fade-ins, this starts from the first sample and ramps the
1178 volume of the audio from 0 to full volume over fade-in-length
1179 seconds. Specify 0 seconds if no fade-in is wanted.
1180
1181 For fade-outs, the audio will be truncated at stop-time and the
1182 volume will be ramped from full volume down to 0 starting at
1183 fade-out-length seconds before the stop-time. If fade-out-
1184 length is not specified, it defaults to the same value as fade-
1185 in-length. No fade-out is performed if stop-time is not speci‐
1186 fied.
1187
1188 All times can be specified in either periods of time or sample
1189 counts. To specify time periods use the format hh:mm:ss.frac
1190 format. To specify using sample counts, specify the number of
1191 samples and append the letter `s' to the sample count (for exam‐
1192 ple `8000s').
1193
1194 An optional type can be specified to change the type of enve‐
1195 lope. Choices are q for quarter of a sine wave, h for half a
1196 sine wave, t for linear slope, l for logarithmic, and p for
1197 inverted parabola. The default is a linear slope.
1198
1199 filter [low]-[high] [window-len [beta]]
1200 Apply a sinc-windowed lowpass, highpass, or bandpass filter of
1201 given window length to the signal. low refers to the frequency
1202 of the lower 6dB corner of the filter. high refers to the fre‐
1203 quency of the upper 6dB corner of the filter.
1204
1205 A low-pass filter is obtained by leaving low unspecified, or 0.
1206 A high-pass filter is obtained by leaving high unspecified, or
1207 0, or greater than or equal to the Nyquist frequency.
1208
1209 The window-len, if unspecified, defaults to 128. Longer windows
1210 give a sharper cutoff, smaller windows a more gradual cutoff.
1211
1212 The beta, if unspecified, defaults to 16. This selects a Kaiser
1213 window. You can select a Nuttall window by specifying anything
1214 ≤ 2 here. For more discussion of beta, look under the resample
1215 effect.
1216
1217
1218 flanger [delay depth regen width speed shape phase interp]
1219 Apply a flanging effect to the audio. All parameters are
1220 optional (right to left).
1221
1222 ┌─────────────────────────────────────────────────────────────────┐
1223 │ Range Default Description │
1224 │delay 0 - 10 0 Base delay in milliseconds. │
1225 │depth 0 - 10 2 Added swept delay in milliseconds. │
1226 │regen -95 - 95 0 Percentage regeneration (delayed │
1227 │ signal feedback). │
1228 │width 0 - 100 71 Percentage of delayed signal mixed │
1229 │ with original. │
1230 │speed 0.1 - 10 0.5 Sweeps per second (Hz). │
1231 │shape sin Swept wave shape: sine|triangle. │
1232 │phase 0 - 100 25 Swept wave percentage phase-shift │
1233 │ for multi-channel (e.g. stereo) │
1234 │ flange; 0 = 100 = same phase on │
1235 │ each channel. │
1236 │interp lin Digital delay-line interpolation: │
1237 │ linear|quadratic. │
1238 └─────────────────────────────────────────────────────────────────┘
1239 See [3] for a detailed description of flanging.
1240
1241 highpass|lowpass [-1|-2] frequency [width[q|o|h]]
1242 Apply a high-pass or low-pass filter with 3dB point frequency.
1243 The filter can be either single-pole (with -1), or double-pole
1244 (the default, or with -2). width applies only to double-pole
1245 filters and is the filter-width: as a Q-factor (the default, or
1246 if appended with `q'), in octaves (if appended with `o'), or in
1247 Hz (if appended with `h'); the default Q is 0.707 and gives a
1248 Butterworth response. The filters roll off at 6dB per pole per
1249 octave (20dB per pole per decade). The double-pole filters are
1250 described in detail in [1].
1251
1252 These effects support the --octave global option.
1253
1254 See also filter for filters with a steeper roll-off.
1255
1256 lowpass [-1|-2] frequency [width[q|o|h]]
1257 Apply a low-pass filter. See the description of the highpass
1258 effect for details.
1259
1260 mcompand "attack1,decay1{,attack2,decay2}
1261 in-dB1,out-dB1{,in-dB2,out-dB2}
1262 [gain [initial-volume [delay]] ]" xover-freq
1263
1264 Multi-band compander is similar to the single band compander but
1265 the audio is first divided up into bands and then the compander
1266 is run on each band. See the compand effect for the definition
1267 of its options. Compand options are specified between double
1268 quotes and the crossover frequency for that band is specified
1269 separately with xover-freq. This can be repeated multiple times
1270 to create multiple bands.
1271
1272 mixer [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
1273 Reduce the number of audio channels by mixing or selecting chan‐
1274 nels, or increase the number of channels by duplicating chan‐
1275 nels. Note: this effect operates on the audio channels within
1276 the SoX effects processing chain; it should not be confused with
1277 the -m global option (where multiple files are mix-combined
1278 before entering the effects chain).
1279
1280 This effect is automatically used when the number of input chan‐
1281 nels differ from the number of output channels. When reducing
1282 the number of channels it is possible to manually specify the
1283 mixer effect and use the -l, -r, -f, -b, -1, -2, -3, -4, options
1284 to select only the left, right, front, back channel(s) or spe‐
1285 cific channel for the output instead of averaging the channels.
1286 The -l, and -r options will do averaging in quad-channel files
1287 so select the exact channel to prevent this.
1288
1289 The mixer effect can also be invoked with up to 16 numbers, sep‐
1290 arated by commas, which specify the proportion (0 = 0% and 1 =
1291 100%) of each input channel that is to be mixed into each output
1292 channel. In two-channel mode, 4 numbers are given: l → l, l →
1293 r, r → l, and r → r, respectively. In four-channel mode, the
1294 first 4 numbers give the proportions for the left-front output
1295 channel, as follows: lf → lf, rf → lf, lb → lf, and rb → rf.
1296 The next 4 give the right-front output in the same order, then
1297 left-back and right-back.
1298
1299 It is also possible to use the 16 numbers to expand or reduce
1300 the channel count; just specify 0 for unused channels.
1301
1302 Finally, certain reduced combination of numbers can be specified
1303 for certain input/output channel combinations.
1304
1305 ┌──────────────────────────────────────────────────────┐
1306 │In Ch Out Ch Num Mappings │
1307 │ 2 1 2 l → l, r → l │
1308 │ 2 2 1 adjust balance │
1309 │ 4 1 4 lf → l, rf → l, lb → l, rb → l │
1310 │ 4 2 2 lf → l&rf → r, lb → l&rb → r │
1311 │ 4 4 1 adjust balance │
1312 │ 4 4 2 front balance, back balance │
1313 └──────────────────────────────────────────────────────┘
1314
1315 noiseprof [profile-file]
1316 Calculate a profile of the audio for use in noise reduction.
1317 See the description of the noisered effect for details.
1318
1319 noisered profile-file [threshold]
1320 Noise reduction filter with profiling. This filter is moder‐
1321 ately effective at removing consistent background noise such as
1322 hiss or hum. To use it, first run the noiseprof effect on a
1323 section of audio that ideally would contain silence but in fact
1324 contains noise. The noiseprof effect will write out a noise
1325 profile to profile-file, or to stdout if no profile-file is
1326 specified. If there is audio output on stdout then the profile
1327 will instead be directed to stderr.
1328
1329 To actually remove the noise, run SoX again with the noisered
1330 filter. The filter needs one parameter, profile-file, which
1331 contains the noise profile from noiseprof. threshold specifies
1332 how much noise should be removed, and may be between 0 and 1
1333 with a default of 0.5. Higher values will remove more noise but
1334 present a greater likelihood of distorting the desired audio
1335 signal. Experiment with different threshold values to find the
1336 optimal one for your audio.
1337
1338 pad { length[@position] }
1339 Pad the audio with silence, at the beginning, the end, or any
1340 specified points through the audio. Both length and position
1341 can specify a time or, if appended with an `s', a number of sam‐
1342 ples. length is the amount of silence to insert and position
1343 the position in the input audio stream at which to insert it.
1344 Any number of lengths and positions may be specified, provided
1345 that a specified position is not less that the previous one.
1346 position is optional for the first and last lengths specified
1347 and if omitted correspond to the beginning and the end of the
1348 audio respectively. For example: pad 1.5 1.5 adds 1.5 seconds
1349 of silence padding at each end of the audio, whilst pad
1350 4000s@3:00 inserts 4000 samples of silence 3 minutes into the
1351 audio. If silence is wanted only at the end of the audio, spec‐
1352 ify either the end position or specify a zero-length pad at the
1353 start.
1354
1355 pan direction
1356 Pan the audio from one channel to another. This is done by
1357 changing the volume of the input channels so that it fades out
1358 on one channel and fades-in on another. If the number of input
1359 channels is different then the number of output channels then
1360 this effect tries to intelligently handle this. For instance,
1361 if the input contains 1 channel and the output contains 2 chan‐
1362 nels, then it will create the missing channel itself. The
1363 direction is a value from -1 to 1. -1 represents far left and 1
1364 represents far right. Numbers in between will start the pan
1365 effect without totally muting the opposite channel.
1366
1367 phaser gain-in gain-out delay decay speed [-s|-t]
1368 Add a phasing effect to the audio. delay/decay/speed gives the
1369 delay in milliseconds and the decay (relative to gain-in) with a
1370 modulation speed in Hz. The modulation is either sinusoidal
1371 (-s) or triangular (-t). The decay should be less than 0.5 to
1372 avoid feedback. Gain-out is the volume of the output.
1373
1374 pitch shift [width interpolate fade]
1375 Change the pitch of file without affecting its duration by
1376 cross-fading shifted samples. shift is given in cents. Use a
1377 positive value to shift to treble, negative value to shift to
1378 bass. Default shift is 0. width of window is in ms. Default
1379 width is 20ms. Try 30ms to lower pitch, and 10ms to raise
1380 pitch. interpolate option, can be cubic or linear. Default is
1381 cubic. The fade option, can be cos, hamming, linear or trape‐
1382 zoid; the default is cos.
1383
1384 polyphase [-w nut|ham] [-width n] [-cutoff c]
1385 Change the sampling rate using `polyphase interpolation', a DSP
1386 algorithm. This method is relatively slow and memory intensive.
1387
1388 If the -w parameter is nut, then a Nuttall (~90 dB stop-band)
1389 window will be used; ham selects a Hamming (~43 dB stop-band)
1390 window. The default is Nuttall.
1391
1392 The -width parameter specifies the (approximate) width of the
1393 filter. The default is 1024 samples, which produces reasonable
1394 results.
1395
1396 The -cutoff value (c) specifies the filter cutoff frequency in
1397 terms of fraction of frequency bandwidth, also know as the
1398 Nyquist frequency. See the resample effect for further informa‐
1399 tion on Nyquist frequency. If up-sampling, then this is the
1400 fraction of the original signal that should go through. If
1401 down-sampling, this is the fraction of the signal left after
1402 down-sampling. The default is 0.95.
1403
1404 See also rabbit and resample for other sample-rate changing
1405 effects.
1406
1407 rabbit [-c0|-c1|-c2|-c3|-c4]
1408 Change the sampling rate using `libsamplerate', also known as
1409 `Secret Rabbit Code'. This effect is optional and must have
1410 been selected at compile time of SoX. See http://www.mega-
1411 nerd.com/SRC for details of the algorithms. Algorithms 0
1412 through 2 are progressively faster and lower quality versions of
1413 the sinc algorithm; the default is -c0, which is probably the
1414 best quality algorithm for general use currently available in
1415 SoX. Algorithm 3 is zero-order hold, and 4 is linear interpola‐
1416 tion.
1417
1418 See also polyphase and resample for other sample-rate changing
1419 effects, and see resample for more discussion of resampling.
1420
1421 repeat count
1422 Repeat the entire audio count times. Requires disk space to
1423 store the data to be repeated. Note that repeating once yields
1424 two copies: the original audio and the repeated audio.
1425
1426 resample [-qs|-q|-ql] [rolloff [beta]]
1427 Change the sampling rate using simulated analog filtration.
1428 Other rate changing effects available are polyphase and rabbit.
1429 There is a detailed analysis of resample and polyphase at
1430 http://leute.server.de/wilde/resample.html; see rabbit for a
1431 pointer to its own documentation.
1432
1433 By default, linear interpolation is used, with a window width
1434 about 45 samples at the lower of the two rates. This gives an
1435 accuracy of about 16 bits, but insufficient stop-band rejection
1436 in the case that you want to have roll-off greater than about
1437 0.8 of the Nyquist frequency.
1438
1439 The -q* options will change the default values for roll-off and
1440 beta as well as use quadratic interpolation of filter coeffi‐
1441 cients, resulting in about 24 bits precision. The -qs, -q, or
1442 -ql options specify increased accuracy at the cost of lower exe‐
1443 cution speed. It is optional to specify roll-off and beta
1444 parameters when using the -q* options.
1445
1446 Following is a table of the reasonable defaults which are built-
1447 in to SoX:
1448
1449
1450 ┌──────────────────────────────────────────────────┐
1451 │Option Window Roll-off Beta Interpolation │
1452 │(none) 45 0.80 16 linear │
1453 │ -qs 45 0.80 16 quadratic │
1454 │ -q 75 0.875 16 quadratic │
1455 │ -ql 149 0.94 16 quadratic │
1456 └──────────────────────────────────────────────────┘
1457 -qs, -q, or -ql use window lengths of 45, 75, or 149 samples,
1458 respectively, at the lower sample-rate of the two files. This
1459 means progressively sharper stop-band rejection, at proportion‐
1460 ally slower execution times.
1461
1462 rolloff refers to the cut-off frequency of the low pass filter
1463 and is given in terms of the Nyquist frequency for the lower
1464 sample rate. rolloff therefore should be something between 0
1465 and 1, in practise 0.8-0.95. The defaults are indicated above.
1466
1467 The Nyquist frequency is equal to half the sample rate. Logi‐
1468 cally, this is because the A/D converter needs at least 2 sam‐
1469 ples to detect 1 cycle at the Nyquist frequency. Frequencies
1470 higher then the Nyquist will actually appear as lower frequen‐
1471 cies to the A/D converter and is called aliasing. Normally, A/D
1472 converts run the signal through a lowpass filter first to avoid
1473 these problems.
1474
1475 Similar problems will happen in software when reducing the sam‐
1476 ple rate of an audio file (frequencies above the new Nyquist
1477 frequency can be aliased to lower frequencies). Therefore, a
1478 good resample effect will remove all frequency information above
1479 the new Nyquist frequency.
1480
1481 The rolloff refers to how close to the Nyquist frequency this
1482 cutoff is, with closer being better. When increasing the sample
1483 rate of an audio file you would not expect to have any frequen‐
1484 cies exist that are past the original Nyquist frequency.
1485 Because of resampling properties, it is common to have aliasing
1486 artifacts created above the old Nyquist frequency. In that case
1487 the rolloff refers to how close to the original Nyquist fre‐
1488 quency to use a highpass filter to remove these artifacts, with
1489 closer also being better.
1490
1491 The beta parameter determines the type of filter window used.
1492 Any value greater than 2 is the beta for a Kaiser window. Beta
1493 ≤ 2 selects a Nuttall window. If unspecified, the default is a
1494 Kaiser window with beta 16.
1495
1496 In the case of Kaiser window (beta > 2), lower betas produce a
1497 somewhat faster transition from pass-band to stop-band, at the
1498 cost of noticeable artifacts. A beta of 16 is the default, beta
1499 less than 10 is not recommended. If you want a sharper cutoff,
1500 don't use low beta's, use a longer sample window. A Nuttall
1501 window is selected by specifying any `beta' ≤ 2, and the Nuttall
1502 window has somewhat steeper cutoff than the default Kaiser win‐
1503 dow. You will probably not need to use the beta parameter at
1504 all, unless you are just curious about comparing the effects of
1505 Nuttall vs. Kaiser windows.
1506
1507 This is the default effect if the two files have different sam‐
1508 pling rates. Default parameters are, as indicated above, Kaiser
1509 window of length 45, roll-off 0.80, beta 16, linear interpola‐
1510 tion.
1511
1512 Note: -qs is only slightly slower, but more accurate for 16-bit
1513 or higher precision.
1514
1515 Note: In many cases of up-sampling, no interpolation is needed,
1516 as exact filter coefficients can be computed in a reasonable
1517 amount of space. To be precise, this is done when
1518
1519 input-rate < output-rate
1520 and
1521 output-rate ÷ gcd(input-rate, output-rate) ≤ 511
1522
1523 reverb gain-out reverb-time <delay>
1524 Add reverberation to the audio. Each delay is given in mil‐
1525 liseconds and its feedback is depending on the reverb-time in
1526 milliseconds. Each delay should be in the range of half to
1527 quarter of reverb-time to get a realistic reverberation. gain-
1528 out is the volume of the output.
1529
1530 reverse
1531 Reverse the audio completely. Requires disk space to store the
1532 data to be reversed.
1533
1534 silence above-periods [duration threshold[d|%] [below-periods duration
1535 threshold[d|%]]
1536
1537 Removes silence from the beginning, middle, or end of the audio.
1538 Silence is anything below a specified threshold.
1539
1540 The above-periods value is used to indicate if audio should be
1541 trimmed at the beginning of the audio. A value of zero indi‐
1542 cates no silence should be trimmed from the beginning. When
1543 specifying an non-zero above-periods, it trims audio up until it
1544 finds non-silence. Normally, when trimming silence from begin‐
1545 ning of audio the above-periods will be 1 but it can be
1546 increased to higher values to trim all audio up to a specific
1547 count of non-silence periods. For example, if you had an audio
1548 file with two songs that each contained 2 seconds of silence
1549 before the song, you could specify an above-period of 2 to strip
1550 out both silence periods and the first song.
1551
1552 When above-periods is non-zero, you must also specify a duration
1553 and threshold. Duration indications the amount of time that
1554 non-silence must be detected before it stops trimming audio. By
1555 increasing the duration, burst of noise can be treated as
1556 silence and trimmed off.
1557
1558 Threshold is used to indicate what sample value you should treat
1559 as silence. For digital audio, a value of 0 may be fine but for
1560 audio recorded from analog, you may wish to increase the value
1561 to account for background noise.
1562
1563 When optionally trimming silence from the end of the audio, you
1564 specify a below-periods count. In this case, below-period means
1565 to remove all audio after silence is detected. Normally, this
1566 will be a value 1 of but it can be increased to skip over peri‐
1567 ods of silence that are wanted. For example, if you have a song
1568 with 2 seconds of silence in the middle and 2 second at the end,
1569 you could set below-period to a value of 2 to skip over the
1570 silence in the middle of the audio.
1571
1572 For below-periods, duration specifies a period of silence that
1573 must exist before audio is not copied any more. By specifying a
1574 higher duration, silence that is wanted can be left in the
1575 audio. For example, if you have a song with an expected 1 sec‐
1576 ond of silence in the middle and 2 seconds of silence at the
1577 end, a duration of 2 seconds could be used to skip over the mid‐
1578 dle silence.
1579
1580 Unfortunately, you must know the length of the silence at the
1581 end of your audio file to trim off silence reliably. A work
1582 around is to use the silence effect in combination with the
1583 reverse effect. By first reversing the audio, you can use the
1584 above-periods to reliably trim all audio from what looks like
1585 the front of the file. Then reverse the file again to get back
1586 to normal.
1587
1588 To remove silence from the middle of a file, specify a below-
1589 periods that is negative. This value is then treated as a posi‐
1590 tive value and is also used to indicate the effect should
1591 restart processing as specified by the above-periods, making it
1592 suitable for removing periods of silence in the middle of the
1593 audio.
1594
1595 The period counts are in units of samples. Duration counts may
1596 be in the format of hh:mm:ss.frac, or the exact count of sam‐
1597 ples. Threshold numbers may be suffixed with d to indicate the
1598 value is in decibels, or % to indicate a percentage of maximum
1599 value of the sample value (0% specifies pure digital silence).
1600
1601 speed factor[c]
1602 Adjust the audio speed (pitch and tempo together). factor is
1603 either the ratio of the new speed to the old speed: greater than
1604 1 speeds up, less than 1 slows down, or, if appended with the
1605 letter `c', the number of cents (i.e. 100ths of a semitone) by
1606 which the pitch (and tempo) should be adjusted: greater than 0
1607 increases, less than 0 decreases.
1608
1609 By default, the speed change is performed by the resample effect
1610 with its default parameters. For higher quality resampling, in
1611 addition to the speed effect, specify either the resample or the
1612 rabbit effect with appropriate parameters.
1613
1614 stat [-s n] [-rms] [-freq] [-v] [-d]
1615 Do a statistical check on the input file, and print results on
1616 the standard error file. Audio is passed unmodified through the
1617 SoX processing chain.
1618
1619 The `Volume Adjustment:' field in the statistics gives you the
1620 parameter to the -v number which will make the audio as loud as
1621 possible without clipping. Note: See the discussion on Clipping
1622 above for reasons why it is rarely a good idea to actually do
1623 this.
1624
1625 The option -v will print out the `Volume Adjustment:' field's
1626 value only and return. This could be of use in scripts to auto
1627 convert the volume.
1628
1629 The -s option is used to scale the input data by a given factor.
1630 The default value of n is the max value of a signed long vari‐
1631 able (0x7fffffff). Internal effects always work with signed
1632 long PCM data and so the value should relate to this fact.
1633
1634 The -rms option will convert all output average values to `root
1635 mean square' format.
1636
1637 The -freq option calculates the input's power spectrum and
1638 prints it to standard error.
1639
1640 There is also an optional parameter -d that will print out a hex
1641 dump of the audio from the internal buffer that is in 32-bit
1642 signed PCM data. This is mainly only of use in tracking down
1643 endian problems that creep in to SoX on cross-platform versions.
1644
1645 stretch factor [window fade shift fading]
1646 Time stretch the audio by the given factor. Changes duration
1647 without affecting the pitch. factor of stretching: >1 lengthen,
1648 <1 shorten duration. window size is in ms. Default is 20ms.
1649 The fade option, can be `lin'. shift ratio, in [0 1]. Default
1650 depends on stretch factor. 1 to shorten, 0.8 to lengthen. The
1651 fading ratio, in [0 0.5]. The amount of a fade's default
1652 depends on factor and shift.
1653
1654 swap [1 2 | 1 2 3 4]
1655 Swap channels in multi-channel audio files. Optionally, you may
1656 specify the channel order you would like the output in. This
1657 defaults to output channel 2 and then 1 for stereo and 2, 1, 4,
1658 3 for quad-channels. An interesting feature is that you may
1659 duplicate a given channel by overwriting another. This is done
1660 by repeating an output channel on the command-line. For exam‐
1661 ple, swap 2 2 will overwrite channel 1 with channel 2; creating
1662 a stereo file with both channels containing the same audio.
1663
1664 synth [len] {[type] [combine] [freq[-freq2]] [off] [ph] [p1] [p2] [p3]}
1665 This effect can be used to generate fixed or swept frequency
1666 audio tones with various wave shapes, or to generate wide-band
1667 noise of various `colours'. Multiple synth effects can be cas‐
1668 caded to produce more complex waveforms; at each stage it is
1669 possible to choose whether the generated waveform will be mixed
1670 with, or modulated onto the output from the previous stage.
1671 Audio for each channel in a multi-channel audio file can be syn‐
1672 thesised independently.
1673
1674 Though this effect is used to generate audio, an input file must
1675 still be given, the characteristics of which will be used to set
1676 the synthesised audio length, the number of channels, and the
1677 sampling rate; however, since the input file's audio is not nor‐
1678 mally needed, a `null file' (with the special name -n) is often
1679 given instead (and the length specified as a parameter to synth
1680 or by another given effect that can has an associated length).
1681
1682 For example, the following produces a 3 second, 44.1 kHz, stereo
1683 audio file containing a sine-wave swept from 300 to 3300 Hz:
1684
1685 sox -n output.au synth 3 sine 300-3300
1686
1687 and this produces an 8 kHz mono version:
1688
1689 sox -r 8000 -c 1 -n output.au synth 3 sine 300-3300
1690
1691 Multiple channels can be synthesised by specifying the set of
1692 parameters shown between braces multiple times; the following
1693 puts the swept tone in the left channel and adds `brown' noise
1694 in the right:
1695
1696 sox -n output.au synth 3 sine 300-3300 brownnoise
1697
1698 The following example shows how two synth effects can be cas‐
1699 caded to create a more complex waveform:
1700
1701 sox -n output.au synth 0.5 sine 200-500 synth 0.5 sine
1702 fmod 700-100
1703
1704 Frequencies can also be given as a number of musical semitones
1705 relative to `middle A' (440 Hz) by prefixing a `%' character;
1706 for example, the following could be used to help tune a guitar's
1707 `E' strings:
1708
1709 play -n synth sine %-17
1710
1711 N.B. This effect generates audio at maximum volume, which means
1712 that there is a high chance of clipping when using the audio
1713 subsequently, so in most cases, you will want to follow this
1714 effect with the vol effect to prevent this from happening. (See
1715 also Clipping above.)
1716
1717 A detailed description of each synth parameter follows:
1718
1719 len is the length of audio to synthesise expressed as a time or
1720 as a number of samples; 0=inputlength, default=0.
1721
1722 The format for specifying lengths in time is hh:mm:ss.frac. The
1723 format for specifying sample counts is the number of samples
1724 with the letter `s' appended to it.
1725
1726 type is one of sine, square, triangle, sawtooth, trapezium, exp,
1727 [white]noise, pinknoise, brownnoise; default=sine
1728
1729 combine is one of create, mix, amod (amplitude modulation), fmod
1730 (frequency modulation); default=create
1731
1732 freq/freq2 are the frequencies at the beginning/end of synthesis
1733 in Hz or, if preceded with `%', semitones relative to A
1734 (440 Hz); for both, default=%0. If freq2 is given, then len
1735 must also have been given. Not used for noise.
1736
1737 off is the bias (DC-offset) of the signal in percent; default=0.
1738
1739 ph is the phase shift in percentage of 1 cycle; default=0. Not
1740 used for noise.
1741
1742 p1 is the percentage of each cycle that is `on' (square), or
1743 `rising' (triangle, exp, trapezium); default=50 (square, trian‐
1744 gle, exp), default=10 (trapezium).
1745
1746 p2 (trapezium): the percentage through each cycle at which
1747 `falling' begins; default=50. exp: the amplitude in percent;
1748 default=100.
1749
1750 p3 (trapezium): the percentage through each cycle at which
1751 `falling' ends; default=60.
1752
1753 treble gain [frequency [width[s|h|o|q]]]
1754 Apply a treble tone-control effect. See the description of the
1755 bass effect for details.
1756
1757 tremolo speed [depth]
1758 Apply a tremolo (low frequency amplitude modulation) effect to
1759 the audio. The tremolo frequency in Hz is given by speed, and
1760 the depth as a percentage by depth (default 40).
1761
1762 Note: This effect is a special case of the synth effect.
1763
1764 trim start [length]
1765 Trim can trim off unwanted audio from the beginning and end of
1766 the audio. Audio is not sent to the output stream until the
1767 start location is reached.
1768
1769 The optional length parameter tells the number of samples to
1770 output after the start sample and is used to trim off the back
1771 side of the audio. Using a value of 0 for the start parameter
1772 will allow trimming off the back side only.
1773
1774 Both options can be specified using either an amount of time or
1775 an exact count of samples. The format for specifying lengths in
1776 time is hh:mm:ss.frac. A start value of 1:30.5 will not start
1777 until 1 minute, thirty and ½ seconds into the audio. The format
1778 for specifying sample counts is the number of samples with the
1779 letter `s' appended to it. A value of 8000s will wait until
1780 8000 samples are read before starting to process audio.
1781
1782 vol gain [type [limitergain]]
1783 Apply an amplification or an attenuation to the audio signal.
1784 Unlike the -v option (which is used for balancing multiple input
1785 files as they enter the SoX effects processing chain), vol is an
1786 effect like any other so can be applied anywhere, and several
1787 times if necessary, during the processing chain.
1788
1789 The amount to change the volume is given by gain which is inter‐
1790 preted, according to the given type, as follows: if type is
1791 amplitude (or is omitted), then gain is an amplitude (i.e. volt‐
1792 age or linear) ratio, if power, then a power (i.e. wattage or
1793 voltage-squared) ratio, and if dB, then a power change in dB.
1794
1795 When type is amplitude or power, a gain of 1 leaves the volume
1796 unchanged, less than 1 decreases it, and greater than 1
1797 increases it; a negative gain inverts the audio signal in addi‐
1798 tion to adjusting its volume.
1799
1800 When type is dB, a gain of 0 leaves the volume unchanged, less
1801 than 0 decreases it, and greater than 0 increases it.
1802
1803 See [4] for a detailed discussion on electrical (and hence audio
1804 signal) voltage and power ratios.
1805
1806 Beware of Clipping when the increasing the volume.
1807
1808 An optional limitergain value can be specified and should be a
1809 value much less than 1 (e.g. 0.05 or 0.02) and is used only on
1810 peaks to prevent clipping. Not specifying this parameter will
1811 cause no limiter to be used. In verbose mode, this effect will
1812 display the percentage of the audio that needed to be limited.
1813
1814 Deprecated Effects
1815 The following effects have been renamed or have their functionality
1816 included in another effect. They continue to work in this version of
1817 SoX but may be removed in future.
1818
1819 avg [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
1820 Reduce the number of audio channels by mixing or selecting chan‐
1821 nels, or duplicate channels to increase the number of channels.
1822 This effect is just an alias of the mixer effect and is retained
1823 for backwards compatibility only.
1824
1825 highp frequency
1826 Apply a high-pass filter. This effect is just an alias for the
1827 highpass effect used with its -1 option; it is retained for
1828 backwards compatibility only.
1829
1830 lowp frequency
1831 Apply a low-pass filter. This effect is just an alias for the
1832 lowpass effect used with its -1 option; it is retained for back‐
1833 wards compatibility only.
1834
1835 mask [depth]
1836 This effect is just a deprecated alias for the dither effect,
1837 left for historical reasons.
1838
1839 pick [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
1840 Pick a subset of channels to be copied into the output file.
1841 This effect is just an alias of the mixer effect and is retained
1842 for backwards compatibility only.
1843
1844 rate Does the same as resample with no parameters; it exists for
1845 backwards compatibility.
1846
1847 vibro speed [depth]
1848 This is a deprecated alias for the tremolo effect. It differs
1849 in that the depth parameter ranges from 0 to 1 and defaults to
1850 0.5.
1851
1853 Exit status is 0 for no error, 1 if there is a problem with the com‐
1854 mand-line parameters, or 2 if an error occurs during file processing.
1855
1857 Please report any bugs found in this version of SoX to the mailing list
1858 (sox-users@lists.sourceforge.net).
1859
1861 soxexam(7), libst(3)
1862
1863 The SoX web page at http://sox.sourceforge.net
1864
1865 References
1866 [1] R. Bristow-Johnson, Cookbook formulae for audio EQ biquad filter
1867 coefficients, http://musicdsp.org/files/Audio-EQ-Cookbook.txt
1868
1869 [2] Wikipedia, Q-factor, http://en.wikipedia.org/wiki/Q_factor
1870
1871 [3] Scott Lehman, Flanging, http://harmony-central.com/Effects/Arti‐
1872 cles/Flanging
1873
1874 [4] Wikipedia, Decibel, http://en.wikipedia.org/wiki/Decibel
1875
1877 Copyright 1991 Lance Norskog and Sundry Contributors. Copyright
1878 1998-2007 by Chris Bagwell and SoX Contributors.
1879
1880 This program is free software; you can redistribute it and/or modify it
1881 under the terms of the GNU General Public License as published by the
1882 Free Software Foundation; either version 2, or (at your option) any
1883 later version.
1884
1885 This program is distributed in the hope that it will be useful, but
1886 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
1887 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
1888 Public License for more details.
1889
1891 Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
1892 tributors are listed in the AUTHORS file that is distributed with the
1893 source code.
1894
1895
1896
1897sox January 31, 2007 SoX(1)