1SoX(1) Sound eXchange SoX(1)
2
3
4
6 SoX - Sound eXchange, the Swiss Army knife of audio manipulation
7
9 sox [global-options] [format-options] infile1
10 [[format-options] infile2] ... [format-options] outfile
11 [effect [effect-options]] ...
12
13 play [global-options] [format-options] infile1
14 [[format-options] infile2] ... [format-options]
15 [effect [effect-options]] ...
16
17 rec [global-options] [format-options] outfile
18 [effect [effect-options]] ...
19
21 Introduction
22 SoX reads and writes audio files in most popular formats and can
23 optionally apply effects to them; it can combine multiple input
24 sources, synthesise audio, and, on many systems, act as a general pur‐
25 pose audio player or a multi-track audio recorder. It also has limited
26 ability to split the input in to multiple output files.
27
28 Almost all SoX functionality is available using just the sox command,
29 however, to simplify playing and recording audio, if SoX is invoked as
30 play the output file is automatically set to be the default sound
31 device and if invoked as rec the default sound device is used as an
32 input source. Additionally, the soxi(1) command provides a convenient
33 way to just query audio file header information.
34
35 The heart of SoX is a library called libSoX. Those interested in
36 extending SoX or using it in other programs should refer to the libSoX
37 manual page: libsox(3).
38
39 SoX is a command-line audio processing tool, particularly suited to
40 making quick, simple edits and to batch processing. If you need an
41 interactive, graphical audio editor, use audacity(1).
42
43 * * *
44
45 The overall SoX processing chain can be summarised as follows:
46
47 Input(s) → Combiner → Effects → Output(s)
48
49 To show how this works in practise, here is a selection of examples of
50 how SoX might be used. The simple
51 sox recital.au recital.wav
52 translates an audio file in Sun AU format to a Microsoft WAV file,
53 whilst
54 sox recital.au -r 12k -b 8 -c 1 recital.wav vol 0.7 dither
55 performs the same format translation, but also changes the audio sam‐
56 pling rate & sample size, down-mixes to mono, and applies the vol and
57 dither effects.
58 sox -r 8k -u -b 8 -c 1 voice-memo.raw voice-memo.wav
59 converts `raw' (a.k.a. `headerless') audio to a self-descibing file
60 format,
61 sox slow.aiff fixed.aiff speed 1.027
62 adjusts audio speed,
63 sox short.au long.au longer.au
64 concatenates two audio files, and
65 sox -m music.mp3 voice.wav mixed.flac
66 mixes together two audio files.
67 play "The Moonbeams/Greatest/*.ogg" bass +3
68 plays a collection of audio files whilst applying a bass boosting
69 effect,
70 play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade q 0.1 1 0.1
71 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
72 rec -c 2 test.aiff trim 0 10
73 records 10 seconds of stereo audio, and
74 rec -M take1.aiff take1-dub.aiff
75 records a new track in a multi-track recording.
76 rec -r 44100 -2 -s -p silence 1 0.50 0.1% 1 10:00 0.1% | \
77 sox -p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \
78 newfile : restart
79 records a stream of audio such as LP/cassette and splits in to multiple
80 audio files at points with 2 seconds of silence. Also does not start
81 recording until it detects audio is playing and stops after it sees 10
82 minutes of silence.
83
84 N.B. Detailed explanations of how to use all SoX parameters, file for‐
85 mats, and effects can be found below in this manual, and in soxfor‐
86 mat(7).
87
88 File Format Types
89 There are two types of audio file format that SoX can work with. The
90 first is `self-describing'; these formats include a header that com‐
91 pletely describes the characteristics of the audio data that follows.
92 The second type is `headerless' (or `raw data'); here, the audio data
93 characteristics must be described using the SoX command line.
94
95 The following four characteristics are sufficient to describe the for‐
96 mat of audio data such that it can be processed with SoX:
97
98 sample rate
99 The sample rate in samples per second (`Hertz' or `Hz'). For
100 example, digital telephony traditionally uses a sample rate of
101 8000 Hz (8 kHz); audio Compact Discs use 44100 Hz (44.1 kHz);
102 Digital Audio Tape and many computer systems use 48 kHz; profes‐
103 sional audio systems typically use 96 or 192 kHz.
104
105 sample size
106 The number of bits used to store each sample. The most popular
107 is 16-bit (two bytes); 8-bit (one byte) was popular in the early
108 days of computer audio, and is still used in telephony; 24-bit
109 (three bytes) is used, primarily as an intermediate format, in
110 the professional audio arena. Other sizes are also used.
111
112 data encoding
113 The way in which each audio sample is represented (or
114 `encoded'). Some encodings have variants with different byte-
115 orderings or bit-orderings; some `compress' the audio data, i.e.
116 the stored audio data takes up less space (i.e. disk-space or
117 transmission band-width) than the other format parameters and
118 the number of samples would imply. Commonly-used encoding types
119 include floating-point, μ-law, ADPCM, signed-integer PCM, and
120 FLAC.
121
122 channels
123 The number of audio channels contained in the file. One
124 (`mono') and two (`stereo') are widely used. `Surround sound'
125 audio typically contains six or more channels.
126
127 The term `bit-rate' is sometimes used as an overall measure of an audio
128 format and may incorporate elements of all of the above.
129
130 Most self-describing formats also allow textual `comments' to be embed‐
131 ded in the file that can be used to describe the audio in some way,
132 e.g. for music, the title, the author, etc.
133
134 One important use of audio file comments is to convey `Replay Gain'
135 information. SoX supports applying Replay Gain information, but not
136 generating it. Note that by default, SoX copies input file comments to
137 output files that support comments, so output files may contain Replay
138 Gain information if some was present in the input file. In this case,
139 if anything other than a simple format conversion was performed then
140 the output file Replay Gain information is likely to be incorrect and
141 so should be recalculated using a tool that supports this (not SoX).
142
143 The soxi(1) command can be used to display information from audio file
144 headers.
145
146 Determining & Setting The File Format
147 There are several mechanisms available for SoX to use to determine or
148 set the format characteristics of an audio file. Depending on the cir‐
149 cumstances, individual characteristics may be determined or set using
150 different mechanisms.
151
152 To determine the format of an input file, SoX will use, in order of
153 precedence and as given or available:
154
155
156 1. Command-line format options.
157 2. The contents of the file header.
158 3. The filename extension.
159
160 To set the output file format, SoX will use, in order of precedence and
161 as given or available:
162
163
164 1. Command-line format options.
165 2. The filename extension.
166 3. The input file format characteristics, or the closest to
167 them that is supported by the output file type.
168
169 For all files, SoX will exit with an error if the file type cannot be
170 determined; command-line format options may need to be added or changed
171 to resolve the problem.
172
173 Play, Rec, & Default Audio Devices
174 Some systems provide more than one type of (SoX-compatible) audio
175 driver, e.g. ALSA & OSS, or SUNAU & AO. Systems can also have more
176 than one audio device (a.k.a. `sound card'). If more than one audio
177 driver has been built-in to SoX, and the default selected by SoX when
178 using rec or play is not the one that is wanted, then the AUDIODRIVER
179 environment variable can be used to override the default. For example
180 (on many systems):
181 set AUDIODRIVER=oss
182 play ...
183 For rec, play, and sox, the AUDIODEV environment variable can be used
184 to override the default audio device; e.g.
185 set AUDIODEV=/dev/dsp2
186 play ...
187 sox ... -t oss
188 or
189 set AUDIODEV=hw:0
190 play ...
191 sox ... -t alsa
192 (Note that the syntax of the set command may vary from system to sys‐
193 tem.)
194
195 When playing a file with a sample rate that is not supported by the
196 audio output device, SoX will automatically invoke the rate effect to
197 perform the necessary sample rate conversion. For compatibility with
198 old hardware, here, the default rate quality level is set to `low';
199 however, this can be changed if desired, by explicitly specifing the
200 rate effect with a different quality level, e.g.
201 play ... rate -m
202 or by setting the environment varible PLAY_RATE_ARG to the desired
203 quality option, e.g.
204 set PLAY_RATE_ARG=-m
205 play ...
206 (Note that the syntax of the set command may vary from system to sys‐
207 tem.)
208
209 To help with setting a suitable recording level, SoX includes a simple
210 VU meter which can be invoked (before making the actual recording) as
211 follows:
212 rec -n
213 The recording level should be adjusted (using the system-provided mixer
214 program, not SoX) so that the meter is at most occasionally full scale,
215 and never `in the red' (an exclamation mark is shown).
216
217 Accuracy
218 Many file formats that compress audio discard some of the audio signal
219 information whilst doing so; converting to such a format then convert‐
220 ing back again will not produce an exact copy of the original audio.
221 This is the case for many formats used in telephony (e.g. A-law, GSM)
222 where low signal bandwidth is more important than high audio fidelity,
223 and for many formats used in portable music players (e.g. MP3, Vorbis)
224 where adequate fidelity can be retained even with the large compression
225 ratios that are needed to make portable players practical.
226
227 Formats that discard audio signal information are called `lossy', and
228 formats that do not, `lossless'. The term `quality' is used as a mea‐
229 sure of how closely the original audio signal can be reproduced when
230 using a lossy format.
231
232 Audio file conversion with SoX is lossless when it can be, i.e. when
233 not using lossy compression, when not reducing the sampling rate or
234 number of channels, and when the number of bits used in the destination
235 format is not less than in the source format. E.g. converting from an
236 8-bit PCM format to a 16-bit PCM format is lossless but converting from
237 an 8-bit PCM format to (8-bit) A-law isn't.
238
239 N.B. SoX converts all audio files to an internal uncompressed format
240 before performing any audio processing; this means that manipulating a
241 file that is stored in a lossy format can cause further losses in audio
242 fidelity. E.g. with
243 sox long.mp3 short.mp3 trim 10
244 SoX first decompresses the input MP3 file, then applies the trim
245 effect, and finally creates the output MP3 file by recompressing the
246 audio - with a possible reduction in fidelity above that which occurred
247 when the input file was created. Hence, if what is ultimately desired
248 is lossily compressed audio, it is highly recommended to perform all
249 audio processing using lossless file formats and then convert to the
250 lossy format only at the final stage.
251
252 N.B. Applying multiple effects with a single SoX invocation will, in
253 general, produce more accurate results than those produced using multi‐
254 ple SoX invocations; hence this is also recommended.
255
256 Clipping
257 Clipping is distortion that occurs when an audio signal level (or `vol‐
258 ume') exceeds the range of the chosen representation. It is nearly
259 always undesirable and so should usually be corrected by adjusting the
260 level prior to the point at which clipping occurs.
261
262 In SoX, clipping could occur, as you might expect, when using the vol
263 effect to increase the audio volume, but could also occur with many
264 other effects, when converting one format to another, and even when
265 simply playing the audio.
266
267 Playing an audio file often involves re-sampling, and processing by
268 analogue components that can introduce a small DC offset and/or ampli‐
269 fication, all of which can produce distortion if the audio signal level
270 was initially too close to the clipping point.
271
272 For these reasons, it is usual to make sure that an audio file's signal
273 level does not exceed around 70% of the maximum (linear) range avail‐
274 able, as this will avoid the majority of clipping problems. SoX's stat
275 effect can assist in determining the signal level in an audio file; the
276 gain or vol effect can be used to prevent clipping, e.g.
277 sox dull.au bright.au gain -6 treble +6
278 guarantees that the treble boost will not clip.
279
280 If clipping occurs at any point during processing, then SoX will dis‐
281 play a warning message to that effect.
282
283 Input File Combining
284 SoX's input combiner can be configured (see OPTIONS below) to combine
285 multiple files using any of the following methods: `concatenate',
286 `sequence', `mix', `mix-power', or `merge'. The default method is
287 `sequence' for play, and `concatenate' for rec and sox.
288
289 For all methods other than `sequence', multiple input files must have
290 the same sampling rate; if necessary, separate SoX invocations can be
291 used to make sampling rate adjustments prior to combining.
292
293 If the `concatenate' combining method is selected (usually, this will
294 be by default) then the input files must also have the same number of
295 channels. The audio from each input will be concatenated in the order
296 given to form the output file.
297
298 The `sequence' combining method is selected automatically for play. It
299 is similar to `concatenate' in that the audio from each input file is
300 sent serially to the output file, however here the output file may be
301 closed and reopened at the corresponding transition between input files
302 - this may be just what is needed when sending different types of audio
303 to an output device, but is not generally useful when the output is a
304 normal file.
305
306 If either the `mix' or `mix-power' combining method is selected, then
307 two or more input files must be given and will be mixed together to
308 form the output file. The number of channels in each input file need
309 not be the same, however, SoX will issue a warning if they are not and
310 some channels in the output file will not contain audio from every
311 input file. A mixed audio file cannot be un-mixed (without reference
312 to the orignal input files).
313
314 If the `merge' combining method is selected, then two or more input
315 files must be given and will be merged together to form the output
316 file. The number of channels in each input file need not be the same.
317 A merged audio file comprises all of the channels from all of the input
318 files; un-merging is possible using multiple invocations of SoX with
319 the remix effect. For example, two mono files could be merged to form
320 one stereo file; the first and second mono files would become the left
321 and right channels of the stereo file.
322
323 When combining input files, SoX applies any specified effects (includ‐
324 ing, for example, the vol volume adjustment effect) after the audio has
325 been combined; however, it is often useful to be able to set the volume
326 of (i.e. `balance') the inputs individually, before combining takes
327 place.
328
329 For all combining methods, input file volume adjustments can be made
330 manually using the -v option (below) which can be given for one or more
331 input files; if it is given for only some of the input files then the
332 others receive no volume adjustment. In some circumstances, automatic
333 volume adjustments may be applied (see below).
334
335 The -V option (below) can be used to show the input file volume adjust‐
336 ments that have been selected (either manually or automatically).
337
338 There are some special considerations that need to made when mixing
339 input files:
340
341 Unlike the other methods, `mix' combining has the potential to cause
342 clipping in the combiner if no balancing is performed. So here, if
343 manual volume adjustments are not given, to ensure that clipping does
344 not occur, SoX will automatically adjust the volume (amplitude) of each
345 input signal by a factor of ¹/n, where n is the number of input files.
346 If this results in audio that is too quiet or otherwise unbalanced then
347 the input file volumes can be set manually as described above; using
348 the norm effect on the mix is another alternative.
349
350 If mixed audio seems loud enough at some points through the mixed audio
351 but too quiet in others, then dynamic-range compression should be
352 applied to correct this - see the compand effect.
353
354 With the `mix-power' combine method, the mixed volume is appropriately
355 equal to that of one of the input signals. This is achieved by balanc‐
356 ing using a factor of ¹/√n instead of ¹/n. Note that this balancing
357 factor does not guarantee that no clipping will occur, however, in many
358 cases, the number of clips will be low and the resultant distortion
359 imperceptable.
360
361 Output Files
362 SoX's default behavior is to take one or more input files and write
363 them to a single output file.
364
365 This behavior can be changed by specifying the pseudo-effect 'newfile'
366 within the effects list. SoX will then enter multiple output mode.
367
368 In multiple output mode, a new file is created when the effects prior
369 to the 'newfile' indicate they are done. The effects chain listed
370 after 'newfile' is then started up and its output is saved to the new
371 file.
372
373 In multiple output mode, a unique number will automatically be appended
374 to the end of all filenames. If the filename has an extension then the
375 number is inserted before the extension. This behavior can be custom‐
376 ized by placing a %n anywhere in the filename where the number should
377 be substituted. An optional number can be placed after the % to indi‐
378 cate a minimum fixed width for the number.
379
380 Multiple output mode is not very useful unless an effect that will stop
381 the effects chain early is specified before the 'newfile'. If end of
382 file is reached before the effects chain stops itself then no new file
383 will be created as it would be empty.
384
385 The following is an example of splitting the first 60 seconds of an
386 input file in to two 30 second files and ignoring the rest.
387 sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
388
389 Stopping SoX
390 Usually SoX will complete its processing and exit automatically once it
391 has read all available audio data from the input files.
392
393 If desired, it can be terminated earlier by sending an interrupt signal
394 to the process (usually by pressing the keyboard interrupt key which is
395 usually Ctrl-C). This is a natural requirement in some circumstances,
396 e.g. when using SoX to make a recording. Note that when using SoX to
397 play multiple files, Ctrl-C behaves slightly differently: pressing it
398 once causes SoX to skip to the next file; pressing it twice in quick
399 succession causes SoX to exit.
400
401 Another option to stop processing early is to use an effect that has a
402 time period or sample count to determine the stopping point. The trim
403 effect is an example of this. Once all effects chains have stopped
404 then SoX will also stop.
405
407 Filenames can be simple file names, absolute or relative path names, or
408 URLs (input files only). Note that URL support requires that wget(1)
409 is available.
410
411 Note: Giving SoX an input or output filename that is the same as a SoX
412 effect-name will not work since SoX will treat it as an effect
413 specification. The only work-around to this is to avoid such
414 filenames; however, this is generally not difficult since most audio
415 filenames have a filename `extension', whilst effect-names do not.
416
417 Special Filenames
418 The following special filenames may be used in certain circumstances in
419 place of a normal filename on the command line:
420
421 - SoX can be used in simple pipeline operations by using the
422 special filename `-' which, if used in place of an input
423 filename, will cause SoX will read audio data from `standard
424 input' (stdin), and which, if used in place of the output
425 filename, will cause SoX will send audio data to `standard
426 output' (stdout). Note that when using this option, the file-
427 type (see -t below) must also be given.
428
429 "|program [options] ..."
430 This can be used in place of an input filename to specify the
431 the given program's standard output (stdout) be used as an input
432 file. Unlike - (above), this can be used for several inputs to
433 one SoX command. For example, if `genw' generates mono WAV
434 formatted signals to its standard output, then the following
435 command makes a stereo file from two generated signals:
436 sox -M -t wav "|genw --imd -" -t wav "|genw --thd -" out.wav
437 If -t is not given then the signal is assumed (and checked) to
438 be in SoX's native .sox format (see -p below and soxformat(7)).
439
440 -p, --sox-pipe
441 This can be used in place of an output filename to specify that
442 the SoX command should be used as in input pipe to another SoX
443 command. For example, the command:
444 play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat
445 plays two `files' in succession, each with different effects.
446
447 -p is in fact an alias for `-t sox -'.
448
449 -d, --default-device
450 This can be used in place of an input or output filename to
451 specify that the default audio device (if one has been built
452 into SoX) is to be used. This is akin to invoking rec or play
453 (as described above).
454
455 -n, --null
456 This can be used in place of an input or output filename to
457 specify that a `null file' is to be used. Note that here, `null
458 file' refers to a SoX-specific mechanism and is not related to
459 any operating-system mechanism with a similar name.
460
461 Using a null file to input audio is equivalent to using a normal
462 audio file that contains an infinite amount of silence, and as
463 such is not generally useful unless used with an effect that
464 specifies a finite time length (such as trim or synth).
465
466 Using a null file to output audio amounts to discarding the
467 audio and is useful mainly with effects that produce information
468 about the audio instead of affecting it (such as noiseprof or
469 stat).
470
471 The sampling rate associated with a null file is by default
472 48 kHz, but, as with a normal file, this can be overridden if
473 desired using command-line format options (see below).
474
475 Supported File & Audio Device Types
476 See soxformat(7) for a list and description of the supported file for‐
477 mats and audio device drivers.
478
480 Global Options
481 These options can be specified on the command line at any point before
482 the first effect name.
483
484 -h, --help
485 Show version number and usage information.
486
487 --help-effect=NAME
488 Show usage information on the specified effect. The name all
489 can be used to show usage on all effects.
490
491 --help-format=NAME
492 Show information about the specified file format. The name all
493 can be used to show information on all formats.
494
495 --buffer BYTES, --input-buffer BYTES
496 Set the size in bytes of the buffers used for processing audio
497 (default 8192). --buffer applies to input, effects, and output
498 processing; --input-buffer applies only to input processing (for
499 which it overrides --buffer if both are given).
500
501 Be aware that large values for --buffer will cause SoX to be
502 become slow to respond to requests to terminate or to skip the
503 current input file.
504
505 ---effects-file=FILENAME
506 Use FILENAME to obtain all effects and their arguments. The
507 file is parsed as if the values were specified on the command
508 line. A new line can be used in place of the special ":" marker
509 to separate effect chains. This option causes any effects spec‐
510 ified on the command line to be discarded.
511
512 --interactive
513 Prompt before overwriting an existing file with the same name as
514 that given for the output file.
515
516 N.B. Unintentionally overwriting a file is easier than you
517 might think, for example, if you accidentally enter
518 sox file1 file2 effect1 effect2 ...
519 when what you really meant was
520 play file1 file2 effect1 effect2 ...
521 then, without this option, file2 will be overwritten. Hence,
522 using this option is strongly recommended; a `shell' alias,
523 script, or batch file may be an appropriate way of permanently
524 enabling it.
525
526 -m|-M|--combine concatenate|merge|mix|mix-power|sequence
527 Select the input file combining method; -m selects `mix', -M
528 selects `merge'.
529
530 See Input File Combining above for a description of the differ‐
531 ent combining methods.
532
533 --plot gnuplot|octave|off
534 If not set to off (the default if --plot is not given), run in a
535 mode that can be used, in conjunction with the gnuplot program
536 or the GNU Octave program, to assist with the selection and con‐
537 figuration of many of the transfer-function based effects. For
538 the first given effect that supports the selected plotting pro‐
539 gram, SoX will output commands to plot the effect's transfer
540 function, and then exit without actually processing any audio.
541 E.g.
542 sox --plot octave input-file -n highpass 1320 > plot.m
543 octave plot.m
544
545 -q, --no-show-progress
546 Run in quiet mode when SoX wouldn't otherwise do so; this is the
547 opposite of the -S option.
548
549 --replay-gain track|album|off
550 Select whether or not to apply replay-gain adjustment to input
551 files. The default is off for sox and rec, album for play where
552 (at least) the first two input files are tagged with the same
553 Artist and Album names, and track for play otherwise.
554
555 -S, --show-progress
556 Display input file format/header information, and processing
557 progress as input file(s) percentage complete, elapsed time, and
558 remaining time (if known; shown in brackets), and the number of
559 samples written to the output file. Also shown is a VU meter,
560 and an indication if clipping has occurred. The VU meter shows
561 up to two channels and is calibrated for digital audio as fol‐
562 lows:
563
564 ┌────────────────────────────────────────┐
565 │dB FSD Display │
566 │ >= (right channel) │
567 │ -25 - │
568 │ -23 = │
569 │ -21 =- │
570 │ -19 == │
571 │ -17 ==- │
572 │ -15 === │
573 │ -13 ===- │
574 │ -11 ==== │
575 │ -9 ====- │
576 │ -7 ===== │
577 │ -5 =====- │
578 │ -3 ====== │
579 │ -1 =====! `In the red' │
580 └────────────────────────────────────────┘
581 A three-second peak-held value of headroom in dBs will be shown
582 to the right of the meter if this is below 6dB.
583
584 This option is enabled by default when using SoX to play or
585 record audio.
586
587 --version
588 Show SoX's version number and exit.
589
590 -V[level]
591 Set verbosity. SoX displays messages on the console (stderr)
592 according to the following verbosity levels:
593
594 0 No messages are shown at all; use the exit status to
595 determine if an error has occurred.
596
597 1 Only error messages are shown. These are generated if
598 SoX cannot complete the requested commands.
599
600 2 Warning messages are also shown. These are generated if
601 SoX can complete the requested commands, but not exactly
602 according to the requested command parameters, or if
603 clipping occurs.
604
605 3 Descriptions of SoX's processing phases are also shown.
606 Useful for seeing exactly how SoX is processing your
607 audio.
608
609 4 and above
610 Messages to help with debugging SoX are also shown.
611
612 By default, the verbosity level is set to 2; each occurrence of
613 the -V option increases the verbosity level by 1. Alterna‐
614 tively, the verbosity level can be set to an absolute number by
615 specifying it immediately after the -V; e.g. -V0 sets it to 0.
616
617 Input File Options
618 These options apply only to input files and may precede only input
619 filenames on the command line.
620
621 -v, --volume FACTOR
622 Adjust volume by a factor of FACTOR. This is a linear (ampli‐
623 tude) adjustment, so a number less than 1 decreases the volume;
624 greater than 1 increases it. If a negative number is given,
625 then in addition to the volume adjustment, the audio signal will
626 be inverted.
627
628 See also the stat effect for information on how to find the max‐
629 imum volume of an audio file; this can be used to help select
630 suitable values for this option.
631
632 See also Input File Balancing above.
633
634 Input & Output File Format Options
635 These options apply to the input or output file whose name they immedi‐
636 ately precede on the command line and are used mainly when working with
637 headerless file formats or when specifying a format for the output file
638 that is different to that of the input file.
639
640 -b BITS, --bits BITS
641 The number of bits in each encoded sample. Not applicable to
642 complex encodings, e.g. MP3, GSM. Not necessary with encodings
643 that have a fixed number of bits, e.g. A/μ-law, ADPCM.
644
645 -1/-2/-3/-4/-8
646 The number of bytes in each encoded sample. Aliases for -b 8/-b
647 16/-b 24/-b 32/-b 64 respectively.
648
649 -c CHANNELS, --channels CHANNELS
650 The number of audio channels in the audio file; this can be any
651 number greater than zero. To cause the output file to have a
652 different number of channels than the input file, include this
653 option with the output file options. If the input and output
654 file have a different number of channels then the mixer effect
655 must be used. If the mixer effect is not specified on the com‐
656 mand line it will be invoked internally with default parameters.
657
658 Alternatively, some effects (e.g. synth, remix) determine what
659 will be the number of output channels; in this case, neither
660 this option nor the mixer effect is necessary.
661
662 -e ENCODING, --encoding ENCODING
663 The audio encoding type.
664
665 signed-integer
666 PCM data stored as signed (`two's complement') integers.
667 Commonly used with a 16 or 24 -bit encoding size. A
668 value of 0 represents minimum signal power.
669
670 unsigned-integer
671 PCM data stored as signed (`two's complement') integers.
672 Commonly used with an 8-bit encoding size. A value of 0
673 represents maximum signal power.
674
675 floating-point
676 PCM data stored as IEEE 753 single precision (32-bit) or
677 double precision (64-bit) floating-point ('real') num‐
678 bers. A value of 0 represents minimum signal power.
679
680 a-law International telephony standard for logarithmic encoding
681 to 8 bits per sample. It has a precision equivalent to
682 roughly 13-bit PCM and is sometimes encoded with reversed
683 bit-ordering (see the -X option).
684
685 u-law, mu-law
686 North American telephony standard for logarithmic encod‐
687 ing to 8 bits per sample. A.k.a μ-law. It has a preci‐
688 sion equivalent to roughly 14-bit PCM and is sometimes
689 encoded with reversed bit-ordering (see the -X option).
690
691 oki-adpcm
692 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM; it has
693 a precision equivalent to roughly 12-bit PCM. ADPCM is a
694 form of audio compression that has a good compromise
695 between audio quality and encoding/decoding speed.
696
697 ima-adpcm
698 IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision equiva‐
699 lent to roughly 13-bit PCM.
700
701 ms-adpcm
702 Microsoft 4-bit ADPCM; it has a precision equivalent to
703 roughly 14-bit PCM.
704
705 gsm-full-rate
706 GSM is currently used for the vast majority of the
707 world's digital wireless telephone calls. It utilises
708 several audio formats with different bit-rates and asso‐
709 ciated speech quality. SoX has support for GSM's origi‐
710 nal 13kbps `Full Rate' audio format. It is usually CPU
711 intensive to work with GSM audio.
712
713 Encoding names can be abbreviated where this would not be
714 ambiguous; e.g. 'unsigned-integer' can be given as 'un', but not
715 'u' (ambiguous with 'u-law'). For reasons of forward compati‐
716 bility, using abbreviations in scripts is not recommended.
717
718 Note that explicitly specifying other encoding types (e.g. MP3,
719 FLAC) is not necessary since they can be inferred from the file
720 type or header.
721
722 -s/-u/-f/-A/-U/-o/-i/-a/-g
723 Aliases for specifying the encoding types signed-inte‐
724 ger/unsigned-integer/floating-point/mu-law/a-law/oki-adpcm/ima-
725 adpcm/ms-adpcm/gsm-full-rate respectively.
726
727 -r, --rate RATE[k]
728 Gives the sample rate in Hz (or kHz if appended with `k') of the
729 file. To cause the output file to have a different sample rate
730 than the input file, include this option with the output file
731 format options.
732
733 If the input and output files have different rates then a sample
734 rate change effect must be run. Since SoX has multiple rate
735 changing effects, the user can specify which to use as an
736 effect. If no rate change effect is specified then the rate
737 effect will be chosen by default.
738
739 -t, --type file-type
740 Gives the type of the audio file. This is useful when the file
741 extension is non-standard or when the type can not be determined
742 by looking at the header of the file.
743
744 The -t option can also be used to override the type implied by
745 an input filename extension, but if overriding with a type that
746 has a header, SoX will exit with an appropriate error message if
747 such a header is not actually present.
748
749 See soxformat(7) for a list of supported file types.
750
751 -L, --endian little
752 -B, --endian big
753 -x, --endian swap
754 These options specify whether the byte-order of the audio data
755 is, respectively, `little endian', `big endian', or the opposite
756 to that of the system on which SoX is being used. Endianness
757 applies only to data encoded as signed or unsigned integers of
758 16 or more bits. It is often necessary to specify one of these
759 options for headerless files, and sometimes necessary for (oth‐
760 erwise) self-describing files. A given endian-setting option
761 may be ignored for an input file whose header contains a spe‐
762 cific endianness identifier, or for an output file that is actu‐
763 ally an audio device.
764
765 N.B. Unlike normal format characteristics, the endianness
766 (byte, nibble, & bit ordering) of the input file is not automat‐
767 ically used for the output file; so, for example, when the fol‐
768 lowing is run on a little-endian system:
769 sox -B audio.s2 trimmed.s2 trim 2
770 trimmed.s2 will be created as little-endian;
771 sox -B audio.s2 -B trimmed.s2 trim 2
772 must be used to preserve big-endianness in the output file.
773
774 The -V option can be used to check the selected orderings.
775
776 -N, --reverse-nibbles
777 Specifies that the nibble ordering (i.e. the 2 halves of a byte)
778 of the samples should be reversed; sometimes useful with ADPCM-
779 based formats.
780
781 N.B. See also N.B. in section on -x above.
782
783 -X, --reverse-bits
784 Specifies that the bit ordering of the samples should be
785 reversed; sometimes useful with a few (mostly headerless) for‐
786 mats.
787
788 N.B. See also N.B. in section on -x above.
789
790 Output File Format Options
791 These options apply only to the output file and may precede only the
792 output filename on the command line.
793
794 --add-comment TEXT
795 Append a comment in the output file header (where applicable).
796
797 --comment TEXT
798 Specify the comment text to store in the output file header
799 (where applicable).
800
801 SoX will provide a default comment if this option (or --com‐
802 ment-file) is not given; to specify that no comment should be
803 stored in the output file, use --comment "" .
804
805 --comment-file FILENAME
806 Specify a file containing the comment text to store in the out‐
807 put file header (where applicable).
808
809 -C, --compression FACTOR
810 The compression factor for variably compressing output file for‐
811 mats. If this option is not given, then a default compression
812 factor will apply. The compression factor is interpreted dif‐
813 ferently for different compressing file formats. See the
814 description of the file formats that use this option in soxfor‐
815 mat(7) for more information.
816
818 In addition to converting and playing audio files, SoX can be used to
819 invoke a number of audio `effects'. Multiple effects may be applied by
820 specifying them one after another at the end of the SoX command line;
821 forming an effects chain. Note that applying multiple effects in real-
822 time (i.e. when playing audio) is likely to need a high performance
823 computer; stopping other applications may alleviate performance issues
824 should they occur.
825
826 Some of the SoX effects are primarily intended to be applied to a sin‐
827 gle instrument or `voice'. To facilitate this, the remix effect and
828 the global SoX option -M can be used to isolate then recombine tracks
829 from a multi-track recording.
830
831 Multiple Effect Chains
832 A single effects chain is made up of one or more effects. Audio from
833 the input in ran through the chain until either the input file reaches
834 end of file or an effects in the chain requests to terminate the chain.
835
836 SoX supports running multiple effects chain over the input audio. In
837 this case, when one chain indicates it is done processing audio the
838 audio data is then sent through the next effects chain. This continues
839 until either no more effects chains exist or the input has reach end of
840 file.
841
842 A effects chain is terminated by placing a : (colon) after an effect.
843 Any following effects are apart of a new effects chain.
844
845 It is important to place the effect that will stop the chain as the
846 first effect in the chain. This is because any samples that are
847 buffered by effects to the left of the terminating effect will be dis‐
848 carded. The amount of samples discarded is related to the --buffer
849 option and it should be keep small, relative to the sample rate, if the
850 terminating effect can not be first. Further information on stopping
851 effects can be found in the Stopping SoX section.
852
853 There are a few pseudo-effects that aid using multiple effects chains.
854 These include newfile which will start writing to a new output file
855 before moving to the next effects chain and restart which will move
856 back to the first effects chain. Pseudo-effects must be specified as
857 the first effect in a chain and as the only effect in a chain (they
858 must have a : before and after they are specified).
859
860 The following is an example of multiple effects chains. It will split
861 the input file into multiple files of 30 seconds in length. Each out‐
862 put filename will have unique number in its name as documented in Out‐
863 put Files section.
864 sox infile.wav output.wav trim 0 30 : newfile : restart
865
866 Common Notation And Parameters
867 In the descriptions that follow, brackets [ ] are used to denote
868 parameters that are optional, braces { } to denote those that are both
869 optional and repeatable, and angle brackets < > to denote those that
870 are repeatable but not optional. Where applicable, default values for
871 optional parameters are shown in parenthesis ( ).
872
873 The following parameters are used with, and have the same meaning for,
874 several effects:
875
876 centre[k]
877 See frequency.
878
879 frequency[k]
880 A frequency in Hz, or, if appended with `k', kHz.
881
882 gain A power gain in dB. Zero gives no gain; less than zero gives an
883 attenuation.
884
885 width[h|k|o|q]
886 Used to specify the band-width of a filter. A number of
887 different methods to specify the width are available (though not
888 all for every effect); one of the characters shown may be
889 appended to select the desired method as follows:
890
891 ┌───────────────────────┐
892 │ Method Notes │
893 │h Hz │
894 │k kHz │
895 │o Octaves │
896 │q Q-factor See [2] │
897 └───────────────────────┘
898 For each effect that uses this parameter, the default method
899 (i.e. if no character is appended) is the one that it listed
900 first in the effect's first line of description.
901
902 To see if SoX has support for an optional effect, enter sox -h and look
903 for its name under the list: `EFFECTS'.
904
905 Supported Effects
906 allpass frequency[k] width[h|k|o|q]
907 Apply a two-pole all-pass filter with central frequency (in Hz)
908 frequency, and filter-width width. An all-pass filter changes
909 the audio's frequency to phase relationship without changing its
910 frequency to amplitude relationship. The filter is described in
911 detail in [1].
912
913 This effect supports the --plot global option.
914
915 band [-n] center[k] [width[h|k|o|q]]
916 Apply a band-pass filter. The frequency response drops
917 logarithmically around the center frequency. The width
918 parameter gives the slope of the drop. The frequencies at
919 center + width and center - width will be half of their original
920 amplitudes. band defaults to a mode oriented to pitched audio,
921 i.e. voice, singing, or instrumental music. The -n (for noise)
922 option uses the alternate mode for un-pitched audio (e.g.
923 percussion). Warning: -n introduces a power-gain of about 11dB
924 in the filter, so beware of output clipping. band introduces
925 noise in the shape of the filter, i.e. peaking at the center
926 frequency and settling around it.
927
928 This effect supports the --plot global option.
929
930 See also filter for a bandpass filter with steeper shoulders.
931
932 bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
933 Apply a two-pole Butterworth band-pass or band-reject filter
934 with central frequency frequency, and (3dB-point) band-width
935 width. The -c option applies only to bandpass and selects a
936 constant skirt gain (peak gain = Q) instead of the default:
937 constant 0dB peak gain. The filters roll off at 6dB per octave
938 (20dB per decade) and are described in detail in [1].
939
940 These effects support the --plot global option.
941
942 See also filter for a bandpass filter with steeper shoulders.
943
944 bandreject frequency[k] width[h|k|o|q]
945 Apply a band-reject filter. See the description of the bandpass
946 effect for details.
947
948 bass|treble gain [frequency[k] [width[s|h|k|o|q]]]
949 Boost or cut the bass (lower) or treble (upper) frequencies of
950 the audio using a two-pole shelving filter with a response
951 similar to that of a standard hi-fi's tone-controls. This is
952 also known as shelving equalisation (EQ).
953
954 gain gives the gain at 0 Hz (for bass), or whichever is the
955 lower of ∼22 kHz and the Nyquist frequency (for treble). Its
956 useful range is about -20 (for a large cut) to +20 (for a large
957 boost). Beware of Clipping when using a positive gain.
958
959 If desired, the filter can be fine-tuned using the following
960 optional parameters:
961
962 frequency sets the filter's central frequency and so can be used
963 to extend or reduce the frequency range to be boosted or cut.
964 The default value is 100 Hz (for bass) or 3 kHz (for treble).
965
966 width determines how steep is the filter's shelf transition. In
967 addition to the common width specification methods described
968 above, `slope' (the default, or if appended with `s') may be
969 used. The useful range of `slope' is about 0.3, for a gentle
970 slope, to 1 (the maximum), for a steep slope; the default value
971 is 0.5.
972
973 The filters are described in detail in [1].
974
975 These effects support the --plot global option.
976
977 See also equalizer for a peaking equalisation effect.
978
979 bend [-f [22mframe-rate(25)] [-o [22mover-sample(16)] { delay,cents,duration }
980 Changes pitch by specified amounts at specified times. Each
981 given triple: delay,cents,duration specifies one bend. delay is
982 the amount of time after the start of the audio stream, or the
983 end of the previous bend, at which to start bending the pitch;
984 cents is the number of cents (100 cents = 1 semitone) by which
985 to bend the pitch, and duration the length of time over which
986 the pitch will be bent.
987
988 The pitch-bending algorithm utilises the Discrete Fourier
989 Transform (DFT) at a particular frame rate and over-sampling
990 rate. The -f and -o parameters may be used to adjust these
991 parameters and thus control the smoothness of the changes in
992 pitch.
993
994 For example, an initial tone is generated, then bent three
995 times, yeilding four different notes in total:
996 play -n synth 2.5 sin 667 gain 1 \
997 bend .35,180,.25 .15,740,.53 0,-520,.3
998 Note that the clipping that is produced in this example is
999 deliberate; to remove it, use gain -5 in place of gain 1.
1000
1001 chorus gain-in gain-out <delay decay speed depth -s|-t>
1002 Add a chorus effect to the audio. This can make a single vocal
1003 sound like a chorus, but can also be applied to instrumentation.
1004
1005 Chorus resembles an echo effect with a short delay, but whereas
1006 with echo the delay is constant, with chorus, it is varied using
1007 sinusoidal or triangular modulation. The modulation depth
1008 defines the range the modulated delay is played before or after
1009 the delay. Hence the delayed sound will sound slower or faster,
1010 that is the delayed sound tuned around the original one, like in
1011 a chorus where some vocals are slightly off key. See [3] for
1012 more discussion of the chorus effect.
1013
1014 Each four-tuple parameter delay/decay/speed/depth gives the
1015 delay in milliseconds and the decay (relative to gain-in) with a
1016 modulation speed in Hz using depth in milliseconds. The modula‐
1017 tion is either sinusoidal (-s) or triangular (-t). Gain-out is
1018 the volume of the output.
1019
1020 A typical delay is around 40ms to 60ms; the modulation speed is
1021 best near 0.25Hz and the modulation depth around 2ms. For exam‐
1022 ple, a single delay:
1023 play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 -t
1024 Two delays of the original samples:
1025 play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 -t \
1026 60 0.32 0.4 1.3 -s
1027 A fuller sounding chorus (with three additional delays):
1028 play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
1029 60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s
1030
1031 compand attack1,decay1{,attack2,decay2}
1032 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1033 [gain [initial-volume-dB [delay]]]
1034
1035 Compand (compress or expand) the dynamic range of the audio.
1036
1037 The attack and decay parameters (in seconds) determine the time
1038 over which the instantaneous level of the input signal is aver‐
1039 aged to determine its volume; attacks refer to increases in vol‐
1040 ume and decays refer to decreases. For most situations, the
1041 attack time (response to the music getting louder) should be
1042 shorter than the decay time because the human ear is more sensi‐
1043 tive to sudden loud music than sudden soft music. Where more
1044 than one pair of attack/decay parameters are specified, each
1045 input channel is companded separately and the number of pairs
1046 must agree with the number of input channels. Typical values
1047 are 0.3,0.8 seconds.
1048
1049 The second parameter is a list of points on the compander's
1050 transfer function specified in dB relative to the maximum possi‐
1051 ble signal amplitude. The input values must be in a strictly
1052 increasing order but the transfer function does not have to be
1053 monotonically rising. If omitted, the value of out-dB1 defaults
1054 to the same value as in-dB1; levels below in-dB1 are not com‐
1055 panded (but may have gain applied to them). The point 0,0 is
1056 assumed but may be overridden (by 0,out-dBn). If the list is
1057 preceded by a soft-knee-dB value, then the points at where adja‐
1058 cent line segments on the transfer function meet will be rounded
1059 by the amount given. Typical values for the transfer function
1060 are 6:-70,-60,-20.
1061
1062 The third (optional) parameter is an additional gain in dB to be
1063 applied at all points on the transfer function and allows easy
1064 adjustment of the overall gain.
1065
1066 The fourth (optional) parameter is an initial level to be
1067 assumed for each channel when companding starts. This permits
1068 the user to supply a nominal level initially, so that, for exam‐
1069 ple, a very large gain is not applied to initial signal levels
1070 before the companding action has begun to operate: it is quite
1071 probable that in such an event, the output would be severely
1072 clipped while the compander gain properly adjusts itself. A
1073 typical value (for audio which is initially quiet) is -90 dB.
1074
1075 The fifth (optional) parameter is a delay in seconds. The input
1076 signal is analysed immediately to control the compander, but it
1077 is delayed before being fed to the volume adjuster. Specifying
1078 a delay approximately equal to the attack/decay times allows the
1079 compander to effectively operate in a `predictive' rather than a
1080 reactive mode. A typical value is 0.2 seconds.
1081
1082 * * *
1083
1084 The following example might be used to make a piece of music
1085 with both quiet and loud passages suitable for listening to in a
1086 noisy environment such as a moving vehicle:
1087 sox asz.au asz-car.au compand 0.3,1 6:-70,-60,-20 -5 -90 0.2
1088 The transfer function (`6:-70,...') says that very soft sounds
1089 (below -70dB) will remain unchanged. This will stop the compan‐
1090 der from boosting the volume on `silent' passages such as
1091 between movements. However, sounds in the range -60dB to 0dB
1092 (maximum volume) will be boosted so that the 60dB dynamic range
1093 of the original music will be compressed 3-to-1 into a 20dB
1094 range, which is wide enough to enjoy the music but narrow enough
1095 to get around the road noise. The `6:' selects 6dB soft-knee
1096 companding. The -5 (dB) output gain is needed to avoid clipping
1097 (the number is inexact, and was derived by experimentation).
1098 The -90 (dB) for the initial volume will work fine for a clip
1099 that starts with near silence, and the delay of 0.2 (seconds)
1100 has the effect of causing the compander to react a bit more
1101 quickly to sudden volume changes.
1102
1103 This effect supports the --plot global option (for the transfer
1104 function).
1105
1106 See also mcompand for a multiple-band companding effect.
1107
1108 contrast [enhancement-amount [4m(75)]
1109 Comparable with compression, this effect modifies an audio sig‐
1110 nal to make it sound louder. enhancement-amount controls the
1111 amount of the enhancement and is a number in the range 0-100.
1112 Note that enhancement-amount = 0 still gives a significant con‐
1113 trast enhancement. contrast is often used in conjunction with
1114 the norm effect as follows:
1115 sox infile outfile norm -i contrast
1116
1117 dcshift shift [limitergain]
1118 DC Shift the audio, with basic linear amplitude formula. This
1119 is most useful if your audio tends to not be centered around a
1120 value of 0. Shifting it back will allow you to get the most
1121 volume adjustments without clipping.
1122
1123 The first option is the dcshift value. It is a floating point
1124 number that indicates the amount to shift.
1125
1126 An optional limitergain can be specified as well. It should
1127 have a value much less than 1 (e.g. 0.05 or 0.02) and is used
1128 only on peaks to prevent clipping.
1129
1130 An alternative approach to removing a DC offset (albeit with a
1131 short delay) is to use the highpass filter effect at a frequency
1132 of say 10Hz, as illustrated in the following example:
1133 sox -n out.au synth 5 sin %0 50 highpass 10
1134
1135 deemph Apply ISO 908 de-emphasis (a treble attenuation shelving filter)
1136 to 44.1kHz (Compact Disc) audio.
1137
1138 Pre-emphasis was applied in the mastering of some CDs issued in
1139 the early 1980s. These included many classical music albums, as
1140 well as now sought-after issues of albums by The Beatles, Pink
1141 Floyd and others. Pre-emphasis should be removed at playback
1142 time by a de-emphasis filter in the playback device. However,
1143 not all modern CD players have this filter, and very few PC CD
1144 drives have it; playing pre-emphasised audio without the correct
1145 de-emphasis filter results in audio that sounds harsh and is far
1146 from what its creators intended.
1147
1148 With the deemph effect, it is possible to apply the necessary
1149 de-emphasis to audio that has been extracted from a pre-empha‐
1150 sised CD, and then either burn the de-emphasised audio to a new
1151 CD (which will then play correctly on any CD player), or simply
1152 play the correctly de-emphasised audio files on the PC. For
1153 example:
1154 sox track1.wav track1-deemph.wav deemph
1155 and then burn track1-deemph.wav to CD, or
1156 play track1-deemph.wav
1157 or simply
1158 play track1.wav deemph
1159 The de-emphasis filter is implemented as a biquad; its maximum
1160 deviation from the ideal response is only 0.06dB (up to 20kHz).
1161
1162 This effect supports the --plot global option.
1163
1164 See also the bass and treble shelving equalisation effects.
1165
1166 delay {length}
1167 Delay one or more audio channels. length can specify a time or,
1168 if appended with an `s', a number of samples. Do not specify
1169 both time and samples delays in the same command. For example,
1170 delay 1.5 0 0.5 delays the first channel by 1.5 seconds, the
1171 third channel by 0.5 seconds, and leaves the second channel (and
1172 any other channels that may be present) un-delayed. The follow‐
1173 ing (one long) command plays a chime sound:
1174 play -n synth sin %-21.5 sin %-14.5 sin %-9.5 sin %-5.5 \
1175 sin %-2.5 sin %2.5 gain -5.4 fade h 0.008 2 1.5 \
1176 delay 0 .27 .54 .76 1.01 1.3 remix - fade h 0.1 2.72 2.5
1177
1178 dither [-r|-t] [-s|-f filter] [depth]
1179 Apply dithering to the audio. Dithering deliberately adds a
1180 small amount of noise to the signal in order to mask audible
1181 quantization effects that can occur if the output sample size is
1182 less than 24 bits. The default (or with the -t option) is Tri‐
1183 angular (TPDF) white noise. The -r option can be used to select
1184 Rectangular Probability Density Function (RPDF) white noise.
1185 Noise-shaping (only for certain sample rates) can be selected
1186 with -s. With the -f option, it is possible to select a partic‐
1187 ular noise-shaping filter from the following list: lipshitz, f-
1188 weighted, modified-e-weighted, improved-e-weighted, gesemann,
1189 shibata, low-shibata, high-shibata. Note that most filter types
1190 are available only with 44100Hz sample rate. The filter types
1191 are distiguished by the following properties: audibility of
1192 noise, level of (inaudible, but in some circumstances, otherwise
1193 problematic) shaped high frequency noise, and processing speed.
1194
1195 By default, the amount of noise added is ±½ bit for RPDF, ±1 bit
1196 for TPDF; the optional depth parameter (0.5 to 1) is a (linear
1197 or voltage) multiplier of this amount. Reducing this value
1198 reduces the audibility of the added white noise, but correspond‐
1199 ingly creates residual quantization noise, so it should not nor‐
1200 mally be changed.
1201
1202 This effect should not be followed by any other effect that
1203 affects the audio.
1204
1205 earwax Makes audio easier to listen to on headphones. Adds `cues' to
1206 44.1kHz stereo (i.e. audio CD format) audio so that when lis‐
1207 tened to on headphones the stereo image is moved from inside
1208 your head (standard for headphones) to outside and in front of
1209 the listener (standard for speakers). See http://www.geoci‐
1210 ties.com/beinges for a full explanation.
1211
1212 echo gain-in gain-out <delay decay>
1213 Add echoing to the audio. Echoes are reflected sound and can
1214 occur naturally amongst mountains (and sometimes large build‐
1215 ings) when talking or shouting; digital echo effects emulate
1216 this behaviour and are often used to help fill out the sound of
1217 a single instrument or vocal. The time difference between the
1218 original signal and the reflection is the `delay' (time), and
1219 the loudness of the relected signal is the `decay'. Multiple
1220 echoes can have different delays and decays.
1221
1222 Each given delay decay pair gives the delay in milliseconds and
1223 the decay (relative to gain-in) of that echo. Gain-out is the
1224 volume of the output. For example: This will make it sound as
1225 if there are twice as many instruments as are actually playing:
1226 play lead.aiff echo 0.8 0.88 60 0.4
1227 If the delay is very short, then it sound like a (metallic) ro‐
1228 bot playing music:
1229 play lead.aiff echo 0.8 0.88 6 0.4
1230 A longer delay will sound like an open air concert in the moun‐
1231 tains:
1232 play lead.aiff echo 0.8 0.9 1000 0.3
1233 One mountain more, and:
1234 play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
1235
1236 echos gain-in gain-out <delay decay>
1237 Add a sequence of echoes to the audio. Each delay decay pair
1238 gives the delay in milliseconds and the decay (relative to gain-
1239 in) of that echo. Gain-out is the volume of the output.
1240
1241 Like the echo effect, echos stand for `ECHO in Sequel', that is
1242 the first echos takes the input, the second the input and the
1243 first echos, the third the input and the first and the second
1244 echos, ... and so on. Care should be taken using many echos; a
1245 single echos has the same effect as a single echo.
1246
1247 The sample will be bounced twice in symmetric echos:
1248 play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
1249 The sample will be bounced twice in asymmetric echos:
1250 play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
1251 The sample will sound as if played in a garage:
1252 play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
1253
1254 equalizer frequency[k] width[q|o|h|k] gain
1255 Apply a two-pole peaking equalisation (EQ) filter. With this
1256 filter, the signal-level at and around a selected frequency can
1257 be increased or decreased, whilst (unlike band-pass and band-
1258 reject filters) that at all other frequencies is unchanged.
1259
1260 frequency gives the filter's central frequency in Hz, width, the
1261 band-width, and gain the required gain or attenuation in dB.
1262 Beware of Clipping when using a positive gain.
1263
1264 In order to produce complex equalisation curves, this effect can
1265 be given several times, each with a different central frequency.
1266
1267 The filter is described in detail in [1].
1268
1269 This effect supports the --plot global option.
1270
1271 See also bass and treble for shelving equalisation effects.
1272
1273 fade [type] fade-in-length [stop-time [fade-out-length]]
1274 Add a fade effect to the beginning, end, or both of the audio.
1275
1276 For fade-ins, this starts from the first sample and ramps the
1277 volume of the audio from 0 to full volume over fade-in-length
1278 seconds. Specify 0 seconds if no fade-in is wanted.
1279
1280 For fade-outs, the audio will be truncated at stop-time and the
1281 volume will be ramped from full volume down to 0 starting at
1282 fade-out-length seconds before the stop-time. If fade-out-
1283 length is not specified, it defaults to the same value as fade-
1284 in-length. No fade-out is performed if stop-time is not speci‐
1285 fied. If the file length can be determined from the input file
1286 header and length-changing effects are not in effect, then 0 may
1287 be specified for stop-time to indicate the usual case of a fade-
1288 out that ends at the end of the input audio stream.
1289
1290 All times can be specified in either periods of time or sample
1291 counts. To specify time periods use the format hh:mm:ss.frac
1292 format. To specify using sample counts, specify the number of
1293 samples and append the letter `s' to the sample count (for exam‐
1294 ple `8000s').
1295
1296 An optional type can be specified to change the type of enve‐
1297 lope. Choices are q for quarter of a sine wave, h for half a
1298 sine wave, t for linear slope, l for logarithmic, and p for
1299 inverted parabola. The default is logarithmic.
1300
1301 filter [low]-[high] [window-len [beta]]
1302 Apply a sinc-windowed lowpass, highpass, or bandpass filter of
1303 given window length to the signal. low refers to the frequency
1304 of the lower 6dB corner of the filter. high refers to the fre‐
1305 quency of the upper 6dB corner of the filter.
1306
1307 A low-pass filter is obtained by leaving low unspecified, or 0.
1308 A high-pass filter is obtained by leaving high unspecified, or
1309 0, or greater than or equal to the Nyquist frequency.
1310
1311 The window-len, if unspecified, defaults to 128. Longer windows
1312 give a sharper cut-off, smaller windows a more gradual cut-off.
1313
1314 The beta parameter determines the type of filter window used.
1315 Any value greater than 2 is the beta for a Kaiser window. Beta
1316 ≤ 2 selects a Blackman-Nuttall window. If unspecified, the
1317 default is a Kaiser window with beta 16.
1318
1319 In the case of Kaiser window (beta > 2), lower betas produce a
1320 somewhat faster transition from pass-band to stop-band, at the
1321 cost of noticeable artifacts. A beta of 16 is the default, beta
1322 less than 10 is not recommended. If you want a sharper cut-off,
1323 don't use low beta's, use a longer sample window. A Blackman-
1324 Nuttall window is selected by specifying any `beta' ≤ 2, and the
1325 Blackman-Nuttall window has somewhat steeper cut-off than the
1326 default Kaiser window. You will probably not need to use the
1327 beta parameter at all, unless you are just curious about compar‐
1328 ing the effects of Blackman-Nuttall vs. Kaiser windows.
1329
1330 This effect supports the --plot global option.
1331
1332 flanger [delay depth regen width speed shape phase interp]
1333 Apply a flanging effect to the audio. See [3] for a detailed
1334 description of flanging.
1335
1336 All parameters are optional (right to left).
1337
1338 ┌─────────────────────────────────────────────────────────────────┐
1339 │ Range Default Description │
1340 │delay 0 - 10 0 Base delay in milliseconds. │
1341 │depth 0 - 10 2 Added swept delay in milliseconds. │
1342 │regen -95 - 95 0 Percentage regeneration (delayed │
1343 │ signal feedback). │
1344 │width 0 - 100 71 Percentage of delayed signal mixed │
1345 │ with original. │
1346 │speed 0.1 - 10 0.5 Sweeps per second (Hz). │
1347 │shape sin Swept wave shape: sine|triangle. │
1348 │phase 0 - 100 25 Swept wave percentage phase-shift │
1349 │ for multi-channel (e.g. stereo) │
1350 │ flange; 0 = 100 = same phase on │
1351 │ each channel. │
1352 │interp lin Digital delay-line interpolation: │
1353 │ linear|quadratic. │
1354 └─────────────────────────────────────────────────────────────────┘
1355 gain dB-gain
1356 Apply an amplification or an attenuation to the audio signal.
1357 The signal level is adjusted by the given number of dB - posi‐
1358 tive amplifies (beware of Clipping), negative attenuates.
1359
1360 See also the vol effect.
1361
1362 highpass|lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1363 Apply a high-pass or low-pass filter with 3dB point frequency.
1364 The filter can be either single-pole (with -1), or double-pole
1365 (the default, or with -2). width applies only to double-pole
1366 filters; the default is Q = 0.707 and gives a Butterworth
1367 response. The filters roll off at 6dB per pole per octave (20dB
1368 per pole per decade). The double-pole filters are described in
1369 detail in [1].
1370
1371 These effects support the --plot global option.
1372
1373 See also filter for filters with a steeper roll-off.
1374
1375 ladspa module [plugin] [argument...]
1376 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API)
1377 plugin. Despite the name, LADSPA is not Linux-specific, and a
1378 wide range of effects is available as LADSPA plugins, such as
1379 cmt [6] (the Computer Music Toolkit) and Steve Harris's plugin
1380 collection [7]. The first argument is the plugin module, the
1381 second the name of the plugin (a module can contain more than
1382 one plugin) and any other arguments are for the control ports of
1383 the plugin. Missing arguments are supplied by default values if
1384 possible. Only plugins with at most one audio input and one
1385 audio output port can be used. If found, the environment vari‐
1386 ble LADSPA_PATH will be used as search path for plugins.
1387
1388 loudness [gain [reference]]
1389 Loudness control - similar to the gain effect, but provides
1390 equalisation for the human auditory system. See
1391 http://en.wikipedia.org/wiki/Loudness for a detailed description
1392 of loudness. The gain is adjusted by the given gain parameter
1393 (usually negative) and the signal equalised according to ISO 226
1394 w.r.t. a reference level of 65dB, though an alternative refer‐
1395 ence level may be given if the original audio has been equalised
1396 for some other optimal level. A default gain of -10dB is used
1397 if a gain value is not given.
1398
1399 See also the gain effect.
1400
1401 lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1402 Apply a low-pass filter. See the description of the highpass
1403 effect for details.
1404
1405 mcompand "attack1,decay1{,attack2,decay2}
1406 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1407 [gain [initial-volume-dB [delay]]]" {crossover-freq[k]
1408 "attack1,..."}
1409
1410 The multi-band compander is similar to the single-band compander
1411 but the audio is first divided into bands using Linkwitz-Riley
1412 cross-over filters and a separately specifiable compander run on
1413 each band. See the compand effect for the definition of its
1414 parameters. Compand parameters are specified between double
1415 quotes and the crossover frequency for that band is given by
1416 crossover-freq; these can be repeated to create multiple bands.
1417
1418 For example, the following (one long) command shows how multi-
1419 band companding is typically used in FM radio:
1420 play track1.wav gain -3 filter 8000- 32 100 mcompand \
1421 "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
1422 "0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
1423 "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
1424 "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
1425 "0,0.025 -38,-31,-28,-28,-0,-25" \
1426 gain 15 highpass 22 highpass 22 filter -17500 256 \
1427 gain 9 lowpass -1 17801
1428 The audio file is played with a simulated FM radio sound (or
1429 broadcast signal condition if the lowpass filter at the end is
1430 skipped). Note that the pipeline is set up with US-style 75us
1431 preemphasis.
1432
1433 See also compand for a single-band companding effect.
1434
1435 mixer [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
1436 Reduce the number of audio channels by mixing or selecting chan‐
1437 nels, or increase the number of channels by duplicating chan‐
1438 nels. Note: this effect operates on the audio channels within
1439 the SoX effects processing chain; it should not be confused with
1440 the -m global option (where multiple files are mix-combined
1441 before entering the effects chain).
1442
1443 This effect is automatically used when the number of input chan‐
1444 nels differ from the number of output channels. When reducing
1445 the number of channels it is possible to manually specify the
1446 mixer effect and use the -l, -r, -f, -b, -1, -2, -3, -4, options
1447 to select only the left, right, front, back channel(s) or spe‐
1448 cific channel for the output instead of averaging the channels.
1449 The -l, and -r options will do averaging in quad-channel files
1450 so select the exact channel to prevent this.
1451
1452 The mixer effect can also be invoked with up to 16 numbers, sep‐
1453 arated by commas, which specify the proportion (0 = 0% and 1 =
1454 100%) of each input channel that is to be mixed into each output
1455 channel. In two-channel mode, 4 numbers are given: l → l, l →
1456 r, r → l, and r → r, respectively. In four-channel mode, the
1457 first 4 numbers give the proportions for the left-front output
1458 channel, as follows: lf → lf, rf → lf, lb → lf, and rb → rf.
1459 The next 4 give the right-front output in the same order, then
1460 left-back and right-back.
1461
1462 It is also possible to use the 16 numbers to expand or reduce
1463 the channel count; just specify 0 for unused channels.
1464
1465 Finally, certain reduced combination of numbers can be specified
1466 for certain input/output channel combinations.
1467
1468 ┌──────────────────────────────────────────────────────┐
1469 │In Ch Out Ch Num Mappings │
1470 │ 2 1 2 l → l, r → l │
1471 │ 2 2 1 adjust balance │
1472 │ 4 1 4 lf → l, rf → l, lb → l, rb → l │
1473 │ 4 2 2 lf → l&rf → r, lb → l&rb → r │
1474 │ 4 4 1 adjust balance │
1475 │ 4 4 2 front balance, back balance │
1476 └──────────────────────────────────────────────────────┘
1477 See also remix for a mixing effect that handles any number of
1478 channels.
1479
1480 noiseprof [profile-file]
1481 Calculate a profile of the audio for use in noise reduction.
1482 See the description of the noisered effect for details.
1483
1484 noisered [profile-file [amount]]
1485 Reduce noise in the audio signal by profiling and filtering.
1486 This effect is moderately effective at removing consistent back‐
1487 ground noise such as hiss or hum. To use it, first run SoX with
1488 the noiseprof effect on a section of audio that ideally would
1489 contain silence but in fact contains noise - such sections are
1490 typically found at the beginning or the end of a recording.
1491 noiseprof will write out a noise profile to profile-file, or to
1492 stdout if no profile-file or if `-' is given. E.g.
1493 sox speech.au -n trim 0 1.5 noiseprof speech.noise-profile
1494 To actually remove the noise, run SoX again, this time with the
1495 noisered effect; noisered will reduce noise according to a noise
1496 profile (which was generated by noiseprof), from profile-file,
1497 or from stdin if no profile-file or if `-' is given. E.g.
1498 sox speech.au cleaned.au noisered speech.noise-profile 0.3
1499 How much noise should be removed is specified by amount-a number
1500 between 0 and 1 with a default of 0.5. Higher numbers will
1501 remove more noise but present a greater likelihood of removing
1502 wanted components of the audio signal. Before replacing an
1503 original recording with a noise-reduced version, experiment with
1504 different amount values to find the optimal one for your audio;
1505 use headphones to check that you are happy with the results,
1506 paying particular attention to quieter sections of the audio.
1507
1508 On most systems, the two stages - profiling and reduction - can
1509 be combined using a pipe, e.g.
1510 sox noisy.au -n trim 0 1 noiseprof | play noisy.au noisered
1511
1512 norm [-i|-b] [level]
1513 Normalise audio to 0dB FSD, to a given level relative to 0dB, or
1514 normalise the balance of multi-channel audio. Requires tempo‐
1515 rary file space to store the audio to be normalised.
1516
1517 To create a normalised copy of an audio file,
1518 sox infile outfile norm
1519 can be used, though note that if `infile' has a simple encoding
1520 (e.g. PCM), then
1521 sox infile outfile vol `sox infile -n stat -v 2>&1`
1522 (on systems that support this construct) might be quicker to
1523 execute (though perhaps not to type!) as it doesn't require a
1524 temporary file.
1525
1526 For a more complex example, suppose that `effect1' performs some
1527 unknown or unpredictable attenuation and that `effect2' requires
1528 up to 10dB of headroom, then
1529 sox infile outfile effect1 norm -10 effect2 norm
1530 gives both effect2 and the output file the highest possible sig‐
1531 nal levels.
1532
1533 Normally, audio is normalised based on the level of the channel
1534 with the highest peak level, which means that whilst all chan‐
1535 nels are adjusted, only one channel attains the normalised
1536 level. If the -i option is given, then each channel is treated
1537 individually and will attain the normalised level.
1538
1539 If the -b option is given (with a multi-channel audio file),
1540 then the audio channels will be balanced; i.e. the RMS level of
1541 each channel will be normalised to that of the channel with the
1542 highest RMS level. This can be used, for example, to correct
1543 stereo imbalance. Note that -b can cause clipping.
1544
1545 In most cases, norm -3 should be the maximum level used at the
1546 output file (to leave headroom for playback-resampling, etc.).
1547 See also the discussions of Clipping and Replay Gain above.
1548
1549 oops Out Of Phase Stereo effect. Mixes stereo to twin-mono where
1550 each mono channel contains the difference between the left and
1551 right stereo channels. This is sometimes known as the `karaoke'
1552 effect as it often has the effect of removing most or all of the
1553 vocals from a recording.
1554
1555 pad { length[@position] }
1556 Pad the audio with silence, at the beginning, the end, or any
1557 specified points through the audio. Both length and position
1558 can specify a time or, if appended with an `s', a number of sam‐
1559 ples. length is the amount of silence to insert and position
1560 the position in the input audio stream at which to insert it.
1561 Any number of lengths and positions may be specified, provided
1562 that a specified position is not less that the previous one.
1563 position is optional for the first and last lengths specified
1564 and if omitted correspond to the beginning and the end of the
1565 audio respectively. For example, pad 1.5 1.5 adds 1.5 seconds
1566 of silence padding at each end of the audio, whilst pad
1567 4000s@3:00 inserts 4000 samples of silence 3 minutes into the
1568 audio. If silence is wanted only at the end of the audio, spec‐
1569 ify either the end position or specify a zero-length pad at the
1570 start.
1571
1572 phaser gain-in gain-out delay decay speed [-s|-t]
1573 Add a phasing effect to the audio. See [3] for a detailed
1574 description of phasing.
1575
1576 delay/decay/speed gives the delay in milliseconds and the decay
1577 (relative to gain-in) with a modulation speed in Hz. The modu‐
1578 lation is either sinusoidal (-s) - preferable for multiple
1579 instruments, or triangular (-t) - gives single instruments a
1580 sharper phasing effect. The decay should be less than 0.5 to
1581 avoid feedback, and usually no less than 0.1. Gain-out is the
1582 volume of the output.
1583
1584 For example:
1585 play snare.flac phaser 0.8 0.74 3 0.4 0.5 -t
1586 Gentler:
1587 play snare.flac phaser 0.9 0.85 4 0.23 1.3 -s
1588 A popular sound:
1589 play snare.flac phaser 0.89 0.85 1 0.24 2 -t
1590 More severe:
1591 play snare.flac phaser 0.6 0.66 3 0.6 2 -t
1592
1593 pitch [-q] shift [segment [search [overlap]]]
1594 Change the audio pitch (but not tempo).
1595
1596 shift gives the pitch shift as positive or negative `cents'
1597 (i.e. 100ths of a semitone). See the tempo effect for a
1598 description of the other parameters.
1599
1600 rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
1601 Change the audio sampling rate (i.e. resample the audio) to any
1602 given RATE (even non-integer if this is supported by the output
1603 file format) using a quality level defined as follows:
1604
1605 ┌───────────────────────────────────────────────────┐
1606 │ Quality Band- Rej dB Typical Use │
1607 │ width │
1608 │-q quick n/a ≈30 @ playback on │
1609 │ Fs/4 ancient hardware │
1610 │-l low 80% 100 playback on old │
1611 │ hardware │
1612 │-m medium 95% 100 audio playback │
1613 │-h high 95% 125 16-bit mastering │
1614 │ (use with dither) │
1615 │-v very high 95% 175 24-bit mastering │
1616 └───────────────────────────────────────────────────┘
1617 where Band-width is the percentage of the audio frequency band
1618 that is preserved and Rej dB is the level of noise rejection.
1619 Increasing levels of resampling quality come at the expense of
1620 increasing amounts of time to process the audio. If no quality
1621 option is given, the quality level used is `high'.
1622
1623 The `quick' algorithm uses cubic interpolation; all others use
1624 band-limited interpolation. By default, all algorithms have a
1625 `linear' phase response; for `medium', `high' and `very high',
1626 the phase response is configurable (see below).
1627
1628 The rate effect is invoked automatically if SoX's -r option
1629 specifies a rate that is different to that of the input file(s).
1630 Alternatively, if this effect is given explicitly, then SoX's -r
1631 option need not be given. For example, the following two com‐
1632 mands are equivalent:
1633 sox input.au -r 48k output.au bass -3
1634 sox input.au output.au bass -3 rate 48k
1635 though the second command is more flexible as it allows rate
1636 options to be given, and allows the effects to be ordered arbi‐
1637 trarily.
1638
1639 * * *
1640
1641 Warning: technically detailed discussion follows.
1642
1643 The simple quality selection described above provides settings
1644 that satisfy the needs of the vast majority of resampling tasks.
1645 Occasionally, however, it may be desirable to fine-tune the
1646 resampler's filter response; this can be achieved using over‐
1647 ride options, as detailed in the following table:
1648
1649 ┌──────────────────────────────────────────────────────────────────┐
1650 │-M/-I/-L Phase response = minimum/intermediate/linear │
1651 │-s Steep filter (band-width = 99%) │
1652 │-a Allow aliasing above the pass-band │
1653 │-b 74-99.7 Any band-width % │
1654 │-p 0-100 Any phase response (0 = minimum, 25 = intermediate, │
1655 │ 50 = linear, 100 = maximum) │
1656 └──────────────────────────────────────────────────────────────────┘
1657 N.B. Override options can not be used with the `quick' or `low'
1658 quality algorithms.
1659
1660 All resamplers use filters that can sometimes create `echo'
1661 (a.k.a. `ringing') artefacts with transient signals such as
1662 those that occur with `finger snaps' or other highly percussive
1663 sounds. Such artefacts are much more noticable to the human ear
1664 if they occur before the transient (`pre-echo') than if they
1665 occur after it (`post-echo'). Note that frequency of any such
1666 artefacts is related to the smaller of the original and new sam‐
1667 pling rates but that if this is at least 44.1kHz, then the arte‐
1668 facts will lie outside the range of human hearing.
1669
1670 A phase response setting may be used to control the distribution
1671 of any transient echo between `pre' and `post': with minimum
1672 phase, there is no pre-echo but the longest post-echo; with lin‐
1673 ear phase, pre and post echo are in equal amounts (in signal
1674 terms, but not audibility terms); the intermediate phase setting
1675 attempts to find the best compromise by selecting a small length
1676 (and level) of pre-echo and a medium lengthed post-echo.
1677
1678 Minimum, intermediate, or linear phase response is selected
1679 using the -M, -I, or -L option; a custom phase response can be
1680 created with the -p option. Note that phase responses between
1681 `linear' and `maximum' (greater than 50) are rarely useful.
1682
1683 A resampler's band-width setting determines how much of the fre‐
1684 quency content of the original signal (w.r.t. the orignal sample
1685 rate when up-sampling, or the new sample rate when down-sam‐
1686 pling) is preserved during conversion. The term `pass-band' is
1687 used to refer to all frequencies up to the band-width point
1688 (e.g. for 44.1kHz sampling rate, and a resampling band-width of
1689 95%, the pass-band represents frequencies from 0Hz (D.C.) to
1690 circa 21kHz). Increasing the resampler's band-width results in
1691 a slower conversion and can increase transient echo artefacts
1692 (and vice versa).
1693
1694 The -s `steep filter' option changes resampling band-width from
1695 the default 95% (based on the 3dB point), to 99%. The -b option
1696 allows the band-width to be set to any value in the range
1697 74-99.7 %, but note that band-width values greater than 99% are
1698 not recommended for normal use as they can cause excessive tran‐
1699 sient echo.
1700
1701 If the -a option is given, then aliasing above the pass-band is
1702 allowed. For example, with 44.1kHz sampling rate, and a resam‐
1703 pling band-width of 95%, this means that frequency content above
1704 21kHz can be distorted; however, since this is above the pass-
1705 band (i.e. above the highest frequency of interest/audibility),
1706 this may not be a problem. The benefits of allowing aliasing
1707 are reduced processing time, and reduced (by almost half) tran‐
1708 sient echo artefacts. Note that if this option is given, then
1709 the minimum band-width allowable with -b increases to 85%.
1710
1711 Examples:
1712 sox input.wav -b 16 output.wav rate -s -a 44100 dither
1713 default (high) quality resampling; overrides: steep filter,
1714 allow aliasing; to 44.1kHz sample rate; dither output to 16-bit
1715 WAV file.
1716 sox input.wav -b 24 output.aiff rate -v -L -b 90 48k
1717 very high quality resampling; overrides: linear phase, band-
1718 width 90%; to 48k sample rate; store output to 24-bit AIFF file.
1719
1720
1721 * * *
1722
1723 The pitch, speed and tempo effects all use the rate effect at
1724 their core.
1725
1726 See also resample, polyphase and rabbit for other sample-rate
1727 changing effects.
1728
1729 remix [-a|-m|-p] <out-spec>
1730 out-spec = in-spec{,in-spec} | 0
1731 in-spec = [in-chan][-[in-chan2]][vol-spec]
1732 vol-spec = p|i|v[volume]
1733
1734 Select and mix input audio channels into output audio channels.
1735 Each output channel is specified, in turn, by a given out-spec:
1736 a list of contributing input channels and volume specifications.
1737
1738 Note that this effect operates on the audio channels within the
1739 SoX effects processing chain; it should not be confused with the
1740 -m global option (where multiple files are mix-combined before
1741 entering the effects chain).
1742
1743 An out-spec contains comma-separated input channel-numbers and
1744 hyphen-delimited channel-number ranges; alternatively, 0 may be
1745 given to create a silent output channel. For example,
1746 sox input.au output.au remix 6 7 8 0
1747 creates an output file with four channels, where channels 1, 2,
1748 and 3 are copies of channels 6, 7, and 8 in the input file, and
1749 channel 4 is silent. Whereas
1750 sox input.au output.au remix 1-3,7 3
1751 creates a (somewhat bizarre) stereo output file where the left
1752 channel is a mix-down of input channels 1, 2, 3, and 7, and the
1753 right channel is a copy of input channel 3.
1754
1755 Where a range of channels is specified, the channel numbers to
1756 the left and right of the hyphen are optional and default to 1
1757 and to the number of input channels respectively. Thus
1758 sox input.au output.au remix -
1759 performs a mix-down of all input channels to mono.
1760
1761 By default, where an output channel is mixed from multiple (n)
1762 input channels, each input channel will be scaled by a factor of
1763 ¹/n. Custom mixing volumes can be set by following a given
1764 input channel or range of input channels with a vol-spec (volume
1765 specification). This is one of the letters p, i, or v, followed
1766 by a volume number, the meaning of which depends on the given
1767 letter and is defined as follows:
1768
1769 Letter Volume number Notes
1770 p power adjust in dB 0 = no change
1771 i power adjust in dB As `p', but invert
1772 the audio
1773 v voltage multiplier 1 = no change, 0.5
1774 ≈ 6dB attenuation,
1775 2 ≈ 6dB gain, -1 =
1776 invert
1777
1778 If an out-spec includes at least one vol-spec then, by default,
1779 ¹/n scaling is not applied to any other channels in the same
1780 out-spec (though may be in other out-specs). The -a (automatic)
1781 option however, can be given to retain the automatic scaling in
1782 this case. For example,
1783 sox input.au output.au remix 1,2 3,4v0.8
1784 results in channel level multipliers of 0.5,0.5 1,0.8, whereas
1785 sox input.au output.au remix -a 1,2 3,4v0.8
1786 results in channel level multipliers of 0.5,0.5 0.5,0.8.
1787
1788 The -m (manual) option disables all automatic volume adjust‐
1789 ments, so
1790 sox input.au output.au remix -m 1,2 3,4v0.8
1791 results in channel level multipliers of 1,1 1,0.8.
1792
1793 The volume number is optional and omitting it corresponds to no
1794 volume change; however, the only case in which this is useful is
1795 in conjunction with i. For example, if input.au is stereo, then
1796 sox input.au output.au remix 1,2i
1797 is a mono equivalent of the oops effect.
1798
1799 If the -p option is given, then any automatic ¹/n scaling is
1800 replaced by ¹/√n (`power') scaling; this gives a louder mix but
1801 one that might occasionally clip.
1802
1803 * * *
1804
1805 One use of the remix effect is to split an audio file into a set
1806 of files, each containing one of the constituent channels (in
1807 order to perform subsequent processing on individual audio chan‐
1808 nels). Where more than a few channels are involved, a script
1809 such as the following (Bourne shell script) is useful:
1810 #!/bin/sh
1811 chans=`soxi -c "$1"`
1812 while [ $chans -ge 1 ]; do
1813 chans0=`printf %02i $chans` # 2 digits hence up to 99 chans
1814 out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
1815 sox "$1" "$out" remix $chans
1816 chans=`expr $chans - 1`
1817 done
1818 If a file input.au containing six audio channels were given, the
1819 script would produce six output files: input-01.au, input-02.au,
1820 ..., input-06.au.
1821
1822 See also mixer and swap for similar effects.
1823
1824 repeat count
1825 Repeat the entire audio count times. Requires temporary file
1826 space to store the audio to be repeated. Note that repeating
1827 once yields two copies: the original audio and the repeated
1828 audio.
1829
1830 reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
1831 [room-scale (100%) [stereo-depth (100%)
1832 [pre-delay (0ms) [wet-gain (0dB)]]]]]]
1833
1834 Add reverberation to the audio using the `freeverb' algorithm.
1835 A reverberation effect is sometimes desirable for concert halls
1836 that are too small or contain so many people that the hall's
1837 natural reverberance is diminished. Applying a small amount of
1838 stereo reverb to a (dry) mono signal will usually make it sound
1839 more natural. See [3] for a detailed description of reverbera‐
1840 tion.
1841
1842 Note that this effect increases both the volume and the length
1843 of the audio, so to prevent clipping in these domains, a typical
1844 invocation might be:
1845 play dry.au gain -3 pad 0 3 reverb
1846
1847 reverse
1848 Reverse the audio completely. Requires temporary file space to
1849 store the audio to be reversed.
1850
1851 riaa Apply RIAA vinyl playback equalisation. The sampling rate must
1852 be one of: 44.1, 48, 88.2, 96 kHz.
1853
1854 This effect supports the --plot global option.
1855
1856 silence [-l] above-periods [duration
1857 threshold[d|%] [below-periods duration threshold[d|%]]
1858
1859 Removes silence from the beginning, middle, or end of the audio.
1860 Silence is anything below a specified threshold.
1861
1862 The above-periods value is used to indicate if audio should be
1863 trimmed at the beginning of the audio. A value of zero indicates
1864 no silence should be trimmed from the beginning. When specifying
1865 an non-zero above-periods, it trims audio up until it finds non-
1866 silence. Normally, when trimming silence from beginning of audio
1867 the above-periods will be 1 but it can be increased to higher
1868 values to trim all audio up to a specific count of non-silence
1869 periods. For example, if you had an audio file with two songs
1870 that each contained 2 seconds of silence before the song, you
1871 could specify an above-period of 2 to strip out both silence
1872 periods and the first song.
1873
1874 When above-periods is non-zero, you must also specify a duration
1875 and threshold. Duration indications the amount of time that non-
1876 silence must be detected before it stops trimming audio. By
1877 increasing the duration, burst of noise can be treated as
1878 silence and trimmed off.
1879
1880 Threshold is used to indicate what sample value you should treat
1881 as silence. For digital audio, a value of 0 may be fine but for
1882 audio recorded from analog, you may wish to increase the value
1883 to account for background noise.
1884
1885 When optionally trimming silence from the end of the audio, you
1886 specify a below-periods count. In this case, below-period means
1887 to remove all audio after silence is detected. Normally, this
1888 will be a value 1 of but it can be increased to skip over peri‐
1889 ods of silence that are wanted. For example, if you have a song
1890 with 2 seconds of silence in the middle and 2 second at the end,
1891 you could set below-period to a value of 2 to skip over the
1892 silence in the middle of the audio.
1893
1894 For below-periods, duration specifies a period of silence that
1895 must exist before audio is not copied any more. By specifying a
1896 higher duration, silence that is wanted can be left in the
1897 audio. For example, if you have a song with an expected 1 sec‐
1898 ond of silence in the middle and 2 seconds of silence at the
1899 end, a duration of 2 seconds could be used to skip over the mid‐
1900 dle silence.
1901
1902 Unfortunately, you must know the length of the silence at the
1903 end of your audio file to trim off silence reliably. A work
1904 around is to use the silence effect in combination with the
1905 reverse effect. By first reversing the audio, you can use the
1906 above-periods to reliably trim all audio from what looks like
1907 the front of the file. Then reverse the file again to get back
1908 to normal.
1909
1910 To remove silence from the middle of a file, specify a below-
1911 periods that is negative. This value is then treated as a posi‐
1912 tive value and is also used to indicate the effect should
1913 restart processing as specified by the above-periods, making it
1914 suitable for removing periods of silence in the middle of the
1915 audio.
1916
1917 The option -l indicates that below-periods duration length of
1918 audio should be left intact at the beginning of each period of
1919 silence. For example, if you want to remove long pauses between
1920 words but do not want to remove the pauses completely.
1921
1922 The period counts are in units of samples. Duration counts may
1923 be in the format of hh:mm:ss.frac, or the exact count of sam‐
1924 ples. Threshold numbers may be suffixed with d to indicate the
1925 value is in decibels, or % to indicate a percentage of maximum
1926 value of the sample value (0% specifies pure digital silence).
1927
1928 The following example shows how this effect can be used to start
1929 a recording that does not contain the delay at the start which
1930 usually occurs between `pressing the record button' and the
1931 start of the performance:
1932 rec parameters filename other-effects silence 1 5 2%
1933
1934 speed factor[c]
1935 Adjust the audio speed (pitch and tempo together). factor is
1936 either the ratio of the new speed to the old speed: greater than
1937 1 speeds up, less than 1 slows down, or, if appended with the
1938 letter `c', the number of cents (i.e. 100ths of a semitone) by
1939 which the pitch (and tempo) should be adjusted: greater than 0
1940 increases, less than 0 decreases.
1941
1942 By default, the speed change is performed by resampling with the
1943 rate effect using its default quality/speed. For higher quality
1944 or higher speed resampling, in addition to the speed effect,
1945 specify the rate effect with the desired quality option.
1946
1947 spectrogram [options]
1948 Create a spectrogram of the audio. This effect is optional;
1949 type sox --help and check the list of supported effects to see
1950 if it has been included.
1951
1952 The spectrogram is rendered in a Portable Network Graphic (PNG)
1953 file, and shows time in the X-axis, frequency in the Y-axis, and
1954 audio signal magnitude in the Z-axis. Z-axis values are repre‐
1955 sented by the colour (or intensity) of the pixels in the X-Y
1956 plane.
1957
1958 This effect supports only one channel; for multi-channel input
1959 files, use either SoX's -c 1 option with the output file (to
1960 obtain a spectrogram on the mix-down), or the remix n effect to
1961 select a particular channel. Be aware though, that both of
1962 these methods affect the audio in the effects chain.
1963
1964 -x num X-axis pixels/second, default 100. This controls the
1965 width of the spectrogram; num can be from 1 (low time
1966 resolution) to 5000 (high time resolution) and need not
1967 be an integer. SoX may make a slight adjustment to the
1968 given number for processing quantisation reasons; if so,
1969 SoX will report the actual number used (viewable when
1970 --verbose is in effect).
1971
1972 The maximum width of the spectrogram is 999 pixels; if
1973 the audio length and the given -x number are such that
1974 this would be exceeded, then the spectrogram (and the
1975 effects chain) will be truncated. To move the spectro‐
1976 gram to a point later in the audio stream, first invoke
1977 the trim effect; e.g.
1978 sox audio.ogg -n trim 1:00 spectrogram
1979 starts the spectrogram at 1 minute through the audio.
1980
1981 -y num Y-axis resolution (1 - 4), default 2. This controls the
1982 height of the spectrogram; num can be from 1 (low fre‐
1983 quency resolution) to 4 (high frequency resolution). For
1984 values greater than 2, the resulting image may be too
1985 tall to display on the screen; if so, a graphic manipula‐
1986 tion package (such as ImageMagick(1)) can be used to re-
1987 size the image.
1988
1989 To increase the frequency resolution without increasing
1990 the height of the spectrogram, the rate effect may be
1991 invoked to reduce the sampling rate of the signal before
1992 invoking spectrogram; e.g.
1993 sox audio.ogg -r 4k -n rate spectrogram
1994 allows detailed analysis of frequencies up to 2kHz (half
1995 the sampling rate).
1996
1997 -z num Z-axis (colour) range in dB, default 120. This sets the
1998 dynamic-range of the spectrogram to be -num dBFS to
1999 0 dBFS. Num may range from 20 to 180. Decreasing
2000 dynamic-range effectively increases the `contrast' of the
2001 spectrogram display, and vice versa.
2002
2003 -Z num Sets the upper limit of the Z-axis in dBFS. A negative
2004 num effectively increases the `brightness' of the spec‐
2005 trogram display, and vice versa.
2006
2007 -q num Sets the Z-axis quantisation, i.e. the number of differ‐
2008 ent colours (or intensities) in which to render Z-axis
2009 values. A small number (e.g. 4) will give a
2010 `poster'-like effect making it easier to discern magni‐
2011 tude bands of similar level. Small numbers also usually
2012 result in small PNG files. The number given specifies
2013 the number of colours to use inside the Z-axis range; two
2014 colours are reserved to represent out-of-range values.
2015
2016 -w name
2017 Window: Hann (default), Hamming, Bartlett, Rectangular or
2018 Kaiser. The spectrogram is produced using the Discrete
2019 Fourier Transform (DFT) algorithm. A significant parame‐
2020 ter to this algorithm is the choice of `window function'.
2021 By default, SoX uses the Hann window which has good all-
2022 round frequency-resolution and dynamic-range properties.
2023 For better frequency resolution (but lower dynamic-
2024 range), select a Hamming window; for higher dynamic-range
2025 (but poorer frequency-resolution), select a Kaiser win‐
2026 dow. Bartlett and Rectangular windows are also avail‐
2027 able. Selecting a window other than Hann will usually
2028 require a corresponding -z setting.
2029
2030 -s Allow slack overlapping of DFT windows. This can, in
2031 some cases, increase image sharpness and give greater
2032 adherence to the -x value, but at the expense of a little
2033 spectral loss.
2034
2035 -m Creates a monochrome spectrogram (the default is colour).
2036
2037 -h Selects a high-colour palette - less visually pleasing
2038 than the default colour palette, but it may make it eas‐
2039 ier to differentiate different levels. If this option is
2040 used in conjunction with -m, the result will be a hybrid
2041 monochrome/colour palette.
2042
2043 -p num Permute the colours in a colour or hybrid palette. The
2044 num parameter (from 1 to 6) selects the permutation.
2045
2046 -l Creates a `printer friendly' spectrogram with a light
2047 background (the default has a dark background).
2048
2049 -a Suppress the display of the axis lines. This is some‐
2050 times useful in helping to discern artefacts at the spec‐
2051 trogram edges.
2052
2053 -t text
2054 Set the image title - text to display above the spectro‐
2055 gram.
2056
2057 -c text
2058 Set the image comment - text to display below and to the
2059 left of the spectrogram.
2060
2061 -o text
2062 Name of the spectrogram output PNG file, default `spec‐
2063 trogram.png'.
2064
2065 For example, let's see what the spectrogram of a swept triangu‐
2066 lar wave looks like:
2067 sox -n -n synth 6 tri 10k:14k spectrogram -z 100 -w k
2068 Append the following to the `chime' example in the delay effect
2069 to see its spectrogram:
2070 rate 2k spectrogram -x 200 -Z -15 -w k
2071 For the ability to perform off-line processing of spectral data,
2072 see the stat effect.
2073
2074 splice { position[,excess[,leeway]] }
2075 Splice together audio sections. This effect provides two things
2076 over simple audio concatenation: a (usually short) cross-fade is
2077 applied at the join, and a wave similarity comparison is made to
2078 help determine the best place at which to make the join.
2079
2080 To perform a splice, first use the trim effect to select the
2081 audio sections to be joined together. As when performing a tape
2082 splice, the end of the section to be spliced onto should be
2083 trimmed with a small excess (default 0.005 seconds) of audio
2084 after the ideal joining point. The beginning of the audio sec‐
2085 tion to splice on should be trimmed with the same excess (before
2086 the ideal joining point), plus an additional leeway (default
2087 0.005 seconds). SoX should then be invoked with the two audio
2088 sections as input files and the splice effect given with the
2089 position at which to perform the splice - this is length of the
2090 first audio section (including the excess).
2091
2092 For example, a long song begins with two verses which start (as
2093 determined e.g. by using the play command with the trim (start)
2094 effect) at times 0:30.125 and 1:03.432. The following commands
2095 cut out the first verse:
2096 sox too-long.au part1.au trim 0 30.130
2097 (5 ms excess, after the first verse starts)
2098 sox long.au part2.au trim 1:03.422
2099 (5 ms excess plus 5 ms leeway, before the second verse starts)
2100 sox part1.au part2.au just-right.au splice 30.130
2101 Provided your arithmetic is good enough, multiple splices can be
2102 performed with a single splice invocation. For example:
2103 #!/bin/sh
2104 # Audio Copy and Paste Over
2105 # acpo infile copy-start copy-stop paste-over-start outfile
2106 # All times measured in samples.
2107 rate=`soxi -r "$1"`
2108 e=`expr $rate '*' 5 / 1000` # Using default excess
2109 l=$e # and leeway.
2110 sox "$1" piece.au trim `expr $2 - $e - $l`s \
2111 `expr $3 - $2 + $e + $l + $e`s
2112 sox "$1" part1.au trim 0 `expr $4 + $e`s
2113 sox "$1" part2.au trim `expr $4 + $3 - $2 - $e - $l`s
2114 sox part1.au piece.au part2.au "$5" splice \
2115 `expr $4 + $e`s \
2116 `expr $4 + $e + $3 - $2 + $e + $l + $e`s
2117 In the above Bourne shell script, two splices are used to `copy
2118 and paste' audio.
2119
2120 The SoX command
2121 play "|sox -n -p synth 1 sin %1" "|sox -n -p synth 1 sin %3"
2122 generates and plays two notes, but there is a nasty click at the
2123 transition; the click can be removed by appending splice 1 to
2124 the command. (Clicks at the beginning and end of the audio can
2125 be removed by preceding the splice effect with fade q .01 2
2126 .01).
2127
2128 * * *
2129
2130 It is also possible to use this effect to perform general cross-
2131 fades, e.g. to join two songs. In this case, excess would typi‐
2132 cally be an number of seconds, and leeway should be set to zero.
2133
2134 stat [-s scale] [-rms] [-freq] [-v] [-d]
2135 Display time and frequency domain statistical information about
2136 the audio. Audio is passed unmodified through the SoX process‐
2137 ing chain.
2138
2139 The information is output to the `standard error' (stderr)
2140 stream and is calculated, where n is the duration of the audio
2141 in samples, c is the number of audio channels, r is the audio
2142 sample rate, and xk represents the PCM value (in the range -1 to
2143 +1 by default) of each successive sample in the audio, as fol‐
2144 lows:
2145
2146 Samples read n×c
2147 Length (seconds) n÷r
2148 Scaled by See -s below.
2149 Maximum amplitude max(xk) The maximum sample
2150 value in the audio;
2151 usually this will
2152 be a positive num‐
2153 ber.
2154
2155
2156
2157
2158
2159 Minimum amplitude min(xk) The minimum sample
2160 value in the audio;
2161 usually this will
2162 be a negative num‐
2163 ber.
2164 Midline amplitude ½min(xk)+½max(xk)
2165 Mean norm ¹/nΣ│xk│ The average of the
2166 absolute value of
2167 each sample in the
2168 audio.
2169 Mean amplitude ¹/nΣxk The average of each
2170 sample in the
2171 audio. If this
2172 figure is non-zero,
2173 then it indicates
2174 the presence of a
2175 D.C. offset (which
2176 could be removed
2177 using the dcshift
2178 effect).
2179 RMS amplitude √(¹/nΣxk²) The level of a D.C.
2180 signal that would
2181 have the same power
2182 as the audio's
2183 average power.
2184 Maximum delta max(│xk-xk-1│)
2185 Minimum delta min(│xk-xk-1│)
2186 Mean delta ¹/n-1Σ│xk-xk-1│
2187 RMS delta √(¹/n-1Σ(xk-xk-1)²)
2188 Rough frequency In Hz.
2189 Volume Adjustment The parameter to
2190 the vol effect
2191 which would make
2192 the audio as loud
2193 as possible without
2194 clipping. Note:
2195 See the discussion
2196 on Clipping above
2197 for reasons why it
2198 is rarely a good
2199 idea actually to do
2200 this.
2201
2202 The -s option can be used to scale the input data by a given
2203 factor. The default value of scale is 2147483647 (i.e. the max‐
2204 imum value of a 32-bit signed integer). Internal effects always
2205 work with signed long PCM data and so the value should relate to
2206 this fact.
2207
2208 The -rms option will convert all output average values to `root
2209 mean square' format.
2210
2211 The -v option displays only the `Volume Adjustment' value.
2212
2213 The -freq option calculates the input's power spectrum (4096
2214 point DFT) instead of the statistics listed above.
2215
2216 The -d option displays a hex dump of the 32-bit signed PCM data
2217 audio in SoX's internal buffer. This is mainly used to help
2218 track down endian problems that sometimes occur in cross-plat‐
2219 form versions of SoX.
2220
2221 swap [1 2 | 1 2 3 4]
2222 Swap channels in multi-channel audio files. Optionally, you may
2223 specify the channel order you would like the output in. This
2224 defaults to output channel 2 and then 1 for stereo and 2, 1, 4,
2225 3 for quad-channels. An interesting feature is that you may
2226 duplicate a given channel by overwriting another. This is done
2227 by repeating an output channel on the command-line. For exam‐
2228 ple, swap 2 2 will overwrite channel 1 with channel 2; creating
2229 a stereo file with both channels containing the same audio.
2230
2231 See also the remix effect.
2232
2233 stretch factor [window fade shift fading]
2234 Change the audio duration (but not its pitch). This effect is
2235 broadly equivalent to the tempo effect with (factor inverted
2236 and) search set to zero, so in general, its results are compara‐
2237 tively poor; it is retained as it can sometimes out-perform
2238 tempo for small factors.
2239
2240 factor of stretching: >1 lengthen, <1 shorten duration. window
2241 size is in ms. Default is 20ms. The fade option, can be `lin'.
2242 shift ratio, in [0 1]. Default depends on stretch factor. 1 to
2243 shorten, 0.8 to lengthen. The fading ratio, in [0 0.5]. The
2244 amount of a fade's default depends on factor and shift.
2245
2246 See also the tempo effect.
2247
2248 synth [len] {[type] [combine] [[%]freq[k][:|+|/|-[%]freq2[k]]] [off]
2249 [ph] [p1] [p2] [p3]}
2250 This effect can be used to generate fixed or swept frequency
2251 audio tones with various wave shapes, or to generate wide-band
2252 noise of various `colours'. Multiple synth effects can be cas‐
2253 caded to produce more complex waveforms; at each stage it is
2254 possible to choose whether the generated waveform will be mixed
2255 with, or modulated onto the output from the previous stage.
2256 Audio for each channel in a multi-channel audio file can be syn‐
2257 thesised independently.
2258
2259 Though this effect is used to generate audio, an input file must
2260 still be given, the characteristics of which will be used to set
2261 the synthesised audio length, the number of channels, and the
2262 sampling rate; however, since the input file's audio is not nor‐
2263 mally needed, a `null file' (with the special name -n) is often
2264 given instead (and the length specified as a parameter to synth
2265 or by another given effect that can has an associated length).
2266
2267 For example, the following produces a 3 second, 48kHz, audio
2268 file containing a sine-wave swept from 300 to 3300 Hz:
2269 sox -n output.au synth 3 sine 300-3300
2270 and this produces an 8 kHz version:
2271 sox -r 8000 -n output.au synth 3 sine 300-3300
2272 Multiple channels can be synthesised by specifying the set of
2273 parameters shown between braces multiple times; the following
2274 puts the swept tone in the left channel and adds `brown' noise
2275 in the right:
2276 sox -n output.au synth 3 sine 300-3300 brownnoise
2277 The following example shows how two synth effects can be cas‐
2278 caded to create a more complex waveform:
2279 sox -n output.au synth 0.5 sine 200-500 \
2280 synth 0.5 sine fmod 700-100
2281 Frequencies can also be given as a number of musical semitones
2282 relative to `middle A' (440 Hz) by prefixing a `%' character;
2283 for example, the following could be used to help tune a guitar's
2284 `E' strings:
2285 play -n synth sine %-17
2286 N.B. This effect generates audio at maximum volume (0dBFS),
2287 which means that there is a high chance of clipping when using
2288 the audio subsequently, so in most cases, you will want to fol‐
2289 low this effect with the gain effect to prevent this from hap‐
2290 pening. (See also Clipping above.)
2291
2292 A detailed description of each synth parameter follows:
2293
2294 len is the length of audio to synthesise expressed as a time or
2295 as a number of samples; 0=inputlength, default=0.
2296
2297 The format for specifying lengths in time is hh:mm:ss.frac. The
2298 format for specifying sample counts is the number of samples
2299 with the letter `s' appended to it.
2300
2301 type is one of sine, square, triangle, sawtooth, trapezium, exp,
2302 [white]noise, pinknoise, brownnoise; default=sine
2303
2304 combine is one of create, mix, amod (amplitude modulation), fmod
2305 (frequency modulation); default=create
2306
2307 freq/freq2 are the frequencies at the beginning/end of synthesis
2308 in Hz or, if preceded with `%', semitones relative to A
2309 (440 Hz); for both, default=%0. If freq2 is given, then len
2310 must also have been given and the generated tone will be swept
2311 between the given frequencies. The two given frequencies must
2312 be separated by one of the characters `:', `+', `/', or `-'.
2313 This character is used to specify the sweep function as follows:
2314
2315 : Linear: the tone will change by a fixed number of hertz
2316 per second.
2317
2318 + Square: a second-order function is used to change the
2319 tone.
2320
2321 / Exponential: the tone will change by a fixed number of
2322 semitones per second.
2323
2324 - Exponential: as `/', but initial phase always zero, and
2325 stepped (less smooth) frequency changes.
2326
2327 Not used for noise.
2328
2329 off is the bias (DC-offset) of the signal in percent; default=0.
2330
2331 ph is the phase shift in percentage of 1 cycle; default=0. Not
2332 used for noise.
2333
2334 p1 is the percentage of each cycle that is `on' (square), or
2335 `rising' (triangle, exp, trapezium); default=50 (square, trian‐
2336 gle, exp), default=10 (trapezium).
2337
2338 p2 (trapezium): the percentage through each cycle at which
2339 `falling' begins; default=50. exp: the amplitude in percent;
2340 default=100.
2341
2342 p3 (trapezium): the percentage through each cycle at which
2343 `falling' ends; default=60.
2344
2345 tempo [-q] factor [segment [search [overlap]]]
2346 Change the audio tempo (but not its pitch). The audio is
2347 chopped up into segments which are then shifted in the time
2348 domain and overlapped (cross-faded) at points where their wave‐
2349 forms are most similar (as determined by measurement of `least
2350 squares').
2351
2352 By default, linear searches are used to find the best overlap‐
2353 ping points; if the optional -q parameter is given, tree
2354 searches are used instead, giving a quicker, but possibly lower
2355 quality, result.
2356
2357 factor gives the ratio of new tempo to the old tempo, so e.g.
2358 1.1 speeds up the tempo by 10%, and 0.9 slows it down by 10%.
2359
2360 The optional segment parameter selects the algorithm's segment
2361 size in milliseconds. The default value is 82 and is typically
2362 suited to making small changes to the tempo of music; for larger
2363 changes (e.g. a factor of 2), 50 ms may give a better result.
2364 When changing the tempo of speech, a segment size of around
2365 30 ms often works well.
2366
2367 The optional search parameter gives the audio length in mil‐
2368 liseconds (default 14) over which the algorithm will search for
2369 overlapping points. Larger values use more processing time and
2370 do not necessarily produce better results.
2371
2372 The optional overlap parameter gives the segment overlap length
2373 in milliseconds (default 12).
2374
2375 See also speed for an effect that changes tempo and pitch
2376 together, and pitch for an effect that changes pitch without
2377 changing tempo.
2378
2379 treble gain [frequency[k] [width[s|h|k|o|q]]]
2380 Apply a treble tone-control effect. See the description of the
2381 bass effect for details.
2382
2383 tremolo speed [depth]
2384 Apply a tremolo (low frequency amplitude modulation) effect to
2385 the audio. The tremolo frequency in Hz is given by speed, and
2386 the depth as a percentage by depth (default 40).
2387
2388 Note: This effect is a special case of the synth effect.
2389
2390 trim start [length]
2391 Trim can trim off unwanted audio from the beginning and end of
2392 the audio. Audio is not sent to the output stream until the
2393 start location is reached.
2394
2395 The optional length parameter tells the number of samples to
2396 output after the start sample and is used to trim off the back
2397 side of the audio. Using a value of 0 for the start parameter
2398 will allow trimming off the back side only.
2399
2400 Both options can be specified using either an amount of time or
2401 an exact count of samples. The format for specifying lengths in
2402 time is hh:mm:ss.frac. A start value of 1:30.5 will not start
2403 until 1 minute, thirty and ½ seconds into the audio. The format
2404 for specifying sample counts is the number of samples with the
2405 letter `s' appended to it. A value of 8000s will wait until
2406 8000 samples are read before starting to process audio.
2407
2408 vol gain [type [limitergain]]
2409 Apply an amplification or an attenuation to the audio signal.
2410 Unlike the -v option (which is used for balancing multiple input
2411 files as they enter the SoX effects processing chain), vol is an
2412 effect like any other so can be applied anywhere, and several
2413 times if necessary, during the processing chain.
2414
2415 The amount to change the volume is given by gain which is inter‐
2416 preted, according to the given type, as follows: if type is
2417 amplitude (or is omitted), then gain is an amplitude (i.e. volt‐
2418 age or linear) ratio, if power, then a power (i.e. wattage or
2419 voltage-squared) ratio, and if dB, then a power change in dB.
2420
2421 When type is amplitude or power, a gain of 1 leaves the volume
2422 unchanged, less than 1 decreases it, and greater than 1
2423 increases it; a negative gain inverts the audio signal in addi‐
2424 tion to adjusting its volume.
2425
2426 When type is dB, a gain of 0 leaves the volume unchanged, less
2427 than 0 decreases it, and greater than 0 increases it.
2428
2429 See [4] for a detailed discussion on electrical (and hence audio
2430 signal) voltage and power ratios.
2431
2432 Beware of Clipping when the increasing the volume.
2433
2434 The gain and the type parameters can be concatenated if desired,
2435 e.g. vol 10dB.
2436
2437 An optional limitergain value can be specified and should be a
2438 value much less than 1 (e.g. 0.05 or 0.02) and is used only on
2439 peaks to prevent clipping. Not specifying this parameter will
2440 cause no limiter to be used. In verbose mode, this effect will
2441 display the percentage of the audio that needed to be limited.
2442
2443 See also compand for a dynamic-range compression/expansion/lim‐
2444 iting effect.
2445
2446 Deprecated Effects
2447 The following effects have been renamed or have their functionality
2448 included in another effect; they continue to work in this version of
2449 SoX but may be removed in future.
2450
2451 key [-q] shift [segment [search [overlap]]]
2452 Change the audio key (i.e. pitch but not tempo). This is just
2453 an alias for the pitch effect.
2454
2455 pan direction
2456 Mix the audio from one channel to another. Use mixer or remix
2457 instead of this effect.
2458
2459 The direction is a value from -1 to 1. -1 represents far left
2460 and 1 represents far right.
2461
2462 polyphase [-w nut|ham] [-width n] [-cut-off c]
2463 Change the sampling rate using `polyphase interpolation', a DSP
2464 algorithm. polyphase copes with only certain rational fraction
2465 resampling ratios, and, compared with the rate effect, is gener‐
2466 ally slow, memory intensive, and has poorer stop-band rejection.
2467
2468 If the -w parameter is nut, then a Blackman-Nuttall (~90 dB
2469 stop-band) window will be used; ham selects a Hamming (~43 dB
2470 stop-band) window. The default is Blackman-Nuttall.
2471
2472 The -width parameter specifies the (approximate) width of the
2473 filter. The default is 1024 samples, which produces reasonable
2474 results.
2475
2476 The -cut-off value (c) specifies the filter cut-off frequency in
2477 terms of fraction of frequency bandwidth, also know as the
2478 Nyquist frequency. See the resample effect for further informa‐
2479 tion on Nyquist frequency. If up-sampling, then this is the
2480 fraction of the original signal that should go through. If
2481 down-sampling, this is the fraction of the signal left after
2482 down-sampling. The default is 0.95.
2483
2484 See also rate, rabbit and resample for other sample-rate chang‐
2485 ing effects.
2486
2487 rabbit [-c0|-c1|-c2|-c3|-c4]
2488 Change the sampling rate using libsamplerate, also known as
2489 `Secret Rabbit Code'. This effect is optional and, due to
2490 licence issues, is not included in all versions of SoX. Com‐
2491 pared with the rate effect, rabbit is very slow.
2492
2493 See http://www.mega-nerd.com/SRC for details of the algorithms.
2494 Algorithms 0 through 2 are progressively faster and lower qual‐
2495 ity versions of the sinc algorithm; the default is -c0. Algo‐
2496 rithm 3 is zero-order hold, and 4 is linear interpolation.
2497
2498 See also rate, polyphase and resample for other sample-rate
2499 changing effects, and see resample for more discussion of resam‐
2500 pling.
2501
2502 resample [-qs|-q|-ql] [rolloff [beta]]
2503 Change the sampling rate using simulated analog filtration.
2504 Compared with the rate effect, resample is slow, and has poorer
2505 stop-band rejection. Only the low quality option works with all
2506 resampling ratios.
2507
2508 By default, linear interpolation of the filter coefficients is
2509 used, with a window width about 45 samples at the lower of the
2510 two rates. This gives an accuracy of about 16 bits, but insuf‐
2511 ficient stop-band rejection in the case that you want to have
2512 roll-off greater than about 0.8 of the Nyquist frequency.
2513
2514 The -q* options will change the default values for roll-off and
2515 beta as well as use quadratic interpolation of filter coeffi‐
2516 cients, resulting in about 24 bits precision. The -qs, -q, or
2517 -ql options specify increased accuracy at the cost of lower exe‐
2518 cution speed. It is optional to specify roll-off and beta
2519 parameters when using the -q* options.
2520
2521 Following is a table of the reasonable defaults which are built-
2522 in to SoX:
2523
2524
2525 ┌──────────────────────────────────────────────────┐
2526 │Option Window Roll-off Beta Interpolation │
2527 │(none) 45 0.80 16 linear │
2528 │ -qs 45 0.80 16 quadratic │
2529 │ -q 75 0.875 16 quadratic │
2530 │ -ql 149 0.94 16 quadratic │
2531 └──────────────────────────────────────────────────┘
2532 -qs, -q, or -ql use window lengths of 45, 75, or 149 samples,
2533 respectively, at the lower sample-rate of the two files. This
2534 means progressively sharper stop-band rejection, at proportion‐
2535 ally slower execution times.
2536
2537 rolloff refers to the cut-off frequency of the low pass filter
2538 and is given in terms of the Nyquist frequency for the lower
2539 sample rate. rolloff therefore should be something between 0
2540 and 1, in practise 0.8-0.95. The defaults are indicated above.
2541
2542 The Nyquist frequency is equal to half the sample rate. Logi‐
2543 cally, this is because the A/D converter needs at least 2 sam‐
2544 ples to detect 1 cycle at the Nyquist frequency. Frequencies
2545 higher then the Nyquist will actually appear as lower frequen‐
2546 cies to the A/D converter and is called aliasing. Normally, A/D
2547 converts run the signal through a lowpass filter first to avoid
2548 these problems.
2549
2550 Similar problems will happen in software when reducing the sam‐
2551 ple rate of an audio file (frequencies above the new Nyquist
2552 frequency can be aliased to lower frequencies). Therefore, a
2553 good resample effect will remove all frequency information above
2554 the new Nyquist frequency.
2555
2556 The rolloff refers to how close to the Nyquist frequency this
2557 cut-off is, with closer being better. When increasing the sam‐
2558 ple rate of an audio file you would not expect to have any fre‐
2559 quencies exist that are past the original Nyquist frequency.
2560 Because of resampling properties, it is common to have aliasing
2561 artifacts created above the old Nyquist frequency. In that case
2562 the rolloff refers to how close to the original Nyquist fre‐
2563 quency to use a highpass filter to remove these artifacts, with
2564 closer also being better.
2565
2566 The beta, if unspecified, defaults to 16. This selects a Kaiser
2567 window. You can select a Blackman-Nuttall window by specifying
2568 anything ≤ 2 here. For more discussion of beta, look under the
2569 filter effect.
2570
2571 Default parameters are, as indicated above, Kaiser window of
2572 length 45, roll-off 0.80, beta 16, linear interpolation.
2573
2574 Note: -qs is only slightly slower, but more accurate for 16-bit
2575 or higher precision.
2576
2577 See also rate, polyphase and rabbit for other sample-rate chang‐
2578 ing effects. There is a detailed analysis of resample and
2579 polyphase at http://leute.server.de/wilde/resample.html; see
2580 rabbit for a pointer to its own documentation.
2581
2583 Exit status is 0 for no error, 1 if there is a problem with the com‐
2584 mand-line parameters, or 2 if an error occurs during file processing.
2585
2587 Please report any bugs found in this version of SoX to the mailing list
2588 (sox-users@lists.sourceforge.net).
2589
2591 soxi(1), soxformat(7), libsox(3)
2592 audacity(1), ImageMagick(1), gnuplot(1), octave(1), wget(1)
2593 The SoX web site at http://sox.sourceforge.net
2594 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
2595
2596 References
2597 [1] R. Bristow-Johnson, Cookbook formulae for audio EQ biquad filter
2598 coefficients, http://musicdsp.org/files/Audio-EQ-Cookbook.txt
2599
2600 [2] Wikipedia, Q-factor, http://en.wikipedia.org/wiki/Q_factor
2601
2602 [3] Scott Lehman, Effects Explained, http://harmony-cen‐
2603 tral.com/Effects/effects-explained.html
2604
2605 [4] Wikipedia, Decibel, http://en.wikipedia.org/wiki/Decibel
2606
2607 [5] Richard Furse, Linux Audio Developer's Simple Plugin API,
2608 http://www.ladspa.org
2609
2610 [6] Richard Furse, Computer Music Toolkit, http://www.ladspa.org/cmt
2611
2612 [7] Steve Harris, LADSPA plugins, http://plugin.org.uk
2613
2615 Copyright 1991 Lance Norskog and Sundry Contributors.
2616 Copyright 1998-2008 Chris Bagwell and SoX Contributors.
2617
2618 This program is free software; you can redistribute it and/or modify it
2619 under the terms of the GNU General Public License as published by the
2620 Free Software Foundation; either version 2, or (at your option) any
2621 later version.
2622
2623 This program is distributed in the hope that it will be useful, but
2624 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
2625 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
2626 Public License for more details.
2627
2629 Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
2630 tributors are listed in the AUTHORS file that is distributed with the
2631 source code.
2632
2633
2634
2635sox October 28, 2008 SoX(1)