1SoX(1) Sound eXchange SoX(1)
2
3
4
6 SoX - Sound eXchange, the Swiss Army knife of audio manipulation
7
9 sox [global-options] [format-options] infile1
10 [[format-options] infile2] ... [format-options] outfile
11 [effect [effect-options]] ...
12
13 play [global-options] [format-options] infile1
14 [[format-options] infile2] ... [format-options]
15 [effect [effect-options]] ...
16
17 rec [global-options] [format-options] outfile
18 [effect [effect-options]] ...
19
21 Introduction
22 SoX reads and writes audio files in most popular formats and can
23 optionally apply effects to them. It can combine multiple input
24 sources, synthesise audio, and, on many systems, act as a general pur‐
25 pose audio player or a multi-track audio recorder. It also has limited
26 ability to split the input into multiple output files.
27
28 All SoX functionality is available using just the sox command. To sim‐
29 plify playing and recording audio, if SoX is invoked as play, the out‐
30 put file is automatically set to be the default sound device, and if
31 invoked as rec, the default sound device is used as an input source.
32 Additionally, the soxi(1) command provides a convenient way to just
33 query audio file header information.
34
35 The heart of SoX is a library called libSoX. Those interested in
36 extending SoX or using it in other programs should refer to the libSoX
37 manual page: libsox(3).
38
39 SoX is a command-line audio processing tool, particularly suited to
40 making quick, simple edits and to batch processing. If you need an
41 interactive, graphical audio editor, use audacity(1).
42
43 * * *
44
45 The overall SoX processing chain can be summarised as follows:
46
47 Input(s) → Combiner → Effects → Output(s)
48
49 Note however, that on the SoX command line, the positions of the Out‐
50 put(s) and the Effects are swapped w.r.t. the logical flow just shown.
51 Note also that whilst options pertaining to files are placed before
52 their respective file name, the opposite is true for effects. To show
53 how this works in practice, here is a selection of examples of how SoX
54 might be used. The simple
55 sox recital.au recital.wav
56 translates an audio file in Sun AU format to a Microsoft WAV file,
57 whilst
58 sox recital.au -b 16 recital.wav channels 1 rate 16k fade 3 norm
59 performs the same format translation, but also applies four effects
60 (down-mix to one channel, sample rate change, fade-in, nomalize), and
61 stores the result at a bit-depth of 16.
62 sox -r 16k -e signed -b 8 -c 1 voice-memo.raw voice-memo.wav
63 converts `raw' (a.k.a. `headerless') audio to a self-describing file
64 format,
65 sox slow.aiff fixed.aiff speed 1.027
66 adjusts audio speed,
67 sox short.wav long.wav longer.wav
68 concatenates two audio files, and
69 sox -m music.mp3 voice.wav mixed.flac
70 mixes together two audio files.
71 play "The Moonbeams/Greatest/*.ogg" bass +3
72 plays a collection of audio files whilst applying a bass boosting
73 effect,
74 play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1
75 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
76 rec -c 2 radio.aiff trim 0 30:00
77 records half an hour of stereo audio, and
78 play -q take1.aiff & rec -M take1.aiff take1-dub.aiff
79 (with POSIX shell and where supported by hardware) records a new track
80 in a multi-track recording. Finally,
81 rec -r 44100 -b 16 -s -p silence 1 0.50 0.1% 1 10:00 0.1% | \
82 sox -p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \
83 newfile : restart
84 records a stream of audio such as LP/cassette and splits in to multiple
85 audio files at points with 2 seconds of silence. Also, it does not
86 start recording until it detects audio is playing and stops after it
87 sees 10 minutes of silence.
88
89 N.B. The above is just an overview of SoX's capabilities; detailed
90 explanations of how to use all SoX parameters, file formats, and
91 effects can be found below in this manual, in soxformat(7), and in
92 soxi(1).
93
94 File Format Types
95 SoX can work with `self-describing' and `raw' audio files. `self-
96 describing' formats (e.g. WAV, FLAC, MP3) have a header that completely
97 describes the signal and encoding attributes of the audio data that
98 follows. `raw' or `headerless' formats do not contain this information,
99 so the audio characteristics of these must be described on the SoX com‐
100 mand line or inferred from those of the input file.
101
102 The following four characteristics are used to describe the format of
103 audio data such that it can be processed with SoX:
104
105 sample rate
106 The sample rate in samples per second (`Hertz' or `Hz'). Digi‐
107 tal telephony traditionally uses a sample rate of 8000 Hz
108 (8 kHz), though these days, 16 and even 32 kHz are becoming more
109 common. Audio Compact Discs use 44100 Hz (44.1 kHz). Digital
110 Audio Tape and many computer systems use 48 kHz. Professional
111 audio systems often use 96 kHz.
112
113 sample size
114 The number of bits used to store each sample. Today, 16-bit is
115 commonly used. 8-bit was popular in the early days of computer
116 audio. 24-bit is used in the professional audio arena. Other
117 sizes are also used.
118
119 data encoding
120 The way in which each audio sample is represented (or
121 `encoded'). Some encodings have variants with different byte-
122 orderings or bit-orderings. Some compress the audio data so
123 that the stored audio data takes up less space (i.e. disk space
124 or transmission bandwidth) than the other format parameters and
125 the number of samples would imply. Commonly-used encoding types
126 include floating-point, μ-law, ADPCM, signed-integer PCM, MP3,
127 and FLAC.
128
129 channels
130 The number of audio channels contained in the file. One
131 (`mono') and two (`stereo') are widely used. `Surround sound'
132 audio typically contains six or more channels.
133
134 The term `bit-rate' is a measure of the amount of storage occupied by
135 an encoded audio signal over a unit of time. It can depend on all of
136 the above and is typically denoted as a number of kilo-bits per second
137 (kbps). An A-law telephony signal has a bit-rate of 64 kbps.
138 MP3-encoded stereo music typically has a bit-rate of 128-196 kbps.
139 FLAC-encoded stereo music typically has a bit-rate of 550-760 kbps.
140
141 Most self-describing formats also allow textual `comments' to be embed‐
142 ded in the file that can be used to describe the audio in some way,
143 e.g. for music, the title, the author, etc.
144
145 One important use of audio file comments is to convey `Replay Gain'
146 information. SoX supports applying Replay Gain information, but not
147 generating it. Note that by default, SoX copies input file comments to
148 output files that support comments, so output files may contain Replay
149 Gain information if some was present in the input file. In this case,
150 if anything other than a simple format conversion was performed then
151 the output file Replay Gain information is likely to be incorrect and
152 so should be recalculated using a tool that supports this (not SoX).
153
154 The soxi(1) command can be used to display information from audio file
155 headers.
156
157 Determining & Setting The File Format
158 There are several mechanisms available for SoX to use to determine or
159 set the format characteristics of an audio file. Depending on the cir‐
160 cumstances, individual characteristics may be determined or set using
161 different mechanisms.
162
163 To determine the format of an input file, SoX will use, in order of
164 precedence and as given or available:
165
166 1. Command-line format options.
167
168 2. The contents of the file header.
169
170 3. The filename extension.
171
172 To set the output file format, SoX will use, in order of precedence and
173 as given or available:
174
175 1. Command-line format options.
176
177 2. The filename extension.
178
179 3. The input file format characteristics, or the closest that is sup‐
180 ported by the output file type.
181
182 For all files, SoX will exit with an error if the file type cannot be
183 determined. Command-line format options may need to be added or changed
184 to resolve the problem.
185
186 Playing & Recording Audio
187 The play and rec commands are provided so that basic playing and
188 recording is as simple as
189 play existing-file.wav
190 and
191 rec new-file.wav
192 These two commands are functionally equivalent to
193 sox existing-file.wav -d
194 and
195 sox -d new-file.wav
196 Of course, further options and effects (as described below) can be
197 added to the commands in either form.
198
199 * * *
200
201 Some systems provide more than one type of (SoX-compatible) audio
202 driver, e.g. ALSA & OSS, or SUNAU & AO. Systems can also have more
203 than one audio device (a.k.a. `sound card'). If more than one audio
204 driver has been built-in to SoX, and the default selected by SoX when
205 recording or playing is not the one that is wanted, then the AUDIO‐
206 DRIVER environment variable can be used to override the default. For
207 example (on many systems):
208 set AUDIODRIVER=oss
209 play ...
210 The AUDIODEV environment variable can be used to override the default
211 audio device, e.g.
212 set AUDIODEV=/dev/dsp2
213 play ...
214 sox ... -t oss
215 or
216 set AUDIODEV=hw:soundwave,1,2
217 play ...
218 sox ... -t alsa
219 Note that the way of setting environment variables varies from system
220 to system - for some specific examples, see `SOX_OPTS' below.
221
222 When playing a file with a sample rate that is not supported by the
223 audio output device, SoX will automatically invoke the rate effect to
224 perform the necessary sample rate conversion. For compatibility with
225 old hardware, the default rate quality level is set to `low'. This can
226 be changed by explicitly specifying the rate effect with a different
227 quality level, e.g.
228 play ... rate -m
229 or by using the --play-rate-arg option (see below).
230
231 * * *
232
233 On some systems, SoX allows audio playback volume to be adjusted whilst
234 using play. Where supported, this is achieved by tapping the `v' & `V'
235 keys during playback.
236
237 To help with setting a suitable recording level, SoX includes a peak-
238 level meter which can be invoked (before making the actual recording)
239 as follows:
240 rec -n
241 The recording level should be adjusted (using the system-provided mixer
242 program, not SoX) so that the meter is at most occasionally full scale,
243 and never `in the red' (an exclamation mark is shown). See also -S
244 below.
245
246 Accuracy
247 Many file formats that compress audio discard some of the audio signal
248 information whilst doing so. Converting to such a format and then con‐
249 verting back again will not produce an exact copy of the original
250 audio. This is the case for many formats used in telephony (e.g. A-
251 law, GSM) where low signal bandwidth is more important than high audio
252 fidelity, and for many formats used in portable music players (e.g.
253 MP3, Vorbis) where adequate fidelity can be retained even with the
254 large compression ratios that are needed to make portable players prac‐
255 tical.
256
257 Formats that discard audio signal information are called `lossy'. For‐
258 mats that do not are called `lossless'. The term `quality' is used as
259 a measure of how closely the original audio signal can be reproduced
260 when using a lossy format.
261
262 Audio file conversion with SoX is lossless when it can be, i.e. when
263 not using lossy compression, when not reducing the sampling rate or
264 number of channels, and when the number of bits used in the destination
265 format is not less than in the source format. E.g. converting from an
266 8-bit PCM format to a 16-bit PCM format is lossless but converting from
267 an 8-bit PCM format to (8-bit) A-law isn't.
268
269 N.B. SoX converts all audio files to an internal uncompressed format
270 before performing any audio processing. This means that manipulating a
271 file that is stored in a lossy format can cause further losses in audio
272 fidelity. E.g. with
273 sox long.mp3 short.mp3 trim 10
274 SoX first decompresses the input MP3 file, then applies the trim
275 effect, and finally creates the output MP3 file by re-compressing the
276 audio - with a possible reduction in fidelity above that which occurred
277 when the input file was created. Hence, if what is ultimately desired
278 is lossily compressed audio, it is highly recommended to perform all
279 audio processing using lossless file formats and then convert to the
280 lossy format only at the final stage.
281
282 N.B. Applying multiple effects with a single SoX invocation will, in
283 general, produce more accurate results than those produced using multi‐
284 ple SoX invocations.
285
286 Dithering
287 Dithering is a technique used to maximise the dynamic range of audio
288 stored at a particular bit-depth. Any distortion introduced by quanti‐
289 sation is decorrelated by adding a small amount of white noise to the
290 signal. In most cases, SoX can determine whether the selected process‐
291 ing requires dither and will add it during output formatting if appro‐
292 priate.
293
294 Specifically, by default, SoX automatically adds TPDF dither when the
295 output bit-depth is less than 24 and any of the following are true:
296
297 · bit-depth reduction has been specified explicitly using a command-
298 line option
299
300 · the output file format supports only bit-depths lower than that of
301 the input file format
302
303 · an effect has increased effective bit-depth within the internal
304 processing chain
305
306 For example, adjusting volume with vol 0.25 requires two additional
307 bits in which to losslessly store its results (since 0.25 decimal
308 equals 0.01 binary). So if the input file bit-depth is 16, then SoX's
309 internal representation will utilise 18 bits after processing this vol‐
310 ume change. In order to store the output at the same depth as the
311 input, dithering is used to remove the additional bits.
312
313 Use the -V option to see what processing SoX has automatically added.
314 The -D option may be given to override automatic dithering. To invoke
315 dithering manually (e.g. to select a noise-shaping curve), see the
316 dither effect.
317
318 Clipping
319 Clipping is distortion that occurs when an audio signal level (or `vol‐
320 ume') exceeds the range of the chosen representation. In most cases,
321 clipping is undesirable and so should be corrected by adjusting the
322 level prior to the point (in the processing chain) at which it occurs.
323
324 In SoX, clipping could occur, as you might expect, when using the vol
325 or gain effects to increase the audio volume. Clipping could also occur
326 with many other effects, when converting one format to another, and
327 even when simply playing the audio.
328
329 Playing an audio file often involves resampling, and processing by ana‐
330 logue components can introduce a small DC offset and/or amplification,
331 all of which can produce distortion if the audio signal level was ini‐
332 tially too close to the clipping point.
333
334 For these reasons, it is usual to make sure that an audio file's signal
335 level has some `headroom', i.e. it does not exceed a particular level
336 below the maximum possible level for the given representation. Some
337 standards bodies recommend as much as 9dB headroom, but in most cases,
338 3dB (≈ 70% linear) is enough. Note that this wisdom seems to have been
339 lost in modern music production; in fact, many CDs, MP3s, etc. are now
340 mastered at levels above 0dBFS i.e. the audio is clipped as delivered.
341
342 SoX's stat and stats effects can assist in determining the signal level
343 in an audio file. The gain or vol effect can be used to prevent clip‐
344 ping, e.g.
345 sox dull.wav bright.wav gain -6 treble +6
346 guarantees that the treble boost will not clip.
347
348 If clipping occurs at any point during processing, SoX will display a
349 warning message to that effect.
350
351 See also -G and the gain and norm effects.
352
353 Input File Combining
354 SoX's input combiner can be configured (see OPTIONS below) to combine
355 multiple files using any of the following methods: `concatenate',
356 `sequence', `mix', `mix-power', `merge', or `multiply'. The default
357 method is `sequence' for play, and `concatenate' for rec and sox.
358
359 For all methods other than `sequence', multiple input files must have
360 the same sampling rate. If necessary, separate SoX invocations can be
361 used to make sampling rate adjustments prior to combining.
362
363 If the `concatenate' combining method is selected (usually, this will
364 be by default) then the input files must also have the same number of
365 channels. The audio from each input will be concatenated in the order
366 given to form the output file.
367
368 The `sequence' combining method is selected automatically for play. It
369 is similar to `concatenate' in that the audio from each input file is
370 sent serially to the output file. However, here the output file may be
371 closed and reopened at the corresponding transition between input
372 files. This may be just what is needed when sending different types of
373 audio to an output device, but is not generally useful when the output
374 is a normal file.
375
376 If either the `mix' or `mix-power' combining method is selected then
377 two or more input files must be given and will be mixed together to
378 form the output file. The number of channels in each input file need
379 not be the same, but SoX will issue a warning if they are not and some
380 channels in the output file will not contain audio from every input
381 file. A mixed audio file cannot be un-mixed without reference to the
382 original input files.
383
384 If the `merge' combining method is selected then two or more input
385 files must be given and will be merged together to form the output
386 file. The number of channels in each input file need not be the same.
387 A merged audio file comprises all of the channels from all of the input
388 files. Un-merging is possible using multiple invocations of SoX with
389 the remix effect. For example, two mono files could be merged to form
390 one stereo file. The first and second mono files would become the left
391 and right channels of the stereo file.
392
393 The `multiply' combining method multiplies the sample values of corre‐
394 sponding channels (treated as numbers in the interval -1 to +1). If
395 the number of channels in the input files is not the same, the missing
396 channels are considered to contain all zero.
397
398 When combining input files, SoX applies any specified effects (includ‐
399 ing, for example, the vol volume adjustment effect) after the audio has
400 been combined. However, it is often useful to be able to set the volume
401 of (i.e. `balance') the inputs individually, before combining takes
402 place.
403
404 For all combining methods, input file volume adjustments can be made
405 manually using the -v option (below) which can be given for one or more
406 input files. If it is given for only some of the input files then the
407 others receive no volume adjustment. In some circumstances, automatic
408 volume adjustments may be applied (see below).
409
410 The -V option (below) can be used to show the input file volume adjust‐
411 ments that have been selected (either manually or automatically).
412
413 There are some special considerations that need to made when mixing
414 input files:
415
416 Unlike the other methods, `mix' combining has the potential to cause
417 clipping in the combiner if no balancing is performed. In this case,
418 if manual volume adjustments are not given, SoX will try to ensure that
419 clipping does not occur by automatically adjusting the volume (ampli‐
420 tude) of each input signal by a factor of ¹/n, where n is the number of
421 input files. If this results in audio that is too quiet or otherwise
422 unbalanced then the input file volumes can be set manually as described
423 above. Using the norm effect on the mix is another alternative.
424
425 If mixed audio seems loud enough at some points but too quiet in others
426 then dynamic range compression should be applied to correct this - see
427 the compand effect.
428
429 With the `mix-power' combine method, the mixed volume is approximately
430 equal to that of one of the input signals. This is achieved by balanc‐
431 ing using a factor of ¹/√n instead of ¹/n. Note that this balancing
432 factor does not guarantee that clipping will not occur, but the number
433 of clips will usually be low and the resultant distortion is generally
434 imperceptible.
435
436 Output Files
437 SoX's default behaviour is to take one or more input files and write
438 them to a single output file.
439
440 This behaviour can be changed by specifying the pseudo-effect `newfile'
441 within the effects list. SoX will then enter multiple output mode.
442
443 In multiple output mode, a new file is created when the effects prior
444 to the `newfile' indicate they are done. The effects chain listed
445 after `newfile' is then started up and its output is saved to the new
446 file.
447
448 In multiple output mode, a unique number will automatically be appended
449 to the end of all filenames. If the filename has an extension then the
450 number is inserted before the extension. This behaviour can be custom‐
451 ized by placing a %n anywhere in the filename where the number should
452 be substituted. An optional number can be placed after the % to indi‐
453 cate a minimum fixed width for the number.
454
455 Multiple output mode is not very useful unless an effect that will stop
456 the effects chain early is specified before the `newfile'. If end of
457 file is reached before the effects chain stops itself then no new file
458 will be created as it would be empty.
459
460 The following is an example of splitting the first 60 seconds of an
461 input file into two 30 second files and ignoring the rest.
462 sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
463
464 Stopping SoX
465 Usually SoX will complete its processing and exit automatically once it
466 has read all available audio data from the input files.
467
468 If desired, it can be terminated earlier by sending an interrupt signal
469 to the process (usually by pressing the keyboard interrupt key which is
470 normally Ctrl-C). This is a natural requirement in some circumstances,
471 e.g. when using SoX to make a recording. Note that when using SoX to
472 play multiple files, Ctrl-C behaves slightly differently: pressing it
473 once causes SoX to skip to the next file; pressing it twice in quick
474 succession causes SoX to exit.
475
476 Another option to stop processing early is to use an effect that has a
477 time period or sample count to determine the stopping point. The trim
478 effect is an example of this. Once all effects chains have stopped
479 then SoX will also stop.
480
482 Filenames can be simple file names, absolute or relative path names, or
483 URLs (input files only). Note that URL support requires that wget(1)
484 is available.
485
486 Note: Giving SoX an input or output filename that is the same as a SoX
487 effect-name will not work since SoX will treat it as an effect
488 specification. The only work-around to this is to avoid such
489 filenames. This is generally not difficult since most audio filenames
490 have a filename `extension', whilst effect-names do not.
491
492 Special Filenames
493 The following special filenames may be used in certain circumstances in
494 place of a normal filename on the command line:
495
496 - SoX can be used in simple pipeline operations by using the
497 special filename `-' which, if used as an input filename, will
498 cause SoX will read audio data from `standard input' (stdin),
499 and which, if used as the output filename, will cause SoX will
500 send audio data to `standard output' (stdout). Note that when
501 using this option for the output file, and sometimes when using
502 it for an input file, the file-type (see -t below) must also be
503 given.
504
505 "|program [options] ..."
506 This can be used in place of an input filename to specify the
507 the given program's standard output (stdout) be used as an input
508 file. Unlike - (above), this can be used for several inputs to
509 one SoX command. For example, if `genw' generates mono WAV
510 formatted signals to its standard output, then the following
511 command makes a stereo file from two generated signals:
512 sox -M "|genw --imd -" "|genw --thd -" out.wav
513 For headerless (raw) audio, -t (and perhaps other format
514 options) will need to be given, preceding the input command.
515
516 "wildcard-filename"
517 Specifies that filename `globbing' (wild-card matching) should
518 be performed by SoX instead of by the shell. This allows a sin‐
519 gle set of file options to be applied to a group of files. For
520 example, if the current directory contains three `vox' files,
521 file1.vox, file2.vox, and file3.vox, then
522 play --rate 6k *.vox
523 will be expanded by the `shell' (in most environments) to
524 play --rate 6k file1.vox file2.vox file3.vox
525 which will treat only the first vox file as having a sample rate
526 of 6k. With
527 play --rate 6k "*.vox"
528 the given sample rate option will be applied to all three vox
529 files.
530
531 -p, --sox-pipe
532 This can be used in place of an output filename to specify that
533 the SoX command should be used as in input pipe to another SoX
534 command. For example, the command:
535 play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat
536 plays two `files' in succession, each with different effects.
537
538 -p is in fact an alias for `-t sox -'.
539
540 -d, --default-device
541 This can be used in place of an input or output filename to
542 specify that the default audio device (if one has been built
543 into SoX) is to be used. This is akin to invoking rec or play
544 (as described above).
545
546 -n, --null
547 This can be used in place of an input or output filename to
548 specify that a `null file' is to be used. Note that here, `null
549 file' refers to a SoX-specific mechanism and is not related to
550 any operating-system mechanism with a similar name.
551
552 Using a null file to input audio is equivalent to using a normal
553 audio file that contains an infinite amount of silence, and as
554 such is not generally useful unless used with an effect that
555 specifies a finite time length (such as trim or synth).
556
557 Using a null file to output audio amounts to discarding the
558 audio and is useful mainly with effects that produce information
559 about the audio instead of affecting it (such as noiseprof or
560 stat).
561
562 The sampling rate associated with a null file is by default
563 48 kHz, but, as with a normal file, this can be overridden if
564 desired using command-line format options (see below).
565
566 Supported File & Audio Device Types
567 See soxformat(7) for a list and description of the supported file for‐
568 mats and audio device drivers.
569
571 Global Options
572 These options can be specified on the command line at any point before
573 the first effect name.
574
575 The SOX_OPTS environment variable can be used to provide alternative
576 default values for SoX's global options. For example:
577 SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"
578 Note that setting SOX_OPTS can potentially create unwanted changes in
579 the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
580 might best be used for things (such as in the given example) that
581 reflect the environment in which SoX is being run. Enabling options
582 such as --no-clobber as default might be handled better using a shell
583 alias since a shell alias will not affect operation in scripts etc.
584
585 One way to ensure that a script cannot be affected by SOX_OPTS is to
586 clear SOX_OPTS at the start of the script, but this of course loses the
587 benefit of SOX_OPTS carrying some system-wide default options. An
588 alternative approach is to explicitly invoke SoX with default option
589 values, e.g.
590 SOX_OPTS="-V --no-clobber"
591 ...
592 sox -V2 --clobber $input $output ...
593 Note that the way to set environment variables varies from system to
594 system. Here are some examples:
595
596 Unix bash:
597 export SOX_OPTS="-V --no-clobber"
598 Unix csh:
599 setenv SOX_OPTS "-V --no-clobber"
600 MS-DOS/MS-Windows:
601 set SOX_OPTS=-V --no-clobber
602 MS-Windows GUI: via Control Panel : System : Advanced : Environment
603 Variables
604
605 Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
606
607 --buffer BYTES, --input-buffer BYTES
608 Set the size in bytes of the buffers used for processing audio
609 (default 8192). --buffer applies to input, effects, and output
610 processing; --input-buffer applies only to input processing (for
611 which it overrides --buffer if both are given).
612
613 Be aware that large values for --buffer will cause SoX to be
614 become slow to respond to requests to terminate or to skip the
615 current input file.
616
617 --clobber
618 Don't prompt before overwriting an existing file with the same
619 name as that given for the output file. This is the default be‐
620 haviour.
621
622 --combine concatenate|merge|mix|mix-power|multiply|sequence
623 Select the input file combining method; for some of these, short
624 options are available: -m selects `mix', -M selects `merge', and
625 -T selects `multiply'.
626
627 See Input File Combining above for a description of the differ‐
628 ent combining methods.
629
630 -D, --no-dither
631 Disable automatic dither - see `Dithering' above. An example of
632 why this might occasionally be useful is if a file has been con‐
633 verted from 16 to 24 bit with the intention of doing some pro‐
634 cessing on it, but in fact no processing is needed after all and
635 the original 16 bit file has been lost, then, strictly speaking,
636 no dither is needed if converting the file back to 16 bit. See
637 also the stats effect for how to determine the actual bit depth
638 of the audio within a file.
639
640 --effects-file FILENAME
641 Use FILENAME to obtain all effects and their arguments. The
642 file is parsed as if the values were specified on the command
643 line. A new line can be used in place of the special : marker
644 to separate effect chains. For convenience, such markers at the
645 end of the file are normally ignored; if you want to specify an
646 empty last effects chain, use an explicit : by itself on the
647 last line of the file. This option causes any effects specified
648 on the command line to be discarded.
649
650 -G, --guard
651 Automatically invoke the gain effect to guard against clipping.
652 E.g.
653 sox -G infile -b 16 outfile rate 44100 dither -s
654 is shorthand for
655 sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s
656 See also -V, --norm, and the gain effect.
657
658 -h, --help
659 Show version number and usage information.
660
661 --help-effect NAME
662 Show usage information on the specified effect. The name all
663 can be used to show usage on all effects.
664
665 --help-format NAME
666 Show information about the specified file format. The name all
667 can be used to show information on all formats.
668
669 --i, --info
670 Only if given as the first parameter to sox, behave as soxi(1).
671
672 -m|-M Equivalent to --combine mix and --combine merge, respectively.
673
674 --magic
675 If SoX has been built with the optional `libmagic' library then
676 this option can be given to enable its use in helping to detect
677 audio file types.
678
679 --multi-threaded | --single-threaded
680 By default, SoX is `single threaded'. If the --multi-threaded
681 option is given however then SoX will process audio channels for
682 most multi-channel effects in parallel on hyper-threading/multi-
683 core architectures. This may reduce processing time, though
684 sometimes it may be necessary to use this option in conjuction
685 with a larger buffer size than is the default to gain any bene‐
686 fit from multi-threaded processing (e.g. 131072; see --buffer
687 above).
688
689 --no-clobber
690 Prompt before overwriting an existing file with the same name as
691 that given for the output file.
692
693 N.B. Unintentionally overwriting a file is easier than you
694 might think, for example, if you accidentally enter
695 sox file1 file2 effect1 effect2 ...
696 when what you really meant was
697 play file1 file2 effect1 effect2 ...
698 then, without this option, file2 will be overwritten. Hence,
699 using this option is recommended. SOX_OPTS (above), a `shell'
700 alias, script, or batch file may be an appropriate way of perma‐
701 nently enabling it.
702
703 --norm[=dB-level]
704 Automatically invoke the gain effect to guard against clipping
705 and to normalise the audio. E.g.
706 sox --norm infile -b 16 outfile rate 44100 dither -s
707 is shorthand for
708 sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s
709 Optionally, the audio can be normalized to a given level (usu‐
710 ally) below 0 dBFS:
711 sox --norm=-3 infile outfile
712
713 See also -V, -G, and the gain effect.
714
715 --play-rate-arg ARG
716 Selects a quality option to be used when the `rate' effect is
717 automatically invoked whilst playing audio. This option is typ‐
718 ically set via the SOX_OPTS environment variable (see above).
719
720 --plot gnuplot|octave|off
721 If not set to off (the default if --plot is not given), run in a
722 mode that can be used, in conjunction with the gnuplot program
723 or the GNU Octave program, to assist with the selection and con‐
724 figuration of many of the transfer-function based effects. For
725 the first given effect that supports the selected plotting pro‐
726 gram, SoX will output commands to plot the effect's transfer
727 function, and then exit without actually processing any audio.
728 E.g.
729 sox --plot octave input-file -n highpass 1320 > highpass.plt
730 octave highpass.plt
731
732 -q, --no-show-progress
733 Run in quiet mode when SoX wouldn't otherwise do so. This is
734 the opposite of the -S option.
735
736 -R Run in `repeatable' mode. When this option is given, where
737 applicable, SoX will embed a fixed time-stamp in the output file
738 (e.g. AIFF) and will `seed' pseudo random number generators
739 (e.g. dither) with a fixed number, thus ensuring that succes‐
740 sive SoX invocations with the same inputs and the same parame‐
741 ters yield the same output.
742
743 --replay-gain track|album|off
744 Select whether or not to apply replay-gain adjustment to input
745 files. The default is off for sox and rec, album for play where
746 (at least) the first two input files are tagged with the same
747 Artist and Album names, and track for play otherwise.
748
749 -S, --show-progress
750 Display input file format/header information, and processing
751 progress as input file(s) percentage complete, elapsed time, and
752 remaining time (if known; shown in brackets), and the number of
753 samples written to the output file. Also shown is a peak-level
754 meter, and an indication if clipping has occurred. The peak-
755 level meter shows up to two channels and is calibrated for digi‐
756 tal audio as follows (right channel shown):
757
758 dB FSD Display dB FSD Display
759 -25 - -11 ====
760 -23 = -9 ====-
761 -21 =- -7 =====
762 -19 == -5 =====-
763 -17 ==- -3 ======
764 -15 === -1 =====!
765 -13 ===-
766
767 A three-second peak-held value of headroom in dBs will be shown
768 to the right of the meter if this is below 6dB.
769
770 This option is enabled by default when using SoX to play or
771 record audio.
772
773 -T Equivalent to --combine multiply.
774
775 --temp DIRECTORY
776 Specify that any temporary files should be created in the given
777 DIRECTORY. This can be useful if there are permission or free-
778 space problems with the default location. In this case, using
779 `--temp .' (to use the current directory) is often a good solu‐
780 tion.
781
782 --version
783 Show SoX's version number and exit.
784
785 -V[level]
786 Set verbosity. This is particularly useful for seeing how any
787 automatic effects have been invoked by SoX.
788
789 SoX displays messages on the console (stderr) according to the
790 following verbosity levels:
791
792 0 No messages are shown at all; use the exit status to
793 determine if an error has occurred.
794
795 1 Only error messages are shown. These are generated if
796 SoX cannot complete the requested commands.
797
798 2 Warning messages are also shown. These are generated if
799 SoX can complete the requested commands, but not exactly
800 according to the requested command parameters, or if
801 clipping occurs.
802
803 3 Descriptions of SoX's processing phases are also shown.
804 Useful for seeing exactly how SoX is processing your
805 audio.
806
807 4 and above
808 Messages to help with debugging SoX are also shown.
809
810 By default, the verbosity level is set to 2 (shows errors and
811 warnings). Each occurrence of the -V option increases the ver‐
812 bosity level by 1. Alternatively, the verbosity level can be
813 set to an absolute number by specifying it immediately after the
814 -V, e.g. -V0 sets it to 0.
815
816 Input File Options
817 These options apply only to input files and may precede only input
818 filenames on the command line.
819
820 --ignore-length
821 Override an (incorrect) audio length given in an audio file's
822 header. If this option is given then SoX will keep reading audio
823 until it reaches the end of the input file.
824
825 -v, --volume FACTOR
826 Intended for use when combining multiple input files, this
827 option adjusts the volume of the file that follows it on the
828 command line by a factor of FACTOR. This allows it to be `bal‐
829 anced' w.r.t. the other input files. This is a linear (ampli‐
830 tude) adjustment, so a number less than 1 decreases the volume
831 and a number greater than 1 increases it. If a negative number
832 is given then in addition to the volume adjustment, the audio
833 signal will be inverted.
834
835 See also the norm, vol, and gain effects, and see Input File
836 Balancing above.
837
838 Input & Output File Format Options
839 These options apply to the input or output file whose name they immedi‐
840 ately precede on the command line and are used mainly when working with
841 headerless file formats or when specifying a format for the output file
842 that is different to that of the input file.
843
844 -b BITS, --bits BITS
845 The number of bits (a.k.a. bit-depth or sometimes word-length)
846 in each encoded sample. Not applicable to complex encodings
847 such as MP3 or GSM. Not necessary with encodings that have a
848 fixed number of bits, e.g. A/μ-law, ADPCM.
849
850 For an input file, the most common use for this option is to
851 inform SoX of the number of bits per sample in a `raw' (`header‐
852 less') audio file. For example
853 sox -r 16k -e signed -b 8 input.raw output.wav
854 converts a particular `raw' file to a self-describing `WAV'
855 file.
856
857 For an output file, this option can be used (perhaps along with
858 -e) to set the output encoding size. By default (i.e. if this
859 option is not given), the output encoding size will (providing
860 it is supported by the output file type) be set to the input
861 encoding size. For example
862 sox input.cdda -b 24 output.wav
863 converts raw CD digital audio (16-bit, signed-integer) to a
864 24-bit (signed-integer) `WAV' file.
865
866 -1/-2/-3/-4/-8
867 The number of bytes in each encoded sample. Deprecated aliases
868 for -b 8, -b 16, -b 24, -b 32, -b 64 respectively.
869
870 -c CHANNELS, --channels CHANNELS
871 The number of audio channels in the audio file. This can be any
872 number greater than zero.
873
874 For an input file, the most common use for this option is to
875 inform SoX of the number of channels in a `raw' (`headerless')
876 audio file. Occasionally, it may be useful to use this option
877 with a `headered' file, in order to override the (presumably
878 incorrect) value in the header - note that this is only sup‐
879 ported with certain file types. Examples:
880 sox -r 48k -e float -b 32 -c 2 input.raw output.wav
881 converts a particular `raw' file to a self-describing `WAV'
882 file.
883 play -c 1 music.wav
884 interprets the file data as belonging to a single channel
885 regardless of what is indicated in the file header. Note that
886 if the file does in fact have two channels, this will result in
887 the file playing at half speed.
888
889 For an output file, this option provides a shorthand for speci‐
890 fying that the channels effect should be invoked in order to
891 change (if necessary) the number of channels in the audio signal
892 to the number given. For example, the following two commands
893 are equivalent:
894 sox input.wav -c 1 output.wav bass -b 24
895 sox input.wav output.wav bass -b 24 channels 1
896 though the second form is more flexible as it allows the effects
897 to be ordered arbitrarily.
898
899 -e ENCODING, --encoding ENCODING
900 The audio encoding type. Sometimes needed with file-types that
901 support more than one encoding type. For example, with raw, WAV,
902 or AU (but not, for example, with MP3 or FLAC). The available
903 encoding types are as follows:
904
905 signed-integer
906 PCM data stored as signed (`two's complement') integers.
907 Commonly used with a 16 or 24 -bit encoding size. A
908 value of 0 represents minimum signal power.
909
910 unsigned-integer
911 PCM data stored as unsigned integers. Commonly used with
912 an 8-bit encoding size. A value of 0 represents maximum
913 signal power.
914
915 floating-point
916 PCM data stored as IEEE 753 single precision (32-bit) or
917 double precision (64-bit) floating-point (`real') num‐
918 bers. A value of 0 represents minimum signal power.
919
920 a-law International telephony standard for logarithmic encoding
921 to 8 bits per sample. It has a precision equivalent to
922 roughly 13-bit PCM and is sometimes encoded with reversed
923 bit-ordering (see the -X option).
924
925 u-law, mu-law
926 North American telephony standard for logarithmic encod‐
927 ing to 8 bits per sample. A.k.a. μ-law. It has a preci‐
928 sion equivalent to roughly 14-bit PCM and is sometimes
929 encoded with reversed bit-ordering (see the -X option).
930
931 oki-adpcm
932 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM; it has
933 a precision equivalent to roughly 12-bit PCM. ADPCM is a
934 form of audio compression that has a good compromise
935 between audio quality and encoding/decoding speed.
936
937 ima-adpcm
938 IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision equiva‐
939 lent to roughly 13-bit PCM.
940
941 ms-adpcm
942 Microsoft 4-bit ADPCM; it has a precision equivalent to
943 roughly 14-bit PCM.
944
945 gsm-full-rate
946 GSM is currently used for the vast majority of the
947 world's digital wireless telephone calls. It utilises
948 several audio formats with different bit-rates and asso‐
949 ciated speech quality. SoX has support for GSM's origi‐
950 nal 13kbps `Full Rate' audio format. It is usually CPU-
951 intensive to work with GSM audio.
952
953 Encoding names can be abbreviated where this would not be
954 ambiguous; e.g. `unsigned-integer' can be given as `un', but not
955 `u' (ambiguous with `u-law').
956
957 For an input file, the most common use for this option is to
958 inform SoX of the encoding of a `raw' (`headerless') audio file
959 (see the examples in -b and -c above).
960
961 For an output file, this option can be used (perhaps along with
962 -b) to set the output encoding type For example
963 sox input.cdda -e float output1.wav
964
965 sox input.cdda -b 64 -e float output2.wav
966 convert raw CD digital audio (16-bit, signed-integer) to float‐
967 ing-point `WAV' files (single & double precision respectively).
968
969 By default (i.e. if this option is not given), the output encod‐
970 ing type will (providing it is supported by the output file
971 type) be set to the input encoding type.
972
973 -s/-u/-f/-A/-U/-o/-i/-a/-g
974 Deprecated aliases for specifying the encoding types signed-
975 integer, unsigned-integer, floating-point, a-law, mu-law, oki-
976 adpcm, ima-adpcm, ms-adpcm, gsm-full-rate respectively (see -e
977 above).
978
979 --no-glob
980 Specifies that filename `globbing' (wild-card matching) should
981 not be performed by SoX on the following filename. For example,
982 if the current directory contains the two files `five-sec‐
983 onds.wav' and `five*.wav', then
984 play --no-glob "five*.wav"
985 can be used to play just the single file `five*.wav'.
986
987 -r, --rate RATE[k]
988 Gives the sample rate in Hz (or kHz if appended with `k') of the
989 file.
990
991 For an input file, the most common use for this option is to
992 inform SoX of the sample rate of a `raw' (`headerless') audio
993 file (see the examples in -b and -c above). Occasionally it may
994 be useful to use this option with a `headered' file, in order to
995 override the (presumably incorrect) value in the header - note
996 that this is only supported with certain file types. For exam‐
997 ple, if audio was recorded with a sample-rate of say 48k from a
998 source that played back a little, say 1.5%, too slowly, then
999 sox -r 48720 input.wav output.wav
1000 effectively corrects the speed by changing only the file header
1001 (but see also the speed effect for the more usual solution to
1002 this problem).
1003
1004 For an output file, this option provides a shorthand for speci‐
1005 fying that the rate effect should be invoked in order to change
1006 (if necessary) the sample rate of the audio signal to the given
1007 value. For example, the following two commands are equivalent:
1008 sox input.wav -r 48k output.wav bass -b 24
1009 sox input.wav output.wav bass -b 24 rate 48k
1010 though the second form is more flexible as it allows rate
1011 options to be given, and allows the effects to be ordered arbi‐
1012 trarily.
1013
1014 -t, --type FILE-TYPE
1015 Gives the type of the audio file. For both input and output
1016 files, this option is commonly used to inform SoX of the type a
1017 `headerless' audio file (e.g. raw, mp3) where the actual/desired
1018 type cannot be determined from a given filename extension. For
1019 example:
1020 another-command | sox -t mp3 - output.wav
1021
1022 sox input.wav -t raw output.bin
1023 It can also be used to override the type implied by an input
1024 filename extension, but if overriding with a type that has a
1025 header, SoX will exit with an appropriate error message if such
1026 a header is not actually present.
1027
1028 See soxformat(7) for a list of supported file types.
1029
1030 -L, --endian little
1031 -B, --endian big
1032 -x, --endian swap
1033 These options specify whether the byte-order of the audio data
1034 is, respectively, `little endian', `big endian', or the opposite
1035 to that of the system on which SoX is being used. Endianness
1036 applies only to data encoded as floating-point, or as signed or
1037 unsigned integers of 16 or more bits. It is often necessary to
1038 specify one of these options for headerless files, and sometimes
1039 necessary for (otherwise) self-describing files. A given
1040 endian-setting option may be ignored for an input file whose
1041 header contains a specific endianness identifier, or for an out‐
1042 put file that is actually an audio device.
1043
1044 N.B. Unlike other format characteristics, the endianness (byte,
1045 nibble, & bit ordering) of the input file is not automatically
1046 used for the output file; so, for example, when the following is
1047 run on a little-endian system:
1048 sox -B audio.s16 trimmed.s16 trim 2
1049 trimmed.s16 will be created as little-endian;
1050 sox -B audio.s16 -B trimmed.s16 trim 2
1051 must be used to preserve big-endianness in the output file.
1052
1053 The -V option can be used to check the selected orderings.
1054
1055 -N, --reverse-nibbles
1056 Specifies that the nibble ordering (i.e. the 2 halves of a byte)
1057 of the samples should be reversed; sometimes useful with ADPCM-
1058 based formats.
1059
1060 N.B. See also N.B. in section on -x above.
1061
1062 -X, --reverse-bits
1063 Specifies that the bit ordering of the samples should be
1064 reversed; sometimes useful with a few (mostly headerless) for‐
1065 mats.
1066
1067 N.B. See also N.B. in section on -x above.
1068
1069 Output File Format Options
1070 These options apply only to the output file and may precede only the
1071 output filename on the command line.
1072
1073 --add-comment TEXT
1074 Append a comment in the output file header (where applicable).
1075
1076 --comment TEXT
1077 Specify the comment text to store in the output file header
1078 (where applicable).
1079
1080 SoX will provide a default comment if this option (or --com‐
1081 ment-file) is not given. To specify that no comment should be
1082 stored in the output file, use --comment "" .
1083
1084 --comment-file FILENAME
1085 Specify a file containing the comment text to store in the out‐
1086 put file header (where applicable).
1087
1088 -C, --compression FACTOR
1089 The compression factor for variably compressing output file for‐
1090 mats. If this option is not given then a default compression
1091 factor will apply. The compression factor is interpreted dif‐
1092 ferently for different compressing file formats. See the
1093 description of the file formats that use this option in soxfor‐
1094 mat(7) for more information.
1095
1097 In addition to converting, playing and recording audio files, SoX can
1098 be used to invoke a number of audio `effects'. Multiple effects may be
1099 applied by specifying them one after another at the end of the SoX com‐
1100 mand line, forming an `effects chain'. Note that applying multiple
1101 effects in real-time (i.e. when playing audio) is likely to require a
1102 high performance computer. Stopping other applications may alleviate
1103 performance issues should they occur.
1104
1105 Some of the SoX effects are primarily intended to be applied to a sin‐
1106 gle instrument or `voice'. To facilitate this, the remix effect and
1107 the global SoX option -M can be used to isolate then recombine tracks
1108 from a multi-track recording.
1109
1110 Multiple Effects Chains
1111 A single effects chain is made up of one or more effects. Audio from
1112 the input runs through the chain until either the end of the input file
1113 is reached or an effect in the chain requests to terminate the chain.
1114
1115 SoX supports running multiple effects chains over the input audio. In
1116 this case, when one chain indicates it is done processing audio, the
1117 audio data is then sent through the next effects chain. This continues
1118 until either no more effects chains exist or the input has reached the
1119 end of the file.
1120
1121 An effects chain is terminated by placing a : (colon) after an effect.
1122 Any following effects are a part of a new effects chain.
1123
1124 It is important to place the effect that will stop the chain as the
1125 first effect in the chain. This is because any samples that are
1126 buffered by effects to the left of the terminating effect will be dis‐
1127 carded. The amount of samples discarded is related to the --buffer
1128 option and it should be kept small, relative to the sample rate, if the
1129 terminating effect cannot be first. Further information on stopping
1130 effects can be found in the Stopping SoX section.
1131
1132 There are a few pseudo-effects that aid using multiple effects chains.
1133 These include newfile which will start writing to a new output file
1134 before moving to the next effects chain and restart which will move
1135 back to the first effects chain. Pseudo-effects must be specified as
1136 the first effect in a chain and as the only effect in a chain (they
1137 must have a : before and after they are specified).
1138
1139 The following is an example of multiple effects chains. It will split
1140 the input file into multiple files of 30 seconds in length. Each out‐
1141 put filename will have unique number in its name as documented in the
1142 Output Files section.
1143 sox infile.wav output.wav trim 0 30 : newfile : restart
1144
1145 Common Notation And Parameters
1146 In the descriptions that follow, brackets [ ] are used to denote param‐
1147 eters that are optional, braces { } to denote those that are both
1148 optional and repeatable, and angle brackets < > to denote those that
1149 are repeatable but not optional. Where applicable, default values for
1150 optional parameters are shown in parenthesis ( ).
1151
1152 The following parameters are used with, and have the same meaning for,
1153 several effects:
1154
1155 center[k]
1156 See frequency.
1157
1158 frequency[k]
1159 A frequency in Hz, or, if appended with `k', kHz.
1160
1161 gain A power gain in dB. Zero gives no gain; less than zero gives an
1162 attenuation.
1163
1164 width[h|k|o|q]
1165 Used to specify the band-width of a filter. A number of differ‐
1166 ent methods to specify the width are available (though not all
1167 for every effect). One of the characters shown may be appended
1168 to select the desired method as follows:
1169
1170 Method Notes
1171 h Hz
1172 k kHz
1173 o Octaves
1174 q Q-factor See [2]
1175
1176 For each effect that uses this parameter, the default method
1177 (i.e. if no character is appended) is the one that it listed
1178 first in the first line of the effect's description.
1179
1180 To see if SoX has support for an optional effect, enter sox -h and look
1181 for its name under the list: `EFFECTS'.
1182
1183 Supported Effects
1184 Note: a categorised list of the effects can be found in the accompany‐
1185 ing `README' file.
1186
1187 allpass frequency[k] width[h|k|o|q]
1188 Apply a two-pole all-pass filter with central frequency (in Hz)
1189 frequency, and filter-width width. An all-pass filter changes
1190 the audio's frequency to phase relationship without changing its
1191 frequency to amplitude relationship. The filter is described in
1192 detail in [1].
1193
1194 This effect supports the --plot global option.
1195
1196 band [-n] center[k] [width[h|k|o|q]]
1197 Apply a band-pass filter. The frequency response drops loga‐
1198 rithmically around the center frequency. The width parameter
1199 gives the slope of the drop. The frequencies at center + width
1200 and center - width will be half of their original amplitudes.
1201 band defaults to a mode oriented to pitched audio, i.e. voice,
1202 singing, or instrumental music. The -n (for noise) option uses
1203 the alternate mode for un-pitched audio (e.g. percussion).
1204 Warning: -n introduces a power-gain of about 11dB in the filter,
1205 so beware of output clipping. band introduces noise in the
1206 shape of the filter, i.e. peaking at the center frequency and
1207 settling around it.
1208
1209 This effect supports the --plot global option.
1210
1211 See also sinc for a bandpass filter with steeper shoulders.
1212
1213 bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
1214 Apply a two-pole Butterworth band-pass or band-reject filter
1215 with central frequency frequency, and (3dB-point) band-width
1216 width. The -c option applies only to bandpass and selects a
1217 constant skirt gain (peak gain = Q) instead of the default: con‐
1218 stant 0dB peak gain. The filters roll off at 6dB per octave
1219 (20dB per decade) and are described in detail in [1].
1220
1221 These effects support the --plot global option.
1222
1223 See also sinc for a bandpass filter with steeper shoulders.
1224
1225 bandreject frequency[k] width[h|k|o|q]
1226 Apply a band-reject filter. See the description of the bandpass
1227 effect for details.
1228
1229 bass|treble gain [frequency[k] [width[s|h|k|o|q]]]
1230 Boost or cut the bass (lower) or treble (upper) frequencies of
1231 the audio using a two-pole shelving filter with a response simi‐
1232 lar to that of a standard hi-fi's tone-controls. This is also
1233 known as shelving equalisation (EQ).
1234
1235 gain gives the gain at 0 Hz (for bass), or whichever is the
1236 lower of ∼22 kHz and the Nyquist frequency (for treble). Its
1237 useful range is about -20 (for a large cut) to +20 (for a large
1238 boost). Beware of Clipping when using a positive gain.
1239
1240 If desired, the filter can be fine-tuned using the following
1241 optional parameters:
1242
1243 frequency sets the filter's central frequency and so can be used
1244 to extend or reduce the frequency range to be boosted or cut.
1245 The default value is 100 Hz (for bass) or 3 kHz (for treble).
1246
1247 width determines how steep is the filter's shelf transition. In
1248 addition to the common width specification methods described
1249 above, `slope' (the default, or if appended with `s') may be
1250 used. The useful range of `slope' is about 0.3, for a gentle
1251 slope, to 1 (the maximum), for a steep slope; the default value
1252 is 0.5.
1253
1254 The filters are described in detail in [1].
1255
1256 These effects support the --plot global option.
1257
1258 See also equalizer for a peaking equalisation effect.
1259
1260 bend [-f [22mframe-rate(25)] [-o [22mover-sample(16)] { delay,cents,duration }
1261 Changes pitch by specified amounts at specified times. Each
1262 given triple: delay,cents,duration specifies one bend. delay is
1263 the amount of time after the start of the audio stream, or the
1264 end of the previous bend, at which to start bending the pitch;
1265 cents is the number of cents (100 cents = 1 semitone) by which
1266 to bend the pitch, and duration the length of time over which
1267 the pitch will be bent.
1268
1269 The pitch-bending algorithm utilises the Discrete Fourier Trans‐
1270 form (DFT) at a particular frame rate and over-sampling rate.
1271 The -f and -o parameters may be used to adjust these parameters
1272 and thus control the smoothness of the changes in pitch.
1273
1274 For example, an initial tone is generated, then bent three
1275 times, yielding four different notes in total:
1276 play -n synth 2.5 sin 667 gain 1 \
1277 bend .35,180,.25 .15,740,.53 0,-520,.3
1278 Note that the clipping that is produced in this example is
1279 deliberate; to remove it, use gain -5 in place of gain 1.
1280
1281 See also pitch.
1282
1283 biquad b0 b1 b2 a0 a1 a2
1284 Apply a biquad IIR filter with the given coefficients. Where b*
1285 and a* are the numerator and denominator coefficients respec‐
1286 tively.
1287
1288 See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0
1289 = 1).
1290
1291 This effect supports the --plot global option.
1292
1293 channels CHANNELS
1294 Invoke a simple algorithm to change the number of channels in
1295 the audio signal to the given number CHANNELS: mixing if
1296 decreasing the number of channels or duplicating if increasing
1297 the number of channels.
1298
1299 The channels effect is invoked automatically if SoX's -c option
1300 specifies a number of channels that is different to that of the
1301 input file(s). Alternatively, if this effect is given explic‐
1302 itly, then SoX's -c option need not be given. For example, the
1303 following two commands are equivalent:
1304 sox input.wav -c 1 output.wav bass -b 24
1305 sox input.wav output.wav bass -b 24 channels 1
1306 though the second form is more flexible as it allows the effects
1307 to be ordered arbitrarily.
1308
1309 See also remix for an effect that allows channels to be
1310 mixed/selected arbitrarily.
1311
1312 chorus gain-in gain-out <delay decay speed depth -s|-t>
1313 Add a chorus effect to the audio. This can make a single vocal
1314 sound like a chorus, but can also be applied to instrumentation.
1315
1316 Chorus resembles an echo effect with a short delay, but whereas
1317 with echo the delay is constant, with chorus, it is varied using
1318 sinusoidal or triangular modulation. The modulation depth
1319 defines the range the modulated delay is played before or after
1320 the delay. Hence the delayed sound will sound slower or faster,
1321 that is the delayed sound tuned around the original one, like in
1322 a chorus where some vocals are slightly off key. See [3] for
1323 more discussion of the chorus effect.
1324
1325 Each four-tuple parameter delay/decay/speed/depth gives the
1326 delay in milliseconds and the decay (relative to gain-in) with a
1327 modulation speed in Hz using depth in milliseconds. The modula‐
1328 tion is either sinusoidal (-s) or triangular (-t). Gain-out is
1329 the volume of the output.
1330
1331 A typical delay is around 40ms to 60ms; the modulation speed is
1332 best near 0.25Hz and the modulation depth around 2ms. For exam‐
1333 ple, a single delay:
1334 play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 -t
1335 Two delays of the original samples:
1336 play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 -t \
1337 60 0.32 0.4 1.3 -s
1338 A fuller sounding chorus (with three additional delays):
1339 play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
1340 60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s
1341
1342 compand attack1,decay1{,attack2,decay2}
1343 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1344 [gain [initial-volume-dB [delay]]]
1345
1346 Compand (compress or expand) the dynamic range of the audio.
1347
1348 The attack and decay parameters (in seconds) determine the time
1349 over which the instantaneous level of the input signal is aver‐
1350 aged to determine its volume; attacks refer to increases in vol‐
1351 ume and decays refer to decreases. For most situations, the
1352 attack time (response to the music getting louder) should be
1353 shorter than the decay time because the human ear is more sensi‐
1354 tive to sudden loud music than sudden soft music. Where more
1355 than one pair of attack/decay parameters are specified, each
1356 input channel is companded separately and the number of pairs
1357 must agree with the number of input channels. Typical values
1358 are 0.3,0.8 seconds.
1359
1360 The second parameter is a list of points on the compander's
1361 transfer function specified in dB relative to the maximum possi‐
1362 ble signal amplitude. The input values must be in a strictly
1363 increasing order but the transfer function does not have to be
1364 monotonically rising. If omitted, the value of out-dB1 defaults
1365 to the same value as in-dB1; levels below in-dB1 are not com‐
1366 panded (but may have gain applied to them). The point 0,0 is
1367 assumed but may be overridden (by 0,out-dBn). If the list is
1368 preceded by a soft-knee-dB value, then the points at where adja‐
1369 cent line segments on the transfer function meet will be rounded
1370 by the amount given. Typical values for the transfer function
1371 are 6:-70,-60,-20.
1372
1373 The third (optional) parameter is an additional gain in dB to be
1374 applied at all points on the transfer function and allows easy
1375 adjustment of the overall gain.
1376
1377 The fourth (optional) parameter is an initial level to be
1378 assumed for each channel when companding starts. This permits
1379 the user to supply a nominal level initially, so that, for exam‐
1380 ple, a very large gain is not applied to initial signal levels
1381 before the companding action has begun to operate: it is quite
1382 probable that in such an event, the output would be severely
1383 clipped while the compander gain properly adjusts itself. A
1384 typical value (for audio which is initially quiet) is -90 dB.
1385
1386 The fifth (optional) parameter is a delay in seconds. The input
1387 signal is analysed immediately to control the compander, but it
1388 is delayed before being fed to the volume adjuster. Specifying
1389 a delay approximately equal to the attack/decay times allows the
1390 compander to effectively operate in a `predictive' rather than a
1391 reactive mode. A typical value is 0.2 seconds.
1392
1393 * * *
1394
1395 The following example might be used to make a piece of music
1396 with both quiet and loud passages suitable for listening to in a
1397 noisy environment such as a moving vehicle:
1398 sox asz.wav asz-car.wav compand 0.3,1 6:-70,-60,-20 -5 -90 0.2
1399 The transfer function (`6:-70,...') says that very soft sounds
1400 (below -70dB) will remain unchanged. This will stop the compan‐
1401 der from boosting the volume on `silent' passages such as
1402 between movements. However, sounds in the range -60dB to 0dB
1403 (maximum volume) will be boosted so that the 60dB dynamic range
1404 of the original music will be compressed 3-to-1 into a 20dB
1405 range, which is wide enough to enjoy the music but narrow enough
1406 to get around the road noise. The `6:' selects 6dB soft-knee
1407 companding. The -5 (dB) output gain is needed to avoid clipping
1408 (the number is inexact, and was derived by experimentation).
1409 The -90 (dB) for the initial volume will work fine for a clip
1410 that starts with near silence, and the delay of 0.2 (seconds)
1411 has the effect of causing the compander to react a bit more
1412 quickly to sudden volume changes.
1413
1414 In the next example, compand is being used as a noise-gate for
1415 when the noise is at a lower level than the signal:
1416 play infile compand .1,.2 -inf,-50.1,-inf,-50,-50 0 -90 .1
1417 Here is another noise-gate, this time for when the noise is at a
1418 higher level than the signal (making it, in some ways, similar
1419 to squelch):
1420 play infile compand .1,.1 -45.1,-45,-inf,0,-inf 45 -90 .1
1421 This effect supports the --plot global option (for the transfer
1422 function).
1423
1424 See also mcompand for a multiple-band companding effect.
1425
1426 contrast [enhancement-amount(75)]
1427 Comparable with compression, this effect modifies an audio sig‐
1428 nal to make it sound louder. enhancement-amount controls the
1429 amount of the enhancement and is a number in the range 0-100.
1430 Note that enhancement-amount = 0 still gives a significant con‐
1431 trast enhancement.
1432
1433 See also the compand and mcompand effects.
1434
1435 dcshift shift [limitergain]
1436 Apply a DC shift to the audio. This can be useful to remove a
1437 DC offset (caused perhaps by a hardware problem in the recording
1438 chain) from the audio. The effect of a DC offset is reduced
1439 headroom and hence volume. The stat or stats effect can be used
1440 to determine if a signal has a DC offset.
1441
1442 The given dcshift value is a floating point number in the range
1443 of ±2 that indicates the amount to shift the audio (which is in
1444 the range of ±1).
1445
1446 An optional limitergain can be specified as well. It should
1447 have a value much less than 1 (e.g. 0.05 or 0.02) and is used
1448 only on peaks to prevent clipping.
1449
1450 * * *
1451
1452 An alternative approach to removing a DC offset (albeit with a
1453 short delay) is to use the highpass filter effect at a frequency
1454 of say 10Hz, as illustrated in the following example:
1455 sox -n dc.wav synth 5 sin %0 50
1456 sox dc.wav fixed.wav highpass 10
1457
1458 deemph Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation
1459 shelving filter).
1460
1461 Pre-emphasis was applied in the mastering of some CDs issued in
1462 the early 1980s. These included many classical music albums, as
1463 well as now sought-after issues of albums by The Beatles, Pink
1464 Floyd and others. Pre-emphasis should be removed at playback
1465 time by a de-emphasis filter in the playback device. However,
1466 not all modern CD players have this filter, and very few PC CD
1467 drives have it; playing pre-emphasised audio without the correct
1468 de-emphasis filter results in audio that sounds harsh and is far
1469 from what its creators intended.
1470
1471 With the deemph effect, it is possible to apply the necessary
1472 de-emphasis to audio that has been extracted from a pre-empha‐
1473 sised CD, and then either burn the de-emphasised audio to a new
1474 CD (which will then play correctly on any CD player), or simply
1475 play the correctly de-emphasised audio files on the PC. For
1476 example:
1477 sox track1.wav track1-deemph.wav deemph
1478 and then burn track1-deemph.wav to CD, or
1479 play track1-deemph.wav
1480 or simply
1481 play track1.wav deemph
1482 The de-emphasis filter is implemented as a biquad; its maximum
1483 deviation from the ideal response is only 0.06dB (up to 20kHz).
1484
1485 This effect supports the --plot global option.
1486
1487 See also the bass and treble shelving equalisation effects.
1488
1489 delay {length}
1490 Delay one or more audio channels. length can specify a time or,
1491 if appended with an `s', a number of samples. Do not specify
1492 both time and samples delays in the same command. For example,
1493 delay 1.5 0 0.5 delays the first channel by 1.5 seconds, the
1494 third channel by 0.5 seconds, and leaves the second channel (and
1495 any other channels that may be present) un-delayed. The follow‐
1496 ing (one long) command plays a chime sound:
1497 play -n synth -j 3 sin %3 sin %-2 sin %-5 sin %-9 \
1498 sin %-14 sin %-21 fade h .01 2 1.5 delay \
1499 1.3 1 .76 .54 .27 remix - fade h 0 2.7 2.5 norm -1
1500 and this plays a guitar chord:
1501 play -n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \
1502 delay 0 .05 .1 .15 .2 .25 remix - fade 0 4 .1 norm -1
1503
1504 dither [-S|-s|-f filter] [-a] [-p precision]
1505 Apply dithering to the audio. Dithering deliberately adds a
1506 small amount of noise to the signal in order to mask audible
1507 quantization effects that can occur if the output sample size is
1508 less than 24 bits. With no options, this effect will add trian‐
1509 gular (TPDF) white noise. Noise-shaping (only for certain sam‐
1510 ple rates) can be selected with -s. With the -f option, it is
1511 possible to select a particular noise-shaping filter from the
1512 following list: lipshitz, f-weighted, modified-e-weighted,
1513 improved-e-weighted, gesemann, shibata, low-shibata, high-shi‐
1514 bata. Note that most filter types are available only with
1515 44100Hz sample rate. The filter types are distinguished by the
1516 following properties: audibility of noise, level of (inaudible,
1517 but in some circumstances, otherwise problematic) shaped high
1518 frequency noise, and processing speed.
1519 See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of
1520 the different noise-shaping curves.
1521
1522 The -S option selects a slightly `sloped' TPDF, biased towards
1523 higher frequencies. It can be used at any sampling rate but
1524 below ≈22k, plain TPDF is probably better, and above ≈ 37k,
1525 noise-shaped is probably better.
1526
1527 The -a option enables a mode where dithering (and noise-shaping
1528 if applicable) are automatically enabled only when needed. The
1529 most likely use for this is when applying fade in or out to an
1530 already dithered file, so that the redithering applies only to
1531 the faded portions. However, auto dithering is not fool-proof,
1532 so the fades should be carefully checked for any noise modula‐
1533 tion; if this occurs, then either re-dither the whole file, or
1534 use trim, fade, and concatencate.
1535
1536 The -p option allows overriding the target precision.
1537
1538 If the SoX global option -R option is not given, then the
1539 pseudo-random number generator used to generate the white noise
1540 will be `reseeded', i.e. the generated noise will be different
1541 between invocations.
1542
1543 This effect should not be followed by any other effect that
1544 affects the audio.
1545
1546 See also the `Dithering' section above.
1547
1548 downsample [factor(2)]
1549 Downsample the signal by an integer factor: Only the first out
1550 of each factor samples is retained, the others are discarded.
1551
1552 No decimation filter is applied. If the input is not a properly
1553 bandlimited baseband signal, aliasing will occur. This may be
1554 desirable, e.g., for frequency translation.
1555
1556 For a general resampling effect with anti-aliasing, see rate.
1557 See also upsample.
1558
1559 earwax Makes audio easier to listen to on headphones. Adds `cues' to
1560 44.1kHz stereo (i.e. audio CD format) audio so that when lis‐
1561 tened to on headphones the stereo image is moved from inside
1562 your head (standard for headphones) to outside and in front of
1563 the listener (standard for speakers).
1564
1565 echo gain-in gain-out <delay decay>
1566 Add echoing to the audio. Echoes are reflected sound and can
1567 occur naturally amongst mountains (and sometimes large build‐
1568 ings) when talking or shouting; digital echo effects emulate
1569 this behaviour and are often used to help fill out the sound of
1570 a single instrument or vocal. The time difference between the
1571 original signal and the reflection is the `delay' (time), and
1572 the loudness of the reflected signal is the `decay'. Multiple
1573 echoes can have different delays and decays.
1574
1575 Each given delay decay pair gives the delay in milliseconds and
1576 the decay (relative to gain-in) of that echo. Gain-out is the
1577 volume of the output. For example: This will make it sound as
1578 if there are twice as many instruments as are actually playing:
1579 play lead.aiff echo 0.8 0.88 60 0.4
1580 If the delay is very short, then it sound like a (metallic) ro‐
1581 bot playing music:
1582 play lead.aiff echo 0.8 0.88 6 0.4
1583 A longer delay will sound like an open air concert in the moun‐
1584 tains:
1585 play lead.aiff echo 0.8 0.9 1000 0.3
1586 One mountain more, and:
1587 play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
1588
1589 echos gain-in gain-out <delay decay>
1590 Add a sequence of echoes to the audio. Each delay decay pair
1591 gives the delay in milliseconds and the decay (relative to gain-
1592 in) of that echo. Gain-out is the volume of the output.
1593
1594 Like the echo effect, echos stand for `ECHO in Sequel', that is
1595 the first echos takes the input, the second the input and the
1596 first echos, the third the input and the first and the second
1597 echos, ... and so on. Care should be taken using many echos; a
1598 single echos has the same effect as a single echo.
1599
1600 The sample will be bounced twice in symmetric echos:
1601 play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
1602 The sample will be bounced twice in asymmetric echos:
1603 play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
1604 The sample will sound as if played in a garage:
1605 play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
1606
1607 equalizer frequency[k] width[q|o|h|k] gain
1608 Apply a two-pole peaking equalisation (EQ) filter. With this
1609 filter, the signal-level at and around a selected frequency can
1610 be increased or decreased, whilst (unlike band-pass and band-
1611 reject filters) that at all other frequencies is unchanged.
1612
1613 frequency gives the filter's central frequency in Hz, width, the
1614 band-width, and gain the required gain or attenuation in dB.
1615 Beware of Clipping when using a positive gain.
1616
1617 In order to produce complex equalisation curves, this effect can
1618 be given several times, each with a different central frequency.
1619
1620 The filter is described in detail in [1].
1621
1622 This effect supports the --plot global option.
1623
1624 See also bass and treble for shelving equalisation effects.
1625
1626 fade [type] fade-in-length [stop-time [fade-out-length]]
1627 Apply a fade effect to the beginning, end, or both of the audio.
1628
1629 An optional type can be specified to select the shape of the
1630 fade curve: q for quarter of a sine wave, h for half a sine
1631 wave, t for linear (`triangular') slope, l for logarithmic, and
1632 p for inverted parabola. The default is logarithmic.
1633
1634 A fade-in starts from the first sample and ramps the signal
1635 level from 0 to full volume over fade-in-length seconds. Spec‐
1636 ify 0 seconds if no fade-in is wanted.
1637
1638 For fade-outs, the audio will be truncated at stop-time and the
1639 signal level will be ramped from full volume down to 0 starting
1640 at fade-out-length seconds before the stop-time. If fade-out-
1641 length is not specified, it defaults to the same value as fade-
1642 in-length. No fade-out is performed if stop-time is not speci‐
1643 fied. If the file length can be determined from the input file
1644 header and length-changing effects are not in effect, then 0 may
1645 be specified for stop-time to indicate the usual case of a fade-
1646 out that ends at the end of the input audio stream.
1647
1648 All times can be specified in either periods of time or sample
1649 counts. To specify time periods use the format hh:mm:ss.frac
1650 format. To specify using sample counts, specify the number of
1651 samples and append the letter `s' to the sample count (for exam‐
1652 ple `8000s').
1653
1654 See also the splice effect.
1655
1656 fir [coefs-file|coefs]
1657 Use SoX's FFT convolution engine with given FIR filter coeffi‐
1658 cients. If a single argument is given then this is treated as
1659 the name of a file containing the filter coefficients (white-
1660 space separated; may contain `#' comments). If the given file‐
1661 name is `-', or if no argument is given, then the coefficients
1662 are read from the `standard input' (stdin); otherwise, coeffi‐
1663 cients may be given on the command line. Examples:
1664 sox infile outfile fir 0.0195 -0.082 0.234 0.891 -0.145 0.043
1665 sox infile outfile fir coefs.txt
1666 with coefs.txt containing
1667 # HP filter
1668 # freq=10000
1669 1.2311233052619888e-01
1670 -4.4777096106211783e-01
1671 5.1031563346705155e-01
1672 -6.6502926320995331e-02
1673 ...
1674
1675 This effect supports the --plot global option.
1676
1677 flanger [delay depth regen width speed shape phase interp]
1678 Apply a flanging effect to the audio. See [3] for a detailed
1679 description of flanging.
1680
1681 All parameters are optional (right to left).
1682
1683 Range Default Description
1684 delay 0 - 30 0 Base delay in milliseconds.
1685 depth 0 - 10 2 Added swept delay in milliseconds.
1686 regen -95 - 95 0 Percentage regeneration (delayed
1687 signal feedback).
1688 width 0 - 100 71 Percentage of delayed signal mixed
1689 with original.
1690 speed 0.1 - 10 0.5 Sweeps per second (Hz).
1691 shape sin Swept wave shape: sine|triangle.
1692 phase 0 - 100 25 Swept wave percentage phase-shift
1693 for multi-channel (e.g. stereo)
1694 flange; 0 = 100 = same phase on
1695 each channel.
1696 interp lin Digital delay-line interpolation:
1697 linear|quadratic.
1698
1699 gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
1700 Apply amplification or attenuation to the audio signal, or, in
1701 some cases, to some of its channels. Note that use of any of
1702 -e, -B, -b, -r, or -n requires temporary file space to store the
1703 audio to be processed, so may be unsuitable for use with
1704 `streamed' audio.
1705
1706 Without other options, gain-dB is used to adjust the signal
1707 power level by the given number of dB: positive amplifies
1708 (beware of Clipping), negative attenuates. With other options,
1709 the gain-dB amplification or attenuation is (logically) applied
1710 after the processing due to those options.
1711
1712 Given the -e option, the levels of the audio channels of a
1713 multi-channel file are `equalised', i.e. gain is applied to all
1714 channels other than that with the highest peak level, such that
1715 all channels attain the same peak level (but, without also giv‐
1716 ing -n, the audio is not `normalised').
1717
1718 The -B (balance) option is similar to -e, but with -B, the RMS
1719 level is used instead of the peak level. -B might be used to
1720 correct stereo imbalance caused by an imperfect record turntable
1721 cartridge. Note that unlike -e, -B might cause some clipping.
1722
1723 -b is similar to -B but has clipping protection, i.e. if neces‐
1724 sary to prevent clipping whilst balancing, attenuation is
1725 applied to all channels. Note, however, that in conjunction
1726 with -n, -B and -b are synonymous.
1727
1728 The -r option is used in conjunction with a prior invocation of
1729 gain with the -h option - see below for details.
1730
1731 The -n option normalises the audio to 0dB FSD; it is often used
1732 in conjunction with a negative gain-dB to the effect that the
1733 audio is normalised to a given level below 0dB. For example,
1734 sox infile outfile gain -n
1735 normalises to 0dB, and
1736 sox infile outfile gain -n -3
1737 normalises to -3dB.
1738
1739 The -l option invokes a simple limiter, e.g.
1740 sox infile outfile gain -l 6
1741 will apply 6dB of gain but never clip. Note that limiting more
1742 than a few dBs more than occasionally (in a piece of audio) is
1743 not recommended as it can cause audible distortion. See the
1744 compand effect for a more capable limiter.
1745
1746 The -h option is used to apply gain to provide head-room for
1747 subsequent processing. For example, with
1748 sox infile outfile gain -h bass +6
1749 6dB of attenuation will be applied prior to the bass boosting
1750 effect thus ensuring that it will not clip. Of course, with
1751 bass, it is obvious how much headroom will be needed, but with
1752 other effects (e.g. rate, dither) it is not always as clear.
1753 Another advantage of using gain -h rather than an explicit
1754 attenuation, is that if the headroom is not used by subsequent
1755 effects, it can be reclaimed with gain -r, for example:
1756 sox infile outfile gain -h bass +6 rate 44100 gain -r
1757 The above effects chain guarantees never to clip nor amplify; it
1758 attenuates if necessary to prevent clipping, but by only as much
1759 as is needed to do so.
1760
1761 Output formatting (dithering and bit-depth reduction) also
1762 requires headroom (which cannot be `reclaimed'), e.g.
1763 sox infile outfile gain -h bass +6 rate 44100 gain -rh dither
1764 Here, the second gain invocation, reclaims as much of the head‐
1765 room as it can from the preceding effects, but retains as much
1766 headroom as is needed for subsequent processing. The SoX global
1767 option -G can be given to automatically invoke gain -h and gain
1768 -r.
1769
1770 See also the norm and vol effects.
1771
1772 highpass|lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1773 Apply a high-pass or low-pass filter with 3dB point frequency.
1774 The filter can be either single-pole (with -1), or double-pole
1775 (the default, or with -2). width applies only to double-pole
1776 filters; the default is Q = 0.707 and gives a Butterworth
1777 response. The filters roll off at 6dB per pole per octave (20dB
1778 per pole per decade). The double-pole filters are described in
1779 detail in [1].
1780
1781 These effects support the --plot global option.
1782
1783 See also sinc for filters with a steeper roll-off.
1784
1785 hilbert [-n taps]
1786 Apply an odd-tap Hilbert transform filter, phase-shifting the
1787 signal by 90 degrees.
1788
1789 This is used in many matrix coding schemes and for analytic sig‐
1790 nal generation. The process is often written as a multiplica‐
1791 tion by i (or j), the imaginary unit.
1792
1793 An odd-tap Hilbert transform filter has a bandpass characteris‐
1794 tic, attenuating the lowest and highest frequencies. Its band‐
1795 width can be controlled by the number of filter taps, which can
1796 be specified with -n. By default, the number of taps is chosen
1797 for a cutoff frequency of about 75 Hz.
1798
1799 This effect supports the --plot global option.
1800
1801 ladspa module [plugin] [argument...]
1802 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API)
1803 plugin. Despite the name, LADSPA is not Linux-specific, and a
1804 wide range of effects is available as LADSPA plugins, such as
1805 cmt [6] (the Computer Music Toolkit) and Steve Harris's plugin
1806 collection [7]. The first argument is the plugin module, the
1807 second the name of the plugin (a module can contain more than
1808 one plugin) and any other arguments are for the control ports of
1809 the plugin. Missing arguments are supplied by default values if
1810 possible. Only plugins with at most one audio input and one
1811 audio output port can be used. If found, the environment vari‐
1812 able LADSPA_PATH will be used as search path for plugins.
1813
1814 loudness [gain [reference]]
1815 Loudness control - similar to the gain effect, but provides
1816 equalisation for the human auditory system. See
1817 http://en.wikipedia.org/wiki/Loudness for a detailed description
1818 of loudness. The gain is adjusted by the given gain parameter
1819 (usually negative) and the signal equalised according to ISO 226
1820 w.r.t. a reference level of 65dB, though an alternative refer‐
1821 ence level may be given if the original audio has been equalised
1822 for some other optimal level. A default gain of -10dB is used
1823 if a gain value is not given.
1824
1825 See also the gain effect.
1826
1827 lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1828 Apply a low-pass filter. See the description of the highpass
1829 effect for details.
1830
1831 mcompand "attack1,decay1{,attack2,decay2}
1832 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1833 [gain [initial-volume-dB [delay]]]" {crossover-freq[k]
1834 "attack1,..."}
1835
1836 The multi-band compander is similar to the single-band compander
1837 but the audio is first divided into bands using Linkwitz-Riley
1838 cross-over filters and a separately specifiable compander run on
1839 each band. See the compand effect for the definition of its
1840 parameters. Compand parameters are specified between double
1841 quotes and the crossover frequency for that band is given by
1842 crossover-freq; these can be repeated to create multiple bands.
1843
1844 For example, the following (one long) command shows how multi-
1845 band companding is typically used in FM radio:
1846 play track1.wav gain -3 sinc 8000- 29 100 mcompand \
1847 "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
1848 "0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
1849 "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
1850 "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
1851 "0,0.025 -38,-31,-28,-28,-0,-25" \
1852 gain 15 highpass 22 highpass 22 sinc -n 255 -b 16 -17500 \
1853 gain 9 lowpass -1 17801
1854 The audio file is played with a simulated FM radio sound (or
1855 broadcast signal condition if the lowpass filter at the end is
1856 skipped). Note that the pipeline is set up with US-style 75us
1857 pre-emphasis.
1858
1859 See also compand for a single-band companding effect.
1860
1861 noiseprof [profile-file]
1862 Calculate a profile of the audio for use in noise reduction.
1863 See the description of the noisered effect for details.
1864
1865 noisered [profile-file [amount]]
1866 Reduce noise in the audio signal by profiling and filtering.
1867 This effect is moderately effective at removing consistent back‐
1868 ground noise such as hiss or hum. To use it, first run SoX with
1869 the noiseprof effect on a section of audio that ideally would
1870 contain silence but in fact contains noise - such sections are
1871 typically found at the beginning or the end of a recording.
1872 noiseprof will write out a noise profile to profile-file, or to
1873 stdout if no profile-file or if `-' is given. E.g.
1874 sox speech.wav -n trim 0 1.5 noiseprof speech.noise-profile
1875 To actually remove the noise, run SoX again, this time with the
1876 noisered effect; noisered will reduce noise according to a noise
1877 profile (which was generated by noiseprof), from profile-file,
1878 or from stdin if no profile-file or if `-' is given. E.g.
1879 sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
1880 How much noise should be removed is specified by amount-a number
1881 between 0 and 1 with a default of 0.5. Higher numbers will
1882 remove more noise but present a greater likelihood of removing
1883 wanted components of the audio signal. Before replacing an
1884 original recording with a noise-reduced version, experiment with
1885 different amount values to find the optimal one for your audio;
1886 use headphones to check that you are happy with the results,
1887 paying particular attention to quieter sections of the audio.
1888
1889 On most systems, the two stages - profiling and reduction - can
1890 be combined using a pipe, e.g.
1891 sox noisy.wav -n trim 0 1 noiseprof | play noisy.wav noisered
1892
1893 norm [dB-level]
1894 Normalise the audio. norm is just an alias for gain -n; see the
1895 gain effect for details.
1896
1897 oops Out Of Phase Stereo effect. Mixes stereo to twin-mono where
1898 each mono channel contains the difference between the left and
1899 right stereo channels. This is sometimes known as the `karaoke'
1900 effect as it often has the effect of removing most or all of the
1901 vocals from a recording. It is equivalent to remix 1,2i 1,2i.
1902
1903 overdrive [gain(20) [colour(20)]]
1904 Non linear distortion. The colour parameter controls the amount
1905 of even harmonic content in the over-driven output.
1906
1907 pad { length[@position] }
1908 Pad the audio with silence, at the beginning, the end, or any
1909 specified points through the audio. Both length and position
1910 can specify a time or, if appended with an `s', a number of sam‐
1911 ples. length is the amount of silence to insert and position
1912 the position in the input audio stream at which to insert it.
1913 Any number of lengths and positions may be specified, provided
1914 that a specified position is not less that the previous one.
1915 position is optional for the first and last lengths specified
1916 and if omitted correspond to the beginning and the end of the
1917 audio respectively. For example, pad 1.5 1.5 adds 1.5 seconds
1918 of silence padding at each end of the audio, whilst pad
1919 4000s@3:00 inserts 4000 samples of silence 3 minutes into the
1920 audio. If silence is wanted only at the end of the audio, spec‐
1921 ify either the end position or specify a zero-length pad at the
1922 start.
1923
1924 See also delay for an effect that can add silence at the begin‐
1925 ning of the audio on a channel-by-channel basis.
1926
1927 phaser gain-in gain-out delay decay speed [-s|-t]
1928 Add a phasing effect to the audio. See [3] for a detailed
1929 description of phasing.
1930
1931 delay/decay/speed gives the delay in milliseconds and the decay
1932 (relative to gain-in) with a modulation speed in Hz. The modu‐
1933 lation is either sinusoidal (-s) - preferable for multiple
1934 instruments, or triangular (-t) - gives single instruments a
1935 sharper phasing effect. The decay should be less than 0.5 to
1936 avoid feedback, and usually no less than 0.1. Gain-out is the
1937 volume of the output.
1938
1939 For example:
1940 play snare.flac phaser 0.8 0.74 3 0.4 0.5 -t
1941 Gentler:
1942 play snare.flac phaser 0.9 0.85 4 0.23 1.3 -s
1943 A popular sound:
1944 play snare.flac phaser 0.89 0.85 1 0.24 2 -t
1945 More severe:
1946 play snare.flac phaser 0.6 0.66 3 0.6 2 -t
1947
1948 pitch [-q] shift [segment [search [overlap]]]
1949 Change the audio pitch (but not tempo).
1950
1951 shift gives the pitch shift as positive or negative `cents'
1952 (i.e. 100ths of a semitone). See the tempo effect for a
1953 description of the other parameters.
1954
1955 See also the bend, speed, and tempo effects.
1956
1957 rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
1958 Change the audio sampling rate (i.e. resample the audio) to any
1959 given RATE (even non-integer if this is supported by the output
1960 file format) using a quality level defined as follows:
1961
1962 Quality Band- Rej dB Typical Use
1963 width
1964 -q quick n/a ≈30 @ playback on
1965 Fs/4 ancient hardware
1966 -l low 80% 100 playback on old
1967 hardware
1968 -m medium 95% 100 audio playback
1969 -h high 95% 125 16-bit mastering
1970 (use with dither)
1971 -v very high 95% 175 24-bit mastering
1972
1973 where Band-width is the percentage of the audio frequency band
1974 that is preserved and Rej dB is the level of noise rejection.
1975 Increasing levels of resampling quality come at the expense of
1976 increasing amounts of time to process the audio. If no quality
1977 option is given, the quality level used is `high' (but see
1978 `Playing & Recording Audio' above regarding playback).
1979
1980 The `quick' algorithm uses cubic interpolation; all others use
1981 band-limited interpolation. By default, all algorithms have a
1982 `linear' phase response; for `medium', `high' and `very high',
1983 the phase response is configurable (see below).
1984
1985 The rate effect is invoked automatically if SoX's -r option
1986 specifies a rate that is different to that of the input file(s).
1987 Alternatively, if this effect is given explicitly, then SoX's -r
1988 option need not be given. For example, the following two com‐
1989 mands are equivalent:
1990 sox input.wav -r 48k output.wav bass -b 24
1991 sox input.wav output.wav bass -b 24 rate 48k
1992 though the second command is more flexible as it allows rate
1993 options to be given, and allows the effects to be ordered arbi‐
1994 trarily.
1995
1996 * * *
1997
1998 Warning: technically detailed discussion follows.
1999
2000 The simple quality selection described above provides settings
2001 that satisfy the needs of the vast majority of resampling tasks.
2002 Occasionally, however, it may be desirable to fine-tune the
2003 resampler's filter response; this can be achieved using over‐
2004 ride options, as detailed in the following table:
2005
2006 -M/-I/-L Phase response = minimum/intermediate/linear
2007 -s Steep filter (band-width = 99%)
2008 -a Allow aliasing/imaging above the pass-band
2009 -b 74-99.7 Any band-width %
2010
2011 -p 0-100 Any phase response (0 = minimum, 25 = intermediate,
2012 50 = linear, 100 = maximum)
2013
2014 N.B. Override options cannot be used with the `quick' or `low'
2015 quality algorithms.
2016
2017 All resamplers use filters that can sometimes create `echo'
2018 (a.k.a. `ringing') artefacts with transient signals such as
2019 those that occur with `finger snaps' or other highly percussive
2020 sounds. Such artefacts are much more noticeable to the human
2021 ear if they occur before the transient (`pre-echo') than if they
2022 occur after it (`post-echo'). Note that frequency of any such
2023 artefacts is related to the smaller of the original and new sam‐
2024 pling rates but that if this is at least 44.1kHz, then the arte‐
2025 facts will lie outside the range of human hearing.
2026
2027 A phase response setting may be used to control the distribution
2028 of any transient echo between `pre' and `post': with minimum
2029 phase, there is no pre-echo but the longest post-echo; with lin‐
2030 ear phase, pre and post echo are in equal amounts (in signal
2031 terms, but not audibility terms); the intermediate phase setting
2032 attempts to find the best compromise by selecting a small length
2033 (and level) of pre-echo and a medium lengthed post-echo.
2034
2035 Minimum, intermediate, or linear phase response is selected
2036 using the -M, -I, or -L option; a custom phase response can be
2037 created with the -p option. Note that phase responses between
2038 `linear' and `maximum' (greater than 50) are rarely useful.
2039
2040 A resampler's band-width setting determines how much of the fre‐
2041 quency content of the original signal (w.r.t. the original sam‐
2042 ple rate when up-sampling, or the new sample rate when down-sam‐
2043 pling) is preserved during conversion. The term `pass-band' is
2044 used to refer to all frequencies up to the band-width point
2045 (e.g. for 44.1kHz sampling rate, and a resampling band-width of
2046 95%, the pass-band represents frequencies from 0Hz (D.C.) to
2047 circa 21kHz). Increasing the resampler's band-width results in
2048 a slower conversion and can increase transient echo artefacts
2049 (and vice versa).
2050
2051 The -s `steep filter' option changes resampling band-width from
2052 the default 95% (based on the 3dB point), to 99%. The -b option
2053 allows the band-width to be set to any value in the range
2054 74-99.7 %, but note that band-width values greater than 99% are
2055 not recommended for normal use as they can cause excessive tran‐
2056 sient echo.
2057
2058 If the -a option is given, then aliasing/imaging above the pass-
2059 band is allowed. For example, with 44.1kHz sampling rate, and a
2060 resampling band-width of 95%, this means that frequency content
2061 above 21kHz can be distorted; however, since this is above the
2062 pass-band (i.e. above the highest frequency of interest/audi‐
2063 bility), this may not be a problem. The benefits of allowing
2064 aliasing/imaging are reduced processing time, and reduced (by
2065 almost half) transient echo artefacts. Note that if this option
2066 is given, then the minimum band-width allowable with -b
2067 increases to 85%.
2068
2069 Examples:
2070 sox input.wav -b 16 output.wav rate -s -a 44100 dither -s
2071 default (high) quality resampling; overrides: steep filter,
2072 allow aliasing; to 44.1kHz sample rate; noise-shaped dither to
2073 16-bit WAV file.
2074 sox input.wav -b 24 output.aiff rate -v -I -b 90 48k
2075 very high quality resampling; overrides: intermediate phase,
2076 band-width 90%; to 48k sample rate; store output to 24-bit AIFF
2077 file.
2078
2079 * * *
2080
2081 The pitch and speed effects use the rate effect at their core.
2082
2083 remix [-a|-m|-p] <out-spec>
2084 out-spec = in-spec{,in-spec} | 0
2085 in-spec = [in-chan][-[in-chan2]][vol-spec]
2086 vol-spec = p|i|v[volume]
2087
2088 Select and mix input audio channels into output audio channels.
2089 Each output channel is specified, in turn, by a given out-spec:
2090 a list of contributing input channels and volume specifications.
2091
2092 Note that this effect operates on the audio channels within the
2093 SoX effects processing chain; it should not be confused with the
2094 -m global option (where multiple files are mix-combined before
2095 entering the effects chain).
2096
2097 An out-spec contains comma-separated input channel-numbers and
2098 hyphen-delimited channel-number ranges; alternatively, 0 may be
2099 given to create a silent output channel. For example,
2100 sox input.wav output.wav remix 6 7 8 0
2101 creates an output file with four channels, where channels 1, 2,
2102 and 3 are copies of channels 6, 7, and 8 in the input file, and
2103 channel 4 is silent. Whereas
2104 sox input.wav output.wav remix 1-3,7 3
2105 creates a (somewhat bizarre) stereo output file where the left
2106 channel is a mix-down of input channels 1, 2, 3, and 7, and the
2107 right channel is a copy of input channel 3.
2108
2109 Where a range of channels is specified, the channel numbers to
2110 the left and right of the hyphen are optional and default to 1
2111 and to the number of input channels respectively. Thus
2112 sox input.wav output.wav remix -
2113 performs a mix-down of all input channels to mono.
2114
2115 By default, where an output channel is mixed from multiple (n)
2116 input channels, each input channel will be scaled by a factor of
2117 ¹/n. Custom mixing volumes can be set by following a given
2118 input channel or range of input channels with a vol-spec (volume
2119 specification). This is one of the letters p, i, or v, followed
2120 by a volume number, the meaning of which depends on the given
2121 letter and is defined as follows:
2122
2123 Letter Volume number Notes
2124 p power adjust in dB 0 = no change
2125 i power adjust in dB As `p', but invert
2126 the audio
2127 v voltage multiplier 1 = no change, 0.5
2128 ≈ 6dB attenuation,
2129 2 ≈ 6dB gain, -1 =
2130 invert
2131
2132 If an out-spec includes at least one vol-spec then, by default,
2133 ¹/n scaling is not applied to any other channels in the same
2134 out-spec (though may be in other out-specs). The -a (automatic)
2135 option however, can be given to retain the automatic scaling in
2136 this case. For example,
2137 sox input.wav output.wav remix 1,2 3,4v0.8
2138 results in channel level multipliers of 0.5,0.5 1,0.8, whereas
2139 sox input.wav output.wav remix -a 1,2 3,4v0.8
2140 results in channel level multipliers of 0.5,0.5 0.5,0.8.
2141
2142 The -m (manual) option disables all automatic volume adjust‐
2143 ments, so
2144 sox input.wav output.wav remix -m 1,2 3,4v0.8
2145 results in channel level multipliers of 1,1 1,0.8.
2146
2147 The volume number is optional and omitting it corresponds to no
2148 volume change; however, the only case in which this is useful is
2149 in conjunction with i. For example, if input.wav is stereo,
2150 then
2151 sox input.wav output.wav remix 1,2i
2152 is a mono equivalent of the oops effect.
2153
2154 If the -p option is given, then any automatic ¹/n scaling is
2155 replaced by ¹/√n (`power') scaling; this gives a louder mix but
2156 one that might occasionally clip.
2157
2158 * * *
2159
2160 One use of the remix effect is to split an audio file into a set
2161 of files, each containing one of the constituent channels (in
2162 order to perform subsequent processing on individual audio chan‐
2163 nels). Where more than a few channels are involved, a script
2164 such as the following (Bourne shell script) is useful:
2165 #!/bin/sh
2166 chans=`soxi -c "$1"`
2167 while [ $chans -ge 1 ]; do
2168 chans0=`printf %02i $chans` # 2 digits hence up to 99 chans
2169 out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
2170 sox "$1" "$out" remix $chans
2171 chans=`expr $chans - 1`
2172 done
2173 If a file input.wav containing six audio channels were given,
2174 the script would produce six output files: input-01.wav,
2175 input-02.wav, ..., input-06.wav.
2176
2177 See also the swap effect.
2178
2179 repeat [count (1)]
2180 Repeat the entire audio count times, or once if count is not
2181 given. Requires temporary file space to store the audio to be
2182 repeated. Note that repeating once yields two copies: the orig‐
2183 inal audio and the repeated audio.
2184
2185 reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
2186 [room-scale (100%) [stereo-depth (100%)
2187 [pre-delay (0ms) [wet-gain (0dB)]]]]]]
2188
2189 Add reverberation to the audio using the `freeverb' algorithm.
2190 A reverberation effect is sometimes desirable for concert halls
2191 that are too small or contain so many people that the hall's
2192 natural reverberance is diminished. Applying a small amount of
2193 stereo reverb to a (dry) mono signal will usually make it sound
2194 more natural. See [3] for a detailed description of reverbera‐
2195 tion.
2196
2197 Note that this effect increases both the volume and the length
2198 of the audio, so to prevent clipping in these domains, a typical
2199 invocation might be:
2200 play dry.wav gain -3 pad 0 3 reverb
2201 The -w option can be given to select only the `wet' signal, thus
2202 allowing it to be processed further, independently of the `dry'
2203 signal. E.g.
2204 play -m voice.wav "|sox voice.wav -p reverse reverb -w reverse"
2205 for a reverse reverb effect.
2206
2207 reverse
2208 Reverse the audio completely. Requires temporary file space to
2209 store the audio to be reversed.
2210
2211 riaa Apply RIAA vinyl playback equalisation. The sampling rate must
2212 be one of: 44.1, 48, 88.2, 96 kHz.
2213
2214 This effect supports the --plot global option.
2215
2216 silence [-l] above-periods [duration threshold[d|%]
2217 [below-periods duration threshold[d|%]]
2218
2219 Removes silence from the beginning, middle, or end of the audio.
2220 `Silence' is determined by a specified threshold.
2221
2222 The above-periods value is used to indicate if audio should be
2223 trimmed at the beginning of the audio. A value of zero indicates
2224 no silence should be trimmed from the beginning. When specifying
2225 an non-zero above-periods, it trims audio up until it finds non-
2226 silence. Normally, when trimming silence from beginning of audio
2227 the above-periods will be 1 but it can be increased to higher
2228 values to trim all audio up to a specific count of non-silence
2229 periods. For example, if you had an audio file with two songs
2230 that each contained 2 seconds of silence before the song, you
2231 could specify an above-period of 2 to strip out both silence
2232 periods and the first song.
2233
2234 When above-periods is non-zero, you must also specify a duration
2235 and threshold. Duration indications the amount of time that non-
2236 silence must be detected before it stops trimming audio. By
2237 increasing the duration, burst of noise can be treated as
2238 silence and trimmed off.
2239
2240 Threshold is used to indicate what sample value you should treat
2241 as silence. For digital audio, a value of 0 may be fine but for
2242 audio recorded from analog, you may wish to increase the value
2243 to account for background noise.
2244
2245 When optionally trimming silence from the end of the audio, you
2246 specify a below-periods count. In this case, below-period means
2247 to remove all audio after silence is detected. Normally, this
2248 will be a value 1 of but it can be increased to skip over peri‐
2249 ods of silence that are wanted. For example, if you have a song
2250 with 2 seconds of silence in the middle and 2 second at the end,
2251 you could set below-period to a value of 2 to skip over the
2252 silence in the middle of the audio.
2253
2254 For below-periods, duration specifies a period of silence that
2255 must exist before audio is not copied any more. By specifying a
2256 higher duration, silence that is wanted can be left in the
2257 audio. For example, if you have a song with an expected 1 sec‐
2258 ond of silence in the middle and 2 seconds of silence at the
2259 end, a duration of 2 seconds could be used to skip over the mid‐
2260 dle silence.
2261
2262 Unfortunately, you must know the length of the silence at the
2263 end of your audio file to trim off silence reliably. A work
2264 around is to use the silence effect in combination with the
2265 reverse effect. By first reversing the audio, you can use the
2266 above-periods to reliably trim all audio from what looks like
2267 the front of the file. Then reverse the file again to get back
2268 to normal.
2269
2270 To remove silence from the middle of a file, specify a below-
2271 periods that is negative. This value is then treated as a posi‐
2272 tive value and is also used to indicate the effect should
2273 restart processing as specified by the above-periods, making it
2274 suitable for removing periods of silence in the middle of the
2275 audio.
2276
2277 The option -l indicates that below-periods duration length of
2278 audio should be left intact at the beginning of each period of
2279 silence. For example, if you want to remove long pauses between
2280 words but do not want to remove the pauses completely.
2281
2282 The period counts are in units of samples. Duration counts may
2283 be in the format of hh:mm:ss.frac, or the exact count of sam‐
2284 ples. Threshold numbers may be suffixed with d to indicate the
2285 value is in decibels, or % to indicate a percentage of maximum
2286 value of the sample value (0% specifies pure digital silence).
2287
2288 The following example shows how this effect can be used to start
2289 a recording that does not contain the delay at the start which
2290 usually occurs between `pressing the record button' and the
2291 start of the performance:
2292 rec parameters filename other-effects silence 1 5 2%
2293
2294 sinc [-a att|-b beta] [-p phase|-M|-I|-L] [-t tbw|-n taps] [freqHP]
2295 [-freqLP [-t tbw|-n taps]]
2296 Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or
2297 band-reject filter to the signal. The freqHP and freqLP parame‐
2298 ters give the frequencies of the 6dB points of a high-pass and
2299 low-pass filter that may be invoked individually, or together.
2300 If both are given, then freqHP less than freqLP creates a band-
2301 pass filter, freqHP greater than freqLP creates a band-reject
2302 filter. For example, the invocations
2303 sinc 3k
2304 sinc -4k
2305 sinc 3k-4k
2306 sinc 4k-3k
2307 create a high-pass, low-pass, band-pass, and band-reject filter
2308 respectively.
2309
2310 The default stop-band attenuation of 120dB can be overridden
2311 with -a; alternatively, the kaiser-window `beta' parameter can
2312 be given directly with -b.
2313
2314 The default transition band-width of 5% of the total band can be
2315 overridden with -t (and tbw in Hertz); alternatively, the number
2316 of filter taps can be given directly with -n.
2317
2318 If both freqHP and freqLP are given, then a -t or -n option
2319 given to the left of the frequencies applies to both frequen‐
2320 cies; one of these options given to the right of the frequencies
2321 applies only to freqLP.
2322
2323 The -p, -M, -I, and -L options control the filter's phase
2324 response; see the rate effect for details.
2325
2326 This effect supports the --plot global option.
2327
2328 spectrogram [options]
2329 Create a spectrogram of the audio; the audio is passed unmodi‐
2330 fied through the SoX processing chain. This effect is optional
2331 - type sox --help and check the list of supported effects to see
2332 if it has been included.
2333
2334 The spectrogram is rendered in a Portable Network Graphic (PNG)
2335 file, and shows time in the X-axis, frequency in the Y-axis, and
2336 audio signal magnitude in the Z-axis. Z-axis values are repre‐
2337 sented by the colour (or optionally the intensity) of the pixels
2338 in the X-Y plane. If the audio signal contains multiple chan‐
2339 nels then these are shown from top to bottom starting from chan‐
2340 nel 1 (which is the left channel for stereo audio).
2341
2342 For example, if `my.wav' is a stereo file, then with
2343 sox my.wav -n spectrogram
2344 a spectrogram of the entire file will be created in the file
2345 `spectrogram.png'. More often though, analysis of a smaller
2346 portion of the audio is required; e.g. with
2347 sox my.wav -n remix 2 trim 20 30 spectrogram
2348 the spectrogram shows information only from the second (right)
2349 channel, and of thirty seconds of audio starting from twenty
2350 seconds in. To analyse a small portion of the frequency domain,
2351 the rate effect may be used, e.g.
2352 sox my.wav -n rate 6k spectrogram
2353 allows detailed analysis of frequencies up to 3kHz (half the
2354 sampling rate) i.e. where the human auditory system is most sen‐
2355 sitive. With
2356 sox my.wav -n trim 0 10 spectrogram -x 600 -y 200 -z 100
2357 the given options control the size of the spectrogram's X, Y & Z
2358 axes (in this case, the spectrogram area of the produced image
2359 will be 600 by 200 pixels in size and the Z-axis range will be
2360 100 dB). Note that the produced image includes axes legends
2361 etc. and so will be a little larger than the specified spectro‐
2362 gram size. In this example:
2363 sox -n -n synth 6 tri 10k:14k spectrogram -z 100 -w kaiser
2364 an analysis `window' with high dynamic range is selected to best
2365 display the spectrogram of a swept triangular wave. For a smi‐
2366 lar example, append the following to the `chime' command in the
2367 description of the delay effect (above):
2368 rate 2k spectrogram -X 200 -Z -10 -w kaiser
2369 Options are also avaliable to control the appearance (colour-
2370 set, brightness, contrast, etc.) and filename of the spectro‐
2371 gram; e.g. with
2372 sox my.wav -n spectrogram -m -l -o print.png
2373 a spectrogram is created suitable for printing on a `black and
2374 white' printer.
2375
2376 Options:
2377
2378 -x num Change the (maximum) width (X-axis) of the spectrogram
2379 from its default value of 800 pixels to a given number
2380 between 100 and 200000. See also -X and -d.
2381
2382 -X num X-axis pixels/second; the default is auto-calculated to
2383 fit the given or known audio duration to the X-axis size,
2384 or 100 otherwise. If given in conjunction with -d, this
2385 option affects the width of the spectrogram; otherwise,
2386 it affects the duration of the spectrogram. num can be
2387 from 1 (low time resolution) to 5000 (high time resolu‐
2388 tion) and need not be an integer. SoX may make a slight
2389 adjustment to the given number for processing quantisa‐
2390 tion reasons; if so, SoX will report the actual number
2391 used (viewable when the SoX global option -V is in
2392 effect). See also -x and -d.
2393
2394 -y num Sets the Y-axis size in pixels (per channel); this is the
2395 number of frequency `bins' used in the Fourier analysis
2396 that produces the spectrogram. N.B. it can be slow to
2397 produce the spectrogram if this number is not one more
2398 than a power of two (e.g. 129). By default the Y-axis
2399 size is chosen automatically (depending on the number of
2400 channels). See -Y for alternative way of setting spec‐
2401 trogram height.
2402
2403 -Y num Sets the target total height of the spectrogram(s). The
2404 default value is 550 pixels. Using this option (and by
2405 default), SoX will choose a height for individual spec‐
2406 trogram channels that is one more than a power of two, so
2407 the actual total height may fall short of the given num‐
2408 ber. However, there is also a minimum height per channel
2409 so if there are many channels, the number may be
2410 exceeded. See -y for alternative way of setting spectro‐
2411 gram height.
2412
2413 -z num Z-axis (colour) range in dB, default 120. This sets the
2414 dynamic-range of the spectrogram to be -num dBFS to
2415 0 dBFS. Num may range from 20 to 180. Decreasing
2416 dynamic-range effectively increases the `contrast' of the
2417 spectrogram display, and vice versa.
2418
2419 -Z num Sets the upper limit of the Z-axis in dBFS. A negative
2420 num effectively increases the `brightness' of the spec‐
2421 trogram display, and vice versa.
2422
2423 -q num Sets the Z-axis quantisation, i.e. the number of differ‐
2424 ent colours (or intensities) in which to render Z-axis
2425 values. A small number (e.g. 4) will give a
2426 `poster'-like effect making it easier to discern magni‐
2427 tude bands of similar level. Small numbers also usually
2428 result in small PNG files. The number given specifies
2429 the number of colours to use inside the Z-axis range; two
2430 colours are reserved to represent out-of-range values.
2431
2432 -w name
2433 Window: Hann (default), Hamming, Bartlett, Rectangular or
2434 Kaiser. The spectrogram is produced using the Discrete
2435 Fourier Transform (DFT) algorithm. A significant parame‐
2436 ter to this algorithm is the choice of `window function'.
2437 By default, SoX uses the Hann window which has good all-
2438 round frequency-resolution and dynamic-range properties.
2439 For better frequency resolution (but lower dynamic-
2440 range), select a Hamming window; for higher dynamic-range
2441 (but poorer frequency-resolution), select a Kaiser win‐
2442 dow. Bartlett and Rectangular windows are also avail‐
2443 able.
2444
2445 -W num Window adjustment parameter. This can be used to make
2446 small adjustments to the Kaiser window shape. A positive
2447 number (up to ten) increases its dynamic range, a nega‐
2448 tive number decreases it.
2449
2450 -s Allow slack overlapping of DFT windows. This can, in
2451 some cases, increase image sharpness and give greater
2452 adherence to the -x value, but at the expense of a little
2453 spectral loss.
2454
2455 -m Creates a monochrome spectrogram (the default is colour).
2456
2457 -h Selects a high-colour palette - less visually pleasing
2458 than the default colour palette, but it may make it eas‐
2459 ier to differentiate different levels. If this option is
2460 used in conjunction with -m, the result will be a hybrid
2461 monochrome/colour palette.
2462
2463 -p num Permute the colours in a colour or hybrid palette. The
2464 num parameter, from 1 (the default) to 6, selects the
2465 permutation.
2466
2467 -l Creates a `printer friendly' spectrogram with a light
2468 background (the default has a dark background).
2469
2470 -a Suppress the display of the axis lines. This is some‐
2471 times useful in helping to discern artefacts at the spec‐
2472 trogram edges.
2473
2474 -r Raw spectrogram: suppress the display of axes and leg‐
2475 ends.
2476
2477 -A Selects an alternative, fixed colour-set. This is pro‐
2478 vided only for compatibility with spectrograms produced
2479 by another package. It should not normally be used as it
2480 has some problems, not least, a lack of differentiation
2481 at the bottom end which results in masking of low-level
2482 artefacts.
2483
2484 -t text
2485 Set the image title - text to display above the spectro‐
2486 gram.
2487
2488 -c text
2489 Set (or clear) the image comment - text to display below
2490 and to the left of the spectrogram.
2491
2492 -o text
2493 Name of the spectrogram output PNG file, default `spec‐
2494 trogram.png'.
2495
2496 Advanced Options:
2497 In order to process a smaller section of audio without affecting
2498 other effects or the output signal (unlike when the trim effect
2499 is used), the following options may be used.
2500
2501 -d duration
2502 This option sets the X-axis resolution such that audio
2503 with the given duration ([[HH:]MM:]SS) fits the selected
2504 (or default) X-axis width. For example,
2505 sox input.mp3 output.wav -n spectrogram -d 1:00 stats
2506 creates a spectrogram showing the first minute of the
2507 audio, whilst
2508 the stats effect is applied to the entire audio signal.
2509
2510 See also -X for an alternative way of setting the X-axis
2511 resolution.
2512
2513 -S time
2514 Start the spectrogram at the given point in the audio
2515 stream. For example
2516 sox input.aiff output.wav spectrogram -S 1:00
2517 creates a spectrogram showing all but the first minute of
2518 the audio (the output file however, receives the entire
2519 audio stream).
2520
2521 For the ability to perform off-line processing of spectral data,
2522 see the stat effect.
2523
2524 speed factor[c]
2525 Adjust the audio speed (pitch and tempo together). factor is
2526 either the ratio of the new speed to the old speed: greater than
2527 1 speeds up, less than 1 slows down, or, if appended with the
2528 letter `c', the number of cents (i.e. 100ths of a semitone) by
2529 which the pitch (and tempo) should be adjusted: greater than 0
2530 increases, less than 0 decreases.
2531
2532 Technically, the speed effect only changes the sample rate
2533 information, leaving the samples themselves untouched. The rate
2534 effect is invoked automatically to resample to the output sample
2535 rate, using its default quality/speed. For higher quality or
2536 higher speed resampling, in addition to the speed effect, spec‐
2537 ify the rate effect with the desired quality option.
2538
2539 See also the bend, pitch, and tempo effects.
2540
2541 splice [-h|-t|-q] { position[,excess[,leeway]] }
2542 Splice together audio sections. This effect provides two things
2543 over simple audio concatenation: a (usually short) cross-fade is
2544 applied at the join, and a wave similarity comparison is made to
2545 help determine the best place at which to make the join.
2546
2547 One of the options -h, -t, or -q may be given to select the fade
2548 envelope as half-cosine wave (the default), triangular (a.k.a.
2549 linear), or quarter-cosine wave respectively.
2550
2551 Type Audio Fade level Transitions
2552 t correlated constant gain abrupt
2553 h correlated constant gain smooth
2554 q uncorrelated constant power smooth
2555
2556 To perform a splice, first use the trim effect to select the
2557 audio sections to be joined together. As when performing a tape
2558 splice, the end of the section to be spliced onto should be
2559 trimmed with a small excess (default 0.005 seconds) of audio
2560 after the ideal joining point. The beginning of the audio sec‐
2561 tion to splice on should be trimmed with the same excess (before
2562 the ideal joining point), plus an additional leeway (default
2563 0.005 seconds). SoX should then be invoked with the two audio
2564 sections as input files and the splice effect given with the
2565 position at which to perform the splice - this is length of the
2566 first audio section (including the excess).
2567
2568 The following diagram uses the tape analogy to illustrate the
2569 splice operation. The effect simulates the diagonal cuts and
2570 joins the two pieces:
2571
2572 length1 excess
2573 -----------><--->
2574 _________ : : _________________
2575 \ : : :\ `
2576 \ : : : \ `
2577 \: : : \ `
2578 * : : * - - *
2579 \ : : :\ `
2580 \ : : : \ `
2581 _______________\: : : \_____`____
2582 : : : :
2583 <---> <----->
2584 excess leeway
2585
2586 where * indicates the joining points.
2587
2588 For example, a long song begins with two verses which start (as
2589 determined e.g. by using the play command with the trim (start)
2590 effect) at times 0:30.125 and 1:03.432. The following commands
2591 cut out the first verse:
2592 sox too-long.wav part1.wav trim 0 30.130
2593 (5 ms excess, after the first verse starts)
2594 sox too-long.wav part2.wav trim 1:03.422
2595 (5 ms excess plus 5 ms leeway, before the second verse starts)
2596 sox part1.wav part2.wav just-right.wav splice 30.130
2597 For another example, the SoX command
2598 play "|sox -n -p synth 1 sin %1" "|sox -n -p synth 1 sin %3"
2599 generates and plays two notes, but there is a nasty click at the
2600 transition; the click can be removed by splicing instead of con‐
2601 catenating the audio, i.e. by appending splice 1 to the command.
2602 (Clicks at the beginning and end of the audio can be removed by
2603 preceding the splice effect with fade q .01 2 .01).
2604
2605 Provided your arithmetic is good enough, multiple splices can be
2606 performed with a single splice invocation. For example:
2607 #!/bin/sh
2608 # Audio Copy and Paste Over
2609 # acpo infile copy-start copy-stop paste-over-start outfile
2610 # All times measured in samples.
2611 rate=`soxi -r "$1"`
2612 e=`expr $rate '*' 5 / 1000` # Using default excess
2613 l=$e # and leeway.
2614 sox "$1" piece.wav trim `expr $2 - $e - $l`s \
2615 `expr $3 - $2 + $e + $l + $e`s
2616 sox "$1" part1.wav trim 0 `expr $4 + $e`s
2617 sox "$1" part2.wav trim `expr $4 + $3 - $2 - $e - $l`s
2618 sox part1.wav piece.wav part2.wav "$5" splice \
2619 `expr $4 + $e`s \
2620 `expr $4 + $e + $3 - $2 + $e + $l + $e`s
2621 In the above Bourne shell script, two splices are used to `copy
2622 and paste' audio.
2623
2624 * * *
2625
2626 It is also possible to use this effect to perform general cross-
2627 fades, e.g. to join two songs. In this case, excess would typi‐
2628 cally be an number of seconds, the -q option would typically be
2629 given (to select an `equal power' cross-fade), and leeway should
2630 be zero (which is the default if -q is given). For example, if
2631 f1.wav and f2.wav are audio files to be cross-faded, then
2632 sox f1.wav f2.wav out.wav splice -q $(soxi -D f1.wav),3
2633 cross-fades the files where the point of equal loudness is 3
2634 seconds before the end of f1.wav, i.e. the total length of the
2635 cross-fade is 2 × 3 = 6 seconds (Note: the $(...) notation is
2636 POSIX shell).
2637
2638 stat [-s scale] [-rms] [-freq] [-v] [-d]
2639 Display time and frequency domain statistical information about
2640 the audio. Audio is passed unmodified through the SoX process‐
2641 ing chain.
2642
2643 The information is output to the `standard error' (stderr)
2644 stream and is calculated, where n is the duration of the audio
2645 in samples, c is the number of audio channels, r is the audio
2646 sample rate, and xk represents the PCM value (in the range -1 to
2647 +1 by default) of each successive sample in the audio, as fol‐
2648 lows:
2649
2650 Samples read n×c
2651 Length (seconds) n÷r
2652 Scaled by See -s below.
2653 Maximum amplitude max(xk) The maximum sample
2654 value in the audio;
2655 usually this will
2656 be a positive num‐
2657 ber.
2658 Minimum amplitude min(xk) The minimum sample
2659 value in the audio;
2660 usually this will
2661 be a negative num‐
2662 ber.
2663 Midline amplitude ½min(xk)+½max(xk)
2664 Mean norm ¹/nΣ│xk│ The average of the
2665 absolute value of
2666 each sample in the
2667 audio.
2668 Mean amplitude ¹/nΣxk The average of each
2669 sample in the
2670 audio. If this
2671 figure is non-zero,
2672 then it indicates
2673 the presence of a
2674 D.C. offset (which
2675 could be removed
2676 using the dcshift
2677 effect).
2678 RMS amplitude √(¹/nΣxk²) The level of a D.C.
2679 signal that would
2680 have the same power
2681 as the audio's
2682 average power.
2683 Maximum delta max(│xk-xk-1│)
2684 Minimum delta min(│xk-xk-1│)
2685
2686 Mean delta ¹/n-1Σ│xk-xk-1│
2687 RMS delta √(¹/n-1Σ(xk-xk-1)²)
2688 Rough frequency In Hz.
2689 Volume Adjustment The parameter to
2690 the vol effect
2691 which would make
2692 the audio as loud
2693 as possible without
2694 clipping. Note:
2695 See the discussion
2696 on Clipping above
2697 for reasons why it
2698 is rarely a good
2699 idea actually to do
2700 this.
2701
2702 Note that the delta measurements are not applicable for multi-
2703 channel audio.
2704
2705 The -s option can be used to scale the input data by a given
2706 factor. The default value of scale is 2147483647 (i.e. the max‐
2707 imum value of a 32-bit signed integer). Internal effects always
2708 work with signed long PCM data and so the value should relate to
2709 this fact.
2710
2711 The -rms option will convert all output average values to `root
2712 mean square' format.
2713
2714 The -v option displays only the `Volume Adjustment' value.
2715
2716 The -freq option calculates the input's power spectrum (4096
2717 point DFT) instead of the statistics listed above. This should
2718 only be used with a single channel audio file.
2719
2720 The -d option displays a hex dump of the 32-bit signed PCM data
2721 audio in SoX's internal buffer. This is mainly used to help
2722 track down endian problems that sometimes occur in cross-plat‐
2723 form versions of SoX.
2724
2725 See also the stats effect.
2726
2727 stats [-b bits|-x bits|-s scale] [-w window-time]
2728 Display time domain statistical information about the audio
2729 channels; audio is passed unmodified through the SoX processing
2730 chain. Statistics are calculated and displayed for each audio
2731 channel and, where applicable, an overall figure is also given.
2732
2733 For example, for a typical well-mastered stereo music file:
2734
2735 Overall Left Right
2736 DC offset 0.000803 -0.000391 0.000803
2737 Min level -0.750977 -0.750977 -0.653412
2738 Max level 0.708801 0.708801 0.653534
2739 Pk lev dB -2.49 -2.49 -3.69
2740 RMS lev dB -19.41 -19.13 -19.71
2741 RMS Pk dB -13.82 -13.82 -14.38
2742 RMS Tr dB -85.25 -85.25 -82.66
2743 Crest factor - 6.79 6.32
2744 Flat factor 0.00 0.00 0.00
2745 Pk count 2 2 2
2746 Bit-depth 16/16 16/16 16/16
2747 Num samples 7.72M
2748 Length s 174.973
2749 Scale max 1.000000
2750 Window s 0.050
2751
2752 DC offset, Min level, and Max level are shown, by default, in
2753 the range ±1. If the -b (bits) options is given, then these
2754 three measurements will be scaled to a signed integer with the
2755 given number of bits; for example, for 16 bits, the scale would
2756 be -32768 to +32767. The -x option behaves the same way as -b
2757 except that the signed integer values are displayed in hexadeci‐
2758 mal. The -s option scales the three measurements by a given
2759 floating-point number.
2760
2761 Pk lev dB and RMS lev dB are standard peak and RMS level mea‐
2762 sured in dBFS. RMS Pk dB and RMS Tr dB are peak and trough val‐
2763 ues for RMS level measured over a short window (default 50ms).
2764
2765 Crest factor is the standard ratio of peak to RMS level (note:
2766 not in dB).
2767
2768 Flat factor is a measure of the flatness (i.e. consecutive sam‐
2769 ples with the same value) of the signal at its peak levels (i.e.
2770 either Min level, or Max level). Pk count is the number of
2771 occasions (not the number of samples) that the signal attained
2772 either Min level, or Max level.
2773
2774 The right-hand Bit-depth figure is the standard definition of
2775 bit-depth i.e. bits less significant than the given number are
2776 fixed at zero. The left-hand figure is the number of most sig‐
2777 nificant bits that are fixed at zero (or one for negative num‐
2778 bers) subtracted from the right-hand figure (the number sub‐
2779 tracted is directly related to Pk lev dB).
2780
2781 For multi-channel audio, an overall figure for each of the above
2782 measurements is given and derived from the channel figures as
2783 follows: DC offset: maximum magnitude; Max level, Pk lev dB,
2784 RMS Pk dB, Bit-depth: maximum; Min level, RMS Tr dB: minimum;
2785 RMS lev dB, Flat factor, Pk count: average; Crest factor: not
2786 applicable.
2787
2788 Length s is the duration in seconds of the audio, and Num sam‐
2789 ples is equal to the sample-rate multiplied by Length.
2790 Scale Max is the scaling applied to the first three measure‐
2791 ments; specifically, it is the maximum value that could apply to
2792 Max level. Window s is the length of the window used for the
2793 peak and trough RMS measurements.
2794
2795 See also the stat effect.
2796
2797 swap Swap stereo channels. See also remix for an effect that allows
2798 arbitrary channel selection and ordering (and mixing).
2799
2800 stretch factor [window fade shift fading]
2801 Change the audio duration (but not its pitch). This effect is
2802 broadly equivalent to the tempo effect with (factor inverted
2803 and) search set to zero, so in general, its results are compara‐
2804 tively poor; it is retained as it can sometimes out-perform
2805 tempo for small factors.
2806
2807 factor of stretching: >1 lengthen, <1 shorten duration. window
2808 size is in ms. Default is 20ms. The fade option, can be `lin'.
2809 shift ratio, in [0 1]. Default depends on stretch factor. 1 to
2810 shorten, 0.8 to lengthen. The fading ratio, in [0 0.5]. The
2811 amount of a fade's default depends on factor and shift.
2812
2813 See also the tempo effect.
2814
2815 synth [-j KEY] [-n] [len [off [ph [p1 [p2 [p3]]]]]] {[type] [combine]
2816 [[%]freq[k][:|+|/|-[%]freq2[k]]] [off [ph [p1 [p2 [p3]]]]]}
2817 This effect can be used to generate fixed or swept frequency
2818 audio tones with various wave shapes, or to generate wide-band
2819 noise of various `colours'. Multiple synth effects can be cas‐
2820 caded to produce more complex waveforms; at each stage it is
2821 possible to choose whether the generated waveform will be mixed
2822 with, or modulated onto the output from the previous stage.
2823 Audio for each channel in a multi-channel audio file can be syn‐
2824 thesised independently.
2825
2826 Though this effect is used to generate audio, an input file must
2827 still be given, the characteristics of which will be used to set
2828 the synthesised audio length, the number of channels, and the
2829 sampling rate; however, since the input file's audio is not nor‐
2830 mally needed, a `null file' (with the special name -n) is often
2831 given instead (and the length specified as a parameter to synth
2832 or by another given effect that can has an associated length).
2833
2834 For example, the following produces a 3 second, 48kHz, audio
2835 file containing a sine-wave swept from 300 to 3300 Hz:
2836 sox -n output.wav synth 3 sine 300-3300
2837 and this produces an 8 kHz version:
2838 sox -r 8000 -n output.wav synth 3 sine 300-3300
2839 Multiple channels can be synthesised by specifying the set of
2840 parameters shown between braces multiple times; the following
2841 puts the swept tone in the left channel and adds `brown' noise
2842 in the right:
2843 sox -n output.wav synth 3 sine 300-3300 brownnoise
2844 The following example shows how two synth effects can be cas‐
2845 caded to create a more complex waveform:
2846 play -n synth 0.5 sine 200-500 synth 0.5 sine fmod 700-100
2847 Frequencies can also be given in `scientific' note notation, or,
2848 by prefixing a `%' character, as a number of semitones relative
2849 to `middle A' (440 Hz). For example, the following could be
2850 used to help tune a guitar's low `E' string:
2851 play -n synth 4 pluck %-29
2852 or with a (Bourne shell) loop, the whole guitar:
2853 for n in E2 A2 D3 G3 B3 E4; do
2854 play -n synth 4 pluck $n repeat 2; done
2855 See the delay effect (above) and the reference to `SoX scripting
2856 examples' (below) for more synth examples.
2857
2858 N.B. This effect generates audio at maximum volume (0dBFS),
2859 which means that there is a high chance of clipping when using
2860 the audio subsequently, so in many cases, you will want to fol‐
2861 low this effect with the gain effect to prevent this from hap‐
2862 pening. (See also Clipping above.) Note that, by default, the
2863 synth effect incorporates the functionality of gain -h (see the
2864 gain effect for details); synth's -n option may be given to dis‐
2865 able this behaviour.
2866
2867 A detailed description of each synth parameter follows:
2868
2869 len is the length of audio to synthesise expressed as a time or
2870 as a number of samples; 0=inputlength, default=0.
2871
2872 The format for specifying lengths in time is hh:mm:ss.frac. The
2873 format for specifying sample counts is the number of samples
2874 with the letter `s' appended to it.
2875
2876 type is one of sine, square, triangle, sawtooth, trapezium, exp,
2877 [white]noise, tpdfnoise pinknoise, brownnoise, pluck;
2878 default=sine.
2879
2880 combine is one of create, mix, amod (amplitude modulation), fmod
2881 (frequency modulation); default=create.
2882
2883 freq/freq2 are the frequencies at the beginning/end of synthesis
2884 in Hz or, if preceded with `%', semitones relative to A
2885 (440 Hz); alternatively, `scientific' note notation (e.g. E2)
2886 may be used. The default frequency is 440Hz. By default, the
2887 tuning used with the note notations is `equal temperament'; the
2888 -j KEY option selects `just intonation', where KEY is an integer
2889 number of semitones relative to A (so for example, -9 or 3
2890 selects the key of C), or a note in scientific notation.
2891
2892 If freq2 is given, then len must also have been given and the
2893 generated tone will be swept between the given frequencies. The
2894 two given frequencies must be separated by one of the characters
2895 `:', `+', `/', or `-'. This character is used to specify the
2896 sweep function as follows:
2897
2898 : Linear: the tone will change by a fixed number of hertz
2899 per second.
2900
2901 + Square: a second-order function is used to change the
2902 tone.
2903
2904 / Exponential: the tone will change by a fixed number of
2905 semitones per second.
2906
2907 - Exponential: as `/', but initial phase always zero, and
2908 stepped (less smooth) frequency changes.
2909
2910 Not used for noise.
2911
2912 off is the bias (DC-offset) of the signal in percent; default=0.
2913
2914 ph is the phase shift in percentage of 1 cycle; default=0. Not
2915 used for noise.
2916
2917 p1 is the percentage of each cycle that is `on' (square), or
2918 `rising' (triangle, exp, trapezium); default=50 (square, trian‐
2919 gle, exp), default=10 (trapezium), or sustain (pluck);
2920 default=40.
2921
2922 p2 (trapezium): the percentage through each cycle at which
2923 `falling' begins; default=50. exp: the amplitude in multiples of
2924 2dB; default=50, or tone-1 (pluck); default=20.
2925
2926 p3 (trapezium): the percentage through each cycle at which
2927 `falling' ends; default=60, or tone-2 (pluck); default=90.
2928
2929 tempo [-q] [-m|-s|-l] factor [segment [search [overlap]]]
2930 Change the audio playback speed but not its pitch. This effect
2931 uses the WSOLA algorithm. The audio is chopped up into segments
2932 which are then shifted in the time domain and overlapped (cross-
2933 faded) at points where their waveforms are most similar as
2934 determined by measurement of `least squares'.
2935
2936 By default, linear searches are used to find the best overlap‐
2937 ping points. If the optional -q parameter is given, tree
2938 searches are used instead. This makes the effect work more
2939 quickly, but the result may not sound as good. However, if you
2940 must improve the processing speed, this generally reduces the
2941 sound quality less than reducing the search or overlap values.
2942
2943 The -m option is used to optimize default values of segment,
2944 search and overlap for music processing.
2945
2946 The -s option is used to optimize default values of segment,
2947 search and overlap for speech processing.
2948
2949 The -l option is used to optimize default values of segment,
2950 search and overlap for `linear' processing that tends to cause
2951 more noticeable distortion but may be useful when factor is
2952 close to 1.
2953
2954 If -m, -s, or -l is specified, the default value of segment will
2955 be calculated based on factor, while default search and overlap
2956 values are based on segment. Any values you provide still over‐
2957 ride these default values.
2958
2959 factor gives the ratio of new tempo to the old tempo, so e.g.
2960 1.1 speeds up the tempo by 10%, and 0.9 slows it down by 10%.
2961
2962 The optional segment parameter selects the algorithm's segment
2963 size in milliseconds. If no other flags are specified, the
2964 default value is 82 and is typically suited to making small
2965 changes to the tempo of music. For larger changes (e.g. a factor
2966 of 2), 41 ms may give a better result. The -m, -s, and -l flags
2967 will cause the segment default to be automatically adjusted
2968 based on factor. For example using -s (for speech) with a tempo
2969 of 1.25 will calculate a default segment value of 32.
2970
2971 The optional search parameter gives the audio length in mil‐
2972 liseconds over which the algorithm will search for overlapping
2973 points. If no other flags are specified, the default value is
2974 14.68. Larger values use more processing time and may or may
2975 not produce better results. A practical maximum is half the
2976 value of segment. Search can be reduced to cut processing time
2977 at the risk of degrading output quality. The -m, -s, and -l
2978 flags will cause the search default to be automatically adjusted
2979 based on segment.
2980
2981 The optional overlap parameter gives the segment overlap length
2982 in milliseconds. Default value is 12, but -m, -s, or -l flags
2983 automatically adjust overlap based on segment size. Increasing
2984 overlap increases processing time and may increase quality. A
2985 practical maximum for overlap is the value of search, with over‐
2986 lap typically being (at least) a little smaller then search.
2987
2988 See also speed for an effect that changes tempo and pitch
2989 together, pitch and bend for effects that change pitch only, and
2990 stretch for an effect that changes tempo using a different algo‐
2991 rithm.
2992
2993 treble gain [frequency[k] [width[s|h|k|o|q]]]
2994 Apply a treble tone-control effect. See the description of the
2995 bass effect for details.
2996
2997 tremolo speed [depth]
2998 Apply a tremolo (low frequency amplitude modulation) effect to
2999 the audio. The tremolo frequency in Hz is given by speed, and
3000 the depth as a percentage by depth (default 40).
3001
3002 trim {[=|-]position}
3003 Cuts portions out of the audio. Any number of positions may be
3004 given; audio is not sent to the output until the first position
3005 is reached. The effect then alternates between copying and dis‐
3006 carding audio at each position.
3007
3008 If a position is preceded by an equals or minus sign, it is
3009 interpreted relative to the beginning or the end of the audio,
3010 respectively. (The audio length must be known for end-relative
3011 locations to work.) Otherwise, it is considered an offset from
3012 the last position, or from the start of audio for the first
3013 parameter. Using a value of 0 for the first position parameter
3014 allows copying from the beginning of the audio.
3015
3016 All parameters can be specified using either an amount of time
3017 or an exact count of samples. The format for specifying lengths
3018 in time is hh:mm:ss.frac. A value of 1:30.5 for the first
3019 parameter will not start until 1 minute, thirty and ½ seconds
3020 into the audio. The format for specifying sample counts is the
3021 number of samples with the letter `s' appended to it. A value
3022 of 8000s for the first parameter will wait until 8000 samples
3023 are read before starting to process audio.
3024
3025 For example,
3026 sox infile outfile trim 0 10
3027 will copy the first ten seconds, while
3028 play infile trim 12:34 =15:00 -2:00
3029 will play from 12 minutes 34 seconds into the audio up to 15
3030 minutes into the audio (i.e. 2 minutes and 26 seconds long),
3031 then resume playing two minutes before the end of audio.
3032
3033 upsample [factor]
3034 Upsample the signal by an integer factor: factor-1 zero-value
3035 samples are inserted between each pair of input samples. As a
3036 result, the original spectrum is replicated into the new fre‐
3037 quency space (aliasing) and attenuated. This attenuation can be
3038 compensated for by adding vol factor after any further process‐
3039 ing. The upsample effect is typically used in combination with
3040 filtering effects.
3041
3042 For a general resampling effect with anti-aliasing, see rate.
3043 See also downsample.
3044
3045 vad [options]
3046 Voice Activity Detector. Attempts to trim silence and quiet
3047 background sounds from the ends of (fairly high resolution i.e.
3048 16-bit, 44-48kHz) recordings of speech. The algorithm currently
3049 uses a simple cepstral power measurement to detect voice, so may
3050 be fooled by other things, especially music. The effect can
3051 trim only from the front of the audio, so in order to trim from
3052 the back, the reverse effect must also be used. E.g.
3053 play speech.wav norm vad
3054 to trim from the front,
3055 play speech.wav norm reverse vad reverse
3056 to trim from the back, and
3057 play speech.wav norm vad reverse vad reverse
3058 to trim from both ends. The use of the norm effect is recom‐
3059 mended, but remember that neither reverse nor norm is suitable
3060 for use with streamed audio.
3061
3062 Options:
3063 Default values are shown in parenthesis.
3064
3065 -t [22mnum (7)
3066 The measurement level used to trigger activity detection.
3067 This might need to be changed depending on the noise
3068 level, signal level and other charactistics of the input
3069 audio.
3070
3071 -T num (0.25)
3072 The time constant (in seconds) used to help ignore short
3073 bursts of sound.
3074
3075 -s [22mnum (1)
3076 The amount of audio (in seconds) to search for qui‐
3077 eter/shorter bursts of audio to include prior to the
3078 detected trigger point.
3079
3080 -g num (0.25)
3081 Allowed gap (in seconds) between quieter/shorter bursts
3082 of audio to include prior to the detected trigger point.
3083
3084 -p [22mnum (0)
3085 The amount of audio (in seconds) to preserve before the
3086 trigger point and any found quieter/shorter bursts.
3087
3088 Advanced Options:
3089 These allow fine tuning of the algorithm's internal parameters.
3090
3091 -b num The algorithm (internally) uses adaptive noise estima‐
3092 tion/reduction in order to detect the start of the wanted
3093 audio. This option sets the time for the initial noise
3094 estimate.
3095
3096 -N num Time constant used by the adaptive noise estimator for
3097 when the noise level is increasing.
3098
3099 -n num Time constant used by the adaptive noise estimator for
3100 when the noise level is decreasing.
3101
3102 -r num Amount of noise reduction to use in the detection algo‐
3103 rithm (e.g. 0, 0.5, ...).
3104
3105 -f num Frequency of the algorithm's processing/measurements.
3106
3107 -m num Measurement duration; by default, twice the measurement
3108 period; i.e. with overlap.
3109
3110 -M num Time constant used to smooth spectral measurements.
3111
3112 -h num `Brick-wall' frequency of high-pass filter applied at the
3113 input to the detector algorithm.
3114
3115 -l num `Brick-wall' frequency of low-pass filter applied at the
3116 input to the detector algorithm.
3117
3118 -H num `Brick-wall' frequency of high-pass lifter used in the
3119 detector algorithm.
3120
3121 -L num `Brick-wall' frequency of low-pass lifter used in the
3122 detector algorithm.
3123
3124 See also the silence effect.
3125
3126 vol gain [type [limitergain]]
3127 Apply an amplification or an attenuation to the audio signal.
3128 Unlike the -v option (which is used for balancing multiple input
3129 files as they enter the SoX effects processing chain), vol is an
3130 effect like any other so can be applied anywhere, and several
3131 times if necessary, during the processing chain.
3132
3133 The amount to change the volume is given by gain which is inter‐
3134 preted, according to the given type, as follows: if type is
3135 amplitude (or is omitted), then gain is an amplitude (i.e. volt‐
3136 age or linear) ratio, if power, then a power (i.e. wattage or
3137 voltage-squared) ratio, and if dB, then a power change in dB.
3138
3139 When type is amplitude or power, a gain of 1 leaves the volume
3140 unchanged, less than 1 decreases it, and greater than 1
3141 increases it; a negative gain inverts the audio signal in addi‐
3142 tion to adjusting its volume.
3143
3144 When type is dB, a gain of 0 leaves the volume unchanged, less
3145 than 0 decreases it, and greater than 0 increases it.
3146
3147 See [4] for a detailed discussion on electrical (and hence audio
3148 signal) voltage and power ratios.
3149
3150 Beware of Clipping when the increasing the volume.
3151
3152 The gain and the type parameters can be concatenated if desired,
3153 e.g. vol 10dB.
3154
3155 An optional limitergain value can be specified and should be a
3156 value much less than 1 (e.g. 0.05 or 0.02) and is used only on
3157 peaks to prevent clipping. Not specifying this parameter will
3158 cause no limiter to be used. In verbose mode, this effect will
3159 display the percentage of the audio that needed to be limited.
3160
3161 See also gain for a volume-changing effect with different capa‐
3162 bilities, and compand for a dynamic-range compression/expan‐
3163 sion/limiting effect.
3164
3165 Deprecated Effects
3166 The following effects have been renamed or have their functionality
3167 included in another effect; they continue to work in this version of
3168 SoX but may be removed in future.
3169
3170 mixer [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
3171 Reduce the number of audio channels by mixing or selecting chan‐
3172 nels, or increase the number of channels by duplicating chan‐
3173 nels. Note: this effect operates on the audio channels within
3174 the SoX effects processing chain; it should not be confused with
3175 the -m global option (where multiple files are mix-combined
3176 before entering the effects chain).
3177
3178 When reducing the number of channels it is possible to use the
3179 -l, -r, -f, -b, -1, -2, -3, -4, options to select only the left,
3180 right, front, back channel(s) or specific channel for the output
3181 instead of averaging the channels. The -l, and -r options will
3182 do averaging in quad-channel files so select the exact channel
3183 to prevent this.
3184
3185 The mixer effect can also be invoked with up to 16 numbers, sep‐
3186 arated by commas, which specify the proportion (0 = 0% and 1 =
3187 100%) of each input channel that is to be mixed into each output
3188 channel. In two-channel mode, 4 numbers are given: l → l, l →
3189 r, r → l, and r → r, respectively. In four-channel mode, the
3190 first 4 numbers give the proportions for the left-front output
3191 channel, as follows: lf → lf, rf → lf, lb → lf, and rb → rf.
3192 The next 4 give the right-front output in the same order, then
3193 left-back and right-back.
3194
3195 It is also possible to use the 16 numbers to expand or reduce
3196 the channel count; just specify 0 for unused channels.
3197
3198 Finally, certain reduced combination of numbers can be specified
3199 for certain input/output channel combinations.
3200
3201 In Ch Out Ch Num Mappings
3202 2 1 2 l → l, r → l
3203 2 2 1 adjust balance
3204 4 1 4 lf → l, rf → l, lb → l, rb → l
3205 4 2 2 lf → l&rf → r, lb → l&rb → r
3206 4 4 1 adjust balance
3207 4 4 2 front balance, back balance
3208
3209 This effect has been superseded by the remix effect that handles
3210 any number of channels.
3211
3213 Exit status is 0 for no error, 1 if there is a problem with the com‐
3214 mand-line parameters, or 2 if an error occurs during file processing.
3215
3217 Please report any bugs found in this version of SoX to the mailing list
3218 (sox-users@lists.sourceforge.net).
3219
3221 soxi(1), soxformat(7), libsox(3)
3222 audacity(1), gnuplot(1), octave(1), wget(1)
3223 The SoX web site at http://sox.sourceforge.net
3224 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
3225
3226 References
3227 [1] R. Bristow-Johnson, Cookbook formulae for audio EQ biquad filter
3228 coefficients, http://musicdsp.org/files/Audio-EQ-Cookbook.txt
3229
3230 [2] Wikipedia, Q-factor, http://en.wikipedia.org/wiki/Q_factor
3231
3232 [3] Scott Lehman, Effects Explained, http://harmony-cen‐
3233 tral.com/Effects/effects-explained.html
3234
3235 [4] Wikipedia, Decibel, http://en.wikipedia.org/wiki/Decibel
3236
3237 [5] Richard Furse, Linux Audio Developer's Simple Plugin API,
3238 http://www.ladspa.org
3239
3240 [6] Richard Furse, Computer Music Toolkit, http://www.ladspa.org/cmt
3241
3242 [7] Steve Harris, LADSPA plugins, http://plugin.org.uk
3243
3245 Copyright 1998-2013 Chris Bagwell and SoX Contributors.
3246 Copyright 1991 Lance Norskog and Sundry Contributors.
3247
3248 This program is free software; you can redistribute it and/or modify it
3249 under the terms of the GNU General Public License as published by the
3250 Free Software Foundation; either version 2, or (at your option) any
3251 later version.
3252
3253 This program is distributed in the hope that it will be useful, but
3254 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
3255 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
3256 Public License for more details.
3257
3259 Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
3260 tributors are listed in the ChangeLog file that is distributed with the
3261 source code.
3262
3263
3264
3265sox February 1, 2013 SoX(1)