1SoX(1) Sound eXchange SoX(1)
2
3
4
6 SoX - Sound eXchange, the Swiss Army knife of audio manipulation
7
9 sox [global-options] [format-options] infile1
10 [[format-options] infile2] ... [format-options] outfile
11 [effect [effect-options]] ...
12
13 play [global-options] [format-options] infile1
14 [[format-options] infile2] ... [format-options]
15 [effect [effect-options]] ...
16
17 rec [global-options] [format-options] outfile
18 [effect [effect-options]] ...
19
21 Introduction
22 SoX reads and writes audio files in most popular formats and can
23 optionally apply effects to them. It can combine multiple input
24 sources, synthesise audio, and, on many systems, act as a general pur‐
25 pose audio player or a multi-track audio recorder. It also has limited
26 ability to split the input into multiple output files.
27
28 All SoX functionality is available using just the sox command. To sim‐
29 plify playing and recording audio, if SoX is invoked as play, the out‐
30 put file is automatically set to be the default sound device, and if
31 invoked as rec, the default sound device is used as an input source.
32 Additionally, the soxi(1) command provides a convenient way to just
33 query audio file header information.
34
35 The heart of SoX is a library called libSoX. Those interested in
36 extending SoX or using it in other programs should refer to the libSoX
37 manual page: libsox(3).
38
39 SoX is a command-line audio processing tool, particularly suited to
40 making quick, simple edits and to batch processing. If you need an
41 interactive, graphical audio editor, use audacity(1).
42
43 * * *
44
45 The overall SoX processing chain can be summarised as follows:
46
47 Input(s) → Combiner → Effects → Output(s)
48
49 Note however, that on the SoX command line, the positions of the Out‐
50 put(s) and the Effects are swapped w.r.t. the logical flow just shown.
51 Note also that whilst options pertaining to files are placed before
52 their respective file name, the opposite is true for effects. To show
53 how this works in practice, here is a selection of examples of how SoX
54 might be used. The simple
55 sox recital.au recital.wav
56 translates an audio file in Sun AU format to a Microsoft WAV file,
57 whilst
58 sox recital.au -b 16 recital.wav channels 1 rate 16k fade 3 norm
59 performs the same format translation, but also applies four effects
60 (down-mix to one channel, sample rate change, fade-in, nomalize), and
61 stores the result at a bit-depth of 16.
62 sox -r 16k -e signed -b 8 -c 1 voice-memo.raw voice-memo.wav
63 converts `raw' (a.k.a. `headerless') audio to a self-describing file
64 format,
65 sox slow.aiff fixed.aiff speed 1.027
66 adjusts audio speed,
67 sox short.wav long.wav longer.wav
68 concatenates two audio files, and
69 sox -m music.mp3 voice.wav mixed.flac
70 mixes together two audio files.
71 play "The Moonbeams/Greatest/*.ogg" bass +3
72 plays a collection of audio files whilst applying a bass boosting
73 effect,
74 play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1
75 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
76 rec -c 2 radio.aiff trim 0 30:00
77 records half an hour of stereo audio, and
78 play -q take1.aiff & rec -M take1.aiff take1-dub.aiff
79 (with POSIX shell and where supported by hardware) records a new track
80 in a multi-track recording. Finally,
81 rec -r 44100 -b 16 -s -p silence 1 0.50 0.1% 1 10:00 0.1% | \
82 sox -p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \
83 newfile : restart
84 records a stream of audio such as LP/cassette and splits in to multiple
85 audio files at points with 2 seconds of silence. Also, it does not
86 start recording until it detects audio is playing and stops after it
87 sees 10 minutes of silence.
88
89 N.B. The above is just an overview of SoX's capabilities; detailed
90 explanations of how to use all SoX parameters, file formats, and
91 effects can be found below in this manual, in soxformat(7), and in
92 soxi(1).
93
94 File Format Types
95 SoX can work with `self-describing' and `raw' audio files. `self-
96 describing' formats (e.g. WAV, FLAC, MP3) have a header that completely
97 describes the signal and encoding attributes of the audio data that
98 follows. `raw' or `headerless' formats do not contain this information,
99 so the audio characteristics of these must be described on the SoX com‐
100 mand line or inferred from those of the input file.
101
102 The following four characteristics are used to describe the format of
103 audio data such that it can be processed with SoX:
104
105 sample rate
106 The sample rate in samples per second (`Hertz' or `Hz'). Digi‐
107 tal telephony traditionally uses a sample rate of 8000 Hz
108 (8 kHz), though these days, 16 and even 32 kHz are becoming more
109 common. Audio Compact Discs use 44100 Hz (44.1 kHz). Digital
110 Audio Tape and many computer systems use 48 kHz. Professional
111 audio systems often use 96 kHz.
112
113 sample size
114 The number of bits used to store each sample. Today, 16-bit is
115 commonly used. 8-bit was popular in the early days of computer
116 audio. 24-bit is used in the professional audio arena. Other
117 sizes are also used.
118
119 data encoding
120 The way in which each audio sample is represented (or
121 `encoded'). Some encodings have variants with different byte-
122 orderings or bit-orderings. Some compress the audio data so
123 that the stored audio data takes up less space (i.e. disk space
124 or transmission bandwidth) than the other format parameters and
125 the number of samples would imply. Commonly-used encoding types
126 include floating-point, μ-law, ADPCM, signed-integer PCM, MP3,
127 and FLAC.
128
129 channels
130 The number of audio channels contained in the file. One
131 (`mono') and two (`stereo') are widely used. `Surround sound'
132 audio typically contains six or more channels.
133
134 The term `bit-rate' is a measure of the amount of storage occupied by
135 an encoded audio signal over a unit of time. It can depend on all of
136 the above and is typically denoted as a number of kilo-bits per second
137 (kbps). An A-law telephony signal has a bit-rate of 64 kbs.
138 MP3-encoded stereo music typically has a bit-rate of 128-196 kbps.
139 FLAC-encoded stereo music typically has a bit-rate of 550-760 kbps.
140
141 Most self-describing formats also allow textual `comments' to be embed‐
142 ded in the file that can be used to describe the audio in some way,
143 e.g. for music, the title, the author, etc.
144
145 One important use of audio file comments is to convey `Replay Gain'
146 information. SoX supports applying Replay Gain information, but not
147 generating it. Note that by default, SoX copies input file comments to
148 output files that support comments, so output files may contain Replay
149 Gain information if some was present in the input file. In this case,
150 if anything other than a simple format conversion was performed then
151 the output file Replay Gain information is likely to be incorrect and
152 so should be recalculated using a tool that supports this (not SoX).
153
154 The soxi(1) command can be used to display information from audio file
155 headers.
156
157 Determining & Setting The File Format
158 There are several mechanisms available for SoX to use to determine or
159 set the format characteristics of an audio file. Depending on the cir‐
160 cumstances, individual characteristics may be determined or set using
161 different mechanisms.
162
163 To determine the format of an input file, SoX will use, in order of
164 precedence and as given or available:
165
166 1. Command-line format options.
167
168 2. The contents of the file header.
169
170 3. The filename extension.
171
172 To set the output file format, SoX will use, in order of precedence and
173 as given or available:
174
175 1. Command-line format options.
176
177 2. The filename extension.
178
179 3. The input file format characteristics, or the closest that is sup‐
180 ported by the output file type.
181
182 For all files, SoX will exit with an error if the file type cannot be
183 determined. Command-line format options may need to be added or changed
184 to resolve the problem.
185
186 Playing & Recording Audio
187 The play and rec commands are provided so that basic playing and
188 recording is as simple as
189 play existing-file.wav
190 and
191 rec new-file.wav
192 These two commands are functionally equivalent to
193 sox existing-file.wav -d
194 and
195 sox -d new-file.wav
196 Of course, further options and effects (as described below) can be
197 added to the commands in either form.
198
199 * * *
200
201 Some systems provide more than one type of (SoX-compatible) audio
202 driver, e.g. ALSA & OSS, or SUNAU & AO. Systems can also have more
203 than one audio device (a.k.a. `sound card'). If more than one audio
204 driver has been built-in to SoX, and the default selected by SoX when
205 recording or playing is not the one that is wanted, then the AUDIO‐
206 DRIVER environment variable can be used to override the default. For
207 example (on many systems):
208 set AUDIODRIVER=oss
209 play ...
210 The AUDIODEV environment variable can be used to override the default
211 audio device, e.g.
212 set AUDIODEV=/dev/dsp2
213 play ...
214 sox ... -t oss
215 or
216 set AUDIODEV=hw:soundwave,1,2
217 play ...
218 sox ... -t alsa
219 Note that the way of setting environment variables varies from system
220 to system - for some specific examples, see `SOX_OPTS' below.
221
222 When playing a file with a sample rate that is not supported by the
223 audio output device, SoX will automatically invoke the rate effect to
224 perform the necessary sample rate conversion. For compatibility with
225 old hardware, the default rate quality level is set to `low'. This can
226 be changed by explicitly specifying the rate effect with a different
227 quality level, e.g.
228 play ... rate -m
229 or by using the --play-rate-arg option (see below).
230
231 * * *
232
233 On some systems, SoX allows audio playback volume to be adjusted whilst
234 using play. Where supported, this is achieved by tapping the `v' & `V'
235 keys during playback.
236
237 To help with setting a suitable recording level, SoX includes a peak-
238 level meter which can be invoked (before making the actual recording)
239 as follows:
240 rec -n
241 The recording level should be adjusted (using the system-provided mixer
242 program, not SoX) so that the meter is at most occasionally full scale,
243 and never `in the red' (an exclamation mark is shown). See also -S
244 below.
245
246 Accuracy
247 Many file formats that compress audio discard some of the audio signal
248 information whilst doing so. Converting to such a format and then con‐
249 verting back again will not produce an exact copy of the original
250 audio. This is the case for many formats used in telephony (e.g. A-
251 law, GSM) where low signal bandwidth is more important than high audio
252 fidelity, and for many formats used in portable music players (e.g.
253 MP3, Vorbis) where adequate fidelity can be retained even with the
254 large compression ratios that are needed to make portable players prac‐
255 tical.
256
257 Formats that discard audio signal information are called `lossy'. For‐
258 mats that do not are called `lossless'. The term `quality' is used as
259 a measure of how closely the original audio signal can be reproduced
260 when using a lossy format.
261
262 Audio file conversion with SoX is lossless when it can be, i.e. when
263 not using lossy compression, when not reducing the sampling rate or
264 number of channels, and when the number of bits used in the destination
265 format is not less than in the source format. E.g. converting from an
266 8-bit PCM format to a 16-bit PCM format is lossless but converting from
267 an 8-bit PCM format to (8-bit) A-law isn't.
268
269 N.B. SoX converts all audio files to an internal uncompressed format
270 before performing any audio processing. This means that manipulating a
271 file that is stored in a lossy format can cause further losses in audio
272 fidelity. E.g. with
273 sox long.mp3 short.mp3 trim 10
274 SoX first decompresses the input MP3 file, then applies the trim
275 effect, and finally creates the output MP3 file by re-compressing the
276 audio - with a possible reduction in fidelity above that which occurred
277 when the input file was created. Hence, if what is ultimately desired
278 is lossily compressed audio, it is highly recommended to perform all
279 audio processing using lossless file formats and then convert to the
280 lossy format only at the final stage.
281
282 N.B. Applying multiple effects with a single SoX invocation will, in
283 general, produce more accurate results than those produced using multi‐
284 ple SoX invocations.
285
286 Dithering
287 Dithering is a technique used to maximise the dynamic range of audio
288 stored at a particular bit-depth. Any distortion introduced by quanti‐
289 sation is decorrelated by adding a small amount of white noise to the
290 signal. In most cases, SoX can determine whether the selected process‐
291 ing requires dither and will add it during output formatting if appro‐
292 priate.
293
294 Specifically, by default, SoX automatically adds TPDF dither when the
295 output bit-depth is less than 24 and any of the following are true:
296
297 · bit-depth reduction has been specified explicitly using a command-
298 line option
299
300 · the output file format supports only bit-depths lower than that of
301 the input file format
302
303 · an effect has increased effective bit-depth within the internal
304 processing chain
305
306 For example, adjusting volume with vol 0.25 requires two additional
307 bits in which to losslessly store its results (since 0.25 decimal
308 equals 0.01 binary). So if the input file bit-depth is 16, then SoX's
309 internal representation will utilise 18 bits after processing this vol‐
310 ume change. In order to store the output at the same depth as the
311 input, dithering is used to remove the additional bits.
312
313 Use the -V option to see what processing SoX has automatically added.
314 The -D option may be given to override automatic dithering. To invoke
315 dithering manually (e.g. to select a noise-shaping curve), see the
316 dither effect.
317
318 Clipping
319 Clipping is distortion that occurs when an audio signal level (or `vol‐
320 ume') exceeds the range of the chosen representation. In most cases,
321 clipping is undesirable and so should be corrected by adjusting the
322 level prior to the point (in the processing chain) at which it occurs.
323
324 In SoX, clipping could occur, as you might expect, when using the vol
325 or gain effects to increase the audio volume. Clipping could also occur
326 with many other effects, when converting one format to another, and
327 even when simply playing the audio.
328
329 Playing an audio file often involves resampling, and processing by ana‐
330 logue components can introduce a small DC offset and/or amplification,
331 all of which can produce distortion if the audio signal level was ini‐
332 tially too close to the clipping point.
333
334 For these reasons, it is usual to make sure that an audio file's signal
335 level has some `headroom', i.e. it does not exceed a particular level
336 below the maximum possible level for the given representation. Some
337 standards bodies recommend as much as 9dB headroom, but in most cases,
338 3dB (≈ 70% linear) is enough. Note that this wisdom seems to have been
339 lost in modern music production; in fact, many CDs, MP3s, etc. are now
340 mastered at levels above 0dBFS i.e. the audio is clipped as delivered.
341
342 SoX's stat and stats effects can assist in determining the signal level
343 in an audio file. The gain or vol effect can be used to prevent clip‐
344 ping, e.g.
345 sox dull.wav bright.wav gain -6 treble +6
346 guarantees that the treble boost will not clip.
347
348 If clipping occurs at any point during processing, SoX will display a
349 warning message to that effect.
350
351 See also -G and the gain and norm effects.
352
353 Input File Combining
354 SoX's input combiner can be configured (see OPTIONS below) to combine
355 multiple files using any of the following methods: `concatenate',
356 `sequence', `mix', `mix-power', `merge', or `multiply'. The default
357 method is `sequence' for play, and `concatenate' for rec and sox.
358
359 For all methods other than `sequence', multiple input files must have
360 the same sampling rate. If necessary, separate SoX invocations can be
361 used to make sampling rate adjustments prior to combining.
362
363 If the `concatenate' combining method is selected (usually, this will
364 be by default) then the input files must also have the same number of
365 channels. The audio from each input will be concatenated in the order
366 given to form the output file.
367
368 The `sequence' combining method is selected automatically for play. It
369 is similar to `concatenate' in that the audio from each input file is
370 sent serially to the output file. However, here the output file may be
371 closed and reopened at the corresponding transition between input
372 files. This may be just what is needed when sending different types of
373 audio to an output device, but is not generally useful when the output
374 is a normal file.
375
376 If either the `mix' or `mix-power' combining method is selected then
377 two or more input files must be given and will be mixed together to
378 form the output file. The number of channels in each input file need
379 not be the same, but SoX will issue a warning if they are not and some
380 channels in the output file will not contain audio from every input
381 file. A mixed audio file cannot be un-mixed without reference to the
382 original input files.
383
384 If the `merge' combining method is selected then two or more input
385 files must be given and will be merged together to form the output
386 file. The number of channels in each input file need not be the same.
387 A merged audio file comprises all of the channels from all of the input
388 files. Un-merging is possible using multiple invocations of SoX with
389 the remix effect. For example, two mono files could be merged to form
390 one stereo file. The first and second mono files would become the left
391 and right channels of the stereo file.
392
393 The `multiply' combining method multiplies the sample values of corre‐
394 sponding channels (treated as numbers in the interval -1 to +1). If
395 the number of channels in the input files is not the same, the missing
396 channels are considered to contain all zero.
397
398 When combining input files, SoX applies any specified effects (includ‐
399 ing, for example, the vol volume adjustment effect) after the audio has
400 been combined. However, it is often useful to be able to set the volume
401 of (i.e. `balance') the inputs individually, before combining takes
402 place.
403
404 For all combining methods, input file volume adjustments can be made
405 manually using the -v option (below) which can be given for one or more
406 input files. If it is given for only some of the input files then the
407 others receive no volume adjustment. In some circumstances, automatic
408 volume adjustments may be applied (see below).
409
410 The -V option (below) can be used to show the input file volume adjust‐
411 ments that have been selected (either manually or automatically).
412
413 There are some special considerations that need to made when mixing
414 input files:
415
416 Unlike the other methods, `mix' combining has the potential to cause
417 clipping in the combiner if no balancing is performed. In this case,
418 if manual volume adjustments are not given, SoX will try to ensure that
419 clipping does not occur by automatically adjusting the volume (ampli‐
420 tude) of each input signal by a factor of ¹/n, where n is the number of
421 input files. If this results in audio that is too quiet or otherwise
422 unbalanced then the input file volumes can be set manually as described
423 above. Using the norm effect on the mix is another alternative.
424
425 If mixed audio seems loud enough at some points but too quiet in others
426 then dynamic range compression should be applied to correct this - see
427 the compand effect.
428
429 With the `mix-power' combine method, the mixed volume is approximately
430 equal to that of one of the input signals. This is achieved by balanc‐
431 ing using a factor of ¹/√n instead of ¹/n. Note that this balancing
432 factor does not guarantee that clipping will not occur, but the number
433 of clips will usually be low and the resultant distortion is generally
434 imperceptible.
435
436 Output Files
437 SoX's default behaviour is to take one or more input files and write
438 them to a single output file.
439
440 This behaviour can be changed by specifying the pseudo-effect `newfile'
441 within the effects list. SoX will then enter multiple output mode.
442
443 In multiple output mode, a new file is created when the effects prior
444 to the `newfile' indicate they are done. The effects chain listed
445 after `newfile' is then started up and its output is saved to the new
446 file.
447
448 In multiple output mode, a unique number will automatically be appended
449 to the end of all filenames. If the filename has an extension then the
450 number is inserted before the extension. This behaviour can be custom‐
451 ized by placing a %n anywhere in the filename where the number should
452 be substituted. An optional number can be placed after the % to indi‐
453 cate a minimum fixed width for the number.
454
455 Multiple output mode is not very useful unless an effect that will stop
456 the effects chain early is specified before the `newfile'. If end of
457 file is reached before the effects chain stops itself then no new file
458 will be created as it would be empty.
459
460 The following is an example of splitting the first 60 seconds of an
461 input file into two 30 second files and ignoring the rest.
462 sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
463
464 Stopping SoX
465 Usually SoX will complete its processing and exit automatically once it
466 has read all available audio data from the input files.
467
468 If desired, it can be terminated earlier by sending an interrupt signal
469 to the process (usually by pressing the keyboard interrupt key which is
470 normally Ctrl-C). This is a natural requirement in some circumstances,
471 e.g. when using SoX to make a recording. Note that when using SoX to
472 play multiple files, Ctrl-C behaves slightly differently: pressing it
473 once causes SoX to skip to the next file; pressing it twice in quick
474 succession causes SoX to exit.
475
476 Another option to stop processing early is to use an effect that has a
477 time period or sample count to determine the stopping point. The trim
478 effect is an example of this. Once all effects chains have stopped
479 then SoX will also stop.
480
482 Filenames can be simple file names, absolute or relative path names, or
483 URLs (input files only). Note that URL support requires that wget(1)
484 is available.
485
486 Note: Giving SoX an input or output filename that is the same as a SoX
487 effect-name will not work since SoX will treat it as an effect
488 specification. The only work-around to this is to avoid such
489 filenames. This is generally not difficult since most audio filenames
490 have a filename `extension', whilst effect-names do not.
491
492 Special Filenames
493 The following special filenames may be used in certain circumstances in
494 place of a normal filename on the command line:
495
496 - SoX can be used in simple pipeline operations by using the
497 special filename `-' which, if used as an input filename, will
498 cause SoX will read audio data from `standard input' (stdin),
499 and which, if used as the output filename, will cause SoX will
500 send audio data to `standard output' (stdout). Note that when
501 using this option for the output file, and sometimes when using
502 it for an input file, the file-type (see -t below) must also be
503 given.
504
505 "|program [options] ..."
506 This can be used in place of an input filename to specify the
507 the given program's standard output (stdout) be used as an input
508 file. Unlike - (above), this can be used for several inputs to
509 one SoX command. For example, if `genw' generates mono WAV
510 formatted signals to its standard output, then the following
511 command makes a stereo file from two generated signals:
512 sox -M "|genw --imd -" "|genw --thd -" out.wav
513 For headerless (raw) audio, -t (and perhaps other format
514 options) will need to be given, preceding the input command.
515
516 "wildcard-filename"
517 Specifies that filename `globbing' (wild-card matching) should
518 be performed by SoX instead of by the shell. This allows a sin‐
519 gle set of file options to be applied to a group of files. For
520 example, if the current directory contains three `vox' files,
521 file1.vox, file2.vox, and file3.vox, then
522 play --rate 6k *.vox
523 will be expanded by the `shell' (in most environments) to
524 play --rate 6k file1.vox file2.vox file3.vox
525 which will treat only the first vox file as having a sample rate
526 of 6k. With
527 play --rate 6k "*.vox"
528 the given sample rate option will be applied to all three vox
529 files.
530
531 -p, --sox-pipe
532 This can be used in place of an output filename to specify that
533 the SoX command should be used as in input pipe to another SoX
534 command. For example, the command:
535 play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat
536 plays two `files' in succession, each with different effects.
537
538 -p is in fact an alias for `-t sox -'.
539
540 -d, --default-device
541 This can be used in place of an input or output filename to
542 specify that the default audio device (if one has been built
543 into SoX) is to be used. This is akin to invoking rec or play
544 (as described above).
545
546 -n, --null
547 This can be used in place of an input or output filename to
548 specify that a `null file' is to be used. Note that here, `null
549 file' refers to a SoX-specific mechanism and is not related to
550 any operating-system mechanism with a similar name.
551
552 Using a null file to input audio is equivalent to using a normal
553 audio file that contains an infinite amount of silence, and as
554 such is not generally useful unless used with an effect that
555 specifies a finite time length (such as trim or synth).
556
557 Using a null file to output audio amounts to discarding the
558 audio and is useful mainly with effects that produce information
559 about the audio instead of affecting it (such as noiseprof or
560 stat).
561
562 The sampling rate associated with a null file is by default
563 48 kHz, but, as with a normal file, this can be overridden if
564 desired using command-line format options (see below).
565
566 Supported File & Audio Device Types
567 See soxformat(7) for a list and description of the supported file for‐
568 mats and audio device drivers.
569
571 Global Options
572 These options can be specified on the command line at any point before
573 the first effect name.
574
575 The SOX_OPTS environment variable can be used to provide alternative
576 default values for SoX's global options. For example:
577 SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"
578 Note that setting SOX_OPTS can potentially create unwanted changes in
579 the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
580 might best be used for things (such as in the given example) that
581 reflect the environment in which SoX is being run. Enabling options
582 such as --no-clobber as default might be handled better using a shell
583 alias since a shell alias will not affect operation in scripts etc.
584
585 One way to ensure that a script cannot be affected by SOX_OPTS is to
586 clear SOX_OPTS at the start of the script, but this of course loses the
587 benefit of SOX_OPTS carrying some system-wide default options. An
588 alternative approach is to explicitly invoke SoX with default option
589 values, e.g.
590 SOX_OPTS="-V --no-clobber"
591 ...
592 sox -V2 --clobber $input $output ...
593 Note that the way to set environment variables varies from system to
594 system. Here are some examples:
595
596 Unix bash:
597 export SOX_OPTS="-V --no-clobber"
598 Unix csh:
599 setenv SOX_OPTS "-V --no-clobber"
600 MS-DOS/MS-Windows:
601 set SOX_OPTS=-V --no-clobber
602 MS-Windows GUI: via Control Panel : System : Advanced : Environment
603 Variables
604
605 Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
606
607 --buffer BYTES, --input-buffer BYTES
608 Set the size in bytes of the buffers used for processing audio
609 (default 8192). --buffer applies to input, effects, and output
610 processing; --input-buffer applies only to input processing (for
611 which it overrides --buffer if both are given).
612
613 Be aware that large values for --buffer will cause SoX to be
614 become slow to respond to requests to terminate or to skip the
615 current input file.
616
617 --clobber
618 Don't prompt before overwriting an existing file with the same
619 name as that given for the output file. This is the default be‐
620 haviour.
621
622 --combine concatenate|merge|mix|mix-power|multiply|sequence
623 Select the input file combining method; for some of these, short
624 options are available: -m selects `mix', -M selects `merge', and
625 -T selects `multiply'.
626
627 See Input File Combining above for a description of the differ‐
628 ent combining methods.
629
630 -D, --no-dither
631 Disable automatic dither - see `Dither' above. An example of
632 why this might occasionally be useful is if a file has been con‐
633 verted from 16 to 24 bit with the intention of doing some pro‐
634 cessing on it, but in fact no processing is needed after all and
635 the original 16 bit file has been lost, then, strictly speaking,
636 no dither is needed if converting the file back to 16 bit. See
637 also the stats effect for how to determine the actual bit depth
638 of the audio within a file.
639
640 --effects-file FILENAME
641 Use FILENAME to obtain all effects and their arguments. The
642 file is parsed as if the values were specified on the command
643 line. A new line can be used in place of the special ":" marker
644 to separate effect chains. This option causes any effects spec‐
645 ified on the command line to be discarded.
646
647 -G, --guard
648 Automatically invoke the gain effect to guard against clipping.
649 E.g.
650 sox -G infile -b 16 outfile rate 44100 dither -s
651 is shorthand for
652 sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s
653 See also -V, --norm, and the gain effect.
654
655 -h, --help
656 Show version number and usage information.
657
658 --help-effect NAME
659 Show usage information on the specified effect. The name all
660 can be used to show usage on all effects.
661
662 --help-format NAME
663 Show information about the specified file format. The name all
664 can be used to show information on all formats.
665
666 --i, --info
667 Only if given as the first parameter to sox, behave as soxi(1).
668
669 --interactive
670 Deprecated alias for --no-clobber.
671
672 -m|-M Equivalent to --combine mix and --combine merge, respectively.
673
674 --magic
675 If SoX has been built with the optional `libmagic' library then
676 this option can be given to enable its use in helping to detect
677 audio file types.
678
679 --multi-threaded | --single-threaded
680 By default, SoX is `single threaded'. If the --multi-threaded
681 option is given however then SoX will process audio channels for
682 most multi-channel effects in parallel on hyper-threading/multi-
683 core architectures. This may reduce processing time, though
684 sometimes it may be necessary to use this option in conjuction
685 with a larger buffer size than is the default to gain any bene‐
686 fit from multi-threaded processing (e.g. 131072; see --buffer
687 above).
688
689 --no-clobber
690 Prompt before overwriting an existing file with the same name as
691 that given for the output file.
692
693 N.B. Unintentionally overwriting a file is easier than you
694 might think, for example, if you accidentally enter
695 sox file1 file2 effect1 effect2 ...
696 when what you really meant was
697 play file1 file2 effect1 effect2 ...
698 then, without this option, file2 will be overwritten. Hence,
699 using this option is recommended. SOX_OPTS (above), a `shell'
700 alias, script, or batch file may be an appropriate way of perma‐
701 nently enabling it.
702
703 --norm Automatically invoke the gain effect to guard against clipping
704 and to normalise the audio. E.g.
705 sox --norm infile -b 16 outfile rate 44100 dither -s
706 is shorthand for
707 sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s
708 See also -V, -G, and the gain effect.
709
710 --play-rate-arg ARG
711 Selects a quality option to be used when the `rate' effect is
712 automatically invoked whilst playing audio. This option is typ‐
713 ically set via the SOX_OPTS environment variable (see above).
714
715 --plot gnuplot|octave|off
716 If not set to off (the default if --plot is not given), run in a
717 mode that can be used, in conjunction with the gnuplot program
718 or the GNU Octave program, to assist with the selection and con‐
719 figuration of many of the transfer-function based effects. For
720 the first given effect that supports the selected plotting pro‐
721 gram, SoX will output commands to plot the effect's transfer
722 function, and then exit without actually processing any audio.
723 E.g.
724 sox --plot octave input-file -n highpass 1320 > highpass.plt
725 octave highpass.plt
726
727 -q, --no-show-progress
728 Run in quiet mode when SoX wouldn't otherwise do so. This is
729 the opposite of the -S option.
730
731 -R Run in `repeatable' mode. When this option is given, where
732 applicable, SoX will embed a fixed time-stamp in the output file
733 (e.g. AIFF) and will `seed' pseudo random number generators
734 (e.g. dither) with a fixed number, thus ensuring that succes‐
735 sive SoX invocations with the same inputs and the same parame‐
736 ters yield the same output.
737
738 --replay-gain track|album|off
739 Select whether or not to apply replay-gain adjustment to input
740 files. The default is off for sox and rec, album for play where
741 (at least) the first two input files are tagged with the same
742 Artist and Album names, and track for play otherwise.
743
744 -S, --show-progress
745 Display input file format/header information, and processing
746 progress as input file(s) percentage complete, elapsed time, and
747 remaining time (if known; shown in brackets), and the number of
748 samples written to the output file. Also shown is a peak-level
749 meter, and an indication if clipping has occurred. The peak-
750 level meter shows up to two channels and is calibrated for digi‐
751 tal audio as follows (right channel shown):
752
753 dB FSD Display dB FSD Display
754 -25 - -11 ====
755 -23 = -9 ====-
756 -21 =- -7 =====
757 -19 == -5 =====-
758 -17 ==- -3 ======
759 -15 === -1 =====!
760 -13 ===-
761
762 A three-second peak-held value of headroom in dBs will be shown
763 to the right of the meter if this is below 6dB.
764
765 This option is enabled by default when using SoX to play or
766 record audio.
767
768 -T Equivalent to --combine multiply.
769
770 --temp DIRECTORY
771 Specify that any temporary files should be created in the given
772 DIRECTORY. This can be useful if there are permission or free-
773 space problems with the default location. In this case, using
774 `--temp .' (to use the current directory) is often a good solu‐
775 tion.
776
777 --version
778 Show SoX's version number and exit.
779
780 -V[level]
781 Set verbosity. This is particularly useful for seeing how any
782 automatic effects have been invoked by SoX.
783
784 SoX displays messages on the console (stderr) according to the
785 following verbosity levels:
786
787 0 No messages are shown at all; use the exit status to
788 determine if an error has occurred.
789
790 1 Only error messages are shown. These are generated if
791 SoX cannot complete the requested commands.
792
793 2 Warning messages are also shown. These are generated if
794 SoX can complete the requested commands, but not exactly
795 according to the requested command parameters, or if
796 clipping occurs.
797
798 3 Descriptions of SoX's processing phases are also shown.
799 Useful for seeing exactly how SoX is processing your
800 audio.
801
802 4 and above
803 Messages to help with debugging SoX are also shown.
804
805 By default, the verbosity level is set to 2 (shows errors and
806 warnings). Each occurrence of the -V option increases the ver‐
807 bosity level by 1. Alternatively, the verbosity level can be
808 set to an absolute number by specifying it immediately after the
809 -V, e.g. -V0 sets it to 0.
810
811 Input File Options
812 These options apply only to input files and may precede only input
813 filenames on the command line.
814
815 --ignore-length
816 Override an (incorrect) audio length given in an audio file's
817 header. If this option is given then SoX will keep reading audio
818 until it reaches the end of the input file.
819
820 -v, --volume FACTOR
821 Intended for use when combining multiple input files, this
822 option adjusts the volume of the file that follows it on the
823 command line by a factor of FACTOR. This allows it to be `bal‐
824 anced' w.r.t. the other input files. This is a linear (ampli‐
825 tude) adjustment, so a number less than 1 decreases the volume
826 and a number greater than 1 increases it. If a negative number
827 is given then in addition to the volume adjustment, the audio
828 signal will be inverted.
829
830 See also the norm, vol, and gain effects, and see Input File
831 Balancing above.
832
833 Input & Output File Format Options
834 These options apply to the input or output file whose name they immedi‐
835 ately precede on the command line and are used mainly when working with
836 headerless file formats or when specifying a format for the output file
837 that is different to that of the input file.
838
839 -b BITS, --bits BITS
840 The number of bits (a.k.a. bit-depth or sometimes word-length)
841 in each encoded sample. Not applicable to complex encodings
842 such as MP3 or GSM. Not necessary with encodings that have a
843 fixed number of bits, e.g. A/μ-law, ADPCM.
844
845 For an input file, the most common use for this option is to
846 inform SoX of the number of bits per sample in a `raw' (`header‐
847 less') audio file. For example
848 sox -r 16k -e signed -b 8 input.raw output.wav
849 converts a particular `raw' file to a self-describing `WAV'
850 file.
851
852 For an output file, this option can be used (perhaps along with
853 -e) to set the output encoding size. By default (i.e. if this
854 option is not given), the output encoding size will (providing
855 it is supported by the output file type) be set to the input
856 encoding size. For example
857 sox input.cdda -b 24 output.wav
858 converts raw CD digital audio (16-bit, signed-integer) to a
859 24-bit (signed-integer) `WAV' file.
860
861 -1/-2/-3/-4/-8
862 The number of bytes in each encoded sample. Deprecated aliases
863 for -b 8, -b 16, -b 24, -b 32, -b 64 respectively.
864
865 -c CHANNELS, --channels CHANNELS
866 The number of audio channels in the audio file. This can be any
867 number greater than zero.
868
869 For an input file, the most common use for this option is to
870 inform SoX of the number of channels in a `raw' (`headerless')
871 audio file. Occasionally, it may be useful to use this option
872 with a `headered' file, in order to override the (presumably
873 incorrect) value in the header - note that this is only sup‐
874 ported with certain file types. Examples:
875 sox -r 48k -e float -b 32 -c 2 input.raw output.wav
876 converts a particular `raw' file to a self-describing `WAV'
877 file.
878 play -c 1 music.wav
879 interprets the file data as belonging to a single channel
880 regardless of what is indicated in the file header. Note that
881 if the file does in fact have two channels, this will result in
882 the file playing at half speed.
883
884 For an output file, this option provides a shorthand for speci‐
885 fying that the channels effect should be invoked in order to
886 change (if necessary) the number of channels in the audio signal
887 to the number given. For example, the following two commands
888 are equivalent:
889 sox input.wav -c 1 output.wav bass -3
890 sox input.wav output.wav bass -3 channels 1
891 though the second form is more flexible as it allows the effects
892 to be ordered arbitrarily.
893
894 -e ENCODING, --encoding ENCODING
895 The audio encoding type. Sometimes needed with file-types that
896 support more than one encoding type. For example, with raw, WAV,
897 or AU (but not, for example, with MP3 or FLAC). The available
898 encoding types are as follows:
899
900 signed-integer
901 PCM data stored as signed (`two's complement') integers.
902 Commonly used with a 16 or 24 -bit encoding size. A
903 value of 0 represents minimum signal power.
904
905 unsigned-integer
906 PCM data stored as signed (`two's complement') integers.
907 Commonly used with an 8-bit encoding size. A value of 0
908 represents maximum signal power.
909
910 floating-point
911 PCM data stored as IEEE 753 single precision (32-bit) or
912 double precision (64-bit) floating-point (`real') num‐
913 bers. A value of 0 represents minimum signal power.
914
915 a-law International telephony standard for logarithmic encoding
916 to 8 bits per sample. It has a precision equivalent to
917 roughly 13-bit PCM and is sometimes encoded with reversed
918 bit-ordering (see the -X option).
919
920 u-law, mu-law
921 North American telephony standard for logarithmic encod‐
922 ing to 8 bits per sample. A.k.a. μ-law. It has a preci‐
923 sion equivalent to roughly 14-bit PCM and is sometimes
924 encoded with reversed bit-ordering (see the -X option).
925
926 oki-adpcm
927 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM; it has
928 a precision equivalent to roughly 12-bit PCM. ADPCM is a
929 form of audio compression that has a good compromise
930 between audio quality and encoding/decoding speed.
931
932 ima-adpcm
933 IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision equiva‐
934 lent to roughly 13-bit PCM.
935
936 ms-adpcm
937 Microsoft 4-bit ADPCM; it has a precision equivalent to
938 roughly 14-bit PCM.
939
940 gsm-full-rate
941 GSM is currently used for the vast majority of the
942 world's digital wireless telephone calls. It utilises
943 several audio formats with different bit-rates and asso‐
944 ciated speech quality. SoX has support for GSM's origi‐
945 nal 13kbps `Full Rate' audio format. It is usually CPU-
946 intensive to work with GSM audio.
947
948 Encoding names can be abbreviated where this would not be
949 ambiguous; e.g. `unsigned-integer' can be given as `un', but not
950 `u' (ambiguous with `u-law').
951
952 For an input file, the most common use for this option is to
953 inform SoX of the encoding of a `raw' (`headerless') audio file
954 (see the examples in -b and -c above).
955
956 For an output file, this option can be used (perhaps along with
957 -b) to set the output encoding type For example
958 sox input.cdda -e float output1.wav
959
960 sox input.cdda -b 64 -e float output2.wav
961 convert raw CD digital audio (16-bit, signed-integer) to float‐
962 ing-point `WAV' files (single & double precision respectively).
963
964 By default (i.e. if this option is not given), the output encod‐
965 ing type will (providing it is supported by the output file
966 type) be set to the input encoding type.
967
968 -s/-u/-f/-A/-U/-o/-i/-a/-g
969 Deprecated aliases for specifying the encoding types signed-
970 integer, unsigned-integer, floating-point, mu-law, a-law, oki-
971 adpcm, ima-adpcm, ms-adpcm, gsm-full-rate respectively (see -e
972 above).
973
974 --no-glob
975 Specifies that filename `globbing' (wild-card matching) should
976 not be performed by SoX on the following filename. For example,
977 if the current directory contains the two files `five-sec‐
978 onds.wav' and `five*.wav', then
979 play --no-glob "five*.wav"
980 can be used to play just the single file `five*.wav'.
981
982 -r, --rate RATE[k]
983 Gives the sample rate in Hz (or kHz if appended with `k') of the
984 file.
985
986 For an input file, the most common use for this option is to
987 inform SoX of the sample rate of a `raw' (`headerless') audio
988 file (see the examples in -b and -c above). Occasionally it may
989 be useful to use this option with a `headered' file, in order to
990 override the (presumably incorrect) value in the header - note
991 that this is only supported with certain file types. For exam‐
992 ple, if audio was recorded with a sample-rate of say 48k from a
993 source that played back a little, say 1.5%, too slowly, then
994 sox -r 48720 input.wav output.wav
995 effectively corrects the speed by changing only the file header
996 (but see also the speed effect for the more usual solution to
997 this problem).
998
999 For an output file, this option provides a shorthand for speci‐
1000 fying that the rate effect should be invoked in order to change
1001 (if necessary) the sample rate of the audio signal to the given
1002 value. For example, the following two commands are equivalent:
1003 sox input.wav -r 48k output.wav bass -3
1004 sox input.wav output.wav bass -3 rate 48k
1005 though the second form is more flexible as it allows rate
1006 options to be given, and allows the effects to be ordered arbi‐
1007 trarily.
1008
1009 -t, --type FILE-TYPE
1010 Gives the type of the audio file. For both input and output
1011 files, this option is commonly used to inform SoX of the type a
1012 `headerless' audio file (e.g. raw, mp3) where the actual/desired
1013 type cannot be determined from a given filename extension. For
1014 example:
1015 another-command | sox -t mp3 - output.wav
1016
1017 sox input.wav -t raw output.bin
1018 It can also be used to override the type implied by an input
1019 filename extension, but if overriding with a type that has a
1020 header, SoX will exit with an appropriate error message if such
1021 a header is not actually present.
1022
1023 See soxformat(7) for a list of supported file types.
1024
1025 -L, --endian little
1026 -B, --endian big
1027 -x, --endian swap
1028 These options specify whether the byte-order of the audio data
1029 is, respectively, `little endian', `big endian', or the opposite
1030 to that of the system on which SoX is being used. Endianness
1031 applies only to data encoded as floating-pont, or as signed or
1032 unsigned integers of 16 or more bits. It is often necessary to
1033 specify one of these options for headerless files, and sometimes
1034 necessary for (otherwise) self-describing files. A given
1035 endian-setting option may be ignored for an input file whose
1036 header contains a specific endianness identifier, or for an out‐
1037 put file that is actually an audio device.
1038
1039 N.B. Unlike other format characteristics, the endianness (byte,
1040 nibble, & bit ordering) of the input file is not automatically
1041 used for the output file; so, for example, when the following is
1042 run on a little-endian system:
1043 sox -B audio.s16 trimmed.s16 trim 2
1044 trimmed.s16 will be created as little-endian;
1045 sox -B audio.s16 -B trimmed.s16 trim 2
1046 must be used to preserve big-endianness in the output file.
1047
1048 The -V option can be used to check the selected orderings.
1049
1050 -N, --reverse-nibbles
1051 Specifies that the nibble ordering (i.e. the 2 halves of a byte)
1052 of the samples should be reversed; sometimes useful with ADPCM-
1053 based formats.
1054
1055 N.B. See also N.B. in section on -x above.
1056
1057 -X, --reverse-bits
1058 Specifies that the bit ordering of the samples should be
1059 reversed; sometimes useful with a few (mostly headerless) for‐
1060 mats.
1061
1062 N.B. See also N.B. in section on -x above.
1063
1064 Output File Format Options
1065 These options apply only to the output file and may precede only the
1066 output filename on the command line.
1067
1068 --add-comment TEXT
1069 Append a comment in the output file header (where applicable).
1070
1071 --comment TEXT
1072 Specify the comment text to store in the output file header
1073 (where applicable).
1074
1075 SoX will provide a default comment if this option (or --com‐
1076 ment-file) is not given. To specify that no comment should be
1077 stored in the output file, use --comment "" .
1078
1079 --comment-file FILENAME
1080 Specify a file containing the comment text to store in the out‐
1081 put file header (where applicable).
1082
1083 -C, --compression FACTOR
1084 The compression factor for variably compressing output file for‐
1085 mats. If this option is not given then a default compression
1086 factor will apply. The compression factor is interpreted dif‐
1087 ferently for different compressing file formats. See the
1088 description of the file formats that use this option in soxfor‐
1089 mat(7) for more information.
1090
1092 In addition to converting, playing and recording audio files, SoX can
1093 be used to invoke a number of audio `effects'. Multiple effects may be
1094 applied by specifying them one after another at the end of the SoX com‐
1095 mand line, forming an `effects chain'. Note that applying multiple
1096 effects in real-time (i.e. when playing audio) is likely to require a
1097 high performance computer. Stopping other applications may alleviate
1098 performance issues should they occur.
1099
1100 Some of the SoX effects are primarily intended to be applied to a sin‐
1101 gle instrument or `voice'. To facilitate this, the remix effect and
1102 the global SoX option -M can be used to isolate then recombine tracks
1103 from a multi-track recording.
1104
1105 Multiple Effect Chains
1106 A single effects chain is made up of one or more effects. Audio from
1107 the input runs through the chain until either the end of the input file
1108 is reached or an effect in the chain requests to terminate the chain.
1109
1110 SoX supports running multiple effects chains over the input audio. In
1111 this case, when one chain indicates it is done processing audio, the
1112 audio data is then sent through the next effects chain. This continues
1113 until either no more effects chains exist or the input has reached the
1114 end of the file.
1115
1116 An effects chain is terminated by placing a : (colon) after an effect.
1117 Any following effects are a part of a new effects chain.
1118
1119 It is important to place the effect that will stop the chain as the
1120 first effect in the chain. This is because any samples that are
1121 buffered by effects to the left of the terminating effect will be dis‐
1122 carded. The amount of samples discarded is related to the --buffer
1123 option and it should be kept small, relative to the sample rate, if the
1124 terminating effect cannot be first. Further information on stopping
1125 effects can be found in the Stopping SoX section.
1126
1127 There are a few pseudo-effects that aid using multiple effects chains.
1128 These include newfile which will start writing to a new output file
1129 before moving to the next effects chain and restart which will move
1130 back to the first effects chain. Pseudo-effects must be specified as
1131 the first effect in a chain and as the only effect in a chain (they
1132 must have a : before and after they are specified).
1133
1134 The following is an example of multiple effects chains. It will split
1135 the input file into multiple files of 30 seconds in length. Each out‐
1136 put filename will have unique number in its name as documented in the
1137 Output Files section.
1138 sox infile.wav output.wav trim 0 30 : newfile : restart
1139
1140 Common Notation And Parameters
1141 In the descriptions that follow, brackets [ ] are used to denote param‐
1142 eters that are optional, braces { } to denote those that are both
1143 optional and repeatable, and angle brackets < > to denote those that
1144 are repeatable but not optional. Where applicable, default values for
1145 optional parameters are shown in parenthesis ( ).
1146
1147 The following parameters are used with, and have the same meaning for,
1148 several effects:
1149
1150 centre[k]
1151 See frequency.
1152
1153 frequency[k]
1154 A frequency in Hz, or, if appended with `k', kHz.
1155
1156 gain A power gain in dB. Zero gives no gain; less than zero gives an
1157 attenuation.
1158
1159 width[h|k|o|q]
1160 Used to specify the band-width of a filter. A number of differ‐
1161 ent methods to specify the width are available (though not all
1162 for every effect). One of the characters shown may be appended
1163 to select the desired method as follows:
1164
1165 Method Notes
1166 h Hz
1167 k kHz
1168 o Octaves
1169 q Q-factor See [2]
1170
1171 For each effect that uses this parameter, the default method
1172 (i.e. if no character is appended) is the one that it listed
1173 first in the first line of the effect's description.
1174
1175 To see if SoX has support for an optional effect, enter sox -h and look
1176 for its name under the list: `EFFECTS'.
1177
1178 Supported Effects
1179 Note: a categorised list of the effects can be found in the accompany‐
1180 ing `README' file.
1181
1182 allpass frequency[k] width[h|k|o|q]
1183 Apply a two-pole all-pass filter with central frequency (in Hz)
1184 frequency, and filter-width width. An all-pass filter changes
1185 the audio's frequency to phase relationship without changing its
1186 frequency to amplitude relationship. The filter is described in
1187 detail in [1].
1188
1189 This effect supports the --plot global option.
1190
1191 band [-n] center[k] [width[h|k|o|q]]
1192 Apply a band-pass filter. The frequency response drops loga‐
1193 rithmically around the center frequency. The width parameter
1194 gives the slope of the drop. The frequencies at center + width
1195 and center - width will be half of their original amplitudes.
1196 band defaults to a mode oriented to pitched audio, i.e. voice,
1197 singing, or instrumental music. The -n (for noise) option uses
1198 the alternate mode for un-pitched audio (e.g. percussion).
1199 Warning: -n introduces a power-gain of about 11dB in the filter,
1200 so beware of output clipping. band introduces noise in the
1201 shape of the filter, i.e. peaking at the center frequency and
1202 settling around it.
1203
1204 This effect supports the --plot global option.
1205
1206 See also sinc for a bandpass filter with steeper shoulders.
1207
1208 bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
1209 Apply a two-pole Butterworth band-pass or band-reject filter
1210 with central frequency frequency, and (3dB-point) band-width
1211 width. The -c option applies only to bandpass and selects a
1212 constant skirt gain (peak gain = Q) instead of the default: con‐
1213 stant 0dB peak gain. The filters roll off at 6dB per octave
1214 (20dB per decade) and are described in detail in [1].
1215
1216 These effects support the --plot global option.
1217
1218 See also sinc for a bandpass filter with steeper shoulders.
1219
1220 bandreject frequency[k] width[h|k|o|q]
1221 Apply a band-reject filter. See the description of the bandpass
1222 effect for details.
1223
1224 bass|treble gain [frequency[k] [width[s|h|k|o|q]]]
1225 Boost or cut the bass (lower) or treble (upper) frequencies of
1226 the audio using a two-pole shelving filter with a response simi‐
1227 lar to that of a standard hi-fi's tone-controls. This is also
1228 known as shelving equalisation (EQ).
1229
1230 gain gives the gain at 0 Hz (for bass), or whichever is the
1231 lower of ∼22 kHz and the Nyquist frequency (for treble). Its
1232 useful range is about -20 (for a large cut) to +20 (for a large
1233 boost). Beware of Clipping when using a positive gain.
1234
1235 If desired, the filter can be fine-tuned using the following
1236 optional parameters:
1237
1238 frequency sets the filter's central frequency and so can be used
1239 to extend or reduce the frequency range to be boosted or cut.
1240 The default value is 100 Hz (for bass) or 3 kHz (for treble).
1241
1242 width determines how steep is the filter's shelf transition. In
1243 addition to the common width specification methods described
1244 above, `slope' (the default, or if appended with `s') may be
1245 used. The useful range of `slope' is about 0.3, for a gentle
1246 slope, to 1 (the maximum), for a steep slope; the default value
1247 is 0.5.
1248
1249 The filters are described in detail in [1].
1250
1251 These effects support the --plot global option.
1252
1253 See also equalizer for a peaking equalisation effect.
1254
1255 bend [-f [22mframe-rate(25)] [-o [22mover-sample(16)] { delay,cents,duration }
1256 Changes pitch by specified amounts at specified times. Each
1257 given triple: delay,cents,duration specifies one bend. delay is
1258 the amount of time after the start of the audio stream, or the
1259 end of the previous bend, at which to start bending the pitch;
1260 cents is the number of cents (100 cents = 1 semitone) by which
1261 to bend the pitch, and duration the length of time over which
1262 the pitch will be bent.
1263
1264 The pitch-bending algorithm utilises the Discrete Fourier Trans‐
1265 form (DFT) at a particular frame rate and over-sampling rate.
1266 The -f and -o parameters may be used to adjust these parameters
1267 and thus control the smoothness of the changes in pitch.
1268
1269 For example, an initial tone is generated, then bent three
1270 times, yielding four different notes in total:
1271 play -n synth 2.5 sin 667 gain 1 \
1272 bend .35,180,.25 .15,740,.53 0,-520,.3
1273 Note that the clipping that is produced in this example is
1274 deliberate; to remove it, use gain -5 in place of gain 1.
1275
1276 biquad b0 b1 b2 a0 a1 a2
1277 Apply a biquad IIR filter with the given coefficients. Where b*
1278 and a* are the numerator and denominator coefficients respec‐
1279 tively.
1280
1281 See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0
1282 = 1).
1283
1284 channels CHANNELS
1285 Invoke a simple algorithm to change the number of channels in
1286 the audio signal to the given number CHANNELS: mixing if
1287 decreasing the number of channels or duplicating if increasing
1288 the number of channels.
1289
1290 The channels effect is invoked automatically if SoX's -c option
1291 specifies a number of channels that is different to that of the
1292 input file(s). Alternatively, if this effect is given explic‐
1293 itly, then SoX's -c option need not be given. For example, the
1294 following two commands are equivalent:
1295 sox input.wav -c 1 output.wav bass -3
1296 sox input.wav output.wav bass -3 channels 1
1297 though the second form is more flexible as it allows the effects
1298 to be ordered arbitrarily.
1299
1300 See also remix for an effect that allows channels to be
1301 mixed/selected arbitrarily.
1302
1303 chorus gain-in gain-out <delay decay speed depth -s|-t>
1304 Add a chorus effect to the audio. This can make a single vocal
1305 sound like a chorus, but can also be applied to instrumentation.
1306
1307 Chorus resembles an echo effect with a short delay, but whereas
1308 with echo the delay is constant, with chorus, it is varied using
1309 sinusoidal or triangular modulation. The modulation depth
1310 defines the range the modulated delay is played before or after
1311 the delay. Hence the delayed sound will sound slower or faster,
1312 that is the delayed sound tuned around the original one, like in
1313 a chorus where some vocals are slightly off key. See [3] for
1314 more discussion of the chorus effect.
1315
1316 Each four-tuple parameter delay/decay/speed/depth gives the
1317 delay in milliseconds and the decay (relative to gain-in) with a
1318 modulation speed in Hz using depth in milliseconds. The modula‐
1319 tion is either sinusoidal (-s) or triangular (-t). Gain-out is
1320 the volume of the output.
1321
1322 A typical delay is around 40ms to 60ms; the modulation speed is
1323 best near 0.25Hz and the modulation depth around 2ms. For exam‐
1324 ple, a single delay:
1325 play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 -t
1326 Two delays of the original samples:
1327 play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 -t \
1328 60 0.32 0.4 1.3 -s
1329 A fuller sounding chorus (with three additional delays):
1330 play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
1331 60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s
1332
1333 compand attack1,decay1{,attack2,decay2}
1334 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1335 [gain [initial-volume-dB [delay]]]
1336
1337 Compand (compress or expand) the dynamic range of the audio.
1338
1339 The attack and decay parameters (in seconds) determine the time
1340 over which the instantaneous level of the input signal is aver‐
1341 aged to determine its volume; attacks refer to increases in vol‐
1342 ume and decays refer to decreases. For most situations, the
1343 attack time (response to the music getting louder) should be
1344 shorter than the decay time because the human ear is more sensi‐
1345 tive to sudden loud music than sudden soft music. Where more
1346 than one pair of attack/decay parameters are specified, each
1347 input channel is companded separately and the number of pairs
1348 must agree with the number of input channels. Typical values
1349 are 0.3,0.8 seconds.
1350
1351 The second parameter is a list of points on the compander's
1352 transfer function specified in dB relative to the maximum possi‐
1353 ble signal amplitude. The input values must be in a strictly
1354 increasing order but the transfer function does not have to be
1355 monotonically rising. If omitted, the value of out-dB1 defaults
1356 to the same value as in-dB1; levels below in-dB1 are not com‐
1357 panded (but may have gain applied to them). The point 0,0 is
1358 assumed but may be overridden (by 0,out-dBn). If the list is
1359 preceded by a soft-knee-dB value, then the points at where adja‐
1360 cent line segments on the transfer function meet will be rounded
1361 by the amount given. Typical values for the transfer function
1362 are 6:-70,-60,-20.
1363
1364 The third (optional) parameter is an additional gain in dB to be
1365 applied at all points on the transfer function and allows easy
1366 adjustment of the overall gain.
1367
1368 The fourth (optional) parameter is an initial level to be
1369 assumed for each channel when companding starts. This permits
1370 the user to supply a nominal level initially, so that, for exam‐
1371 ple, a very large gain is not applied to initial signal levels
1372 before the companding action has begun to operate: it is quite
1373 probable that in such an event, the output would be severely
1374 clipped while the compander gain properly adjusts itself. A
1375 typical value (for audio which is initially quiet) is -90 dB.
1376
1377 The fifth (optional) parameter is a delay in seconds. The input
1378 signal is analysed immediately to control the compander, but it
1379 is delayed before being fed to the volume adjuster. Specifying
1380 a delay approximately equal to the attack/decay times allows the
1381 compander to effectively operate in a `predictive' rather than a
1382 reactive mode. A typical value is 0.2 seconds.
1383
1384 * * *
1385
1386 The following example might be used to make a piece of music
1387 with both quiet and loud passages suitable for listening to in a
1388 noisy environment such as a moving vehicle:
1389 sox asz.wav asz-car.wav compand 0.3,1 6:-70,-60,-20 -5 -90 0.2
1390 The transfer function (`6:-70,...') says that very soft sounds
1391 (below -70dB) will remain unchanged. This will stop the compan‐
1392 der from boosting the volume on `silent' passages such as
1393 between movements. However, sounds in the range -60dB to 0dB
1394 (maximum volume) will be boosted so that the 60dB dynamic range
1395 of the original music will be compressed 3-to-1 into a 20dB
1396 range, which is wide enough to enjoy the music but narrow enough
1397 to get around the road noise. The `6:' selects 6dB soft-knee
1398 companding. The -5 (dB) output gain is needed to avoid clipping
1399 (the number is inexact, and was derived by experimentation).
1400 The -90 (dB) for the initial volume will work fine for a clip
1401 that starts with near silence, and the delay of 0.2 (seconds)
1402 has the effect of causing the compander to react a bit more
1403 quickly to sudden volume changes.
1404
1405 In the next example, compand is being used as a noise-gate for
1406 when the noise is at a lower level than the signal:
1407 play infile compand .1,.2 -inf,-50.1,-inf,-50,-50 0 -90 .1
1408 Here is another noise-gate, this time for when the noise is at a
1409 higher level than the signal (making it, in some ways, similar
1410 to squelch):
1411 play infile compand .1,.1 -45.1,-45,-inf,0,-inf 45 -90 .1
1412 This effect supports the --plot global option (for the transfer
1413 function).
1414
1415 See also mcompand for a multiple-band companding effect.
1416
1417 contrast [enhancement-amount(75)]
1418 Comparable with compression, this effect modifies an audio sig‐
1419 nal to make it sound louder. enhancement-amount controls the
1420 amount of the enhancement and is a number in the range 0-100.
1421 Note that enhancement-amount = 0 still gives a significant con‐
1422 trast enhancement.
1423
1424 See also the compand and mcompand effects.
1425
1426 dcshift shift [limitergain]
1427 Apply a DC shift to the audio. This can be useful to remove a
1428 DC offset (caused perhaps by a hardware problem in the recording
1429 chain) from the audio. The effect of a DC offset is reduced
1430 headroom and hence volume. The stat or stats effect can be used
1431 to determine if a signal has a DC offset.
1432
1433 The given dcshift value is a floating point number in the range
1434 of ±2 that indicates the amount to shift the audio (which is in
1435 the range of ±1).
1436
1437 An optional limitergain can be specified as well. It should
1438 have a value much less than 1 (e.g. 0.05 or 0.02) and is used
1439 only on peaks to prevent clipping.
1440
1441 * * *
1442
1443 An alternative approach to removing a DC offset (albeit with a
1444 short delay) is to use the highpass filter effect at a frequency
1445 of say 10Hz, as illustrated in the following example:
1446 sox -n dc.wav synth 5 sin %0 50
1447 sox dc.wav fixed.wav highpass 10
1448
1449 deemph Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation
1450 shelving filter).
1451
1452 Pre-emphasis was applied in the mastering of some CDs issued in
1453 the early 1980s. These included many classical music albums, as
1454 well as now sought-after issues of albums by The Beatles, Pink
1455 Floyd and others. Pre-emphasis should be removed at playback
1456 time by a de-emphasis filter in the playback device. However,
1457 not all modern CD players have this filter, and very few PC CD
1458 drives have it; playing pre-emphasised audio without the correct
1459 de-emphasis filter results in audio that sounds harsh and is far
1460 from what its creators intended.
1461
1462 With the deemph effect, it is possible to apply the necessary
1463 de-emphasis to audio that has been extracted from a pre-empha‐
1464 sised CD, and then either burn the de-emphasised audio to a new
1465 CD (which will then play correctly on any CD player), or simply
1466 play the correctly de-emphasised audio files on the PC. For
1467 example:
1468 sox track1.wav track1-deemph.wav deemph
1469 and then burn track1-deemph.wav to CD, or
1470 play track1-deemph.wav
1471 or simply
1472 play track1.wav deemph
1473 The de-emphasis filter is implemented as a biquad; its maximum
1474 deviation from the ideal response is only 0.06dB (up to 20kHz).
1475
1476 This effect supports the --plot global option.
1477
1478 See also the bass and treble shelving equalisation effects.
1479
1480 delay {length}
1481 Delay one or more audio channels. length can specify a time or,
1482 if appended with an `s', a number of samples. Do not specify
1483 both time and samples delays in the same command. For example,
1484 delay 1.5 0 0.5 delays the first channel by 1.5 seconds, the
1485 third channel by 0.5 seconds, and leaves the second channel (and
1486 any other channels that may be present) un-delayed. The follow‐
1487 ing (one long) command plays a chime sound:
1488 play -n synth -j 3 sin %3 sin %-2 sin %-5 sin %-9 \
1489 sin %-14 sin %-21 fade h .01 2 1.5 delay \
1490 1.3 1 .76 .54 .27 remix - fade h 0 2.7 2.5 norm -1
1491 and this plays a guitar chord:
1492 play -n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \
1493 delay 0 .05 .1 .15 .2 .25 remix - fade 0 4 .1 norm -1
1494
1495 dither [-a] [-S|-s|-f filter]
1496 Apply dithering to the audio. Dithering deliberately adds a
1497 small amount of noise to the signal in order to mask audible
1498 quantization effects that can occur if the output sample size is
1499 less than 24 bits. With no options, this effect will add trian‐
1500 gular (TPDF) white noise. Noise-shaping (only for certain sam‐
1501 ple rates) can be selected with -s. With the -f option, it is
1502 possible to select a particular noise-shaping filter from the
1503 following list: lipshitz, f-weighted, modified-e-weighted,
1504 improved-e-weighted, gesemann, shibata, low-shibata, high-shi‐
1505 bata. Note that most filter types are available only with
1506 44100Hz sample rate. The filter types are distinguished by the
1507 following properties: audibility of noise, level of (inaudible,
1508 but in some circumstances, otherwise problematic) shaped high
1509 frequency noise, and processing speed.
1510 See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of
1511 the different noise-shaping curves.
1512
1513 The -S option selects a slightly `sloped' TPDF, biased towards
1514 higher frequencies. It can be used at any sampling rate but
1515 below ≈22k, plain TPDF is probably better, and above ≈ 37k,
1516 noise-shaped is probably better.
1517
1518 The -a option enables a mode where dithering (and noise-shaping
1519 if applicable) are automatically enabled only when needed. The
1520 most likely use for this is when applying fade in or out to an
1521 already dithered file, so that the redithering applies only to
1522 the faded portions. However, auto dithering is not fool-proof,
1523 so the fades should be carefully checked for any noise modula‐
1524 tion; if this occurs, then either re-dither the whole file, or
1525 use trim, fade, and concatencate.
1526
1527 If the SoX global option -R option is not given, then the
1528 pseudo-random number generator used to generate the white noise
1529 will be `reseeded', i.e. the generated noise will be different
1530 between invocations.
1531
1532 This effect should not be followed by any other effect that
1533 affects the audio.
1534
1535 See also the `Dither' section above.
1536
1537 earwax Makes audio easier to listen to on headphones. Adds `cues' to
1538 44.1kHz stereo (i.e. audio CD format) audio so that when lis‐
1539 tened to on headphones the stereo image is moved from inside
1540 your head (standard for headphones) to outside and in front of
1541 the listener (standard for speakers). See http://www.geoci‐
1542 ties.com/beinges for a full explanation.
1543
1544 echo gain-in gain-out <delay decay>
1545 Add echoing to the audio. Echoes are reflected sound and can
1546 occur naturally amongst mountains (and sometimes large build‐
1547 ings) when talking or shouting; digital echo effects emulate
1548 this behaviour and are often used to help fill out the sound of
1549 a single instrument or vocal. The time difference between the
1550 original signal and the reflection is the `delay' (time), and
1551 the loudness of the reflected signal is the `decay'. Multiple
1552 echoes can have different delays and decays.
1553
1554 Each given delay decay pair gives the delay in milliseconds and
1555 the decay (relative to gain-in) of that echo. Gain-out is the
1556 volume of the output. For example: This will make it sound as
1557 if there are twice as many instruments as are actually playing:
1558 play lead.aiff echo 0.8 0.88 60 0.4
1559 If the delay is very short, then it sound like a (metallic) ro‐
1560 bot playing music:
1561 play lead.aiff echo 0.8 0.88 6 0.4
1562 A longer delay will sound like an open air concert in the moun‐
1563 tains:
1564 play lead.aiff echo 0.8 0.9 1000 0.3
1565 One mountain more, and:
1566 play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
1567
1568 echos gain-in gain-out <delay decay>
1569 Add a sequence of echoes to the audio. Each delay decay pair
1570 gives the delay in milliseconds and the decay (relative to gain-
1571 in) of that echo. Gain-out is the volume of the output.
1572
1573 Like the echo effect, echos stand for `ECHO in Sequel', that is
1574 the first echos takes the input, the second the input and the
1575 first echos, the third the input and the first and the second
1576 echos, ... and so on. Care should be taken using many echos; a
1577 single echos has the same effect as a single echo.
1578
1579 The sample will be bounced twice in symmetric echos:
1580 play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
1581 The sample will be bounced twice in asymmetric echos:
1582 play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
1583 The sample will sound as if played in a garage:
1584 play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
1585
1586 equalizer frequency[k] width[q|o|h|k] gain
1587 Apply a two-pole peaking equalisation (EQ) filter. With this
1588 filter, the signal-level at and around a selected frequency can
1589 be increased or decreased, whilst (unlike band-pass and band-
1590 reject filters) that at all other frequencies is unchanged.
1591
1592 frequency gives the filter's central frequency in Hz, width, the
1593 band-width, and gain the required gain or attenuation in dB.
1594 Beware of Clipping when using a positive gain.
1595
1596 In order to produce complex equalisation curves, this effect can
1597 be given several times, each with a different central frequency.
1598
1599 The filter is described in detail in [1].
1600
1601 This effect supports the --plot global option.
1602
1603 See also bass and treble for shelving equalisation effects.
1604
1605 fade [type] fade-in-length [stop-time [fade-out-length]]
1606 Apply a fade effect to the beginning, end, or both of the audio.
1607
1608 An optional type can be specified to select the shape of the
1609 fade curve: q for quarter of a sine wave, h for half a sine
1610 wave, t for linear (`triangular') slope, l for logarithmic, and
1611 p for inverted parabola. The default is logarithmic.
1612
1613 A fade-in starts from the first sample and ramps the signal
1614 level from 0 to full volume over fade-in-length seconds. Spec‐
1615 ify 0 seconds if no fade-in is wanted.
1616
1617 For fade-outs, the audio will be truncated at stop-time and the
1618 signal level will be ramped from full volume down to 0 starting
1619 at fade-out-length seconds before the stop-time. If fade-out-
1620 length is not specified, it defaults to the same value as fade-
1621 in-length. No fade-out is performed if stop-time is not speci‐
1622 fied. If the file length can be determined from the input file
1623 header and length-changing effects are not in effect, then 0 may
1624 be specified for stop-time to indicate the usual case of a fade-
1625 out that ends at the end of the input audio stream.
1626
1627 All times can be specified in either periods of time or sample
1628 counts. To specify time periods use the format hh:mm:ss.frac
1629 format. To specify using sample counts, specify the number of
1630 samples and append the letter `s' to the sample count (for exam‐
1631 ple `8000s').
1632
1633 See also the splice effect.
1634
1635 fir [coefs-file|coefs]
1636 Use SoX's FFT convolution engine with given FIR filter coeffi‐
1637 cients. If a single argument is given then this is treated as
1638 the name of a file containing the filter coefficients (white-
1639 space separated; may contain `#' comments). If the given file‐
1640 name is `-', or if no argument is given, then the coefficients
1641 are read from the `standard input' (stdin); otherwise, coeffi‐
1642 cients may be given on the command line. Examples:
1643 sox infile outfile fir 0.0195 -0.082 0.234 0.891 -0.145 0.043
1644 sox infile outfile fir coefs.txt
1645 with coefs.txt containing
1646 # HP filter
1647 # freq=10000
1648 1.2311233052619888e-01
1649 -4.4777096106211783e-01
1650 5.1031563346705155e-01
1651 -6.6502926320995331e-02
1652 ...
1653
1654 flanger [delay depth regen width speed shape phase interp]
1655 Apply a flanging effect to the audio. See [3] for a detailed
1656 description of flanging.
1657
1658 All parameters are optional (right to left).
1659
1660 Range Default Description
1661 delay 0 - 30 0 Base delay in milliseconds.
1662 depth 0 - 10 2 Added swept delay in milliseconds.
1663 regen -95 - 95 0 Percentage regeneration (delayed
1664 signal feedback).
1665 width 0 - 100 71 Percentage of delayed signal mixed
1666 with original.
1667 speed 0.1 - 10 0.5 Sweeps per second (Hz).
1668 shape sin Swept wave shape: sine|triangle.
1669 phase 0 - 100 25 Swept wave percentage phase-shift
1670 for multi-channel (e.g. stereo)
1671 flange; 0 = 100 = same phase on
1672 each channel.
1673 interp lin Digital delay-line interpolation:
1674 linear|quadratic.
1675
1676 gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
1677 Apply amplification or attenuation to the audio signal, or, in
1678 some cases, to some of its channels. Note that use of any of
1679 -e, -B, -b, -r, or -n requires temporary file space to store the
1680 audio to be processed, so may be unsuitable for use with
1681 `streamed' audio.
1682
1683 Without other options, gain-dB is used to adjust the signal
1684 power level by the given number of dB: positive amplifies
1685 (beware of Clipping), negative attenuates. With other options,
1686 the gain-dB amplification or attenuation is (logically) applied
1687 after the processing due to those options.
1688
1689 Given the -e option, the levels of the audio channels of a
1690 multi-channel file are `equalised', i.e. gain is applied to all
1691 channels other than that with the highest peak level, such that
1692 all channels attain the same peak level (but, without also giv‐
1693 ing -n, the audio is not `normalised').
1694
1695 The -B (balance) option is similar to -e, but with -B, the RMS
1696 level is used instead of the peak level. -B might be used to
1697 correct stereo imbalance caused by an imperfect record turntable
1698 cartridge. Note that unlike -e, -B might cause some clipping.
1699
1700 -b is similar to -B but has clipping protection, i.e. if neces‐
1701 sary to prevent clipping whilst balancing, attenuation is
1702 applied to all channels. Note, however, that in conjunction
1703 with -n, -B and -b are synonymous.
1704
1705 The -r option is used in conjunction with a prior invocation of
1706 gain with the -h option - see below for details.
1707
1708 The -n option normalises the audio to 0dB FSD; it is often used
1709 in conjunction with a negative gain-dB to the effect that the
1710 audio is normalised to a given level below 0dB. For example,
1711 sox infile outfile gain -n
1712 normalises to 0dB, and
1713 sox infile outfile gain -n -3
1714 normalises to -3dB.
1715
1716 The -l option invokes a simple limiter, e.g.
1717 sox infile outfile gain -l 6
1718 will apply 6dB of gain but never clip. Note that limiting more
1719 than a few dBs more than occasionally (in a piece of audio) is
1720 not recommended as it can cause audible distortion. See the
1721 compand effect for a more capable limiter.
1722
1723 The -h option is used to apply gain to provide head-room for
1724 subsequent processing. For example, with
1725 sox infile outfile gain -h bass +6
1726 6dB of attenuation will be applied prior to the bass boosting
1727 effect thus ensuring that it will not clip. Of course, with
1728 bass, it is obvious how much headroom will be needed, but with
1729 other effects (e.g. rate, dither) it is not always as clear.
1730 Another advantage of using gain -h rather than an explicit
1731 attenuation, is that if the headroom is not used by subsequent
1732 effects, it can be reclaimed with gain -r, for example:
1733 sox infile outfile gain -h bass +6 rate 44100 gain -r
1734 The above effects chain guarantees never to clip nor amplify; it
1735 attenuates if necessary to prevent clipping, but by only as much
1736 as is needed to do so.
1737
1738 Output formatting (dithering and bit-depth reduction) also
1739 requires headroom (which cannot be `reclaimed'), e.g.
1740 sox infile outfile gain -h bass +6 rate 44100 gain -rh dither
1741 Here, the second gain invocation, reclaims as much of the head‐
1742 room as it can from the preceding effects, but retains as much
1743 headroom as is needed for subsequent processing. The SoX global
1744 option -G can be given to automatically invoke gain -h and gain
1745 -r.
1746
1747 See also the norm and vol effects.
1748
1749 highpass|lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1750 Apply a high-pass or low-pass filter with 3dB point frequency.
1751 The filter can be either single-pole (with -1), or double-pole
1752 (the default, or with -2). width applies only to double-pole
1753 filters; the default is Q = 0.707 and gives a Butterworth
1754 response. The filters roll off at 6dB per pole per octave (20dB
1755 per pole per decade). The double-pole filters are described in
1756 detail in [1].
1757
1758 These effects support the --plot global option.
1759
1760 See also sinc for filters with a steeper roll-off.
1761
1762 ladspa module [plugin] [argument...]
1763 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API)
1764 plugin. Despite the name, LADSPA is not Linux-specific, and a
1765 wide range of effects is available as LADSPA plugins, such as
1766 cmt [6] (the Computer Music Toolkit) and Steve Harris's plugin
1767 collection [7]. The first argument is the plugin module, the
1768 second the name of the plugin (a module can contain more than
1769 one plugin) and any other arguments are for the control ports of
1770 the plugin. Missing arguments are supplied by default values if
1771 possible. Only plugins with at most one audio input and one
1772 audio output port can be used. If found, the environment vari‐
1773 able LADSPA_PATH will be used as search path for plugins.
1774
1775 loudness [gain [reference]]
1776 Loudness control - similar to the gain effect, but provides
1777 equalisation for the human auditory system. See
1778 http://en.wikipedia.org/wiki/Loudness for a detailed description
1779 of loudness. The gain is adjusted by the given gain parameter
1780 (usually negative) and the signal equalised according to ISO 226
1781 w.r.t. a reference level of 65dB, though an alternative refer‐
1782 ence level may be given if the original audio has been equalised
1783 for some other optimal level. A default gain of -10dB is used
1784 if a gain value is not given.
1785
1786 See also the gain effect.
1787
1788 lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
1789 Apply a low-pass filter. See the description of the highpass
1790 effect for details.
1791
1792 mcompand "attack1,decay1{,attack2,decay2}
1793 [soft-knee-dB:]in-dB1[,out-dB1]{,in-dB2,out-dB2}
1794 [gain [initial-volume-dB [delay]]]" {crossover-freq[k]
1795 "attack1,..."}
1796
1797 The multi-band compander is similar to the single-band compander
1798 but the audio is first divided into bands using Linkwitz-Riley
1799 cross-over filters and a separately specifiable compander run on
1800 each band. See the compand effect for the definition of its
1801 parameters. Compand parameters are specified between double
1802 quotes and the crossover frequency for that band is given by
1803 crossover-freq; these can be repeated to create multiple bands.
1804
1805 For example, the following (one long) command shows how multi-
1806 band companding is typically used in FM radio:
1807 play track1.wav gain -3 sinc 8000- 29 100 mcompand \
1808 "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
1809 "0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
1810 "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
1811 "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
1812 "0,0.025 -38,-31,-28,-28,-0,-25" \
1813 gain 15 highpass 22 highpass 22 sinc -n 255 -b 16 -17500 \
1814 gain 9 lowpass -1 17801
1815 The audio file is played with a simulated FM radio sound (or
1816 broadcast signal condition if the lowpass filter at the end is
1817 skipped). Note that the pipeline is set up with US-style 75us
1818 pre-emphasis.
1819
1820 See also compand for a single-band companding effect.
1821
1822 mixer [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
1823 Reduce the number of audio channels by mixing or selecting chan‐
1824 nels, or increase the number of channels by duplicating chan‐
1825 nels. Note: this effect operates on the audio channels within
1826 the SoX effects processing chain; it should not be confused with
1827 the -m global option (where multiple files are mix-combined
1828 before entering the effects chain).
1829
1830 When reducing the number of channels it is possible to use the
1831 -l, -r, -f, -b, -1, -2, -3, -4, options to select only the left,
1832 right, front, back channel(s) or specific channel for the output
1833 instead of averaging the channels. The -l, and -r options will
1834 do averaging in quad-channel files so select the exact channel
1835 to prevent this.
1836
1837 The mixer effect can also be invoked with up to 16 numbers, sep‐
1838 arated by commas, which specify the proportion (0 = 0% and 1 =
1839 100%) of each input channel that is to be mixed into each output
1840 channel. In two-channel mode, 4 numbers are given: l → l, l →
1841 r, r → l, and r → r, respectively. In four-channel mode, the
1842 first 4 numbers give the proportions for the left-front output
1843 channel, as follows: lf → lf, rf → lf, lb → lf, and rb → rf.
1844 The next 4 give the right-front output in the same order, then
1845 left-back and right-back.
1846
1847 It is also possible to use the 16 numbers to expand or reduce
1848 the channel count; just specify 0 for unused channels.
1849
1850 Finally, certain reduced combination of numbers can be specified
1851 for certain input/output channel combinations.
1852
1853 In Ch Out Ch Num Mappings
1854 2 1 2 l → l, r → l
1855 2 2 1 adjust balance
1856 4 1 4 lf → l, rf → l, lb → l, rb → l
1857 4 2 2 lf → l&rf → r, lb → l&rb → r
1858 4 4 1 adjust balance
1859 4 4 2 front balance, back balance
1860
1861 See also remix for a mixing effect that handles any number of
1862 channels.
1863
1864 noiseprof [profile-file]
1865 Calculate a profile of the audio for use in noise reduction.
1866 See the description of the noisered effect for details.
1867
1868 noisered [profile-file [amount]]
1869 Reduce noise in the audio signal by profiling and filtering.
1870 This effect is moderately effective at removing consistent back‐
1871 ground noise such as hiss or hum. To use it, first run SoX with
1872 the noiseprof effect on a section of audio that ideally would
1873 contain silence but in fact contains noise - such sections are
1874 typically found at the beginning or the end of a recording.
1875 noiseprof will write out a noise profile to profile-file, or to
1876 stdout if no profile-file or if `-' is given. E.g.
1877 sox speech.wav -n trim 0 1.5 noiseprof speech.noise-profile
1878 To actually remove the noise, run SoX again, this time with the
1879 noisered effect; noisered will reduce noise according to a noise
1880 profile (which was generated by noiseprof), from profile-file,
1881 or from stdin if no profile-file or if `-' is given. E.g.
1882 sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
1883 How much noise should be removed is specified by amount-a number
1884 between 0 and 1 with a default of 0.5. Higher numbers will
1885 remove more noise but present a greater likelihood of removing
1886 wanted components of the audio signal. Before replacing an
1887 original recording with a noise-reduced version, experiment with
1888 different amount values to find the optimal one for your audio;
1889 use headphones to check that you are happy with the results,
1890 paying particular attention to quieter sections of the audio.
1891
1892 On most systems, the two stages - profiling and reduction - can
1893 be combined using a pipe, e.g.
1894 sox noisy.wav -n trim 0 1 noiseprof | play noisy.wav noisered
1895
1896 norm [dB-level]
1897 Normalise the audio. norm is just an alias for gain -n; see the
1898 gain effect for details.
1899
1900 Note that norm's -i and -b options are deprecated (having been
1901 superseded by gain -en and gain -B respectively) and will be
1902 removed in a future release.
1903
1904 oops Out Of Phase Stereo effect. Mixes stereo to twin-mono where
1905 each mono channel contains the difference between the left and
1906 right stereo channels. This is sometimes known as the `karaoke'
1907 effect as it often has the effect of removing most or all of the
1908 vocals from a recording.
1909
1910 overdrive [gain(20) [colour(20)]]
1911 Non linear distortion. The colour parameter controls the amount
1912 of even harmonic content in the over-driven output.
1913
1914 pad { length[@position] }
1915 Pad the audio with silence, at the beginning, the end, or any
1916 specified points through the audio. Both length and position
1917 can specify a time or, if appended with an `s', a number of sam‐
1918 ples. length is the amount of silence to insert and position
1919 the position in the input audio stream at which to insert it.
1920 Any number of lengths and positions may be specified, provided
1921 that a specified position is not less that the previous one.
1922 position is optional for the first and last lengths specified
1923 and if omitted correspond to the beginning and the end of the
1924 audio respectively. For example, pad 1.5 1.5 adds 1.5 seconds
1925 of silence padding at each end of the audio, whilst pad
1926 4000s@3:00 inserts 4000 samples of silence 3 minutes into the
1927 audio. If silence is wanted only at the end of the audio, spec‐
1928 ify either the end position or specify a zero-length pad at the
1929 start.
1930
1931 See also delay for an effect that can add silence at the begin‐
1932 ning of the audio on a channel-by-channel basis.
1933
1934 phaser gain-in gain-out delay decay speed [-s|-t]
1935 Add a phasing effect to the audio. See [3] for a detailed
1936 description of phasing.
1937
1938 delay/decay/speed gives the delay in milliseconds and the decay
1939 (relative to gain-in) with a modulation speed in Hz. The modu‐
1940 lation is either sinusoidal (-s) - preferable for multiple
1941 instruments, or triangular (-t) - gives single instruments a
1942 sharper phasing effect. The decay should be less than 0.5 to
1943 avoid feedback, and usually no less than 0.1. Gain-out is the
1944 volume of the output.
1945
1946 For example:
1947 play snare.flac phaser 0.8 0.74 3 0.4 0.5 -t
1948 Gentler:
1949 play snare.flac phaser 0.9 0.85 4 0.23 1.3 -s
1950 A popular sound:
1951 play snare.flac phaser 0.89 0.85 1 0.24 2 -t
1952 More severe:
1953 play snare.flac phaser 0.6 0.66 3 0.6 2 -t
1954
1955 pitch [-q] shift [segment [search [overlap]]]
1956 Change the audio pitch (but not tempo).
1957
1958 shift gives the pitch shift as positive or negative `cents'
1959 (i.e. 100ths of a semitone). See the tempo effect for a
1960 description of the other parameters.
1961
1962 See also the speed and tempo effects.
1963
1964 rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
1965 Change the audio sampling rate (i.e. resample the audio) to any
1966 given RATE (even non-integer if this is supported by the output
1967 file format) using a quality level defined as follows:
1968
1969 Quality Band- Rej dB Typical Use
1970 width
1971 -q quick n/a ≈30 @ playback on
1972 Fs/4 ancient hardware
1973 -l low 80% 100 playback on old
1974 hardware
1975 -m medium 95% 100 audio playback
1976 -h high 95% 125 16-bit mastering
1977 (use with dither)
1978 -v very high 95% 175 24-bit mastering
1979
1980 where Band-width is the percentage of the audio frequency band
1981 that is preserved and Rej dB is the level of noise rejection.
1982 Increasing levels of resampling quality come at the expense of
1983 increasing amounts of time to process the audio. If no quality
1984 option is given, the quality level used is `high'.
1985
1986 The `quick' algorithm uses cubic interpolation; all others use
1987 band-limited interpolation. By default, all algorithms have a
1988 `linear' phase response; for `medium', `high' and `very high',
1989 the phase response is configurable (see below).
1990
1991 The rate effect is invoked automatically if SoX's -r option
1992 specifies a rate that is different to that of the input file(s).
1993 Alternatively, if this effect is given explicitly, then SoX's -r
1994 option need not be given. For example, the following two com‐
1995 mands are equivalent:
1996 sox input.wav -r 48k output.wav bass -3
1997 sox input.wav output.wav bass -3 rate 48k
1998 though the second command is more flexible as it allows rate
1999 options to be given, and allows the effects to be ordered arbi‐
2000 trarily.
2001
2002 * * *
2003
2004 Warning: technically detailed discussion follows.
2005
2006 The simple quality selection described above provides settings
2007 that satisfy the needs of the vast majority of resampling tasks.
2008 Occasionally, however, it may be desirable to fine-tune the
2009 resampler's filter response; this can be achieved using over‐
2010 ride options, as detailed in the following table:
2011
2012 -M/-I/-L Phase response = minimum/intermediate/linear
2013 -s Steep filter (band-width = 99%)
2014 -a Allow aliasing/imaging above the pass-band
2015 -b 74-99.7 Any band-width %
2016 -p 0-100 Any phase response (0 = minimum, 25 = intermediate,
2017 50 = linear, 100 = maximum)
2018
2019 N.B. Override options can not be used with the `quick' or `low'
2020 quality algorithms.
2021
2022 All resamplers use filters that can sometimes create `echo'
2023 (a.k.a. `ringing') artefacts with transient signals such as
2024 those that occur with `finger snaps' or other highly percussive
2025 sounds. Such artefacts are much more noticeable to the human
2026 ear if they occur before the transient (`pre-echo') than if they
2027 occur after it (`post-echo'). Note that frequency of any such
2028 artefacts is related to the smaller of the original and new sam‐
2029 pling rates but that if this is at least 44.1kHz, then the arte‐
2030 facts will lie outside the range of human hearing.
2031
2032 A phase response setting may be used to control the distribution
2033 of any transient echo between `pre' and `post': with minimum
2034 phase, there is no pre-echo but the longest post-echo; with lin‐
2035 ear phase, pre and post echo are in equal amounts (in signal
2036 terms, but not audibility terms); the intermediate phase setting
2037 attempts to find the best compromise by selecting a small length
2038 (and level) of pre-echo and a medium lengthed post-echo.
2039
2040 Minimum, intermediate, or linear phase response is selected
2041 using the -M, -I, or -L option; a custom phase response can be
2042 created with the -p option. Note that phase responses between
2043 `linear' and `maximum' (greater than 50) are rarely useful.
2044
2045 A resampler's band-width setting determines how much of the fre‐
2046 quency content of the original signal (w.r.t. the original sam‐
2047 ple rate when up-sampling, or the new sample rate when down-sam‐
2048 pling) is preserved during conversion. The term `pass-band' is
2049 used to refer to all frequencies up to the band-width point
2050 (e.g. for 44.1kHz sampling rate, and a resampling band-width of
2051 95%, the pass-band represents frequencies from 0Hz (D.C.) to
2052 circa 21kHz). Increasing the resampler's band-width results in
2053 a slower conversion and can increase transient echo artefacts
2054 (and vice versa).
2055
2056 The -s `steep filter' option changes resampling band-width from
2057 the default 95% (based on the 3dB point), to 99%. The -b option
2058 allows the band-width to be set to any value in the range
2059 74-99.7 %, but note that band-width values greater than 99% are
2060 not recommended for normal use as they can cause excessive tran‐
2061 sient echo.
2062
2063 If the -a option is given, then aliasing/imaging above the pass-
2064 band is allowed. For example, with 44.1kHz sampling rate, and a
2065 resampling band-width of 95%, this means that frequency content
2066 above 21kHz can be distorted; however, since this is above the
2067 pass-band (i.e. above the highest frequency of interest/audi‐
2068 bility), this may not be a problem. The benefits of allowing
2069 aliasing/imaging are reduced processing time, and reduced (by
2070 almost half) transient echo artefacts. Note that if this option
2071 is given, then the minimum band-width allowable with -b
2072 increases to 85%.
2073
2074 Examples:
2075 sox input.wav -b 16 output.wav rate -s -a 44100 dither -s
2076 default (high) quality resampling; overrides: steep filter,
2077 allow aliasing; to 44.1kHz sample rate; noise-shaped dither to
2078 16-bit WAV file.
2079 sox input.wav -b 24 output.aiff rate -v -I -b 90 48k
2080 very high quality resampling; overrides: intermediate phase,
2081 band-width 90%; to 48k sample rate; store output to 24-bit AIFF
2082 file.
2083
2084 * * *
2085
2086 The pitch, speed and tempo effects all use the rate effect at
2087 their core.
2088
2089 remix [-a|-m|-p] <out-spec>
2090 out-spec = in-spec{,in-spec} | 0
2091 in-spec = [in-chan][-[in-chan2]][vol-spec]
2092 vol-spec = p|i|v[volume]
2093
2094 Select and mix input audio channels into output audio channels.
2095 Each output channel is specified, in turn, by a given out-spec:
2096 a list of contributing input channels and volume specifications.
2097
2098 Note that this effect operates on the audio channels within the
2099 SoX effects processing chain; it should not be confused with the
2100 -m global option (where multiple files are mix-combined before
2101 entering the effects chain).
2102
2103 An out-spec contains comma-separated input channel-numbers and
2104 hyphen-delimited channel-number ranges; alternatively, 0 may be
2105 given to create a silent output channel. For example,
2106 sox input.wav output.wav remix 6 7 8 0
2107 creates an output file with four channels, where channels 1, 2,
2108 and 3 are copies of channels 6, 7, and 8 in the input file, and
2109 channel 4 is silent. Whereas
2110 sox input.wav output.wav remix 1-3,7 3
2111 creates a (somewhat bizarre) stereo output file where the left
2112 channel is a mix-down of input channels 1, 2, 3, and 7, and the
2113 right channel is a copy of input channel 3.
2114
2115 Where a range of channels is specified, the channel numbers to
2116 the left and right of the hyphen are optional and default to 1
2117 and to the number of input channels respectively. Thus
2118 sox input.wav output.wav remix -
2119 performs a mix-down of all input channels to mono.
2120
2121 By default, where an output channel is mixed from multiple (n)
2122 input channels, each input channel will be scaled by a factor of
2123 ¹/n. Custom mixing volumes can be set by following a given
2124 input channel or range of input channels with a vol-spec (volume
2125 specification). This is one of the letters p, i, or v, followed
2126 by a volume number, the meaning of which depends on the given
2127 letter and is defined as follows:
2128
2129 Letter Volume number Notes
2130 p power adjust in dB 0 = no change
2131 i power adjust in dB As `p', but invert
2132 the audio
2133 v voltage multiplier 1 = no change, 0.5
2134 ≈ 6dB attenuation,
2135 2 ≈ 6dB gain, -1 =
2136 invert
2137
2138 If an out-spec includes at least one vol-spec then, by default,
2139 ¹/n scaling is not applied to any other channels in the same
2140 out-spec (though may be in other out-specs). The -a (automatic)
2141 option however, can be given to retain the automatic scaling in
2142 this case. For example,
2143 sox input.wav output.wav remix 1,2 3,4v0.8
2144 results in channel level multipliers of 0.5,0.5 1,0.8, whereas
2145 sox input.wav output.wav remix -a 1,2 3,4v0.8
2146 results in channel level multipliers of 0.5,0.5 0.5,0.8.
2147
2148 The -m (manual) option disables all automatic volume adjust‐
2149 ments, so
2150 sox input.wav output.wav remix -m 1,2 3,4v0.8
2151 results in channel level multipliers of 1,1 1,0.8.
2152
2153 The volume number is optional and omitting it corresponds to no
2154 volume change; however, the only case in which this is useful is
2155 in conjunction with i. For example, if input.wav is stereo,
2156 then
2157 sox input.wav output.wav remix 1,2i
2158 is a mono equivalent of the oops effect.
2159
2160 If the -p option is given, then any automatic ¹/n scaling is
2161 replaced by ¹/√n (`power') scaling; this gives a louder mix but
2162 one that might occasionally clip.
2163
2164 * * *
2165
2166 One use of the remix effect is to split an audio file into a set
2167 of files, each containing one of the constituent channels (in
2168 order to perform subsequent processing on individual audio chan‐
2169 nels). Where more than a few channels are involved, a script
2170 such as the following (Bourne shell script) is useful:
2171 #!/bin/sh
2172 chans=`soxi -c "$1"`
2173 while [ $chans -ge 1 ]; do
2174 chans0=`printf %02i $chans` # 2 digits hence up to 99 chans
2175 out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
2176 sox "$1" "$out" remix $chans
2177 chans=`expr $chans - 1`
2178 done
2179 If a file input.wav containing six audio channels were given,
2180 the script would produce six output files: input-01.wav,
2181 input-02.wav, ..., input-06.wav.
2182
2183 See also mixer and swap for similar effects.
2184
2185 repeat count
2186 Repeat the entire audio count times. Requires temporary file
2187 space to store the audio to be repeated. Note that repeating
2188 once yields two copies: the original audio and the repeated
2189 audio.
2190
2191 reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
2192 [room-scale (100%) [stereo-depth (100%)
2193 [pre-delay (0ms) [wet-gain (0dB)]]]]]]
2194
2195 Add reverberation to the audio using the `freeverb' algorithm.
2196 A reverberation effect is sometimes desirable for concert halls
2197 that are too small or contain so many people that the hall's
2198 natural reverberance is diminished. Applying a small amount of
2199 stereo reverb to a (dry) mono signal will usually make it sound
2200 more natural. See [3] for a detailed description of reverbera‐
2201 tion.
2202
2203 Note that this effect increases both the volume and the length
2204 of the audio, so to prevent clipping in these domains, a typical
2205 invocation might be:
2206 play dry.wav gain -3 pad 0 3 reverb
2207 The -w option can be given to select only the `wet' signal, thus
2208 allowing it to be processed further, independently of the `dry'
2209 signal. E.g.
2210 play -m voice.wav "|sox voice.wav -p reverse reverb -w reverse"
2211 for a reverse reverb effect.
2212
2213 reverse
2214 Reverse the audio completely. Requires temporary file space to
2215 store the audio to be reversed.
2216
2217 riaa Apply RIAA vinyl playback equalisation. The sampling rate must
2218 be one of: 44.1, 48, 88.2, 96 kHz.
2219
2220 This effect supports the --plot global option.
2221
2222 silence [-l] above-periods [duration threshold[d|%]
2223 [below-periods duration threshold[d|%]]
2224
2225 Removes silence from the beginning, middle, or end of the audio.
2226 `Silence' is determined by a specified threshold.
2227
2228 The above-periods value is used to indicate if audio should be
2229 trimmed at the beginning of the audio. A value of zero indicates
2230 no silence should be trimmed from the beginning. When specifying
2231 an non-zero above-periods, it trims audio up until it finds non-
2232 silence. Normally, when trimming silence from beginning of audio
2233 the above-periods will be 1 but it can be increased to higher
2234 values to trim all audio up to a specific count of non-silence
2235 periods. For example, if you had an audio file with two songs
2236 that each contained 2 seconds of silence before the song, you
2237 could specify an above-period of 2 to strip out both silence
2238 periods and the first song.
2239
2240 When above-periods is non-zero, you must also specify a duration
2241 and threshold. Duration indications the amount of time that non-
2242 silence must be detected before it stops trimming audio. By
2243 increasing the duration, burst of noise can be treated as
2244 silence and trimmed off.
2245
2246 Threshold is used to indicate what sample value you should treat
2247 as silence. For digital audio, a value of 0 may be fine but for
2248 audio recorded from analog, you may wish to increase the value
2249 to account for background noise.
2250
2251 When optionally trimming silence from the end of the audio, you
2252 specify a below-periods count. In this case, below-period means
2253 to remove all audio after silence is detected. Normally, this
2254 will be a value 1 of but it can be increased to skip over peri‐
2255 ods of silence that are wanted. For example, if you have a song
2256 with 2 seconds of silence in the middle and 2 second at the end,
2257 you could set below-period to a value of 2 to skip over the
2258 silence in the middle of the audio.
2259
2260 For below-periods, duration specifies a period of silence that
2261 must exist before audio is not copied any more. By specifying a
2262 higher duration, silence that is wanted can be left in the
2263 audio. For example, if you have a song with an expected 1 sec‐
2264 ond of silence in the middle and 2 seconds of silence at the
2265 end, a duration of 2 seconds could be used to skip over the mid‐
2266 dle silence.
2267
2268 Unfortunately, you must know the length of the silence at the
2269 end of your audio file to trim off silence reliably. A work
2270 around is to use the silence effect in combination with the
2271 reverse effect. By first reversing the audio, you can use the
2272 above-periods to reliably trim all audio from what looks like
2273 the front of the file. Then reverse the file again to get back
2274 to normal.
2275
2276 To remove silence from the middle of a file, specify a below-
2277 periods that is negative. This value is then treated as a posi‐
2278 tive value and is also used to indicate the effect should
2279 restart processing as specified by the above-periods, making it
2280 suitable for removing periods of silence in the middle of the
2281 audio.
2282
2283 The option -l indicates that below-periods duration length of
2284 audio should be left intact at the beginning of each period of
2285 silence. For example, if you want to remove long pauses between
2286 words but do not want to remove the pauses completely.
2287
2288 The period counts are in units of samples. Duration counts may
2289 be in the format of hh:mm:ss.frac, or the exact count of sam‐
2290 ples. Threshold numbers may be suffixed with d to indicate the
2291 value is in decibels, or % to indicate a percentage of maximum
2292 value of the sample value (0% specifies pure digital silence).
2293
2294 The following example shows how this effect can be used to start
2295 a recording that does not contain the delay at the start which
2296 usually occurs between `pressing the record button' and the
2297 start of the performance:
2298 rec parameters filename other-effects silence 1 5 2%
2299
2300 sinc [-a att|-b beta] [-p phase|-M|-I|-L] [-t tbw|-n taps] [fre‐
2301 qHP][-freqLP [-t tbw|-n taps]]
2302 Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or
2303 band-reject filter to the signal. The freqHP and freqLP parame‐
2304 ters give the frequencies of the 6dB points of a high-pass and
2305 low-pass filter that may be invoked individually, or together.
2306 If both are given, then freqHP < freqLP creates a band-pass fil‐
2307 ter, freqHP > freqLP creates a band-reject filter.
2308
2309 The default stop-band attenuation of 120dB can be overridden
2310 with -a; alternatively, the kaiser-window `beta' parameter can
2311 be given directly with -b.
2312
2313 The default transition band-width of 5% of the total band can be
2314 overridden with -t (and tbw in Hertz); alternatively, the number
2315 of filter taps can be given directly with -n.
2316
2317 If both freqHP and freqLP are given, then a -t or -n option
2318 given to the left of the frequencies applies to both frequen‐
2319 cies; one of these options given to the right of the frequencies
2320 applies only to freqLP.
2321
2322 The -p, -M, -I, and -L options control the filter's phase
2323 response; see the rate effect for details.
2324
2325 This effect supports the --plot global option.
2326
2327 spectrogram [options]
2328 Create a spectrogram of the audio; the audio is passed unmodi‐
2329 fied through the SoX processing chain. This effect is optional
2330 - type sox --help and check the list of supported effects to see
2331 if it has been included.
2332
2333 The spectrogram is rendered in a Portable Network Graphic (PNG)
2334 file, and shows time in the X-axis, frequency in the Y-axis, and
2335 audio signal magnitude in the Z-axis. Z-axis values are repre‐
2336 sented by the colour (or optionally the intensity) of the pixels
2337 in the X-Y plane. If the audio signal contains multiple chan‐
2338 nels then these are shown from top to bottom starting from chan‐
2339 nel 1 (which is the left channel for stereo audio).
2340
2341 For example, if `my.wav' is a stereo file, then with
2342 sox my.wav -n spectrogram
2343 a spectrogram of the entire file will be created in the file
2344 `spectrogram.png'. More often though, analysis of a smaller
2345 portion of the audio is required; e.g. with
2346 sox my.wav -n remix 2 trim 20 30 spectrogram
2347 the spectrogram shows information only from the second (right)
2348 channel, and of thirty seconds of audio starting from twenty
2349 seconds in. To analyse a small portion of the frequency domain,
2350 the rate effect may be used, e.g.
2351 sox my.wav -n rate 6k spectrogram
2352 allows detailed analysis of frequencies up to 3kHz (half the
2353 sampling rate) i.e. where the human auditory system is most sen‐
2354 sitive. With
2355 sox my.wav -n trim 0 10 spectrogram -x 600 -y 200 -z 100
2356 the given options control the size of the spectrogram's X, Y & Z
2357 axes (in this case, the spectrogram area of the produced image
2358 will be 600 by 200 pixels in size and the Z-axis range will be
2359 100 dB). Note that the produced image includes axes legends
2360 etc. and so will be a little larger than the specified spectro‐
2361 gram size. In this example:
2362 sox -n -n synth 6 tri 10k:14k spectrogram -z 100 -w kaiser
2363 an analysis `window' with high dynamic range is selected to best
2364 display the spectrogram of a swept triangular wave. For a smi‐
2365 lar example, append the following to the `chime' command in the
2366 description of the delay effect (above):
2367 rate 2k spectrogram -X 200 -Z -10 -w kaiser
2368 Options are also avaliable to control the appearance (colour-
2369 set, brightness, contrast, etc.) and filename of the spectro‐
2370 gram; e.g. with
2371 sox my.wav -n spectrogram -m -l -o print.png
2372 a spectrogram is created suitable for printing on a `black and
2373 white' printer.
2374
2375 Options:
2376
2377 -x num Change the (maximum) width (X-axis) of the spectrogram
2378 from its default value of 800 pixels to a given number
2379 between 100 and 5000. See also -X and -d.
2380
2381 -X num X-axis pixels/second; the default is auto-calculated to
2382 fit the given or known audio duration to the X-axis size,
2383 or 100 otherwise. If given in conjunction with -d, this
2384 option affects the width of the spectrogram; otherwise,
2385 it affects the duration of the spectrogram. num can be
2386 from 1 (low time resolution) to 5000 (high time resolu‐
2387 tion) and need not be an integer. SoX may make a slight
2388 adjustment to the given number for processing quantisa‐
2389 tion reasons; if so, SoX will report the actual number
2390 used (viewable when the SoX global option -V is in
2391 effect). See also -x and -d.
2392
2393 -y num Sets the Y-axis size in pixels (per channel); this is the
2394 number of frequency `bins' used in the Fourier analysis
2395 that produces the spectrogram. N.B. it can be slow to
2396 produce the spectrogram if this number is not one more
2397 than a power of two (e.g. 129). By default the Y-axis
2398 size is chosen automatically (depending on the number of
2399 channels). See -Y for alternative way of setting spec‐
2400 trogram height.
2401
2402 -Y num Sets the target total height of the spectrogram(s). The
2403 default value is 550 pixels. Using this option (and by
2404 default), SoX will choose a height for individual spec‐
2405 trogram channels that is one more than a power of two, so
2406 the actual total height may fall short of the given num‐
2407 ber. However, there is also a minimum height per channel
2408 so if there are many channels, the number may be
2409 exceeded. See -y for alternative way of setting spectro‐
2410 gram height.
2411
2412 -z num Z-axis (colour) range in dB, default 120. This sets the
2413 dynamic-range of the spectrogram to be -num dBFS to
2414 0 dBFS. Num may range from 20 to 180. Decreasing
2415 dynamic-range effectively increases the `contrast' of the
2416 spectrogram display, and vice versa.
2417
2418 -Z num Sets the upper limit of the Z-axis in dBFS. A negative
2419 num effectively increases the `brightness' of the spec‐
2420 trogram display, and vice versa.
2421
2422 -q num Sets the Z-axis quantisation, i.e. the number of differ‐
2423 ent colours (or intensities) in which to render Z-axis
2424 values. A small number (e.g. 4) will give a
2425 `poster'-like effect making it easier to discern magni‐
2426 tude bands of similar level. Small numbers also usually
2427 result in small PNG files. The number given specifies
2428 the number of colours to use inside the Z-axis range; two
2429 colours are reserved to represent out-of-range values.
2430
2431 -w name
2432 Window: Hann (default), Hamming, Bartlett, Rectangular or
2433 Kaiser. The spectrogram is produced using the Discrete
2434 Fourier Transform (DFT) algorithm. A significant parame‐
2435 ter to this algorithm is the choice of `window function'.
2436 By default, SoX uses the Hann window which has good all-
2437 round frequency-resolution and dynamic-range properties.
2438 For better frequency resolution (but lower dynamic-
2439 range), select a Hamming window; for higher dynamic-range
2440 (but poorer frequency-resolution), select a Kaiser win‐
2441 dow. Bartlett and Rectangular windows are also avail‐
2442 able.
2443
2444 -W num Window adjustment parameter. This can be used to make
2445 small adjustments to the Kaiser window shape. A positive
2446 number (up to ten) increases its dynamic range, a nega‐
2447 tive number decreases it.
2448
2449 -s Allow slack overlapping of DFT windows. This can, in
2450 some cases, increase image sharpness and give greater
2451 adherence to the -x value, but at the expense of a little
2452 spectral loss.
2453
2454 -m Creates a monochrome spectrogram (the default is colour).
2455
2456 -h Selects a high-colour palette - less visually pleasing
2457 than the default colour palette, but it may make it eas‐
2458 ier to differentiate different levels. If this option is
2459 used in conjunction with -m, the result will be a hybrid
2460 monochrome/colour palette.
2461
2462 -p num Permute the colours in a colour or hybrid palette. The
2463 num parameter, from 1 (the default) to 6, selects the
2464 permutation.
2465
2466 -l Creates a `printer friendly' spectrogram with a light
2467 background (the default has a dark background).
2468
2469 -a Suppress the display of the axis lines. This is some‐
2470 times useful in helping to discern artefacts at the spec‐
2471 trogram edges.
2472
2473 -r Raw spectrogram: suppress the display of axes and leg‐
2474 ends.
2475
2476 -A Selects an alternative, fixed colour-set. This is pro‐
2477 vided only for compatibility with spectrograms produced
2478 by another package. It should not normally be used as it
2479 has some problems, not least, a lack of differentiation
2480 at the bottom end which results in masking of low-level
2481 artefacts.
2482
2483 -t text
2484 Set the image title - text to display above the spectro‐
2485 gram.
2486
2487 -c text
2488 Set (or clear) the image comment - text to display below
2489 and to the left of the spectrogram.
2490
2491 -o text
2492 Name of the spectrogram output PNG file, default `spec‐
2493 trogram.png'.
2494
2495 Advanced Options:
2496 In order to process a smaller section of audio without affecting
2497 other effects or the output signal (unlike when the trim effect
2498 is used), the following options may be used.
2499
2500 -d duration
2501 This option sets the X-axis resolution such that audio
2502 with the given duration ([[HH:]MM:]SS) fits the selected
2503 (or default) X-axis width. For example,
2504 sox input.mp3 output.wav -n spectrogram -d 1:00 stats
2505 creates a spectrogram showing the first minute of the
2506 audio, whilst
2507 the stats effect is applied to the entire audio signal.
2508
2509 See also -X for an alternative way of setting the X-axis
2510 resolution.
2511
2512 -S time
2513 Start the spectrogram at the given point in the audio
2514 stream. For example
2515 sox input.aiff output.wav spectrogram -S 1:00
2516 creates a spectrogram showing all but the first minute of
2517 the audio (the output file however, receives the entire
2518 audio stream).
2519
2520 For the ability to perform off-line processing of spectral data,
2521 see the stat effect.
2522
2523 speed factor[c]
2524 Adjust the audio speed (pitch and tempo together). factor is
2525 either the ratio of the new speed to the old speed: greater than
2526 1 speeds up, less than 1 slows down, or, if appended with the
2527 letter `c', the number of cents (i.e. 100ths of a semitone) by
2528 which the pitch (and tempo) should be adjusted: greater than 0
2529 increases, less than 0 decreases.
2530
2531 By default, the speed change is performed by resampling with the
2532 rate effect using its default quality/speed. For higher quality
2533 or higher speed resampling, in addition to the speed effect,
2534 specify the rate effect with the desired quality option.
2535
2536 See also the pitch and tempo effects.
2537
2538 splice [-h|-t|-q] { position[,excess[,leeway]] }
2539 Splice together audio sections. This effect provides two things
2540 over simple audio concatenation: a (usually short) cross-fade is
2541 applied at the join, and a wave similarity comparison is made to
2542 help determine the best place at which to make the join.
2543
2544 One of the options -h, -t, or -q may be given to select the fade
2545 envelope as triangular (a.k.a. linear) (the default), half-
2546 cosine wave, or quarter-cosine wave respectively.
2547
2548 Type Audio Fade level Transitions
2549 t correlated constant gain abrupt
2550 h correlated constant gain smooth
2551 q uncorrelated constant power smooth
2552
2553 To perform a splice, first use the trim effect to select the
2554 audio sections to be joined together. As when performing a tape
2555 splice, the end of the section to be spliced onto should be
2556 trimmed with a small excess (default 0.005 seconds) of audio
2557 after the ideal joining point. The beginning of the audio sec‐
2558 tion to splice on should be trimmed with the same excess (before
2559 the ideal joining point), plus an additional leeway (default
2560 0.005 seconds). SoX should then be invoked with the two audio
2561 sections as input files and the splice effect given with the
2562 position at which to perform the splice - this is length of the
2563 first audio section (including the excess).
2564
2565 For example, a long song begins with two verses which start (as
2566 determined e.g. by using the play command with the trim (start)
2567 effect) at times 0:30.125 and 1:03.432. The following commands
2568 cut out the first verse:
2569 sox too-long.wav part1.wav trim 0 30.130
2570 (5 ms excess, after the first verse starts)
2571 sox too-long.wav part2.wav trim 1:03.422
2572 (5 ms excess plus 5 ms leeway, before the second verse starts)
2573 sox part1.wav part2.wav just-right.wav splice 30.130
2574 For another example, the SoX command
2575 play "|sox -n -p synth 1 sin %1" "|sox -n -p synth 1 sin %3"
2576 generates and plays two notes, but there is a nasty click at the
2577 transition; the click can be removed by splicing instead of con‐
2578 catenating the audio, i.e. by appending splice 1 to the command.
2579 (Clicks at the beginning and end of the audio can be removed by
2580 preceding the splice effect with fade q .01 2 .01).
2581
2582 Provided your arithmetic is good enough, multiple splices can be
2583 performed with a single splice invocation. For example:
2584 #!/bin/sh
2585 # Audio Copy and Paste Over
2586 # acpo infile copy-start copy-stop paste-over-start outfile
2587 # All times measured in samples.
2588 rate=`soxi -r "$1"`
2589 e=`expr $rate '*' 5 / 1000` # Using default excess
2590 l=$e # and leeway.
2591 sox "$1" piece.wav trim `expr $2 - $e - $l`s \
2592 `expr $3 - $2 + $e + $l + $e`s
2593 sox "$1" part1.wav trim 0 `expr $4 + $e`s
2594 sox "$1" part2.wav trim `expr $4 + $3 - $2 - $e - $l`s
2595 sox part1.wav piece.wav part2.wav "$5" splice \
2596 `expr $4 + $e`s \
2597 `expr $4 + $e + $3 - $2 + $e + $l + $e`s
2598 In the above Bourne shell script, two splices are used to `copy
2599 and paste' audio.
2600
2601 * * *
2602
2603 It is also possible to use this effect to perform general cross-
2604 fades, e.g. to join two songs. In this case, excess would typi‐
2605 cally be an number of seconds, the -q option would typically be
2606 given (to select an `equal power' cross-fade), and leeway should
2607 be zero (which is the default if -q is given). For example, if
2608 f1.wav and f2.wav are audio files to be cross-faded, then
2609 sox f1.wav f2.wav out.wav splice -q $(soxi -D f1.wav),3
2610 cross-fades the files where the point of equal loudness is 3
2611 seconds before the end of f1.wav, i.e. the total length of the
2612 cross-fade is 2 × 3 = 6 seconds (Note: the $(...) notation is
2613 POSIX shell).
2614
2615 stat [-s scale] [-rms] [-freq] [-v] [-d]
2616 Display time and frequency domain statistical information about
2617 the audio. Audio is passed unmodified through the SoX process‐
2618 ing chain.
2619
2620 The information is output to the `standard error' (stderr)
2621 stream and is calculated, where n is the duration of the audio
2622 in samples, c is the number of audio channels, r is the audio
2623 sample rate, and xk represents the PCM value (in the range -1 to
2624 +1 by default) of each successive sample in the audio, as fol‐
2625 lows:
2626
2627 Samples read n×c
2628 Length (seconds) n÷r
2629 Scaled by See -s below.
2630 Maximum amplitude max(xk) The maximum sample
2631 value in the audio;
2632 usually this will
2633 be a positive num‐
2634 ber.
2635 Minimum amplitude min(xk) The minimum sample
2636 value in the audio;
2637 usually this will
2638 be a negative num‐
2639 ber.
2640 Midline amplitude ½min(xk)+½max(xk)
2641 Mean norm ¹/nΣ│xk│ The average of the
2642 absolute value of
2643 each sample in the
2644 audio.
2645 Mean amplitude ¹/nΣxk The average of each
2646 sample in the
2647 audio. If this
2648 figure is non-zero,
2649 then it indicates
2650 the presence of a
2651 D.C. offset (which
2652 could be removed
2653 using the dcshift
2654 effect).
2655 RMS amplitude √(¹/nΣxk²) The level of a D.C.
2656 signal that would
2657 have the same power
2658 as the audio's
2659 average power.
2660 Maximum delta max(│xk-xk-1│)
2661 Minimum delta min(│xk-xk-1│)
2662 Mean delta ¹/n-1Σ│xk-xk-1│
2663 RMS delta √(¹/n-1Σ(xk-xk-1)²)
2664
2665 Rough frequency In Hz.
2666 Volume Adjustment The parameter to
2667 the vol effect
2668 which would make
2669 the audio as loud
2670 as possible without
2671 clipping. Note:
2672 See the discussion
2673 on Clipping above
2674 for reasons why it
2675 is rarely a good
2676 idea actually to do
2677 this.
2678
2679 Note that the delta measurements are not applicable for multi-
2680 channel audio.
2681
2682 The -s option can be used to scale the input data by a given
2683 factor. The default value of scale is 2147483647 (i.e. the max‐
2684 imum value of a 32-bit signed integer). Internal effects always
2685 work with signed long PCM data and so the value should relate to
2686 this fact.
2687
2688 The -rms option will convert all output average values to `root
2689 mean square' format.
2690
2691 The -v option displays only the `Volume Adjustment' value.
2692
2693 The -freq option calculates the input's power spectrum (4096
2694 point DFT) instead of the statistics listed above. This should
2695 only be used with a single channel audio file.
2696
2697 The -d option displays a hex dump of the 32-bit signed PCM data
2698 audio in SoX's internal buffer. This is mainly used to help
2699 track down endian problems that sometimes occur in cross-plat‐
2700 form versions of SoX.
2701
2702 See also the stats effect.
2703
2704 stats [-b bits|-x bits|-s scale] [-w window-time]
2705 Display time domain statistical information about the audio
2706 channels; audio is passed unmodified through the SoX processing
2707 chain. Statistics are calculated and displayed for each audio
2708 channel and, where applicable, an overall figure is also given.
2709
2710 For example, for a typical well-mastered stereo music file:
2711
2712 Overall Left Right
2713 DC offset 0.000803 -0.000391 0.000803
2714 Min level -0.750977 -0.750977 -0.653412
2715 Max level 0.708801 0.708801 0.653534
2716 Pk lev dB -2.49 -2.49 -3.69
2717 RMS lev dB -19.41 -19.13 -19.71
2718 RMS Pk dB -13.82 -13.82 -14.38
2719 RMS Tr dB -85.25 -85.25 -82.66
2720 Crest factor - 6.79 6.32
2721 Flat factor 0.00 0.00 0.00
2722 Pk count 2 2 2
2723 Bit-depth 16/16 16/16 16/16
2724 Num samples 7.72M
2725 Length s 174.973
2726 Scale max 1.000000
2727 Window s 0.050
2728
2729 DC offset, Min level, and Max level are shown, by default, in
2730 the range ±1. If the -b (bits) options is given, then these
2731 three measurements will be scaled to a signed integer with the
2732 given number of bits; for example, for 16 bits, the scale would
2733 be -32768 to +32767. The -x option behaves the same way as -b
2734 except that the signed integer values are displayed in hexadeci‐
2735 mal. The -s option scales the three measurements by a given
2736 floating-point number.
2737
2738 Pk lev dB and RMS lev dB are standard peak and RMS level mea‐
2739 sured in dBFS. RMS Pk dB and RMS Tr dB are peak and trough val‐
2740 ues for RMS level measured over a short window (default 50ms).
2741
2742 Crest factor is the standard ratio of peak to RMS level (note:
2743 not in dB).
2744
2745 Flat factor is a measure of the flatness (i.e. consecutive sam‐
2746 ples with the same value) of the signal at its peak levels (i.e.
2747 either Min level, or Max level). Pk count is the number of
2748 occasions (not the number of samples) that the signal attained
2749 either Min level, or Max level.
2750
2751 The right-hand Bit-depth figure is the standard definition of
2752 bit-depth i.e. bits less significant than the given number are
2753 fixed at zero. The left-hand figure is the number of most sig‐
2754 nificant bits that are fixed at zero (or one for negative num‐
2755 bers) subtracted from the right-hand figure (the number sub‐
2756 tracted is directly related to Pk lev dB).
2757
2758 For multi-channel audio, an overall figure for each of the above
2759 measurements is given and derived from the channel figures as
2760 follows: DC offset: maximum magnitude; Max level, Pk lev dB,
2761 RMS Pk dB, Bit-depth: maximum; Min level, RMS Tr dB: minimum;
2762 RMS lev dB, Flat factor, Pk count: average; Crest factor: not
2763 applicable.
2764
2765 Length s is the duration in seconds of the audio, and Num sam‐
2766 ples is equal to the sample-rate multiplied by Length.
2767 Scale Max is the scaling applied to the first three measure‐
2768 ments; specifically, it is the maximum value that could apply to
2769 Max level. Window s is the length of the window used for the
2770 peak and trough RMS measurements.
2771
2772 See also the stat effect.
2773
2774 swap Swap stereo channels. See also remix for an effect that allows
2775 arbitrary channel selection and ordering (and mixing).
2776
2777 stretch factor [window fade shift fading]
2778 Change the audio duration (but not its pitch). This effect is
2779 broadly equivalent to the tempo effect with (factor inverted
2780 and) search set to zero, so in general, its results are compara‐
2781 tively poor; it is retained as it can sometimes out-perform
2782 tempo for small factors.
2783
2784 factor of stretching: >1 lengthen, <1 shorten duration. window
2785 size is in ms. Default is 20ms. The fade option, can be `lin'.
2786 shift ratio, in [0 1]. Default depends on stretch factor. 1 to
2787 shorten, 0.8 to lengthen. The fading ratio, in [0 0.5]. The
2788 amount of a fade's default depends on factor and shift.
2789
2790 See also the tempo effect.
2791
2792 synth [-j KEY] [-n] [len [off [ph [p1 [p2 [p3]]]]]] {[type] [combine]
2793 [[%]freq[k][:|+|/|-[%]freq2[k]]] [off [ph [p1 [p2 [p3]]]]]}
2794 This effect can be used to generate fixed or swept frequency
2795 audio tones with various wave shapes, or to generate wide-band
2796 noise of various `colours'. Multiple synth effects can be cas‐
2797 caded to produce more complex waveforms; at each stage it is
2798 possible to choose whether the generated waveform will be mixed
2799 with, or modulated onto the output from the previous stage.
2800 Audio for each channel in a multi-channel audio file can be syn‐
2801 thesised independently.
2802
2803 Though this effect is used to generate audio, an input file must
2804 still be given, the characteristics of which will be used to set
2805 the synthesised audio length, the number of channels, and the
2806 sampling rate; however, since the input file's audio is not nor‐
2807 mally needed, a `null file' (with the special name -n) is often
2808 given instead (and the length specified as a parameter to synth
2809 or by another given effect that can has an associated length).
2810
2811 For example, the following produces a 3 second, 48kHz, audio
2812 file containing a sine-wave swept from 300 to 3300 Hz:
2813 sox -n output.wav synth 3 sine 300-3300
2814 and this produces an 8 kHz version:
2815 sox -r 8000 -n output.wav synth 3 sine 300-3300
2816 Multiple channels can be synthesised by specifying the set of
2817 parameters shown between braces multiple times; the following
2818 puts the swept tone in the left channel and adds `brown' noise
2819 in the right:
2820 sox -n output.wav synth 3 sine 300-3300 brownnoise
2821 The following example shows how two synth effects can be cas‐
2822 caded to create a more complex waveform:
2823 play -n synth 0.5 sine 200-500 synth 0.5 sine fmod 700-100
2824 Frequencies can also be given in `scientific' note notation, or,
2825 by prefixing a `%' character, as a number of semitones relative
2826 to `middle A' (440 Hz). For example, the following could be
2827 used to help tune a guitar's low `E' string:
2828 play -n synth 4 pluck %-29
2829 or with a (Bourne shell) loop, the whole guitar:
2830 for n in E2 A2 D3 G3 B3 E4; do
2831 play -n synth 4 pluck $n repeat 2; done
2832 See the delay effect (above) and the reference to `SoX scripting
2833 examples' (below) for more synth examples.
2834
2835 N.B. This effect generates audio at maximum volume (0dBFS),
2836 which means that there is a high chance of clipping when using
2837 the audio subsequently, so in many cases, you will want to fol‐
2838 low this effect with the gain effect to prevent this from hap‐
2839 pening. (See also Clipping above.) Note that, by default, the
2840 synth effect incorporates the functionality of gain -h (see the
2841 gain effect for details); synth's -n option may be given to dis‐
2842 able this behaviour.
2843
2844 A detailed description of each synth parameter follows:
2845
2846 len is the length of audio to synthesise expressed as a time or
2847 as a number of samples; 0=inputlength, default=0.
2848
2849 The format for specifying lengths in time is hh:mm:ss.frac. The
2850 format for specifying sample counts is the number of samples
2851 with the letter `s' appended to it.
2852
2853 type is one of sine, square, triangle, sawtooth, trapezium, exp,
2854 [white]noise, tpdfnoise pinknoise, brownnoise, pluck;
2855 default=sine.
2856
2857 combine is one of create, mix, amod (amplitude modulation), fmod
2858 (frequency modulation); default=create.
2859
2860 freq/freq2 are the frequencies at the beginning/end of synthesis
2861 in Hz or, if preceded with `%', semitones relative to A
2862 (440 Hz); alternatively, `scientific' note notation (e.g. E2)
2863 may be used. The default frequency is 440Hz. By default, the
2864 tuning used with the note notations is `equal temperament'; the
2865 -j KEY option selects `just intonation', where KEY is an integer
2866 number of semitones relative to A (so for example, -9 or 3
2867 selects the key of C), or a note in scientific notation.
2868
2869 If freq2 is given, then len must also have been given and the
2870 generated tone will be swept between the given frequencies. The
2871 two given frequencies must be separated by one of the characters
2872 `:', `+', `/', or `-'. This character is used to specify the
2873 sweep function as follows:
2874
2875 : Linear: the tone will change by a fixed number of hertz
2876 per second.
2877
2878 + Square: a second-order function is used to change the
2879 tone.
2880
2881 / Exponential: the tone will change by a fixed number of
2882 semitones per second.
2883
2884 - Exponential: as `/', but initial phase always zero, and
2885 stepped (less smooth) frequency changes.
2886
2887 Not used for noise.
2888
2889 off is the bias (DC-offset) of the signal in percent; default=0.
2890
2891 ph is the phase shift in percentage of 1 cycle; default=0. Not
2892 used for noise.
2893
2894 p1 is the percentage of each cycle that is `on' (square), or
2895 `rising' (triangle, exp, trapezium); default=50 (square, trian‐
2896 gle, exp), default=10 (trapezium), or sustain (pluck);
2897 default=40.
2898
2899 p2 (trapezium): the percentage through each cycle at which
2900 `falling' begins; default=50. exp: the amplitude in multiples of
2901 2dB; default=50, or tone-1 (pluck); default=20.
2902
2903 p3 (trapezium): the percentage through each cycle at which
2904 `falling' ends; default=60, or tone-2 (pluck); default=90.
2905
2906 tempo [-q] [-m|-s|-l] factor [segment [search [overlap]]]
2907 Change the audio playback speed but not its pitch. This effect
2908 uses the WSOLA algorithm. The audio is chopped up into segments
2909 which are then shifted in the time domain and overlapped (cross-
2910 faded) at points where their waveforms are most similar as
2911 determined by measurement of `least squares'.
2912
2913 By default, linear searches are used to find the best overlap‐
2914 ping points. If the optional -q parameter is given, tree
2915 searches are used instead. This makes the effect work more
2916 quickly, but the result may not sound as good. However, if you
2917 must improve the processing speed, this generally reduces the
2918 sound quality less than reducing the search or overlap values.
2919
2920 The -m option is used to optimize default values of segment,
2921 search and overlap for music processing.
2922
2923 The -s option is used to optimize default values of segment,
2924 search and overlap for speech processing.
2925
2926 The -l option is used to optimize default values of segment,
2927 search and overlap for `linear' processing that tends to cause
2928 more noticeable distortion but may be useful when factor is
2929 close to 1.
2930
2931 If -m, -s, or -l is specified, the default value of segment will
2932 be calculated based on factor, while default search and overlap
2933 values are based on segment. Any values you provide still over‐
2934 ride these default values.
2935
2936 factor gives the ratio of new tempo to the old tempo, so e.g.
2937 1.1 speeds up the tempo by 10%, and 0.9 slows it down by 10%.
2938
2939 The optional segment parameter selects the algorithm's segment
2940 size in milliseconds. If no other flags are specified, the
2941 default value is 82 and is typically suited to making small
2942 changes to the tempo of music. For larger changes (e.g. a factor
2943 of 2), 41 ms may give a better result. The -m, -s, and -l flags
2944 will cause the segment default to be automatically adjusted
2945 based on factor. For example using -s (for speech) with a tempo
2946 of 1.25 will calculate a default segment value of 32.
2947
2948 The optional search parameter gives the audio length in mil‐
2949 liseconds over which the algorithm will search for overlapping
2950 points. If no other flags are specified, the default value is
2951 14.68. Larger values use more processing time and may or may
2952 not produce better results. A practical maximum is half the
2953 value of segment. Search can be reduced to cut processing time
2954 at the risk of degrading output quality. The -m, -s, and -l
2955 flags will cause the search default to be automatically adjusted
2956 based on segment.
2957
2958 The optional overlap parameter gives the segment overlap length
2959 in milliseconds. Default value is 12, but -m, -s, or -l flags
2960 automatically adjust overlap based on segment size. Increasing
2961 overlap increases processing time and may increase quality. A
2962 practical maximum for overlap is the value of search, with over‐
2963 lap typically being (at least) a little smaller then search.
2964
2965 See also speed for an effect that changes tempo and pitch
2966 together, pitch for an effect that changes tempo and pitch
2967 together, and stretch for an effect that changes tempo using a
2968 different algorithm.
2969
2970 treble gain [frequency[k] [width[s|h|k|o|q]]]
2971 Apply a treble tone-control effect. See the description of the
2972 bass effect for details.
2973
2974 tremolo speed [depth]
2975 Apply a tremolo (low frequency amplitude modulation) effect to
2976 the audio. The tremolo frequency in Hz is given by speed, and
2977 the depth as a percentage by depth (default 40).
2978
2979 trim start [length|=end]
2980 Trim can trim off unwanted audio from the beginning and end of
2981 the audio. Audio is not sent to the output stream until the
2982 start location is reached.
2983
2984 The optional length parameter gives the length of audio to out‐
2985 put after the start sample and is thus used to trim off the end
2986 of the audio. Alternatively, an absolute end location can be
2987 given by preceding it with an equals sign. Using a value of 0
2988 for the start parameter will allow trimming off the end only.
2989
2990 Both parameters can be specified using either an amount of time
2991 or an exact count of samples. The format for specifying lengths
2992 in time is hh:mm:ss.frac. A start value of 1:30.5 will not
2993 start until 1 minute, thirty and ½ seconds into the audio. The
2994 format for specifying sample counts is the number of samples
2995 with the letter `s' appended to it. A value of 8000s for the
2996 start parameter will wait until 8000 samples are read before
2997 starting to process audio.
2998
2999 vad [options]
3000 Voice Activity Detector. Attempts to trim silence and quiet
3001 background sounds from the ends of (fairly high resolution i.e.
3002 16-bit, 44-48kHz) recordings of speech. The algorithm currently
3003 uses a simple cepstral power measurement to detect voice, so may
3004 be fooled by other things, especially music. The effect can
3005 trim only from the front of the audio, so in order to trim from
3006 the back, the reverse effect must also be used. E.g.
3007 play speech.wav norm vad
3008 to trim from the front,
3009 play speech.wav norm reverse vad reverse
3010 to trim from the back, and
3011 play speech.wav norm vad reverse vad reverse
3012 to trim from both ends. The use of the norm effect is recom‐
3013 mended, but remember that neither reverse nor norm is suitable
3014 for use with streamed audio.
3015
3016 Options:
3017 Default values are shown in parenthesis.
3018
3019 -t [22mnum (7)
3020 The measurement level used to trigger activity detection.
3021 This might need to be changed depending on the noise
3022 level, signal level and other charactistics of the input
3023 audio.
3024
3025 -T num (0.25)
3026 The time constant (in seconds) used to help ignore short
3027 bursts of sound.
3028
3029 -s [22mnum (1)
3030 The amount of audio (in seconds) to search for qui‐
3031 eter/shorter bursts of audio to include prior to the
3032 detected trigger point.
3033
3034 -g num (0.25)
3035 Allowed gap (in seconds) between quieter/shorter bursts
3036 of audio to include prior to the detected trigger point.
3037
3038 -p [22mnum (0)
3039 The amount of audio (in seconds) to preserve before the
3040 trigger point and any found quieter/shorter bursts.
3041
3042 Advanced Options:
3043 These allow fine tuning of the alogithm's internal parameters.
3044
3045 -b num The algorithm (internally) uses adaptive noise estima‐
3046 tion/reduction in order to detect the start of the wanted
3047 audio. This option sets the time for the initial noise
3048 estimate.
3049
3050 -N num Time constant used by the adaptive noise estimator for
3051 when the noise level is increasing.
3052
3053 -n num Time constant used by the adaptive noise estimator for
3054 when the noise level is decreasing.
3055
3056 -r num Amount of noise reduction to use in the detection algo‐
3057 rithm (e.g. 0, 0.5, ...).
3058
3059 -f num Frequency of the algorithm's processing/measurements.
3060
3061 -m num Measurement duration; by default, twice the measurement
3062 period; i.e. with overlap.
3063
3064 -M num Time constant used to smooth spectral measurements.
3065
3066 -h num `Brick-wall' frequency of high-pass filter applied at the
3067 input to the detector algorithm.
3068
3069 -l num `Brick-wall' frequency of low-pass filter applied at the
3070 input to the detector algorithm.
3071
3072 -H num `Brick-wall' quefrency of high-pass lifter used in the
3073 detector algorithm.
3074
3075 -L num `Brick-wall' quefrency of low-pass lifter used in the
3076 detector algorithm.
3077
3078 See also the silence effect.
3079
3080 vol gain [type [limitergain]]
3081 Apply an amplification or an attenuation to the audio signal.
3082 Unlike the -v option (which is used for balancing multiple input
3083 files as they enter the SoX effects processing chain), vol is an
3084 effect like any other so can be applied anywhere, and several
3085 times if necessary, during the processing chain.
3086
3087 The amount to change the volume is given by gain which is inter‐
3088 preted, according to the given type, as follows: if type is
3089 amplitude (or is omitted), then gain is an amplitude (i.e. volt‐
3090 age or linear) ratio, if power, then a power (i.e. wattage or
3091 voltage-squared) ratio, and if dB, then a power change in dB.
3092
3093 When type is amplitude or power, a gain of 1 leaves the volume
3094 unchanged, less than 1 decreases it, and greater than 1
3095 increases it; a negative gain inverts the audio signal in addi‐
3096 tion to adjusting its volume.
3097
3098 When type is dB, a gain of 0 leaves the volume unchanged, less
3099 than 0 decreases it, and greater than 0 increases it.
3100
3101 See [4] for a detailed discussion on electrical (and hence audio
3102 signal) voltage and power ratios.
3103
3104 Beware of Clipping when the increasing the volume.
3105
3106 The gain and the type parameters can be concatenated if desired,
3107 e.g. vol 10dB.
3108
3109 An optional limitergain value can be specified and should be a
3110 value much less than 1 (e.g. 0.05 or 0.02) and is used only on
3111 peaks to prevent clipping. Not specifying this parameter will
3112 cause no limiter to be used. In verbose mode, this effect will
3113 display the percentage of the audio that needed to be limited.
3114
3115 See also gain for a volume-changing effect with different capa‐
3116 bilities, and compand for a dynamic-range compression/expan‐
3117 sion/limiting effect.
3118
3119 Deprecated Effects
3120 The following effects have been renamed or have their functionality
3121 included in another effect; they continue to work in this version of
3122 SoX but may be removed in future.
3123
3124 filter [low]-[high] [window-len [beta]]
3125 Apply a sinc-windowed lowpass, highpass, or bandpass filter of
3126 given window length to the signal. This effect has been super‐
3127 seded by the sinc effect. Compared with `sinc', `filter' is
3128 slower and has fewer capabilities.
3129
3130 low refers to the frequency of the lower 6dB corner of the fil‐
3131 ter. high refers to the frequency of the upper 6dB corner of
3132 the filter.
3133
3134 A low-pass filter is obtained by leaving low unspecified, or 0.
3135 A high-pass filter is obtained by leaving high unspecified, or
3136 0, or greater than or equal to the Nyquist frequency.
3137
3138 The window-len, if unspecified, defaults to 128. Longer windows
3139 give a sharper cut-off, smaller windows a more gradual cut-off.
3140
3141 The beta parameter determines the type of filter window used.
3142 Any value greater than 2 is the beta for a Kaiser window. Beta
3143 ≤ 2 selects a Blackman-Nuttall window. If unspecified, the
3144 default is a Kaiser window with beta 16.
3145
3146 In the case of Kaiser window (beta > 2), lower betas produce a
3147 somewhat faster transition from pass-band to stop-band, at the
3148 cost of noticeable artifacts. A beta of 16 is the default, beta
3149 less than 10 is not recommended. If you want a sharper cut-off,
3150 don't use low beta's, use a longer sample window. A Blackman-
3151 Nuttall window is selected by specifying any `beta' ≤ 2, and the
3152 Blackman-Nuttall window has somewhat steeper cut-off than the
3153 default Kaiser window. You will probably not need to use the
3154 beta parameter at all, unless you are just curious about compar‐
3155 ing the effects of Blackman-Nuttall vs. Kaiser windows.
3156
3157 This effect supports the --plot global option.
3158
3159 key [-q] shift [segment [search [overlap]]]
3160 Change the audio key (i.e. pitch but not tempo). This is just
3161 an alias for the pitch effect.
3162
3163 pan direction
3164 Mix the audio from one channel to another. Use mixer or remix
3165 instead of this effect.
3166
3167 The direction is a value from -1 to 1. -1 represents far left
3168 and 1 represents far right.
3169
3170 polyphase [-w nut|ham] [-width n] [-cut-off c]
3171 rabbit [-c0|-c1|-c2|-c3|-c4]
3172 resample [-qs|-q|-ql] [rolloff [beta]]
3173 Formerly sample-rate-changing effects in their own right, these
3174 are now just aliases for the rate effect.
3175
3177 Exit status is 0 for no error, 1 if there is a problem with the com‐
3178 mand-line parameters, or 2 if an error occurs during file processing.
3179
3181 Please report any bugs found in this version of SoX to the mailing list
3182 (sox-users@lists.sourceforge.net).
3183
3185 soxi(1), soxformat(7), libsox(3)
3186 audacity(1), gnuplot(1), octave(1), wget(1)
3187 The SoX web site at http://sox.sourceforge.net
3188 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
3189
3190 References
3191 [1] R. Bristow-Johnson, Cookbook formulae for audio EQ biquad filter
3192 coefficients, http://musicdsp.org/files/Audio-EQ-Cookbook.txt
3193
3194 [2] Wikipedia, Q-factor, http://en.wikipedia.org/wiki/Q_factor
3195
3196 [3] Scott Lehman, Effects Explained, http://harmony-cen‐
3197 tral.com/Effects/effects-explained.html
3198
3199 [4] Wikipedia, Decibel, http://en.wikipedia.org/wiki/Decibel
3200
3201 [5] Richard Furse, Linux Audio Developer's Simple Plugin API,
3202 http://www.ladspa.org
3203
3204 [6] Richard Furse, Computer Music Toolkit, http://www.ladspa.org/cmt
3205
3206 [7] Steve Harris, LADSPA plugins, http://plugin.org.uk
3207
3209 Copyright 1998-2011 Chris Bagwell and SoX Contributors.
3210 Copyright 1991 Lance Norskog and Sundry Contributors.
3211
3212 This program is free software; you can redistribute it and/or modify it
3213 under the terms of the GNU General Public License as published by the
3214 Free Software Foundation; either version 2, or (at your option) any
3215 later version.
3216
3217 This program is distributed in the hope that it will be useful, but
3218 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
3219 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
3220 Public License for more details.
3221
3223 Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
3224 tributors are listed in the ChangeLog file that is distributed with the
3225 source code.
3226
3227
3228
3229sox February 19, 2011 SoX(1)