1SoX(7) Sound eXchange SoX(7)
2
3
4
6 SoX - Sound eXchange, the Swiss Army knife of audio manipulation
7
9 This manual describes SoX supported file formats and audio device
10 types; the SoX manual set starts with sox(1).
11
12 Format types that can SoX can determine by a filename extension are
13 listed with their names preceded by a dot. Format types that are
14 optionally built into SoX are marked `(optional)'.
15
16 Format types that can be handled by an external library via an optional
17 pseudo file type (currently sndfile or ffmpeg) are marked e.g. `(also
18 with -t sndfile)'. This might be useful if you have a file that
19 doesn't work with SoX's default format readers and writers, and there's
20 an external reader or writer for that format.
21
22 To see if SoX has support for an optional format or device, enter sox
23 -h and look for its name under the list: `AUDIO FILE FORMATS' or `AUDIO
24 DEVICE DRIVERS'.
25
26 SOX FORMATS & DEVICE DRIVERS
27 .raw (also with -t sndfile),
28 .f4, .f8,
29 .s1, .s2, .s3, .s4,
30 .u1, .u2, .u3, .u4,
31 .ul, .al, .lu, .la,
32 .sb, .sw, .ub, .uw
33 Raw (headerless) audio files. For raw, the sample rate and the
34 data encoding must be given using command-line format options;
35 for the other listed types, the sample rate defaults to 8kHz
36 (but may be overridden), and the data encoding is defined by the
37 given suffix. Thus f4 and f8 indicate files encoded as 4 and
38 8-byte (IEEE single and double precision) floating point PCM
39 respectively; s1, s2, s3, and s4 indicate 1, 2, 3, and 4-byte
40 signed integer PCM respectively; u1, u2, u3, and u4 indicate 1,
41 2, 3, and 4-byte unsigned integer PCM respectively; ul indicates
42 `μ-law' (byte), al indicates `A-law' (byte), and lu and la are
43 inverse bit order `μ-law' and inverse bit order `A-law' respec‐
44 tively. sb, sw, ub, uw, and sl are aliases for s1, s2, u1, u2,
45 and s4 respectively. For all raw formats, the number of chan‐
46 nels defaults to 1 (but may be overridden).
47
48 Headerless audio files on a SPARC computer are likely to be of
49 format ul; on a Mac, they're likely to be u1 but with a sample
50 rate of 11025 or 22050 Hz.
51
52 See .ima and .vox for raw ADPCM formats.
53
54 .8svx (also with -t sndfile)
55 Amiga 8SVX musical instrument description format.
56
57 .aiff, .aif (also with -t sndfile)
58 AIFF files used on Apple Macs as well as older Apple IIc/IIgs
59 and SGI. Currently, SoX's AIFF support does not include multi‐
60 ple audio chunks, or the 8SVX musical instrument description
61 format. AIFF files are multimedia archives and can have multi‐
62 ple audio and picture chunks. You may need a separate archiver
63 to work with them.
64
65 .aiffc, .aifc (also with -t sndfile)
66 AIFF-C is a format based on AIFF that was created to allow han‐
67 dling compressed audio. It can also handle little endian uncom‐
68 pressed linear data that is often referred to as sowt encoding.
69 This encoding has also become the defacto format produced by
70 modern Macs as well as iTunes on any platform. AIFF-C files
71 produced by other applications typically have the file extension
72 .aif and require looking at its header to detect the true for‐
73 mat. The sowt encoding is the only encoding that SoX can handle
74 with this format.
75
76 AIFF-C is defined in DAVIC 1.4 Part 9 Annex B. This format is
77 referred from ARIB STD-B24, which is specified for Japanese data
78 broadcasting. Any private chunks are not supported.
79
80 alsa (optional)
81 Advanced Linux Sound Architecture device driver; supports both
82 playing and recording audio. ALSA is only used in Linux-based
83 operating systems, though these often support OSS (see below) as
84 well. Examples:
85 sox infile -t alsa
86 sox infile -t alsa default
87 sox infile -t alsa hw:0
88 sox -2 -t alsa hw:1 outfile
89 See also play(1) and rec(1).
90
91 .amb Ambisonic B-Format: a specialisation of .wav with between 3 and
92 16 channels of audio for use with an Ambisonic decoder. See
93 http://www.ambisonia.com/Members/mleese/file-format-for-b-format
94 for details. It is up to the user to get the channels together
95 in the right order and at the correct amplitude.
96
97 .amr-nb (optional)
98 Adaptive Multi Rate - Narrow Band speech codec; a lossy format
99 used in 3rd generation mobile telephony and defined in 3GPP TS
100 26.071 et al.
101
102 AMR-NB audio has a fixed sampling rate of 8 kHz and supports
103 encoding to the following bit-rates (as selected by the -C
104 option): 0 = 4.75 kbit/s, 1 = 5.15 kbit/s, 2 = 5.9 kbit/s, 3 =
105 6.7 kbit/s, 4 = 7.4 kbit/s 5 = 7.95 kbit/s, 6 = 10.2 kbit/s, 7 =
106 12.2 kbit/s.
107
108 .amr-wb (optional)
109 Adaptive Multi Rate - Wide Band speech codec; a lossy format
110 used in 3rd generation mobile telephony and defined in 3GPP TS
111 26.171 et al.
112
113 AMR-WB audio has a fixed sampling rate of 16 kHz and supports
114 encoding to the following bit-rates (as selected by the -C
115 option): 0 = 6.6 kbit/s, 1 = 8.85 kbit/s, 2 = 12.65 kbit/s, 3 =
116 14.25 kbit/s, 4 = 15.85 kbit/s 5 = 18.25 kbit/s, 6 = 19.85
117 kbit/s, 7 = 23.05 kbit/s, 8 = 23.85 kbit/s.
118
119 ao (optional)
120 Xiph.org's Audio Output device driver; works only for playing
121 audio. It supports a wide range of devices and sound systems -
122 see its documentation for the full range. For the most part,
123 SoX's use of libao cannot be configured directly; instead, libao
124 configuration files must be used.
125
126 The filename specified is used to determine which libao plugin
127 to use. Normally, you should specify `default' as the filename.
128 If that doesn't give the desired behavior then you can specify
129 the short name for a given plugin (such as pulse for pulse audio
130 plugin). Examples:
131 sox infile -t ao
132 sox infile -t ao default
133 sox infile -t ao pulse
134 See also play(1).
135
136 .au, .snd (also with -t sndfile)
137 Sun Microsystems AU files. There are many types of AU file; DEC
138 has invented its own with a different magic number and byte
139 order. To write a DEC file, use the -L option with the output
140 file options.
141
142 Some .au files are known to have invalid AU headers; these are
143 probably original Sun μ-law 8000 Hz files and can be dealt with
144 using the .ul format (see below).
145
146 It is possible to override AU file header information with the
147 -r and -c options, in which case SoX will issue a warning to
148 that effect.
149
150 .avr Audio Visual Research format; used by a number of commercial
151 packages on the Mac.
152
153 .caf (optional)
154 Apple's Core Audio File format.
155
156 .cdda, .cdr
157 `Red Book' Compact Disc Digital Audio. CDDA has two audio chan‐
158 nels formatted as 16-bit signed integers at a sample rate of
159 44.1 kHz. The number of (stereo) samples in each CDDA track is
160 always a multiple of 588 which is why it needs its own handler.
161
162 coreaudio (optional)
163 Mac OSX CoreAudio device driver: supports both playing and
164 recording audio. Examples:
165 sox infile -t coreaudio
166 sox infile -t coreaudio default
167 See also play(1) and rec(1).
168
169 .cvsd, .cvs
170 Continuously Variable Slope Delta modulation. A headerless for‐
171 mat used to compress speech audio for applications such as voice
172 mail. This format is sometimes used with bit-reversed samples -
173 the -X format option can be used to set the bit-order.
174
175 .cvu Continuously Variable Slope Delta modulation (unfiltered). This
176 is an alternative handler for CVSD that is unfiltered but can be
177 used with any bit-rate. E.g.
178 sox infile outfile.cvu rate 28k
179 play -r 28k outfile.cvu filter -3.4k
180
181 .dat Text Data files. These files contain a textual representation
182 of the sample data. There is one line at the beginning that
183 contains the sample rate. Subsequent lines contain two numeric
184 data items: the time since the beginning of the first sample and
185 the sample value. Values are normalized so that the maximum and
186 minimum are 1 and -1. This file format can be used to create
187 data files for external programs such as FFT analysers or graph
188 routines. SoX can also convert a file in this format back into
189 one of the other file formats.
190
191 .dvms, .vms
192 Used in Germany to compress speech audio for voice mail. A
193 self-describing variant of cvsd.
194
195 .fap (optional)
196 See .paf.
197
198 ffmpeg (optional)
199 This is a pseudo-type that forces ffmpeg to be used. The actual
200 file type is deduced from the file name (it cannot be used on
201 stdio). It can read a wide range of audio files, not all of
202 which are documented here, and also the audio track of many
203 video files (including AVI, WMV and MPEG). At present only the
204 first audio track of a file can be read.
205
206 .flac (optional; also with -t sndfile)
207 Xiph.org's Free Lossless Audio CODEC compressed audio. FLAC is
208 an open, patent-free CODEC designed for compressing music. It
209 is similar to MP3 and Ogg Vorbis, but lossless, meaning that
210 audio is compressed in FLAC without any loss in quality.
211
212 SoX can read native FLAC files (.flac) but not Ogg FLAC files
213 (.ogg). [But see .ogg below for information relating to support
214 for Ogg Vorbis files.]
215
216 SoX can write native FLAC files according to a given or default
217 compression level. 8 is the default compression level and gives
218 the best (but slowest) compression; 0 gives the least (but
219 fastest) compression. The compression level is selected using
220 the -C option [see sox(1)] with a whole number from 0 to 8.
221
222 .fssd An alias for the .u1 format.
223
224 .gsm (optional; also with -t sndfile)
225 GSM 06.10 Lossy Speech Compression. A lossy format for com‐
226 pressing speech which is used in the Global Standard for Mobile
227 telecommunications (GSM). It's good for its purpose, shrinking
228 audio data size, but it will introduce lots of noise when a
229 given audio signal is encoded and decoded multiple times. This
230 format is used by some voice mail applications. It is rather
231 CPU intensive.
232
233 .hcom Macintosh HCOM files. These are Mac FSSD files with Huffman
234 compression.
235
236 .htk Single channel 16-bit PCM format used by HTK, a toolkit for
237 building Hidden Markov Model speech processing tools.
238
239 .ircam (also with -t sndfile)
240 Another name for .sf.
241
242 .ima (also with -t sndfile)
243 A headerless file of IMA ADPCM audio data. IMA ADPCM claims
244 16-bit precision packed into only 4 bits, but in fact sounds no
245 better than .vox.
246
247 .lpc, .lpc10
248 LPC-10 is a compression scheme for speech developed in the
249 United States. See http://www.arl.wustl.edu/~jaf/lpc/ for
250 details. There is no associated file format, so SoX's implemen‐
251 tation is headerless.
252
253 .mat, .mat4, .mat5 (optional)
254 Matlab 4.2/5.0 (respectively GNU Octave 2.0/2.1) format (.mat is
255 the same as .mat4).
256
257 .m3u A playlist format; contains a list of audio files. SoX can
258 read, but not write this file format. See [1] for details of
259 this format.
260
261 .maud An IFF-conforming audio file type, registered by MS MacroSystem
262 Computer GmbH, published along with the `Toccata' sound-card on
263 the Amiga. Allows 8bit linear, 16bit linear, A-Law, μ-law in
264 mono and stereo.
265
266 .mp3, .mp2 (optional read, optional write)
267 MP3 compressed audio; MP3 (MPEG Layer 3) is a part of the
268 patent-encumbered MPEG standards for audio and video compres‐
269 sion. It is a lossy compression format that achieves good com‐
270 pression rates with little quality loss.
271
272 Because MP3 is patented, SoX cannot be distributed with MP3 sup‐
273 port without incurring the patent holder's fees. Users who
274 require SoX with MP3 support must currently compile and build
275 SoX with the MP3 libraries (LAME & MAD) from source code.
276
277 See also Ogg Vorbis for a similar format.
278
279 .mp4, .m4a (optional)
280 MP4 compressed audio. MP3 (MPEG 4) is part of the MPEG stan‐
281 dards for audio and video compression. See mp3 for more infor‐
282 mation.
283
284 .nist (also with -t sndfile)
285 See .sph.
286
287 .ogg, .vorbis (optional)
288 Xiph.org's Ogg Vorbis compressed audio; an open, patent-free
289 CODEC designed for music and streaming audio. It is a lossy
290 compression format (similar to MP3, VQF & AAC) that achieves
291 good compression rates with a minimum amount of quality loss.
292
293 SoX can decode all types of Ogg Vorbis files, and can encode at
294 different compression levels/qualities given as a number from -1
295 (highest compression/lowest quality) to 10 (lowest compression,
296 highest quality). By default the encoding quality level is 3
297 (which gives an encoded rate of approx. 112kbps), but this can
298 be changed using the -C option (see above) with a number from -1
299 to 10; fractional numbers (e.g. 3.6) are also allowed. Decod‐
300 ing is somewhat CPU intensive and encoding is very CPU inten‐
301 sive.
302
303 See also .mp3 for a similar format.
304
305 oss (optional)
306 Open Sound System /dev/dsp device driver; supports both playing
307 and recording audio. OSS support is available in Unix-like
308 operating systems, sometimes together with alternative sound
309 systems (such as ALSA). Examples:
310 sox infile -t oss
311 sox infile -t oss /dev/dsp
312 sox -2 -t oss /dev/dsp outfile
313 See also play(1) and rec(1).
314
315 .paf, .fap (optional)
316 Ensoniq PARIS file format (big and little-endian respectively).
317
318 .pls A playlist format; contains a list of audio files. SoX can
319 read, but not write this file format. See [2] for details of
320 this format.
321
322 Note: SoX support for SHOUTcast PLS relies on wget(1) and is
323 only partially supported: it's necessary to specify the audio
324 type manually, e.g.
325 play -t mp3 "http://a.server/pls?rn=265&file=filename.pls"
326 and SoX does not know about alternative servers - hit Ctrl-C
327 twice in quick succession to quit.
328
329 .prc Psion Record. Used in Psion EPOC PDAs (Series 5, Revo and simi‐
330 lar) for System alarms and recordings made by the built-in
331 Record application. When writing, SoX defaults to A-law, which
332 is recommended; if you must use ADPCM, then use the -i switch.
333 The sound quality is poor because Psion Record seems to insist
334 on frames of 800 samples or fewer, so that the ADPCM CODEC has
335 to be reset at every 800 frames, which causes the sound to
336 glitch every tenth of a second.
337
338 .pvf (optional)
339 Portable Voice Format.
340
341 .sd2 (optional)
342 Sound Designer 2 format.
343
344 .sds (optional)
345 MIDI Sample Dump Standard.
346
347 .sf (also with -t sndfile)
348 IRCAM SDIF (Institut de Recherche et Coordination Acous‐
349 tique/Musique Sound Description Interchange Format). Used by
350 academic music software such as the CSound package, and the
351 MixView sound sample editor.
352
353 .sph, .nist (also with -t sndfile)
354 SPHERE (SPeech HEader Resources) is a file format defined by
355 NIST (National Institute of Standards and Technology) and is
356 used with speech audio. SoX can read these files when they con‐
357 tain μ-law and PCM data. It will ignore any header information
358 that says the data is compressed using shorten compression and
359 will treat the data as either μ-law or PCM. This will allow SoX
360 and the command line shorten program to be run together using
361 pipes to encompasses the data and then pass the result to SoX
362 for processing.
363
364 .smp Turtle Beach SampleVision files. SMP files are for use with the
365 PC-DOS package SampleVision by Turtle Beach Softworks. This
366 package is for communication to several MIDI samplers. All sam‐
367 ple rates are supported by the package, although not all are
368 supported by the samplers themselves. Currently loop points are
369 ignored.
370
371 .snd See .au, .sndr and .sndt.
372
373 sndfile (optional)
374 This is a pseudo-type that forces libsndfile to be used. For
375 writing files, the actual file type is then taken from the out‐
376 put file name; for reading them, it is deduced from the file.
377
378 .sndr Sounder files. An MS-DOS/Windows format from the early '90s.
379 Sounder files usually have the extension `.SND'.
380
381 .sndt SoundTool files. An MS-DOS/Windows format from the early '90s.
382 SoundTool files usually have the extension `.SND'.
383
384 .sou An alias for the .u1 raw format.
385
386 .sox SoX's native uncompressed PCM format, intended for storing (or
387 piping) audio at intermediate processing points (i.e. between
388 SoX invocations). It has much in common with the popular WAV,
389 AIFF, and AU uncompressed PCM formats, but has the following
390 specific characteristics: the PCM samples are always stored as
391 32 bit signed integers, the samples are stored (by default) as
392 `native endian', and the number of samples in the file is
393 recorded as a 64-bit integer. Comments are also supported.
394
395 See `Special Filenames' in sox(1) for examples of using the .sox
396 format with `pipes'.
397
398 sunau (optional)
399 Sun /dev/audio device driver; supports both playing and record‐
400 ing audio. For example:
401 sox infile -t sunau /dev/audio
402 or
403 sox infile -t sunau -U -c 1 /dev/audio
404 for older sun equipment.
405
406 See also play(1) and rec(1).
407
408 .txw Yamaha TX-16W sampler. A file format from a Yamaha sampling
409 keyboard which wrote IBM-PC format 3.5" floppies. Handles read‐
410 ing of files which do not have the sample rate field set to one
411 of the expected by looking at some other bytes in the
412 attack/loop length fields, and defaulting to 33 kHz if the sam‐
413 ple rate is still unknown.
414
415 .vms See .dvms.
416
417 .voc (also with -t sndfile)
418 Sound Blaster VOC files. VOC files are multi-part and contain
419 silence parts, looping, and different sample rates for different
420 chunks. On input, the silence parts are filled out, loops are
421 rejected, and sample data with a new sample rate is rejected.
422 Silence with a different sample rate is generated appropriately.
423 On output, silence is not detected, nor are impossible sample
424 rates. SoX supports reading (but not writing) VOC files with
425 multiple blocks, and files containing μ-law, A-law, and
426 2/3/4-bit ADPCM samples.
427
428 .vorbis
429 See .ogg.
430
431 .vox (also with -t sndfile)
432 A headerless file of Dialogic/OKI ADPCM audio data commonly
433 comes with the extension .vox. This ADPCM data has 12-bit pre‐
434 cision packed into only 4-bits.
435
436 Note: some early Dialogic hardware does not always reset the
437 ADPCM encoder at the start of each vox file. This can result in
438 clipping and/or DC offset problems when it comes to decoding the
439 audio. Whilst little can be done about the clipping, a DC off‐
440 set can be removed by passing the decoded audio through a high-
441 pass filter, e.g.:
442 sox input.vox output.au highpass 10
443
444 .w64 (optional)
445 Sonic Foundry's 64-bit RIFF/WAV format.
446
447 .wav (also with -t sndfile)
448 Microsoft .WAV RIFF files. This is the native audio file format
449 of Windows, and widely used for uncompressed audio.
450
451 Normally .wav files have all formatting information in their
452 headers, and so do not need any format options specified for an
453 input file. If any are, they will override the file header, and
454 you will be warned to this effect. You had better know what you
455 are doing! Output format options will cause a format conversion,
456 and the .wav will written appropriately.
457
458 SoX can read and write PCM, μ-law, A-law, MS ADPCM, and IMA (or
459 DVI) ADPCM. Big endian versions of RIFF files, called RIFX, are
460 also supported. To write a RIFX file, use the -B option with
461 the output file options.
462
463 .wavpcm
464 A non-standard, but widely used, variant of .wav. Some applica‐
465 tions cannot read a standard WAV file header for PCM-encoded
466 data with sample-size greater than 16-bits or with more than two
467 channels, but can read a non-standard WAV header. It is likely
468 that such applications will eventually be updated to support the
469 standard header, but in the mean time, this SoX format can be
470 used to create files with the non-standard header that should
471 work with these applications. (Note that SoX will automatically
472 detect and read WAV files with the non-standard header.)
473
474 The most common use of this file-type is likely to be along the
475 following lines:
476 sox infile.any -t wavpcm -s outfile.wav
477
478 .wv (optional)
479 WavPack lossless audio compression. Note that, when converting
480 .wav to this format and back again, the RIFF header is not nec‐
481 essarily preserved losslessly (though the audio is).
482
483 .wve (also with -t sndfile)
484 Psion 8-bit A-law. Used on Psion SIBO PDAs (Series 3 and simi‐
485 lar). This format is deprecated in SoX, but will continue to be
486 used in libsndfile.
487
488 .xa Maxis XA files. These are 16-bit ADPCM audio files used by
489 Maxis games. Writing .xa files is currently not supported,
490 although adding write support should not be very difficult.
491
492 .xi (optional)
493 Fasttracker 2 Extended Instrument format.
494
496 sox(1), soxi(1), libsox(3), octave(1), wget(1)
497
498 The SoX web page at http://sox.sourceforge.net
499 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
500
501 References
502 [1] Wikipedia, M3U, http://en.wikipedia.org/wiki/M3U
503
504 [2] Wikipedia, PLS, http://en.wikipedia.org/wiki/PLS_(file_format)
505
507 Chris Bagwell (cbagwell@users.sourceforge.net). Other authors and con‐
508 tributors are listed in the AUTHORS file that is distributed with the
509 source code.
510
511
512
513soxformat October 28, 2008 SoX(7)