1SPHINX_FE(1) General Commands Manual SPHINX_FE(1)
2
3
4
6 sphinx_fe - Convert audio files to acoustic feature files
7
9 sphinx_fe [ options ]...
10
12 This program converts audio files (in either Microsoft WAV, NIST
13 Sphere, or raw format) to acoustic feature files for input to batch-
14 mode speech recognition. The resulting files are also useful for vari‐
15 ous other things. A list of options follows:
16
17 -alpha Preemphasis parameter
18
19 -argfile
20 file (e.g. feat.params from an acoustic model) to read parame‐
21 ters from. This will override anything set in other command
22 line arguments.
23
24 -blocksize
25 Number of samples to read at a time.
26
27 -build_outdirs
28 Create missing subdirectories in output directory
29
30 -c file for batch processing
31
32 -cep2spec
33 Input is cepstral files, output is log spectral files
34
35 -di directory, input file names are relative to this, if defined
36
37 -dither
38 Add 1/2-bit noise
39
40 -do directory, output files are relative to this
41
42 -doublebw
43 Use double bandwidth filters (same center freq)
44
45 -ei extension to be applied to all input files
46
47 -eo extension to be applied to all output files
48
49 -example
50 Shows example of how to use the tool
51
52 -frate Frame rate
53
54 -help Shows the usage of the tool
55
56 -i audio input file
57
58 -input_endian
59 Endianness of input data, big or little, ignored if NIST or MS
60 Wav
61
62 -lifter
63 Length of sin-curve for liftering, or 0 for no liftering.
64
65 -logspec
66 Write out logspectral files instead of cepstra
67
68 -lowerf
69 Lower edge of filters
70
71 -mach_endian
72 Endianness of machine, big or little
73
74 -mswav Defines input format as Microsoft Wav (RIFF)
75
76 -ncep Number of cep coefficients
77
78 -nchans
79 Number of channels of data (interlaced samples assumed)
80
81 -nfft Size of FFT
82
83 -nfilt Number of filter banks
84
85 -nist Defines input format as NIST sphere
86
87 -npart Number of parts to run in (supersedes -nskip and -runlen if non-
88 zero)
89
90 -nskip If a control file was specified, the number of utterances to
91 skip at the head of the file
92
93 -o cepstral output file
94
95 -ofmt Format of output files - one of sphinx, htk, text.
96
97 -part Index of the part to run (supersedes -nskip and -runlen if non-
98 zero)
99
100 -raw Defines input format as raw binary data
101
102 -remove_dc
103 Remove DC offset from each frame
104
105 -remove_noise
106 Remove noise with spectral subtraction in mel-energies
107
108 -remove_silence
109 Enables VAD, removes silence frames from processing
110
111 -round_filters
112 Round mel filter frequencies to DFT points
113
114 -runlen
115 If a control file was specified, the number of utterances to
116 process, or -1 for all
117
118 -samprate
119 Sampling rate
120
121 -seed Seed for random number generator; if less than zero, pick our
122 own
123
124 -smoothspec
125 Write out cepstral-smoothed logspectral files
126
127 -spec2cep
128 Input is log spectral files, output is cepstral files
129
130 -sph2pipe
131 Input is NIST sphere (possibly with Shorten), use sph2pipe to
132 convert
133
134 -transform
135 Which type of transform to use to calculate cepstra (legacy,
136 dct, or htk)
137
138 -unit_area
139 Normalize mel filters to unit area
140
141 -upperf
142 Upper edge of filters
143
144 -vad_postspeech
145 Num of silence frames to keep after from speech to silence.
146
147 -vad_prespeech
148 Num of speech frames to keep before silence to speech.
149
150 -vad_startspeech
151 Num of speech frames to trigger vad from silence to speech.
152
153 -vad_threshold
154 Threshold for decision between noise and silence frames. Log-
155 ratio between signal level and noise level.
156
157 -verbose
158 Show input filenames
159
160 -warp_params
161 defining the warping function
162
163 -warp_type
164 Warping function type (or shape)
165
166 -whichchan
167 Channel to process (numbered from 1), or 0 to mix all channels
168
169 -wlen Hamming window length
170
171 Currently the only kind of features supported are MFCCs (mel-frequency
172 cepstral coefficients). There are numerous options which control the
173 properties of the output features. It is VERY important that you docu‐
174 ment the specific set of flags used to create any given set of feature
175 files, since this information is NOT recorded in the files themselves,
176 and any mismatch between the parameters used to extract features for
177 recognition and those used to extract features for training will cause
178 recognition to fail.
179
181 Written by numerous people at CMU from 1994 onwards. This manual page
182 by David Huggins-Daines <dhuggins@cs.cmu.edu>
183
185 Copyright © 1994-2007 Carnegie Mellon University. See the file COPYING
186 included with this package for more information.
187
188
189
190 2007-08-27 SPHINX_FE(1)