1SPHINX_FE(1)                General Commands Manual               SPHINX_FE(1)
2
3
4

NAME

6       sphinx_fe - Convert audio files to acoustic feature files
7

SYNOPSIS

9       sphinx_fe [ options ]...
10

DESCRIPTION

12       This  program  converts  audio  files  (in  either  Microsoft WAV, NIST
13       Sphere, or raw format) to acoustic feature files for  input  to  batch-
14       mode speech recognition.  The resulting files are also useful for vari‐
15       ous other things.  A list of options follows:
16
17       -alpha Preemphasis parameter
18
19       -argfile
20              file (e.g. feat.params from an acoustic model) to  read  parame‐
21              ters  from.   This  will  override anything set in other command
22              line arguments.
23
24       -blocksize
25              Number of samples to read at a time.
26
27       -build_outdirs
28              Create missing subdirectories in output directory
29
30       -c     file for batch processing
31
32       -cep2spec
33              Input is cepstral files, output is log spectral files
34
35       -di    directory, input file names are relative to this, if defined
36
37       -dither
38              Add 1/2-bit noise
39
40       -do    directory, output files are relative to this
41
42       -doublebw
43              Use double bandwidth filters (same center freq)
44
45       -ei    extension to be applied to all input files
46
47       -eo    extension to be applied to all output files
48
49       -example
50              Shows example of how to use the tool
51
52       -frate Frame rate
53
54       -help  Shows the usage of the tool
55
56       -i     audio input file
57
58       -input_endian
59              Endianness of input data, big or little, ignored if NIST  or  MS
60              Wav
61
62       -lifter
63              Length of sin-curve for liftering, or 0 for no liftering.
64
65       -logspec
66              Write out logspectral files instead of cepstra
67
68       -lowerf
69              Lower edge of filters
70
71       -mach_endian
72              Endianness of machine, big or little
73
74       -mswav Defines input format as Microsoft Wav (RIFF)
75
76       -ncep  Number of cep coefficients
77
78       -nchans
79              Number of channels of data (interlaced samples assumed)
80
81       -nfft  Size of FFT
82
83       -nfilt Number of filter banks
84
85       -nist  Defines input format as NIST sphere
86
87       -npart Number of parts to run in (supersedes -nskip and -runlen if non-
88              zero)
89
90       -nskip If a control file was specified, the  number  of  utterances  to
91              skip at the head of the file
92
93       -o     cepstral output file
94
95       -ofmt  Format of output files - one of sphinx, htk, text.
96
97       -part  Index  of the part to run (supersedes -nskip and -runlen if non-
98              zero)
99
100       -raw   Defines input format as raw binary data
101
102       -remove_dc
103              Remove DC offset from each frame
104
105       -remove_noise
106              Remove noise with spectral subtraction in mel-energies
107
108       -remove_silence
109              Enables VAD, removes silence frames from processing
110
111       -round_filters
112              Round mel filter frequencies to DFT points
113
114       -runlen
115              If a control file was specified, the  number  of  utterances  to
116              process, or -1 for all
117
118       -samprate
119              Sampling rate
120
121       -seed  Seed  for  random  number generator; if less than zero, pick our
122              own
123
124       -smoothspec
125              Write out cepstral-smoothed logspectral files
126
127       -spec2cep
128              Input is log spectral files, output is cepstral files
129
130       -sph2pipe
131              Input is NIST sphere (possibly with Shorten),  use  sph2pipe  to
132              convert
133
134       -transform
135              Which  type  of  transform  to use to calculate cepstra (legacy,
136              dct, or htk)
137
138       -unit_area
139              Normalize mel filters to unit area
140
141       -upperf
142              Upper edge of filters
143
144       -vad_postspeech
145              Num of silence frames to keep after from speech to silence.
146
147       -vad_prespeech
148              Num of speech frames to keep before silence to speech.
149
150       -vad_startspeech
151              Num of speech frames to trigger vad from silence to speech.
152
153       -vad_threshold
154              Threshold for decision between noise and  silence  frames.  Log-
155              ratio between signal level and noise level.
156
157       -verbose
158              Show input filenames
159
160       -warp_params
161              defining the warping function
162
163       -warp_type
164              Warping function type (or shape)
165
166       -whichchan
167              Channel to process (numbered from 1), or 0 to mix all channels
168
169       -wlen  Hamming window length
170
171       Currently  the only kind of features supported are MFCCs (mel-frequency
172       cepstral coefficients).  There are numerous options which  control  the
173       properties of the output features.  It is VERY important that you docu‐
174       ment the specific set of flags used to create any given set of  feature
175       files,  since this information is NOT recorded in the files themselves,
176       and any mismatch between the parameters used to  extract  features  for
177       recognition  and those used to extract features for training will cause
178       recognition to fail.
179

AUTHOR

181       Written by numerous people at CMU from 1994 onwards.  This manual  page
182       by David Huggins-Daines <dhuggins@cs.cmu.edu>
183
185       Copyright © 1994-2007 Carnegie Mellon University.  See the file COPYING
186       included with this package for more information.
187
188
189
190                                  2007-08-27                      SPHINX_FE(1)
Impressum