1POCKETSPHINX_CONTINUOUS(1) General Commands Manual POCKETSPHINX_CONTINUOUS(1)
2
3
4
6 pocketsphinx_continuous - Run speech recognition in continuous listen‐
7 ing mode
8
10 pocketsphinx_continuous [-infile filename.wav ] [ -inmic yes ] [
11 options ]...
12
14 This program opens the audio device or a file and waits for speech.
15 When it detects an utterance, it performs speech recognition on it.
16
17 To record from microphone and decode use
18
19 -inmic yes
20
21 To decode a 16kHz 16-bit mono WAV file use
22
23 -infile filename.wav
24
25 You can also specify -lm or -fsg or -kws depending on whether you are
26 using a statistical language model or a finite-state grammar or look
27 for a keyphase.
28
30 -adcdev
31 of audio device to use for input.
32
33 -agc Automatic gain control for c0 ('max', 'emax', 'noise', or
34 'none')
35
36 -agcthresh
37 Initial threshold for automatic gain control
38
39 -allphone
40 phoneme decoding with phonetic lm
41
42 -allphone_ci
43 Perform phoneme decoding with phonetic lm and context-indepen‐
44 dent units only
45
46 -alpha Preemphasis parameter
47
48 -argfile
49 file giving extra arguments.
50
51 -ascale
52 Inverse of acoustic model scale for confidence score calculation
53
54 -aw Inverse weight applied to acoustic scores.
55
56 -backtrace
57 Print results and backtraces to log file.
58
59 -beam Beam width applied to every frame in Viterbi search (smaller
60 values mean wider beam)
61
62 -bestpath
63 Run bestpath (Dijkstra) search over word lattice (3rd pass)
64
65 -bestpathlw
66 Language model probability weight for bestpath search
67
68 -ceplen
69 Number of components in the input feature vector
70
71 -cmn Cepstral mean normalization scheme ('current', 'prior', or
72 'none')
73
74 -cmninit
75 Initial values (comma-separated) for cepstral mean when 'prior'
76 is used
77
78 -compallsen
79 Compute all senone scores in every frame (can be faster when
80 there are many senones)
81
82 -debug level for debugging messages
83
84 -dict pronunciation dictionary (lexicon) input file
85
86 -dictcase
87 Dictionary is case sensitive (NOTE: case insensitivity applies
88 to ASCII characters only)
89
90 -dither
91 Add 1/2-bit noise
92
93 -doublebw
94 Use double bandwidth filters (same center freq)
95
96 -ds Frame GMM computation downsampling ratio
97
98 -fdict word pronunciation dictionary input file
99
100 -feat Feature stream type, depends on the acoustic model
101
102 -featparams
103 containing feature extraction parameters.
104
105 -fillprob
106 Filler word transition probability
107
108 -frate Frame rate
109
110 -fsg format finite state grammar file
111
112 -fsgusealtpron
113 Add alternate pronunciations to FSG
114
115 -fsgusefiller
116 Insert filler words at each state.
117
118 -fwdflat
119 Run forward flat-lexicon search over word lattice (2nd pass)
120
121 -fwdflatbeam
122 Beam width applied to every frame in second-pass flat search
123
124 -fwdflatefwid
125 Minimum number of end frames for a word to be searched in fwd‐
126 flat search
127
128 -fwdflatlw
129 Language model probability weight for flat lexicon (2nd pass)
130 decoding
131
132 -fwdflatsfwin
133 Window of frames in lattice to search for successor words in
134 fwdflat search
135
136 -fwdflatwbeam
137 Beam width applied to word exits in second-pass flat search
138
139 -fwdtree
140 Run forward lexicon-tree search (1st pass)
141
142 -hmm containing acoustic model files.
143
144 -infile
145 file to transcribe.
146
147 -inmic Transcribe audio from microphone.
148
149 -input_endian
150 Endianness of input data, big or little, ignored if NIST or MS
151 Wav
152
153 -jsgf grammar file
154
155 -keyphrase
156 to spot
157
158 -kws file with keyphrases to spot, one per line
159
160 -kws_delay
161 Delay to wait for best detection score
162
163 -kws_plp
164 Phone loop probability for keyword spotting
165
166 -kws_threshold
167 Threshold for p(hyp)/p(alternatives) ratio
168
169 -latsize
170 Initial backpointer table size
171
172 -lda containing transformation matrix to be applied to features (sin‐
173 gle-stream features only)
174
175 -ldadim
176 Dimensionality of output of feature transformation (0 to use
177 entire matrix)
178
179 -lifter
180 Length of sin-curve for liftering, or 0 for no liftering.
181
182 -lm trigram language model input file
183
184 -lmctl a set of language model
185
186 -lmname
187 language model in -lmctl to use by default
188
189 -logbase
190 Base in which all log-likelihoods calculated
191
192 -logfn to write log messages in
193
194 -logspec
195 Write out logspectral files instead of cepstra
196
197 -lowerf
198 Lower edge of filters
199
200 -lpbeam
201 Beam width applied to last phone in words
202
203 -lponlybeam
204 Beam width applied to last phone in single-phone words
205
206 -lw Language model probability weight
207
208 -maxhmmpf
209 Maximum number of active HMMs to maintain at each frame (or -1
210 for no pruning)
211
212 -maxwpf
213 Maximum number of distinct word exits at each frame (or -1 for
214 no pruning)
215
216 -mdef definition input file
217
218 -mean gaussian means input file
219
220 -mfclogdir
221 to log feature files to
222
223 -min_endfr
224 Nodes ignored in lattice construction if they persist for fewer
225 than N frames
226
227 -mixw mixture weights input file (uncompressed)
228
229 -mixwfloor
230 Senone mixture weights floor (applied to data from -mixw file)
231
232 -mllr transformation to apply to means and variances
233
234 -mmap Use memory-mapped I/O (if possible) for model files
235
236 -ncep Number of cep coefficients
237
238 -nfft Size of FFT
239
240 -nfilt Number of filter banks
241
242 -nwpen New word transition penalty
243
244 -pbeam Beam width applied to phone transitions
245
246 -pip Phone insertion penalty
247
248 -pl_beam
249 Beam width applied to phone loop search for lookahead
250
251 -pl_pbeam
252 Beam width applied to phone loop transitions for lookahead
253
254 -pl_pip
255 Phone insertion penalty for phone loop
256
257 -pl_weight
258 Weight for phoneme lookahead penalties
259
260 -pl_window
261 Phoneme lookahead window size, in frames
262
263 -rawlogdir
264 to log raw audio files to
265
266 -remove_dc
267 Remove DC offset from each frame
268
269 -remove_noise
270 Remove noise with spectral subtraction in mel-energies
271
272 -remove_silence
273 Enables VAD, removes silence frames from processing
274
275 -round_filters
276 Round mel filter frequencies to DFT points
277
278 -samprate
279 Sampling rate
280
281 -seed Seed for random number generator; if less than zero, pick our
282 own
283
284 -sendump
285 dump (compressed mixture weights) input file
286
287 -senlogdir
288 to log senone score files to
289
290 -senmgau
291 to codebook mapping input file (usually not needed)
292
293 -silprob
294 Silence word transition probability
295
296 -smoothspec
297 Write out cepstral-smoothed logspectral files
298
299 -svspec
300 specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
301
302 -time Print word times in file transcription.
303
304 -tmat state transition matrix input file
305
306 -tmatfloor
307 HMM state transition probability floor (applied to -tmat file)
308
309 -topn Maximum number of top Gaussians to use in scoring.
310
311 -topn_beam
312 Beam width used to determine top-N Gaussians (or a list, per-
313 feature)
314
315 -toprule
316 rule for JSGF (first public rule is default)
317
318 -transform
319 Which type of transform to use to calculate cepstra (legacy,
320 dct, or htk)
321
322 -unit_area
323 Normalize mel filters to unit area
324
325 -upperf
326 Upper edge of filters
327
328 -uw Unigram weight
329
330 -vad_postspeech
331 Num of silence frames to keep after from speech to silence.
332
333 -vad_prespeech
334 Num of speech frames to keep before silence to speech.
335
336 -vad_startspeech
337 Num of speech frames to trigger vad from silence to speech.
338
339 -vad_threshold
340 Threshold for decision between noise and silence frames. Log-
341 ratio between signal level and noise level.
342
343 -var gaussian variances input file
344
345 -varfloor
346 Mixture gaussian variance floor (applied to data from -var file)
347
348 -varnorm
349 Variance normalize each utterance (only if CMN == current)
350
351 -verbose
352 Show input filenames
353
354 -warp_params
355 defining the warping function
356
357 -warp_type
358 Warping function type (or shape)
359
360 -wbeam Beam width applied to word exits
361
362 -wip Word insertion penalty
363
364 -wlen Hamming window length
365
367 Written by numerous people at CMU from 1994 onwards. This manual page
368 by David Huggins-Daines <dhuggins@cs.cmu.edu>
369
371 Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE
372 included with this package for more information.
373
375 pocketsphinx_batch(1), sphinx_fe(1).
376
377
378
379 2016-04-01 POCKETSPHINX_CONTINUOUS(1)