1POCKETSPHINX_BATCH(1) General Commands Manual POCKETSPHINX_BATCH(1)
2
3
4
6 pocketsphinx_batch - Run speech recognition in batch mode
7
9 pocketsphinx_batch -ctl ctlfile -cepdir cepdir -cepext .mfc [ options
10 ]...
11
13 Run speech recognition over a list of utterances in batchmode. A list
14 of arguments follows:
15
16 -adchdr
17 Size of audio file header in bytes (headers are ignored)
18
19 -adcin Input is raw audio data
20
21 -agc Automatic gain control for c0 ('max', 'emax', 'noise', or
22 'none')
23
24 -agcthresh
25 Initial threshold for automatic gain control
26
27 -allphone
28 phoneme decoding with phonetic lm
29
30 -allphone_ci
31 Perform phoneme decoding with phonetic lm and context-indepen‐
32 dent units only
33
34 -alpha Preemphasis parameter
35
36 -argfile
37 file giving extra arguments.
38
39 -ascale
40 Inverse of acoustic model scale for confidence score calculation
41
42 -aw Inverse weight applied to acoustic scores.
43
44 -backtrace
45 Print results and backtraces to log file.
46
47 -beam Beam width applied to every frame in Viterbi search (smaller
48 values mean wider beam)
49
50 -bestpath
51 Run bestpath (Dijkstra) search over word lattice (3rd pass)
52
53 -bestpathlw
54 Language model probability weight for bestpath search
55
56 -build_outdirs
57 Create missing subdirectories in output directory
58
59 -cepdir
60 files directory (prefixed to filespecs in control file)
61
62 -cepext
63 Input files extension (suffixed to filespecs in control file)
64
65 -ceplen
66 Number of components in the input feature vector
67
68 -cmn Cepstral mean normalization scheme ('current', 'prior', or
69 'none')
70
71 -cmninit
72 Initial values (comma-separated) for cepstral mean when 'prior'
73 is used
74
75 -compallsen
76 Compute all senone scores in every frame (can be faster when
77 there are many senones)
78
79 -ctl file listing utterances to be processed
80
81 -ctlcount
82 No. of utterances to be processed (after skipping -ctloffset
83 entries)
84
85 -ctlincr
86 Do every Nth line in the control file
87
88 -ctloffset
89 No. of utterances at the beginning of -ctl file to be skipped
90
91 -ctm output in CTM file format (may require post-sorting)
92
93 -debug level for debugging messages
94
95 -dict pronunciation dictionary (lexicon) input file
96
97 -dictcase
98 Dictionary is case sensitive (NOTE: case insensitivity applies
99 to ASCII characters only)
100
101 -dither
102 Add 1/2-bit noise
103
104 -doublebw
105 Use double bandwidth filters (same center freq)
106
107 -ds Frame GMM computation downsampling ratio
108
109 -fdict word pronunciation dictionary input file
110
111 -feat Feature stream type, depends on the acoustic model
112
113 -featparams
114 containing feature extraction parameters.
115
116 -fillprob
117 Filler word transition probability
118
119 -frate Frame rate
120
121 -fsg format finite state grammar file
122
123 -fsgctl
124 file listing FSG file to use for each utterance
125
126 -fsgdir
127 directory for FSG files
128
129 -fsgext
130 extension for FSG files (including leading dot)
131
132 -fsgusealtpron
133 Add alternate pronunciations to FSG
134
135 -fsgusefiller
136 Insert filler words at each state.
137
138 -fwdflat
139 Run forward flat-lexicon search over word lattice (2nd pass)
140
141 -fwdflatbeam
142 Beam width applied to every frame in second-pass flat search
143
144 -fwdflatefwid
145 Minimum number of end frames for a word to be searched in fwd‐
146 flat search
147
148 -fwdflatlw
149 Language model probability weight for flat lexicon (2nd pass)
150 decoding
151
152 -fwdflatsfwin
153 Window of frames in lattice to search for successor words in
154 fwdflat search
155
156 -fwdflatwbeam
157 Beam width applied to word exits in second-pass flat search
158
159 -fwdtree
160 Run forward lexicon-tree search (1st pass)
161
162 -hmm containing acoustic model files.
163
164 -hyp output file name
165
166 -hypseg
167 output with segmentation file name
168
169 -input_endian
170 Endianness of input data, big or little, ignored if NIST or MS
171 Wav
172
173 -jsgf grammar file
174
175 -keyphrase
176 to spot
177
178 -kws file with keyphrases to spot, one per line
179
180 -kws_delay
181 Delay to wait for best detection score
182
183 -kws_plp
184 Phone loop probability for keyword spotting
185
186 -kws_threshold
187 Threshold for p(hyp)/p(alternatives) ratio
188
189 -latsize
190 Initial backpointer table size
191
192 -lda containing transformation matrix to be applied to features (sin‐
193 gle-stream features only)
194
195 -ldadim
196 Dimensionality of output of feature transformation (0 to use
197 entire matrix)
198
199 -lifter
200 Length of sin-curve for liftering, or 0 for no liftering.
201
202 -lm trigram language model input file
203
204 -lmctl a set of language model
205
206 -lmname
207 language model in -lmctl to use by default
208
209 -lmnamectl
210 file listing LM name to use for each utterance
211
212 -logbase
213 Base in which all log-likelihoods calculated
214
215 -logfn to write log messages in
216
217 -logspec
218 Write out logspectral files instead of cepstra
219
220 -lowerf
221 Lower edge of filters
222
223 -lpbeam
224 Beam width applied to last phone in words
225
226 -lponlybeam
227 Beam width applied to last phone in single-phone words
228
229 -lw Language model probability weight
230
231 -maxhmmpf
232 Maximum number of active HMMs to maintain at each frame (or -1
233 for no pruning)
234
235 -maxwpf
236 Maximum number of distinct word exits at each frame (or -1 for
237 no pruning)
238
239 -mdef definition input file
240
241 -mean gaussian means input file
242
243 -mfclogdir
244 to log feature files to
245
246 -min_endfr
247 Nodes ignored in lattice construction if they persist for fewer
248 than N frames
249
250 -mixw mixture weights input file (uncompressed)
251
252 -mixwfloor
253 Senone mixture weights floor (applied to data from -mixw file)
254
255 -mllr transformation to apply to means and variances
256
257 -mllrctl
258 file listing MLLR transforms to use for each utterance
259
260 -mllrdir
261 directory for MLLR transforms
262
263 -mllrext
264 extension for MLLR transforms (including leading dot)
265
266 -mmap Use memory-mapped I/O (if possible) for model files
267
268 -nbest Number of N-best hypotheses to write to -nbestdir (0 for no N-
269 best)
270
271 -nbestdir
272 for writing N-best hypothesis lists
273
274 -nbestext
275 Extension for N-best hypothesis list files
276
277 -ncep Number of cep coefficients
278
279 -nfft Size of FFT
280
281 -nfilt Number of filter banks
282
283 -nwpen New word transition penalty
284
285 -outlatbeam
286 Minimum posterior probability for output lattice nodes
287
288 -outlatdir
289 for dumping word lattices
290
291 -outlatext
292 Filename extension for dumping word lattices
293
294 -outlatfmt
295 Format for dumping word lattices (s3 or htk)
296
297 -pbeam Beam width applied to phone transitions
298
299 -pip Phone insertion penalty
300
301 -pl_beam
302 Beam width applied to phone loop search for lookahead
303
304 -pl_pbeam
305 Beam width applied to phone loop transitions for lookahead
306
307 -pl_pip
308 Phone insertion penalty for phone loop
309
310 -pl_weight
311 Weight for phoneme lookahead penalties
312
313 -pl_window
314 Phoneme lookahead window size, in frames
315
316 -rawlogdir
317 to log raw audio files to
318
319 -remove_dc
320 Remove DC offset from each frame
321
322 -remove_noise
323 Remove noise with spectral subtraction in mel-energies
324
325 -remove_silence
326 Enables VAD, removes silence frames from processing
327
328 -round_filters
329 Round mel filter frequencies to DFT points
330
331 -samprate
332 Sampling rate
333
334 -seed Seed for random number generator; if less than zero, pick our
335 own
336
337 -sendump
338 dump (compressed mixture weights) input file
339
340 -senin Input is senone score dump files
341
342 -senlogdir
343 to log senone score files to
344
345 -senmgau
346 to codebook mapping input file (usually not needed)
347
348 -silprob
349 Silence word transition probability
350
351 -smoothspec
352 Write out cepstral-smoothed logspectral files
353
354 -svspec
355 specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
356
357 -tmat state transition matrix input file
358
359 -tmatfloor
360 HMM state transition probability floor (applied to -tmat file)
361
362 -topn Maximum number of top Gaussians to use in scoring.
363
364 -topn_beam
365 Beam width used to determine top-N Gaussians (or a list, per-
366 feature)
367
368 -toprule
369 rule for JSGF (first public rule is default)
370
371 -transform
372 Which type of transform to use to calculate cepstra (legacy,
373 dct, or htk)
374
375 -unit_area
376 Normalize mel filters to unit area
377
378 -upperf
379 Upper edge of filters
380
381 -uw Unigram weight
382
383 -vad_postspeech
384 Num of silence frames to keep after from speech to silence.
385
386 -vad_prespeech
387 Num of speech frames to keep before silence to speech.
388
389 -vad_startspeech
390 Num of speech frames to trigger vad from silence to speech.
391
392 -vad_threshold
393 Threshold for decision between noise and silence frames. Log-
394 ratio between signal level and noise level.
395
396 -var gaussian variances input file
397
398 -varfloor
399 Mixture gaussian variance floor (applied to data from -var file)
400
401 -varnorm
402 Variance normalize each utterance (only if CMN == current)
403
404 -verbose
405 Show input filenames
406
407 -warp_params
408 defining the warping function
409
410 -warp_type
411 Warping function type (or shape)
412
413 -wbeam Beam width applied to word exits
414
415 -wip Word insertion penalty
416
417 -wlen Hamming window length
418
419 To do batchmode recognition, you will need to specify a control file,
420 using -ctl This is a simple text file containing one entry per line.
421 Each entry is the name of an input file relative to the -cepdir direc‐
422 tory, and without the filename extension (which is given in the -cepext
423 argument).
424
425 If you are using acoustic feature files as input (see sphinx_fe(1) for
426 information on how to generate these), you can also specify a subpart
427 of a file, using the following format:
428
429 FILENAME START-FRAME END-FRAME UTTERANCE-ID
430
432 Written by numerous people at CMU from 1994 onwards. This manual page
433 by David Huggins-Daines <dhuggins@cs.cmu.edu>
434
436 Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE
437 included with this package for more information.
438
440 pocketsphinx_continuous(1), sphinx_fe(1).
441
442
443
444 2007-08-27 POCKETSPHINX_BATCH(1)