pocketsphinx_batch(1)

1POCKETSPHINX_BATCH(1)       General Commands Manual      POCKETSPHINX_BATCH(1)
2
3
4

NAME

6       pocketsphinx_batch - Run speech recognition in batch mode
7

SYNOPSIS

9       pocketsphinx_batch  -ctl  ctlfile -cepdir cepdir -cepext .mfc [ options
10       ]...
11

DESCRIPTION

13       Run speech recognition over a list of utterances in batchmode.  A  list
14       of arguments follows:
15
16       -adchdr
17              Size of audio file header in bytes (headers are ignored)
18
19       -adcin Input is raw audio data
20
21       -agc   Automatic  gain  control  for  c0  ('max',  'emax',  'noise', or
22              'none')
23
24       -agcthresh
25              Initial threshold for automatic gain control
26
27       -allphone
28              phoneme decoding with phonetic lm
29
30       -allphone_ci
31              Perform phoneme decoding with phonetic lm  and  context-indepen‐
32              dent units only
33
34       -alpha Preemphasis parameter
35
36       -argfile
37              file giving extra arguments.
38
39       -ascale
40              Inverse of acoustic model scale for confidence score calculation
41
42       -aw    Inverse weight applied to acoustic scores.
43
44       -backtrace
45              Print results and backtraces to log file.
46
47       -beam  Beam  width  applied  to  every frame in Viterbi search (smaller
48              values mean wider beam)
49
50       -bestpath
51              Run bestpath (Dijkstra) search over word lattice (3rd pass)
52
53       -bestpathlw
54              Language model probability weight for bestpath search
55
56       -build_outdirs
57              Create missing subdirectories in output directory
58
59       -cepdir
60              files directory (prefixed to filespecs in control file)
61
62       -cepext
63              Input files extension (suffixed to filespecs in control file)
64
65       -ceplen
66              Number of components in the input feature vector
67
68       -cmn   Cepstral  mean  normalization  scheme  ('current',  'prior',  or
69              'none')
70
71       -cmninit
72              Initial  values (comma-separated) for cepstral mean when 'prior'
73              is used
74
75       -compallsen
76              Compute all senone scores in every frame  (can  be  faster  when
77              there are many senones)
78
79       -ctl   file listing utterances to be processed
80
81       -ctlcount
82              No.  of  utterances  to  be processed (after skipping -ctloffset
83              entries)
84
85       -ctlincr
86              Do every Nth line in the control file
87
88       -ctloffset
89              No. of utterances at the beginning of -ctl file to be skipped
90
91       -ctm   output in CTM file format (may require post-sorting)
92
93       -debug level for debugging messages
94
95       -dict  pronunciation dictionary (lexicon) input file
96
97       -dictcase
98              Dictionary is case sensitive (NOTE: case  insensitivity  applies
99              to ASCII characters only)
100
101       -dither
102              Add 1/2-bit noise
103
104       -doublebw
105              Use double bandwidth filters (same center freq)
106
107       -ds    Frame GMM computation downsampling ratio
108
109       -fdict word pronunciation dictionary input file
110
111       -feat  Feature stream type, depends on the acoustic model
112
113       -featparams
114              containing feature extraction parameters.
115
116       -fillprob
117              Filler word transition probability
118
119       -frate Frame rate
120
121       -fsg   format finite state grammar file
122
123       -fsgctl
124              file listing FSG file to use for each utterance
125
126       -fsgdir
127              directory for FSG files
128
129       -fsgext
130              extension for FSG files (including leading dot)
131
132       -fsgusealtpron
133              Add alternate pronunciations to FSG
134
135       -fsgusefiller
136              Insert filler words at each state.
137
138       -fwdflat
139              Run forward flat-lexicon search over word lattice (2nd pass)
140
141       -fwdflatbeam
142              Beam width applied to every frame in second-pass flat search
143
144       -fwdflatefwid
145              Minimum  number  of end frames for a word to be searched in fwd‐
146              flat search
147
148       -fwdflatlw
149              Language model probability weight for flat  lexicon  (2nd  pass)
150              decoding
151
152       -fwdflatsfwin
153              Window  of  frames  in  lattice to search for successor words in
154              fwdflat search
155
156       -fwdflatwbeam
157              Beam width applied to word exits in second-pass flat search
158
159       -fwdtree
160              Run forward lexicon-tree search (1st pass)
161
162       -hmm   containing acoustic model files.
163
164       -hyp   output file name
165
166       -hypseg
167              output with segmentation file name
168
169       -input_endian
170              Endianness of input data, big or little, ignored if NIST  or  MS
171              Wav
172
173       -jsgf  grammar file
174
175       -keyphrase
176              to spot
177
178       -kws   file with keyphrases to spot, one per line
179
180       -kws_delay
181              Delay to wait for best detection score
182
183       -kws_plp
184              Phone loop probability for keyword spotting
185
186       -kws_threshold
187              Threshold for p(hyp)/p(alternatives) ratio
188
189       -latsize
190              Initial backpointer table size
191
192       -lda   containing transformation matrix to be applied to features (sin‐
193              gle-stream features only)
194
195       -ldadim
196              Dimensionality of output of feature  transformation  (0  to  use
197              entire matrix)
198
199       -lifter
200              Length of sin-curve for liftering, or 0 for no liftering.
201
202       -lm    trigram language model input file
203
204       -lmctl a set of language model
205
206       -lmname
207              language model in -lmctl to use by default
208
209       -lmnamectl
210              file listing LM name to use for each utterance
211
212       -logbase
213              Base in which all log-likelihoods calculated
214
215       -logfn to write log messages in
216
217       -logspec
218              Write out logspectral files instead of cepstra
219
220       -lowerf
221              Lower edge of filters
222
223       -lpbeam
224              Beam width applied to last phone in words
225
226       -lponlybeam
227              Beam width applied to last phone in single-phone words
228
229       -lw    Language model probability weight
230
231       -maxhmmpf
232              Maximum  number  of active HMMs to maintain at each frame (or -1
233              for no pruning)
234
235       -maxwpf
236              Maximum number of distinct word exits at each frame (or  -1  for
237              no pruning)
238
239       -mdef  definition input file
240
241       -mean  gaussian means input file
242
243       -mfclogdir
244              to log feature files to
245
246       -min_endfr
247              Nodes  ignored in lattice construction if they persist for fewer
248              than N frames
249
250       -mixw  mixture weights input file (uncompressed)
251
252       -mixwfloor
253              Senone mixture weights floor (applied to data from -mixw file)
254
255       -mllr  transformation to apply to means and variances
256
257       -mllrctl
258              file listing MLLR transforms to use for each utterance
259
260       -mllrdir
261              directory for MLLR transforms
262
263       -mllrext
264              extension for MLLR transforms (including leading dot)
265
266       -mmap  Use memory-mapped I/O (if possible) for model files
267
268       -nbest Number of N-best hypotheses to write to -nbestdir (0 for  no  N-
269              best)
270
271       -nbestdir
272              for writing N-best hypothesis lists
273
274       -nbestext
275              Extension for N-best hypothesis list files
276
277       -ncep  Number of cep coefficients
278
279       -nfft  Size of FFT
280
281       -nfilt Number of filter banks
282
283       -nwpen New word transition penalty
284
285       -outlatbeam
286              Minimum posterior probability for output lattice nodes
287
288       -outlatdir
289              for dumping word lattices
290
291       -outlatext
292              Filename extension for dumping word lattices
293
294       -outlatfmt
295              Format for dumping word lattices (s3 or htk)
296
297       -pbeam Beam width applied to phone transitions
298
299       -pip   Phone insertion penalty
300
301       -pl_beam
302              Beam width applied to phone loop search for lookahead
303
304       -pl_pbeam
305              Beam width applied to phone loop transitions for lookahead
306
307       -pl_pip
308              Phone insertion penalty for phone loop
309
310       -pl_weight
311              Weight for phoneme lookahead penalties
312
313       -pl_window
314              Phoneme lookahead window size, in frames
315
316       -rawlogdir
317              to log raw audio files to
318
319       -remove_dc
320              Remove DC offset from each frame
321
322       -remove_noise
323              Remove noise with spectral subtraction in mel-energies
324
325       -remove_silence
326              Enables VAD, removes silence frames from processing
327
328       -round_filters
329              Round mel filter frequencies to DFT points
330
331       -samprate
332              Sampling rate
333
334       -seed  Seed  for  random  number generator; if less than zero, pick our
335              own
336
337       -sendump
338              dump (compressed mixture weights) input file
339
340       -senin Input is senone score dump files
341
342       -senlogdir
343              to log senone score files to
344
345       -senmgau
346              to codebook mapping input file (usually not needed)
347
348       -silprob
349              Silence word transition probability
350
351       -smoothspec
352              Write out cepstral-smoothed logspectral files
353
354       -svspec
355              specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
356
357       -tmat  state transition matrix input file
358
359       -tmatfloor
360              HMM state transition probability floor (applied to -tmat file)
361
362       -topn  Maximum number of top Gaussians to use in scoring.
363
364       -topn_beam
365              Beam width used to determine top-N Gaussians (or  a  list,  per-
366              feature)
367
368       -toprule
369              rule for JSGF (first public rule is default)
370
371       -transform
372              Which  type  of  transform  to use to calculate cepstra (legacy,
373              dct, or htk)
374
375       -unit_area
376              Normalize mel filters to unit area
377
378       -upperf
379              Upper edge of filters
380
381       -uw    Unigram weight
382
383       -vad_postspeech
384              Num of silence frames to keep after from speech to silence.
385
386       -vad_prespeech
387              Num of speech frames to keep before silence to speech.
388
389       -vad_startspeech
390              Num of speech frames to trigger vad from silence to speech.
391
392       -vad_threshold
393              Threshold for decision between noise and  silence  frames.  Log-
394              ratio between signal level and noise level.
395
396       -var   gaussian variances input file
397
398       -varfloor
399              Mixture gaussian variance floor (applied to data from -var file)
400
401       -varnorm
402              Variance normalize each utterance (only if CMN == current)
403
404       -verbose
405              Show input filenames
406
407       -warp_params
408              defining the warping function
409
410       -warp_type
411              Warping function type (or shape)
412
413       -wbeam Beam width applied to word exits
414
415       -wip   Word insertion penalty
416
417       -wlen  Hamming window length
418
419       To  do  batchmode recognition, you will need to specify a control file,
420       using -ctl This is a simple text file containing one  entry  per  line.
421       Each  entry is the name of an input file relative to the -cepdir direc‐
422       tory, and without the filename extension (which is given in the -cepext
423       argument).
424
425       If  you are using acoustic feature files as input (see sphinx_fe(1) for
426       information on how to generate these), you can also specify  a  subpart
427       of a file, using the following format:
428
429              FILENAME START-FRAME END-FRAME UTTERANCE-ID
430

AUTHOR

432       Written  by numerous people at CMU from 1994 onwards.  This manual page
433       by David Huggins-Daines <dhuggins@cs.cmu.edu>
434

COPYRIGHT

436       Copyright © 1994-2016 Carnegie Mellon University.  See the file LICENSE
437       included with this package for more information.
438

NAME

SYNOPSIS

DESCRIPTION

AUTHOR

COPYRIGHT

SEE ALSO