1SLMSEG(1) User Contributed Perl Documentation SLMSEG(1)
2
3
4
6 slmseg - maximum matching segment Chinese text.
7
9 slmseg -d dict_file [option]... [corpus_file]...
10
12 slmseg is a tool for segmenting Chinese text into words using maximum
13 matching algorithm. slmseg segments corpus_file, or standard input if
14 no filename is specified, and write the segmented result to standard
15 output.
16
18 -d dict_file
19 Use dict_file as lexicon. A default lexicon can be found at
20 /usr/share/sunpinyin-slm/dict.utf8.
21
22 -f,--format (text|bin)
23 Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in
24 text mode, word text are output, while in binary mode, binary short
25 integer of the word-ids are written to stdout.
26
27 -s, --stok STOK_ID
28 Sentence token id. Default 10. It will be written to output in
29 binary mode after every sentence.
30
31 -i, --show-id
32 Show Id info. Under text output format mode, attach id after known
33 words. If under binary mode, print id(s) in text.
34
35 -m, --model language-model-file Specify the language model file. This
36 file is always generated by slmthread.
37
39 Under binary mode, consecutive id of 0 are merged into one 0. Under
40 text mode, no space are inserted between unknown-words.
41
43 Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently
44 maintained by Kov.Chai <tchaikov@gmail.com>.
45
47 mmseg(1), ids2ngram (1).
48
49
50
51perl v5.34.0 2021-07-23 SLMSEG(1)