1PATGEN(1) General Commands Manual PATGEN(1)
2
3
4
6 patgen - generate patterns for TeX hyphenation
7
9 patgen dictionary_file pattern_file patout_file translate_file
10
12 This manual page is not meant to be exhaustive. The complete documen‐
13 tation for this version of TeX can be found in the info file or manual
14 Web2C: A TeX implementation.
15
16 The patgen program reads the dictionary_file containing a list of
17 hyphenated words and the pattern_file containing previously-generated
18 patterns (if any) for a particular language, and produces the
19 patout_file with (previously- plus newly-generated) hyphenation pat‐
20 terns for that language. The translate_file defines language specific
21 values for the parameters left_hyphen_min and right_hyphen_min used by
22 TeX's hyphenation algorithm and the external representation of the
23 lower and upper case version(s) of all `letters' of that language. Fur‐
24 ther details of the pattern generation process such as hyphenation lev‐
25 els and pattern lengths are requested interactively from the user's
26 terminal. Optionally patgen creates a new dictionary file pattmp.n
27 showing the good and bad hyphens found by the generated patterns, where
28 n is the highest hypenation level.
29
30 The patterns generated by patgen can be read by initex for use in
31 hyphenating words. For a (very) long example of patgen's output, see
32 $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains the patterns
33 TeX uses for English. At some sites, patterns for several other lan‐
34 guages may be available, and the local tex programs may have them pre‐
35 loaded; consult your Local Guide or your system administrator for
36 details.
37
38 All filenames must be complete; no adding of default extensions or path
39 searching is done.
40
42 Letters
43 When initex digests hyphenation patterns, TeX first expands macros
44 and the result must entirely consist of digits (hyphenation lev‐
45 els), dots (`.', edge of a word), and letters. In pattern files for
46 non-English languages letters are often represented by macros or
47 other expandable constructs. For the purpose of patgen these are
48 just character sequences, subject to the condition that no such
49 sequence is a prefix of another one.
50
51 Dictionary file
52 A dictionary file contains a weighted list of hyphenated words, one
53 word per line starting in column 1. A digit in column 1 indicates a
54 global word weight (initially =1) applicable to all following words
55 up to the next global word weight. A digit at some intercharacter
56 position indicates a weight for that position only.
57
58 The hyphens in a word are indicated by `-', `*', or `.' (or their
59 replacements as defined in the translate file) for hyphens yet to
60 be found, `good' hyphens (correctly found by the patterns), and
61 `bad' hyphens (erroneously found by the patterns) respectively;
62 when reading a dictionary file `*' is treated like `-' and `.' is
63 ignored.
64
65 Translate file
66 A translate file starts with a line containing the values of
67 left_hypen_min in columns 1-2, right_hyphen_min in columns 3-4, and
68 either a blank or the replacement for one of the "hyphen" charac‐
69 ters `-', `*', and `.' in columns 5, 6, and 7. (Input lines are
70 padded with blanks as for many TeX related programs.)
71
72 Each following line defines one `letter': an arbitrary delimiter
73 character in column 1, followed by one or more external representa‐
74 tions of that character (first the `lower' case one used for out‐
75 put), each one terminated by the delimiter and the whole sequence
76 terminated by another delimiter.
77
78 If the translate file is empty, the values left_hypen_min=2,
79 right_hyphen_min=3, and the 26 lower case letters a...z with their
80 upper case representations A...Z are assumed.
81
82 Terminal input
83 After reading the translate_file and any previously-generated pat‐
84 terns from pattern_file, patgen requests input from the user's ter‐
85 minal.
86
87 First the integer values of hyph_start and hyph_finish, the lowest
88 and highest hyphenation level for which patterns are to be gener‐
89 ated. The value of hyph_start should be larger than any hyphenation
90 level already present in pattern_file.
91
92 Then, for each hyphenation level, the integer values of pat_start
93 and pat_finish, the smallest and largest pattern length to be ana‐
94 lyzed, as well as good weight, bad weight, and threshold, the
95 weights for good and bad hyphens and a weight threshold for useful
96 patterns.
97
98 Finally the decision (`y' or `Y' vs. anything else) whether or not
99 to produce a hypenated word list.
100
102 $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
103 Patterns for English.
104
106 Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
107 University Ph.D. thesis, 1983.
108 Donald E. Knuth, The TeX for nroffbook, Addison-Wesley, 1986, ISBN
109 0-201-13447-0, Appendix H.
110
112 Frank Liang wrote the first version of this program. Peter Breiten‐
113 lohner made a substantial revision in 1991 for TeX 3. The first ver‐
114 sion was published as the appendix to the TeX for nroffware technical
115 report, available from the TeX Users Group. Howard Trickey originally
116 ported it to Unix.
117
118
119
120Web2C 7.5.4 23 August 2004 PATGEN(1)