1PATGEN(1) General Commands Manual PATGEN(1)
2
3
4
6 patgen - generate patterns for TeX hyphenation
7
9 patgen dictionary_file pattern_file patout_file translate_file
10
12 This manual page is not meant to be exhaustive. See also the Info file
13 or manual Web2C: A TeX implementation available as part of the TeX Live
14 distribution or at http://tug.org/web2c.
15
16 The patgen program reads the dictionary_file containing a list of hy‐
17 phenated words and the pattern_file containing previously-generated
18 patterns (if any) for a particular language (not a complete TeX source
19 file; see below), and produces the patout_file with (previously- plus
20 newly-generated) hyphenation patterns for that language. The trans‐
21 late_file defines language specific values for the parameters left_hy‐
22 phen_min and right_hyphen_min used by TeX's hyphenation algorithm and
23 the external representation of the lower and upper case version(s) of
24 all `letters' of that language. Further details of the pattern genera‐
25 tion process such as hyphenation levels and pattern lengths are re‐
26 quested interactively from the user's terminal. Optionally patgen cre‐
27 ates a new dictionary file pattmp.n showing the good and bad hyphens
28 found by the generated patterns, where n is the highest hyphenation
29 level.
30
31 The patterns generated by patgen can be read by initex for use in hy‐
32 phenating words. For a real-life example of patgen's output, see
33 $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains the patterns
34 TeX uses for English by default. At some sites, patterns for (many)
35 other languages may be available, and the local tex programs may have
36 them preloaded.
37
38 All filenames must be complete; no adding of default extensions or path
39 searching is done.
40
42 Letters
43 When initex digests hyphenation patterns, TeX first expands macros
44 and the result must entirely consist of digits (hyphenation lev‐
45 els), dots (`.', edge of a word), and letters. In pattern files for
46 non-English languages letters are often represented by macros or
47 other expandable constructs. For the purpose of patgen these are
48 just character sequences, subject to the condition that no such se‐
49 quence is a prefix of another one.
50
51 Dictionary file
52 A dictionary file contains a weighted list of hyphenated words, one
53 word per line starting in column 1. A digit in column 1 indicates a
54 global word weight (initially =1) applicable to all following words
55 up to the next global word weight. A digit at some intercharacter
56 position indicates a weight for that position only.
57
58 The hyphens in a word are indicated by `-', `*', or `.' (or their
59 replacements as defined in the translate file) for hyphens yet to
60 be found, `good' hyphens (correctly found by the patterns), and
61 `bad' hyphens (erroneously found by the patterns) respectively;
62 when reading a dictionary file `*' is treated like `-' and `.' is
63 ignored.
64
65 Pattern file
66 A pattern file contains only patterns in the format above, e.g.,
67 from a previous run of patgen. It may not contain any TeX comments
68 or control sequences. For instance, this is not a valid pattern
69 file:
70
71 % this is a pattern file read by TeX.
72 \patterns{%
73 ...
74 }
75 It can only contain the actual patterns, i.e., the `...'.
76
77 Translate file
78 A translate file starts with a line containing the values of
79 left_hyphen_min in columns 1-2, right_hyphen_min in columns 3-4,
80 and either a blank or the replacement for one of the "hyphen" char‐
81 acters `-', `*', and `.' in columns 5, 6, and 7. (Input lines are
82 padded with blanks as for many TeX related programs.)
83
84 Each following line defines one `letter': an arbitrary delimiter
85 character in column 1, followed by one or more external representa‐
86 tions of that character (first the `lower' case one used for out‐
87 put), each one terminated by the delimiter and the whole sequence
88 terminated by another delimiter.
89
90 If the translate file is empty, the values left_hyphen_min=2,
91 right_hyphen_min=3, and the 26 lower case letters a...z with their
92 upper case representations A...Z are assumed.
93
94 Terminal input
95 After reading the translate_file and any previously-generated pat‐
96 terns from pattern_file, patgen requests input from the user's ter‐
97 minal.
98
99 First the integer values of hyph_start and hyph_finish, the lowest
100 and highest hyphenation level for which patterns are to be gener‐
101 ated. The value of hyph_start should be larger than any hyphenation
102 level already present in pattern_file.
103
104 Then, for each hyphenation level, the integer values of pat_start
105 and pat_finish, the smallest and largest pattern length to be ana‐
106 lyzed, as well as good weight, bad weight, and threshold, the
107 weights for good and bad hyphens and a weight threshold for useful
108 patterns.
109
110 Finally the decision (`y' or `Y' vs. anything else) whether or not
111 to produce a hyphenated word list.
112
114 $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
115 The original hyphenation patterns for English, by Donald Knuth and
116 Frank Liang.
117
118 http://www.ctan.org/pkg/ushyph
119 Additional hyphenation patterns for English, extended by Gerard
120 Kuiken.
121
122 http://www.ctan.org/pkg/hyph-utf8
123 Collected hyphenation patterns for many languages in many formats.
124
125 http://www.ctan.org/tex-archive/language/
126 General CTAN directory for patterns and support for many other lan‐
127 guages.
128
130 Frank Liang and Peter Breitenlohner, patgen.web.
131
132 Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
133 University Ph.D. thesis, 1983, http://tug.org/docs/liang.
134
135 Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
136 Appendix H.
137
139 Frank Liang wrote the first version of this program. Peter Breiten‐
140 lohner made a substantial revision in 1991 for TeX 3. The first ver‐
141 sion was published as the appendix to the TeXware technical report.
142 Howard Trickey originally ported it to Unix.
143
144
145
146Web2C 2023 16 June 2015 PATGEN(1)