1PATGEN(1)                   General Commands Manual                  PATGEN(1)
2
3
4

NAME

6       patgen - generate patterns for TeX hyphenation
7

SYNOPSIS

9       patgen dictionary_file pattern_file patout_file translate_file
10

DESCRIPTION

12       This manual page is not meant to be exhaustive.  See also the Info file
13       or manual Web2C: A TeX implementation available as part of the TeX Live
14       distribution or at http://tug.org/web2c.
15
16       The  patgen  program reads the dictionary_file containing a list of hy‐
17       phenated words and  the  pattern_file  containing  previously-generated
18       patterns  (if any) for a particular language (not a complete TeX source
19       file; see below), and produces the patout_file with  (previously-  plus
20       newly-generated)  hyphenation  patterns  for  that language. The trans‐
21       late_file defines language specific values for the parameters  left_hy‐
22       phen_min  and  right_hyphen_min used by TeX's hyphenation algorithm and
23       the external representation of the lower and upper case  version(s)  of
24       all  `letters' of that language. Further details of the pattern genera‐
25       tion process such as hyphenation levels and  pattern  lengths  are  re‐
26       quested  interactively from the user's terminal. Optionally patgen cre‐
27       ates a new dictionary file pattmp.n showing the good  and  bad  hyphens
28       found  by  the  generated  patterns, where n is the highest hyphenation
29       level.
30
31       The patterns generated by patgen can be read by initex for use  in  hy‐
32       phenating  words.  For  a  real-life  example  of  patgen's output, see
33       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains  the  patterns
34       TeX  uses  for  English by default.  At some sites, patterns for (many)
35       other languages may be available, and the local tex programs  may  have
36       them preloaded.
37
38       All filenames must be complete; no adding of default extensions or path
39       searching is done.
40

FILE FORMATS

42       Letters
43           When initex digests hyphenation patterns, TeX first expands  macros
44           and  the  result  must entirely consist of digits (hyphenation lev‐
45           els), dots (`.', edge of a word), and letters. In pattern files for
46           non-English  languages  letters  are often represented by macros or
47           other expandable constructs.  For the purpose of patgen  these  are
48           just character sequences, subject to the condition that no such se‐
49           quence is a prefix of another one.
50
51       Dictionary file
52           A dictionary file contains a weighted list of hyphenated words, one
53           word per line starting in column 1. A digit in column 1 indicates a
54           global word weight (initially =1) applicable to all following words
55           up  to  the next global word weight. A digit at some intercharacter
56           position indicates a weight for that position only.
57
58           The hyphens in a word are indicated by `-', `*', or `.'  (or  their
59           replacements  as  defined in the translate file) for hyphens yet to
60           be found, `good' hyphens (correctly found  by  the  patterns),  and
61           `bad'  hyphens  (erroneously  found  by the patterns) respectively;
62           when reading a dictionary file `*' is treated like `-' and  `.'  is
63           ignored.
64
65       Pattern file
66           A  pattern  file  contains only patterns in the format above, e.g.,
67           from a previous run of patgen.  It may not contain any TeX comments
68           or  control  sequences.   For instance, this is not a valid pattern
69           file:
70
71           % this is a pattern file read by TeX.
72           \patterns{%
73            ...
74           }
75           It can only contain the actual patterns, i.e., the `...'.
76
77       Translate file
78           A translate file starts  with  a  line  containing  the  values  of
79           left_hyphen_min  in  columns  1-2, right_hyphen_min in columns 3-4,
80           and either a blank or the replacement for one of the "hyphen" char‐
81           acters  `-',  `*', and `.' in columns 5, 6, and 7. (Input lines are
82           padded with blanks as for many TeX related programs.)
83
84           Each following line defines one `letter':  an  arbitrary  delimiter
85           character in column 1, followed by one or more external representa‐
86           tions of that character (first the `lower' case one used  for  out‐
87           put),  each  one terminated by the delimiter and the whole sequence
88           terminated by another delimiter.
89
90           If the translate  file  is  empty,  the  values  left_hyphen_min=2,
91           right_hyphen_min=3,  and the 26 lower case letters a...z with their
92           upper case representations A...Z are assumed.
93
94       Terminal input
95           After reading the translate_file and any previously-generated  pat‐
96           terns from pattern_file, patgen requests input from the user's ter‐
97           minal.
98
99           First the integer values of hyph_start and hyph_finish, the  lowest
100           and  highest  hyphenation level for which patterns are to be gener‐
101           ated. The value of hyph_start should be larger than any hyphenation
102           level already present in pattern_file.
103
104           Then,  for  each hyphenation level, the integer values of pat_start
105           and pat_finish, the smallest and largest pattern length to be  ana‐
106           lyzed,  as  well  as  good  weight,  bad weight, and threshold, the
107           weights for good and bad hyphens and a weight threshold for  useful
108           patterns.
109
110           Finally  the decision (`y' or `Y' vs. anything else) whether or not
111           to produce a hyphenated word list.
112

FILES

114       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
115           The original hyphenation patterns for English, by Donald Knuth  and
116           Frank Liang.
117
118       http://www.ctan.org/pkg/ushyph
119           Additional  hyphenation  patterns  for  English, extended by Gerard
120           Kuiken.
121
122       http://www.ctan.org/pkg/hyph-utf8
123           Collected hyphenation patterns for many languages in many formats.
124
125       http://www.ctan.org/tex-archive/language/
126           General CTAN directory for patterns and support for many other lan‐
127           guages.
128

SEE ALSO

130       Frank Liang and Peter Breitenlohner, patgen.web.
131
132       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
133       University Ph.D. thesis, 1983, http://tug.org/docs/liang.
134
135       Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
136       Appendix H.
137

AUTHORS

139       Frank  Liang  wrote  the first version of this program.  Peter Breiten‐
140       lohner made a substantial revision in 1991 for TeX 3.  The  first  ver‐
141       sion  was  published  as  the appendix to the TeXware technical report.
142       Howard Trickey originally ported it to Unix.
143
144
145
146Web2C 2021                       16 June 2015                        PATGEN(1)
Impressum