1PATGEN(1)                   General Commands Manual                  PATGEN(1)
2
3
4

NAME

6       patgen - generate patterns for TeX hyphenation
7

SYNOPSIS

9       patgen dictionary_file pattern_file patout_file translate_file
10

DESCRIPTION

12       This  manual page is not meant to be exhaustive.  The complete documen‐
13       tation for this version of TeX can be found in the info file or  manual
14       Web2C: A TeX implementation.
15
16       The  patgen  program  reads  the  dictionary_file  containing a list of
17       hyphenated words and the pattern_file  containing  previously-generated
18       patterns   (if  any)  for  a  particular  language,  and  produces  the
19       patout_file with (previously- plus  newly-generated)  hyphenation  pat‐
20       terns  for  that language. The translate_file defines language specific
21       values for the parameters left_hyphen_min and right_hyphen_min used  by
22       TeX's  hyphenation  algorithm  and  the  external representation of the
23       lower and upper case version(s) of all `letters' of that language. Fur‐
24       ther details of the pattern generation process such as hyphenation lev‐
25       els and pattern lengths are requested  interactively  from  the  user's
26       terminal.  Optionally  patgen  creates  a  new dictionary file pattmp.n
27       showing the good and bad hyphens found by the generated patterns, where
28       n is the highest hypenation level.
29
30       The  patterns  generated  by  patgen  can  be read by initex for use in
31       hyphenating words. For a (very) long example of  patgen's  output,  see
32       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex,  which  contains the patterns
33       TeX uses for English.  At some sites, patterns for several  other  lan‐
34       guages  may be available, and the local tex programs may have them pre‐
35       loaded; consult your Local  Guide  or  your  system  administrator  for
36       details.
37
38       All filenames must be complete; no adding of default extensions or path
39       searching is done.
40

FILE FORMATS

42       Letters
43           When initex digests hyphenation patterns, TeX first expands  macros
44           and  the  result  must entirely consist of digits (hyphenation lev‐
45           els), dots (`.', edge of a word), and letters. In pattern files for
46           non-English  languages  letters  are often represented by macros or
47           other expandable constructs.  For the purpose of patgen  these  are
48           just  character  sequences,  subject  to the condition that no such
49           sequence is a prefix of another one.
50
51       Dictionary file
52           A dictionary file contains a weighted list of hyphenated words, one
53           word per line starting in column 1. A digit in column 1 indicates a
54           global word weight (initially =1) applicable to all following words
55           up  to  the next global word weight. A digit at some intercharacter
56           position indicates a weight for that position only.
57
58           The hyphens in a word are indicated by `-', `*', or `.'  (or  their
59           replacements  as  defined in the translate file) for hyphens yet to
60           be found, `good' hyphens (correctly found  by  the  patterns),  and
61           `bad'  hyphens  (erroneously  found  by the patterns) respectively;
62           when reading a dictionary file `*' is treated like `-' and  `.'  is
63           ignored.
64
65       Translate file
66           A  translate  file  starts  with  a  line  containing the values of
67           left_hypen_min in columns 1-2, right_hyphen_min in columns 3-4, and
68           either  a  blank or the replacement for one of the "hyphen" charac‐
69           ters `-', `*', and `.' in columns 5, 6, and  7.  (Input  lines  are
70           padded with blanks as for many TeX related programs.)
71
72           Each  following  line  defines one `letter': an arbitrary delimiter
73           character in column 1, followed by one or more external representa‐
74           tions  of  that character (first the `lower' case one used for out‐
75           put), each one terminated by the delimiter and the  whole  sequence
76           terminated by another delimiter.
77
78           If  the  translate  file  is  empty,  the  values left_hypen_min=2,
79           right_hyphen_min=3, and the 26 lower case letters a...z with  their
80           upper case representations A...Z are assumed.
81
82       Terminal input
83           After  reading the translate_file and any previously-generated pat‐
84           terns from pattern_file, patgen requests input from the user's ter‐
85           minal.
86
87           First  the integer values of hyph_start and hyph_finish, the lowest
88           and highest hyphenation level for which patterns are to  be  gener‐
89           ated. The value of hyph_start should be larger than any hyphenation
90           level already present in pattern_file.
91
92           Then, for each hyphenation level, the integer values  of  pat_start
93           and  pat_finish, the smallest and largest pattern length to be ana‐
94           lyzed, as well as good  weight,  bad  weight,  and  threshold,  the
95           weights  for good and bad hyphens and a weight threshold for useful
96           patterns.
97
98           Finally the decision (`y' or `Y' vs. anything else) whether or  not
99           to produce a hypenated word list.
100

FILES

102       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
103           Patterns for English.
104

SEE ALSO

106       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
107       University Ph.D. thesis, 1983.
108       Donald E. Knuth, The TeX  for  nroffbook,  Addison-Wesley,  1986,  ISBN
109       0-201-13447-0, Appendix H.
110

AUTHORS

112       Frank  Liang  wrote  the first version of this program.  Peter Breiten‐
113       lohner made a substantial revision in 1991 for TeX 3.  The  first  ver‐
114       sion  was  published as the appendix to the TeX for nroffware technical
115       report, available from the TeX Users Group. Howard  Trickey  originally
116       ported it to Unix.
117
118
119
120Web2C 7.5.4                     23 August 2004                       PATGEN(1)
Impressum