1Lingua::EN::Alphabet::SUhsaewr(3C)ontributed Perl DocumeLnitnagtuiao:n:EN::Alphabet::Shaw(3)
2
3
4
6 Lingua::EN::Alphabet::Shaw - transliterate the Latin to Shavian
7 alphabets
8
10 Thomas Thurman <tthurman@gnome.org>
11
13 use Lingua::EN::Alphabet::Shaw;
14
15 my $shaw = Lingua::EN::Alphabet::Shaw->new();
16 print $shaw->transliterate('I live near a live wire.');
17
19 The Shaw or Shavian alphabet was commissioned by the will of the
20 playwright George Bernard Shaw in the early 1960s as a replacement for
21 the Latin alphabet for representing English. It is designed to have a
22 one-to-one phonemic (not phonetic) mapping with the sounds of English.
23
24 Its ISO 15924 code is "Shaw" 281.
25
26 This module transliterates English text from the Latin alphabet into
27 the Shavian alphabet.
28
29 The API has changed since version 0.03 to be object-based.
30
31 If you find an error in the translation database, you can change it
32 yourself at http://shavian.org.uk/wiki/ . You may download a current
33 copy of the dataset at http://shavian.org.uk/set/ . If you want to
34 override the database shipped with this module, place the new copy at
35 ~/.cache/shavian/shavian-set.sqlite and it will be used in preference.
36
38 Lingua::EN::Alphabet::Shaw->new()
39 Constructor. Currently takes no arguments.
40
41 $shaw->transliterate($phrase)
42 Returns the transliteration of the given phrase into the Shavian
43 alphabet. Can handle multi-word phrases. Does a reasonable job
44 resolving homonym ambiguity ("does he like does?").
45
46 If you pass multiple arguments, the results will be concatenated, and
47 only the odd-numbered arguments will be transliterated. The state of
48 homonym resolution is maintained. This allows you to embed chunks of
49 text which should not be transliterated into the line, such as XML
50 tags.
51
52 $shaw->unknown_handler([$handler])
53 If a word is not found in the translation database, the transliteration
54 routines will call a particular handler to find out what to do, with
55 the unknown word as both its first and second arguments. (This is to
56 allow later expansion; see BUGS AND ISSUES, below.) The result of the
57 handler should be a string, which will be inserted into the result of
58 the transliteration routine at the correct place.
59
60 This method allows you to set a new handler by passing it as an
61 argument. If you pass no argument, this method returns the current
62 handler.
63
64 The default handler only returns its argument. A replacement handler
65 could, for example, make an attempt at guessing the transliteration; it
66 could die, to abort the transliteration process; it could return its
67 argument but also store the new value in a table so that a list of
68 missing words could later be reported to the user.
69
70 $shaw->mapping($phrase)
71 There is a quasi-standard mapping of the conventional alphabet onto the
72 Shavian alphabet. This method maps Shavian text into the conventional
73 alphabet and vice versa. It does not transliterate. Think of this as a
74 kind of ASCII-armouring.
75
76 Various versions of the standard map the naming dot to "G", "B", and
77 "/". This method does not support "/", but maps both "G" and "B" to
78 the naming dot; in reverse, it maps the naming dot to "G".
79
80 The letters "K" and "L" have no mapping to Shavian letters, and are
81 left alone.
82
83 $shaw->normalise($shavian_text)
84 Certain letters in the Shavian alphabet are ligatures of pairs of other
85 letters: because of this, these pairs should not exist separately.
86 (For example, the letter YEW is a ligature of YEA and OOZE.) This
87 method replaces these pairs with their ligature equivalents.
88
89 $shaw->transliterate_html($html)
90 Given a block of text in the conventional alphabet which is formatted
91 as HTML, this will make a reasonable attempt at returning the same text
92 transliterated into the Shavian alphabet. It is aware of which tags
93 commonly break the flow of sentences, and handles homonym resolution
94 accordingly.
95
97 There should be a version of the main transliteration method which
98 returned a list of hashes, each of which gave the source and
99 destination forms of a word, part of speech and disambiguation
100 information, and a marking of the source (CMUDict or Shavian Wiki).
101
102 It should probably be possible to transliterate in reverse, from
103 Shavian to the conventional alphabet.
104
105 It should be possible to handle other alternative scripts, such as
106 Deseret and Tengwar. This shouldn't be very difficult. It would also
107 allow representation in the IPA, which would mean this module could be
108 used for simple text-to-speech processing.
109
110 The portion of the database which is taken from CMUdict exhibits
111 unhelpful mergers (notably father/bother). There isn't much that can
112 be done about this except extending the Shavian wiki further. In
113 addition, in some cases it does not use the letters ARRAY and ADO in
114 unstressed syllables as they should be; this could and should be fixed.
115
116 It would be useful on initialisation to read a text file in a standard
117 location, which gave a local mapping overriding the database for given
118 words.
119
120 It would be helpful if there was a callback for any words found from
121 the CMUDict data rather than from the Shavian Wiki data, so that the
122 wiki could be updated.
123
124 The HTML transliterator should mark its output as being encoded in
125 UTF-8, whatever the source encoding. (Shavian cannot be represented in
126 any other standard encoding.)
127
128 The HTML transliterator should have an option which put a span around
129 each word whose title was the word's spelling in the conventional
130 alphabet, in the manner of translate.google.com.
131
132 The HTML transliterator should have an option to rewrite the
133 destinations of links, and to add a target to them, so that it can be
134 used by a web script to link back to itself.
135
136 The HTML transliterator should add a "generator" META tag referencing
137 itself, if one is not already present.
138
139 The HTML transliterator should ignore sections marked as being written
140 in non-English languages.
141
142 The HTML transliterator should have an option to allow loading
143 documents in chunks, as "HTML::Parser" already does.
144
145 The mapping() method should have an extra parameter to cause it to map
146 in one direction only.
147
148 Most of these will be implemented before this module reaches version
149 1.00.
150
152 You will need a Shavian Unicode font to use this module. There are
153 several such fonts at http://marnanel.org/shavian/fonts/ . Please be
154 sure to get a Unicode font and not one with the "Latin mapping".
155
156 However, the Mac can handle the Shavian alphabet out of the box.
157
159 This Perl module is copyright (C) Thomas Thurman, 2009-2010. This is
160 free software, and can be used/modified under the same terms as Perl
161 itself.
162
163 The transliteration data is available under various free licences,
164 which are reproduced below.
165
167 Androcles and the Lion
168 Part of the transliteration data was taken from the 1962 Shavian
169 alphabet edition of "Androcles and the Lion"; this data is in the
170 public domain.
171
172 Shavian Wiki
173 Part of the transliteration data was taken from the Shavian Wiki, and
174 this is available under the Creative Commons cc-by-sa licence.
175
176 CMUdict
177 Another part of the transliteration data was taken from CMUdict. Its
178 licence is reproduced below.
179
180 Copyright (C) 1993-2008 Carnegie Mellon University. All rights
181 reserved.
182
183 Redistribution and use in source and binary forms, with or without
184 modification, are permitted provided that the following conditions are
185 met:
186
187 1. Redistributions of source code must retain the above copyright
188 notice, this list of conditions and the following disclaimer.
189 The contents of this file are deemed to be source code.
190
191 2. Redistributions in binary form must reproduce the above copyright
192 notice, this list of conditions and the following disclaimer in
193 the documentation and/or other materials provided with the
194 distribution.
195
196 This work was supported in part by funding from the Defense Advanced
197 Research Projects Agency, the Office of Naval Research and the National
198 Science Foundation of the United States of America, and by member
199 companies of the Carnegie Mellon Sphinx Speech Consortium. We
200 acknowledge the contributions of many volunteers to the expansion and
201 improvement of this dictionary.
202
203 THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND
204 ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
205 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
206 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
207 NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
208 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
209 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
210 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
211 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
212 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
213 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
214
215 Brown tagger
216 The part-of-speech data was taken from the Brown tagger (although the
217 tagger built into this model is not the Brown tagger, so its first
218 sentence is inaccurate). Its licence is also reproduced below:
219
220 This software was written by Eric Brill.
221
222 This software is being provided to you, the LICENSEE, by the
223 Massachusetts Institute of Technology (M.I.T.) under the following
224 license. By obtaining, using and/or copying this software, you agree
225 that you have read, understood, and will comply with these terms and
226 conditions:
227
228 Permission to [use, copy, modify and distribute, including the right to
229 grant others rights to distribute at any tier, this software and its
230 documentation for any purpose and without fee or royalty] is hereby
231 granted, provided that you agree to comply with the following copyright
232 notice and statements, including the disclaimer, and that the same
233 appear on ALL copies of the software and documentation, including
234 modifications that you make for internal use or for distribution:
235
236 Copyright 1993 by the Massachusetts Institute of Technology and the
237 University of Pennsylvania. All rights reserved.
238
239 THIS SOFTWARE IS PROVIDED "AS IS", AND M.I.T. MAKES NO REPRESENTATIONS
240 OR WARRANTIES, EXPRESS OR IMPLIED. By way of example, but not
241 limitation, M.I.T. MAKES NO REPRESENTATIONS OR WARRANTIES OF
242 MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE
243 OF THE LICENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD
244 PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
245
246 The name of the Massachusetts Institute of Technology or M.I.T. may NOT
247 be used in advertising or publicity pertaining to distribution of the
248 software. Title to copyright in this software and any associated
249 documentation shall at all times remain with M.I.T., and USER agrees to
250 preserve same.
251
252
253
254perl v5.32.0 2020-07-28 Lingua::EN::Alphabet::Shaw(3)