1Lingua::EN::Alphabet::SUhsaewr(3C)ontributed Perl DocumeLnitnagtuiao:n:EN::Alphabet::Shaw(3)
2
3
4

NAME

6       Lingua::EN::Alphabet::Shaw - transliterate the Latin to Shavian
7       alphabets
8

AUTHOR

10       Thomas Thurman <tthurman@gnome.org>
11

SYNOPSIS

13         use Lingua::EN::Alphabet::Shaw;
14
15         my $shaw = Lingua::EN::Alphabet::Shaw->new();
16         print $shaw->transliterate('I live near a live wire.');
17

DESCRIPTION

19       The Shaw or Shavian alphabet was commissioned by the will of the
20       playwright George Bernard Shaw in the early 1960s as a replacement for
21       the Latin alphabet for representing English.  It is designed to have a
22       one-to-one phonemic (not phonetic) mapping with the sounds of English.
23
24       Its ISO 15924 code is "Shaw" 281.
25
26       This module transliterates English text from the Latin alphabet into
27       the Shavian alphabet.
28
29       The API has changed since version 0.03 to be object-based.
30
31       If you find an error in the translation database, you can change it
32       yourself at http://shavian.org.uk/wiki/ .  You may download a current
33       copy of the dataset at http://shavian.org.uk/set/ .  If you want to
34       override the database shipped with this module, place the new copy at
35       ~/.cache/shavian/shavian-set.sqlite and it will be used in preference.
36

METHODS

38   Lingua::EN::Alphabet::Shaw->new()
39       Constructor.  Currently takes no arguments.
40
41   $shaw->transliterate($phrase)
42       Returns the transliteration of the given phrase into the Shavian
43       alphabet.  Can handle multi-word phrases.  Does a reasonable job
44       resolving homonym ambiguity ("does he like does?").
45
46       If you pass multiple arguments, the results will be concatenated, and
47       only the odd-numbered arguments will be transliterated.  The state of
48       homonym resolution is maintained.  This allows you to embed chunks of
49       text which should not be transliterated into the line, such as XML
50       tags.
51
52   $shaw->unknown_handler([$handler])
53       If a word is not found in the translation database, the transliteration
54       routines will call a particular handler to find out what to do, with
55       the unknown word as both its first and second arguments.  (This is to
56       allow later expansion; see BUGS AND ISSUES, below.)  The result of the
57       handler should be a string, which will be inserted into the result of
58       the transliteration routine at the correct place.
59
60       This method allows you to set a new handler by passing it as an
61       argument.  If you pass no argument, this method returns the current
62       handler.
63
64       The default handler only returns its argument.  A replacement handler
65       could, for example, make an attempt at guessing the transliteration; it
66       could die, to abort the transliteration process; it could return its
67       argument but also store the new value in a table so that a list of
68       missing words could later be reported to the user.
69
70   $shaw->mapping($phrase)
71       There is a quasi-standard mapping of the conventional alphabet onto the
72       Shavian alphabet.  This method maps Shavian text into the conventional
73       alphabet and vice versa. It does not transliterate.  Think of this as a
74       kind of ASCII-armouring.
75
76       Various versions of the standard map the naming dot to "G", "B", and
77       "/".  This method does not support "/", but maps both "G" and "B" to
78       the naming dot; in reverse, it maps the naming dot to "G".
79
80       The letters "K" and "L" have no mapping to Shavian letters, and are
81       left alone.
82
83   $shaw->normalise($shavian_text)
84       Certain letters in the Shavian alphabet are ligatures of pairs of other
85       letters: because of this, these pairs should not exist separately.
86       (For example, the letter YEW is a ligature of YEA and OOZE.) This
87       method replaces these pairs with their ligature equivalents.
88
89   $shaw->transliterate_html($html)
90       Given a block of text in the conventional alphabet which is formatted
91       as HTML, this will make a reasonable attempt at returning the same text
92       transliterated into the Shavian alphabet.  It is aware of which tags
93       commonly break the flow of sentences, and handles homonym resolution
94       accordingly.
95

BUGS AND ISSUES

97       There should be a version of the main transliteration method which
98       returned a list of hashes, each of which gave the source and
99       destination forms of a word, part of speech and disambiguation
100       information, and a marking of the source (CMUDict or Shavian Wiki).
101
102       It should probably be possible to transliterate in reverse, from
103       Shavian to the conventional alphabet.
104
105       It should be possible to handle other alternative scripts, such as
106       Deseret and Tengwar.  This shouldn't be very difficult.  It would also
107       allow representation in the IPA, which would mean this module could be
108       used for simple text-to-speech processing.
109
110       The portion of the database which is taken from CMUdict exhibits
111       unhelpful mergers (notably father/bother).  There isn't much that can
112       be done about this except extending the Shavian wiki further.  In
113       addition, in some cases it does not use the letters ARRAY and ADO in
114       unstressed syllables as they should be; this could and should be fixed.
115
116       It would be useful on initialisation to read a text file in a standard
117       location, which gave a local mapping overriding the database for given
118       words.
119
120       It would be helpful if there was a callback for any words found from
121       the CMUDict data rather than from the Shavian Wiki data, so that the
122       wiki could be updated.
123
124       The HTML transliterator should mark its output as being encoded in
125       UTF-8, whatever the source encoding.  (Shavian cannot be represented in
126       any other standard encoding.)
127
128       The HTML transliterator should have an option which put a span around
129       each word whose title was the word's spelling in the conventional
130       alphabet, in the manner of translate.google.com.
131
132       The HTML transliterator should have an option to rewrite the
133       destinations of links, and to add a target to them, so that it can be
134       used by a web script to link back to itself.
135
136       The HTML transliterator should add a "generator" META tag referencing
137       itself, if one is not already present.
138
139       The HTML transliterator should ignore sections marked as being written
140       in non-English languages.
141
142       The HTML transliterator should have an option to allow loading
143       documents in chunks, as "HTML::Parser" already does.
144
145       The mapping() method should have an extra parameter to cause it to map
146       in one direction only.
147
148       Most of these will be implemented before this module reaches version
149       1.00.
150

FONTS

152       You will need a Shavian Unicode font to use this module.  There are
153       several such fonts at http://marnanel.org/shavian/fonts/ .  Please be
154       sure to get a Unicode font and not one with the "Latin mapping".
155
156       However, the Mac can handle the Shavian alphabet out of the box.
157
159       This Perl module is copyright (C) Thomas Thurman, 2009-2010.  This is
160       free software, and can be used/modified under the same terms as Perl
161       itself.
162
163       The transliteration data is available under various free licences,
164       which are reproduced below.
165

LICENCES

167   Androcles and the Lion
168       Part of the transliteration data was taken from the 1962 Shavian
169       alphabet edition of "Androcles and the Lion"; this data is in the
170       public domain.
171
172   Shavian Wiki
173       Part of the transliteration data was taken from the Shavian Wiki, and
174       this is available under the Creative Commons cc-by-sa licence.
175
176   CMUdict
177       Another part of the transliteration data was taken from CMUdict.  Its
178       licence is reproduced below.
179
180       Copyright (C) 1993-2008 Carnegie Mellon University. All rights
181       reserved.
182
183       Redistribution and use in source and binary forms, with or without
184       modification, are permitted provided that the following conditions are
185       met:
186
187       1. Redistributions of source code must retain the above copyright
188          notice, this list of conditions and the following disclaimer.
189          The contents of this file are deemed to be source code.
190
191       2. Redistributions in binary form must reproduce the above copyright
192          notice, this list of conditions and the following disclaimer in
193          the documentation and/or other materials provided with the
194          distribution.
195
196       This work was supported in part by funding from the Defense Advanced
197       Research Projects Agency, the Office of Naval Research and the National
198       Science Foundation of the United States of America, and by member
199       companies of the Carnegie Mellon Sphinx Speech Consortium. We
200       acknowledge the contributions of many volunteers to the expansion and
201       improvement of this dictionary.
202
203       THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND
204       ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
205       IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
206       PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
207       NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
208       SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
209       LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
210       DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
211       THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
212       (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
213       OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
214
215   Brown tagger
216       The part-of-speech data was taken from the Brown tagger (although the
217       tagger built into this model is not the Brown tagger, so its first
218       sentence is inaccurate).  Its licence is also reproduced below:
219
220       This software was written by Eric Brill.
221
222       This software is being provided to you, the LICENSEE, by the
223       Massachusetts Institute of Technology (M.I.T.) under the following
224       license.  By obtaining, using and/or copying this software, you agree
225       that you have read, understood, and will comply with these terms and
226       conditions:
227
228       Permission to [use, copy, modify and distribute, including the right to
229       grant others rights to distribute at any tier, this software and its
230       documentation for any purpose and without fee or royalty] is hereby
231       granted, provided that you agree to comply with the following copyright
232       notice and statements, including the disclaimer, and that the same
233       appear on ALL copies of the software and documentation, including
234       modifications that you make for internal use or for distribution:
235
236       Copyright 1993 by the Massachusetts Institute of Technology and the
237       University of Pennsylvania.  All rights reserved.
238
239       THIS SOFTWARE IS PROVIDED "AS IS", AND M.I.T. MAKES NO REPRESENTATIONS
240       OR WARRANTIES, EXPRESS OR IMPLIED.  By way of example, but not
241       limitation, M.I.T. MAKES NO REPRESENTATIONS OR WARRANTIES OF
242       MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE
243       OF THE LICENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD
244       PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
245
246       The name of the Massachusetts Institute of Technology or M.I.T. may NOT
247       be used in advertising or publicity pertaining to distribution of the
248       software.  Title to copyright in this software and any associated
249       documentation shall at all times remain with M.I.T., and USER agrees to
250       preserve same.
251
252
253
254perl v5.36.0                      2022-07-22     Lingua::EN::Alphabet::Shaw(3)
Impressum