1Chatbot::Eliza(3) User Contributed Perl Documentation Chatbot::Eliza(3)
2
3
4
6 Chatbot::Eliza - A clone of the classic Eliza program
7
9 use Chatbot::Eliza;
10
11 $mybot = new Chatbot::Eliza;
12 $mybot->command_interface;
13
14 # see below for details
15
17 This module implements the classic Eliza algorithm. The original Eliza
18 program was written by Joseph Weizenbaum and described in the
19 Communications of the ACM in 1966. Eliza is a mock Rogerian
20 psychotherapist. It prompts for user input, and uses a simple
21 transformation algorithm to change user input into a follow-up
22 question. The program is designed to give the appearance of
23 understanding.
24
25 This program is a faithful implementation of the program described by
26 Weizenbaum. It uses a simplified script language (devised by Charles
27 Hayden). The content of the script is the same as Weizenbaum's.
28
29 This module encapsulates the Eliza algorithm in the form of an object.
30 This should make the functionality easy to incorporate in larger
31 programs.
32
34 The current version of Chatbot::Eliza.pm is available on CPAN:
35
36 http://www.perl.com/CPAN/modules/by-module/Chatbot/
37
38 To install this package, just change to the directory which you created
39 by untarring the package, and type the following:
40
41 perl Makefile.PL
42 make test
43 make
44 make install
45
46 This will copy Eliza.pm to your perl library directory for use by all
47 perl scripts. You probably must be root to do this, unless you have
48 installed a personal copy of perl.
49
51 This is all you need to do to launch a simple Eliza session:
52
53 use Chatbot::Eliza;
54
55 $mybot = new Chatbot::Eliza;
56 $mybot->command_interface;
57
58 You can also customize certain features of the session:
59
60 $myotherbot = new Chatbot::Eliza;
61
62 $myotherbot->name( "Hortense" );
63 $myotherbot->debug( 1 );
64
65 $myotherbot->command_interface;
66
67 These lines set the name of the bot to be "Hortense" and turn on the
68 debugging output.
69
70 When creating an Eliza object, you can specify a name and an
71 alternative scriptfile:
72
73 $bot = new Chatbot::Eliza "Brian", "myscript.txt";
74
75 You can also use an anonymous hash to set these parameters. Any of the
76 fields can be initialized using this syntax:
77
78 $bot = new Chatbot::Eliza {
79 name => "Brian",
80 scriptfile => "myscript.txt",
81 debug => 1,
82 prompts_on => 1,
83 memory_on => 0,
84 myrand =>
85 sub { my $N = defined $_[0] ? $_[0] : 1; rand($N); },
86 };
87
88 If you don't specify a script file, then the new object will be
89 initialized with a default script. The module contains this script
90 within itself.
91
92 You can use any of the internal functions in a calling program. The
93 code below takes an arbitrary string and retrieves the reply from the
94 Eliza object:
95
96 my $string = "I have too many problems.";
97 my $reply = $mybot->transform( $string );
98
99 You can easily create two bots, each with a different script, and see
100 how they interact:
101
102 use Chatbot::Eliza
103
104 my ($harry, $sally, $he_says, $she_says);
105
106 $sally = new Chatbot::Eliza "Sally", "histext.txt";
107 $harry = new Chatbot::Eliza "Harry", "hertext.txt";
108
109 $he_says = "I am sad.";
110
111 # Seed the random number generator.
112 srand( time ^ ($$ + ($$ << 15)) );
113
114 while (1) {
115 $she_says = $sally->transform( $he_says );
116 print $sally->name, ": $she_says \n";
117
118 $he_says = $harry->transform( $she_says );
119 print $harry->name, ": $he_says \n";
120 }
121
122 Mechanically, this works well. However, it critically depends on the
123 actual script data. Having two mock Rogerian therapists talk to each
124 other usually does not produce any sensible conversation, of course.
125
126 After each call to the transform() method, the debugging output for
127 that transformation is stored in a variable called $debug_text.
128
129 my $reply = $mybot->transform( "My foot hurts" );
130 my $debugging = $mybot->debug_text;
131
132 This feature always available, even if the instance's $debug variable
133 is set to 0.
134
135 Calling programs can specify their own random-number generators. Use
136 this syntax:
137
138 $chatbot = new Chatbot::Eliza;
139 $chatbot->myrand(
140 sub {
141 #function goes here!
142 }
143 );
144
145 The custom random function should have the same prototype as perl's
146 built-in rand() function. That is, it should take a single (numeric)
147 expression as a parameter, and it should return a floating-point value
148 between 0 and that number.
149
150 What this code actually does is pass a reference to an anonymous
151 subroutine ("code reference"). Make sure you've read the perlref
152 manpage for details on how code references actually work.
153
154 If you don't specify any custom rand function, then the Eliza object
155 will just use the built-in rand() function.
156
158 Each Eliza object uses the following data structures to hold the script
159 data in memory:
160
161 %decomplist
162 Hash: the set of keywords; Values: strings containing the
163 decomposition rules.
164
165 %reasmblist
166 Hash: a set of values which are each the join of a keyword and a
167 corresponding decomposition rule; Values: the set of possible
168 reassembly statements for that keyword and decomposition rule.
169
170 %reasmblist_for_memory
171 This structure is identical to %reasmblist, except that these rules are
172 only invoked when a user comment is being retrieved from memory. These
173 contain comments such as "Earlier you mentioned that...," which are
174 only appropriate for remembered comments. Rules in the script must be
175 specially marked in order to be included in this list rather than
176 %reasmblist. The default script only has a few of these rules.
177
178 @memory
179 A list of user comments which an Eliza instance is remembering for
180 future use. Eliza does not remember everything, only some things. In
181 this implementation, Eliza will only remember comments which match a
182 decomposition rule which actually has reassembly rules that are marked
183 with the keyword "reasm_for_memory" rather than the normal "reasmb".
184 The default script only has a few of these.
185
186 %keyranks
187 Hash: the set of keywords; Values: the ranks for each keyword
188
189 @quit
190 "quit" words -- that is, words the user might use to try to exit the
191 program.
192
193 @initial
194 Possible greetings for the beginning of the program.
195
196 @final
197 Possible farewells for the end of the program.
198
199 %pre
200 Hash: words which are replaced before any transformations; Values: the
201 respective replacement words.
202
203 %post
204 Hash: words which are replaced after the transformations and after the
205 reply is constructed; Values: the respective replacement words.
206
207 %synon
208 Hash: words which are found in decomposition rules; Values: words which
209 are treated just like their corresponding synonyms during matching of
210 decomposition rules.
211
212 Other data members
213 There are several other internal data members. Hopefully these are
214 sufficiently obvious that you can learn about them just by reading the
215 source code.
216
218 new()
219 my $chatterbot = new Chatbot::Eliza;
220
221 new() creates a new Eliza object. This method also calls the internal
222 _initialize() method, which in turn calls the parse_script_data()
223 method, which initializes the script data.
224
225 my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';
226
227 The eliza object defaults to the name "Eliza", and it contains default
228 script data within itself. However, using the syntax above, you can
229 specify an alternative name and an alternative script file.
230
231 See the method parse_script_data(). for a description of the format of
232 the script file.
233
234 command_interface()
235 $chatterbot->command_interface;
236
237 command_interface() opens an interactive session with the Eliza object,
238 just like the original Eliza program.
239
240 If you want to design your own session format, then you can write your
241 own while loop and your own functions for prompting for and reading
242 user input, and use the transform() method to generate Eliza's
243 responses. (Note: you do not need to invoke preprocess() and
244 postprocess() directly, because these are invoked from within the
245 transform() method.)
246
247 But if you're lazy and you want to skip all that, then just use
248 command_interface(). It's all done for you.
249
250 During an interactive session invoked using command_interface(), you
251 can enter the word "debug" to toggle debug mode on and off. You can
252 also enter the keyword "memory" to invoke the _debug_memory() method
253 and print out the contents of the Eliza instance's memory.
254
255 preprocess()
256 $string = preprocess($string);
257
258 preprocess() applies simple substitution rules to the input string.
259 Mostly this is to catch varieties in spelling, misspellings,
260 contractions and the like.
261
262 preprocess() is called from within the transform() method. It is
263 applied to user-input text, BEFORE any processing, and before a
264 reassebly statement has been selected.
265
266 It uses the array %pre, which is created during the parse of the
267 script.
268
269 postprocess()
270 $string = postprocess($string);
271
272 postprocess() applies simple substitution rules to the reassembly rule.
273 This is where all the "I"'s and "you"'s are exchanged. postprocess()
274 is called from within the transform() function.
275
276 It uses the array %post, created during the parse of the script.
277
278 _testquit()
279 if ($self->_testquit($user_input) ) { ... }
280
281 _testquit() detects words like "bye" and "quit" and returns true if it
282 finds one of them as the first word in the sentence.
283
284 These words are listed in the script, under the keyword "quit".
285
286 _debug_memory()
287 $self->_debug_memory()
288
289 _debug_memory() is a special function which returns the contents of
290 Eliza's memory stack.
291
292 transform()
293 $reply = $chatterbot->transform( $string, $use_memory );
294
295 transform() applies transformation rules to the user input string. It
296 invokes preprocess(), does transformations, then invokes postprocess().
297 It returns the transformed output string, called $reasmb.
298
299 The algorithm embedded in the transform() method has three main parts:
300
301 1. Search the input string for a keyword.
302
303 2. If we find a keyword, use the list of decomposition rules for that
304 keyword, and pattern-match the input string against each rule.
305
306 3. If the input string matches any of the decomposition rules, then
307 randomly select one of the reassembly rules for that decomposition
308 rule, and use it to construct the reply.
309
310 transform() takes two parameters. The first is the string we want to
311 transform. The second is a flag which indicates where this sting came
312 from. If the flag is set, then the string has been pulled from memory,
313 and we should use reassembly rules appropriate for that. If the flag
314 is not set, then the string is the most recent user input, and we can
315 use the ordinary reassembly rules.
316
317 The memory flag is only set when the transform() function is called
318 recursively. The mechanism for setting this parameter is embedded in
319 the transoform method itself. If the flag is set inappropriately, it
320 is ignored.
321
322 How memory is used
323 In the script, some reassembly rules are special. They are marked with
324 the keyword "reasm_for_memory", rather than just "reasm". Eliza
325 "remembers" any comment when it matches a docomposition rule for which
326 there are any reassembly rules for memory. An Eliza object remembers
327 up to $max_memory_size (default: 5) user input strings.
328
329 If, during a subsequent run, the transform() method fails to find any
330 appropriate decomposition rule for a user's comment, and if there are
331 any comments inside the memory array, then Eliza may elect to ignore
332 the most recent comment and instead pull out one of the strings from
333 memory. In this case, the transform method is called recursively with
334 the memory flag.
335
336 Honestly, I am not sure exactly how this memory functionality was
337 implemented in the original Eliza program. Hopefully this
338 implementation is not too far from Weizenbaum's.
339
340 If you don't want to use the memory functionality at all, then you can
341 disable it:
342
343 $mybot->memory_on(0);
344
345 You can also achieve the same effect by making sure that the script
346 data does not contain any reassembly rules marked with the keyword
347 "reasm_for_memory". The default script data only has 4 such items.
348
349 parse_script_data()
350 $self->parse_script_data;
351 $self->parse_script_data( $script_file );
352
353 parse_script_data() is invoked from the _initialize() method, which is
354 called from the new() function. However, you can also call this method
355 at any time against an already-instantiated Eliza instance. In that
356 case, the new script data is added to the old script data. The old
357 script data is not deleted.
358
359 You can pass a parameter to this function, which is the name of the
360 script file, and it will read in and parse that file. If you do not
361 pass any parameter to this method, then it will read the data embedded
362 at the end of the module as its default script data.
363
364 If you pass the name of a script file to parse_script_data(), and that
365 file is not available for reading, then the module dies.
366
368 This module includes a default script file within itself, so it is not
369 necessary to explicitly specify a script file when instantiating an
370 Eliza object.
371
372 Each line in the script file can specify a key, a decomposition rule,
373 or a reassembly rule.
374
375 key: remember 5
376 decomp: * i remember *
377 reasmb: Do you often think of (2) ?
378 reasmb: Does thinking of (2) bring anything else to mind ?
379 decomp: * do you remember *
380 reasmb: Did you think I would forget (2) ?
381 reasmb: What about (2) ?
382 reasmb: goto what
383 pre: equivalent alike
384 synon: belief feel think believe wish
385
386 The number after the key specifies the rank. If a user's input
387 contains the keyword, then the transform() function will try to match
388 one of the decomposition rules for that keyword. If one matches, then
389 it will select one of the reassembly rules at random. The number (2)
390 here means "use whatever set of words matched the second asterisk in
391 the decomposition rule."
392
393 If you specify a list of synonyms for a word, the you should use a "@"
394 when you use that word in a decomposition rule:
395
396 decomp: * i @belief i *
397 reasmb: Do you really think so ?
398 reasmb: But you are not sure you (3).
399
400 Otherwise, the script will never check to see if there are any synonyms
401 for that keyword.
402
403 Reassembly rules should be marked with reasm_for_memory rather than
404 reasmb when it is appropriate for use when a user's comment has been
405 extracted from memory.
406
407 key: my 2
408 decomp: * my *
409 reasm_for_memory: Let's discuss further why your (2).
410 reasm_for_memory: Earlier you said your (2).
411 reasm_for_memory: But your (2).
412 reasm_for_memory: Does that have anything to do with the fact that your (2) ?
413
415 Each line in the script file contains an "entrytype" (key, decomp,
416 synon) and an "entry", separated by a colon. In turn, each "entry" can
417 itself be composed of a "key" and a "value", separated by a space. The
418 parse_script_data() function parses each line out, and splits the
419 "entry" and "entrytype" portion of each line into two variables, $entry
420 and $entrytype.
421
422 Next, it uses the string $entrytype to determine what sort of stuff to
423 expect in the $entry variable, if anything, and parses it accordingly.
424 In some cases, there is no second level of key-value pair, so the
425 function does not even bother to isolate or create $key and $value.
426
427 $key is always a single word. $value can be null, or one single word,
428 or a string composed of several words, or an array of words.
429
430 Based on all these entries and keys and values, the function creates
431 two giant hashes: %decomplist, which holds the decomposition rules for
432 each keyword, and %reasmblist, which holds the reassembly phrases for
433 each decomposition rule. It also creates %keyranks, which holds the
434 ranks for each key.
435
436 Six other arrays are created: "%reasm_for_memory, %pre, %post, %synon,
437 @initial," and @final.
438
440 This software is copyright (c) 2003 by John Nolan <jpnolan@sonic.net>.
441
442 This is free software; you can redistribute it and/or modify it under
443 the same terms as the Perl 5 programming language system itself.
444
446 John Nolan jpnolan@sonic.net January 2003.
447
448 Implements the classic Eliza algorithm by Prof. Joseph Weizenbaum.
449 Script format devised by Charles Hayden.
450
451
452
453perl v5.34.0 2022-01-21 Chatbot::Eliza(3)