1Algorithm::Diff(3)    User Contributed Perl Documentation   Algorithm::Diff(3)
2
3
4

NAME

6       Algorithm::Diff - Compute `intelligent' differences between two files /
7       lists
8

SYNOPSIS

10           require Algorithm::Diff;
11
12           # This example produces traditional 'diff' output:
13
14           my $diff = Algorithm::Diff->new( \@seq1, \@seq2 );
15
16           $diff->Base( 1 );   # Return line numbers, not indices
17           while(  $diff->Next()  ) {
18               next   if  $diff->Same();
19               my $sep = '';
20               if(  ! $diff->Items(2)  ) {
21                   printf "%d,%dd%d\n",
22                       $diff->Get(qw( Min1 Max1 Max2 ));
23               } elsif(  ! $diff->Items(1)  ) {
24                   printf "%da%d,%d\n",
25                       $diff->Get(qw( Max1 Min2 Max2 ));
26               } else {
27                   $sep = "---\n";
28                   printf "%d,%dc%d,%d\n",
29                       $diff->Get(qw( Min1 Max1 Min2 Max2 ));
30               }
31               print "< $_"   for  $diff->Items(1);
32               print $sep;
33               print "> $_"   for  $diff->Items(2);
34           }
35
36
37           # Alternate interfaces:
38
39           use Algorithm::Diff qw(
40               LCS LCS_length LCSidx
41               diff sdiff compact_diff
42               traverse_sequences traverse_balanced );
43
44           @lcs    = LCS( \@seq1, \@seq2 );
45           $lcsref = LCS( \@seq1, \@seq2 );
46           $count  = LCS_length( \@seq1, \@seq2 );
47
48           ( $seq1idxref, $seq2idxref ) = LCSidx( \@seq1, \@seq2 );
49
50
51           # Complicated interfaces:
52
53           @diffs  = diff( \@seq1, \@seq2 );
54
55           @sdiffs = sdiff( \@seq1, \@seq2 );
56
57           @cdiffs = compact_diff( \@seq1, \@seq2 );
58
59           traverse_sequences(
60               \@seq1,
61               \@seq2,
62               {   MATCH     => \&callback1,
63                   DISCARD_A => \&callback2,
64                   DISCARD_B => \&callback3,
65               },
66               \&key_generator,
67               @extra_args,
68           );
69
70           traverse_balanced(
71               \@seq1,
72               \@seq2,
73               {   MATCH     => \&callback1,
74                   DISCARD_A => \&callback2,
75                   DISCARD_B => \&callback3,
76                   CHANGE    => \&callback4,
77               },
78               \&key_generator,
79               @extra_args,
80           );
81

INTRODUCTION

83       (by Mark-Jason Dominus)
84
85       I once read an article written by the authors of "diff"; they said that
86       they worked very hard on the algorithm until they found the right one.
87
88       I think what they ended up using (and I hope someone will correct me,
89       because I am not very confident about this) was the `longest common
90       subsequence' method.  In the LCS problem, you have two sequences of
91       items:
92
93           a b c d f g h j q z
94
95           a b c d e f g i j k r x y z
96
97       and you want to find the longest sequence of items that is present in
98       both original sequences in the same order.  That is, you want to find a
99       new sequence S which can be obtained from the first sequence by
100       deleting some items, and from the secend sequence by deleting other
101       items.  You also want S to be as long as possible.  In this case S is
102
103           a b c d f g j z
104
105       From there it's only a small step to get diff-like output:
106
107           e   h i   k   q r x y
108           +   - +   +   - + + +
109
110       This module solves the LCS problem.  It also includes a canned function
111       to generate "diff"-like output.
112
113       It might seem from the example above that the LCS of two sequences is
114       always pretty obvious, but that's not always the case, especially when
115       the two sequences have many repeated elements.  For example, consider
116
117           a x b y c z p d q
118           a b c a x b y c z
119
120       A naive approach might start by matching up the "a" and "b" that appear
121       at the beginning of each sequence, like this:
122
123           a x b y c         z p d q
124           a   b   c a b y c z
125
126       This finds the common subsequence "a b c z".  But actually, the LCS is
127       "a x b y c z":
128
129                 a x b y c z p d q
130           a b c a x b y c z
131
132       or
133
134           a       x b y c z p d q
135           a b c a x b y c z
136

USAGE

138       (See also the README file and several example scripts include with this
139       module.)
140
141       This module now provides an object-oriented interface that uses less
142       memory and is easier to use than most of the previous procedural
143       interfaces.  It also still provides several exportable functions.
144       We'll deal with these in ascending order of difficulty:  "LCS",
145       "LCS_length", "LCSidx", OO interface, "prepare", "diff", "sdiff",
146       "traverse_sequences", and "traverse_balanced".
147
148   "LCS"
149       Given references to two lists of items, LCS returns an array containing
150       their longest common subsequence.  In scalar context, it returns a
151       reference to such a list.
152
153           @lcs    = LCS( \@seq1, \@seq2 );
154           $lcsref = LCS( \@seq1, \@seq2 );
155
156       "LCS" may be passed an optional third parameter; this is a CODE
157       reference to a key generation function.  See "KEY GENERATION
158       FUNCTIONS".
159
160           @lcs    = LCS( \@seq1, \@seq2, \&keyGen, @args );
161           $lcsref = LCS( \@seq1, \@seq2, \&keyGen, @args );
162
163       Additional parameters, if any, will be passed to the key generation
164       routine.
165
166   "LCS_length"
167       This is just like "LCS" except it only returns the length of the
168       longest common subsequence.  This provides a performance gain of about
169       9% compared to "LCS".
170
171   "LCSidx"
172       Like "LCS" except it returns references to two arrays.  The first array
173       contains the indices into @seq1 where the LCS items are located.  The
174       second array contains the indices into @seq2 where the LCS items are
175       located.
176
177       Therefore, the following three lists will contain the same values:
178
179           my( $idx1, $idx2 ) = LCSidx( \@seq1, \@seq2 );
180           my @list1 = @seq1[ @$idx1 ];
181           my @list2 = @seq2[ @$idx2 ];
182           my @list3 = LCS( \@seq1, \@seq2 );
183
184   "new"
185           $diff = Algorithm::Diffs->new( \@seq1, \@seq2 );
186           $diff = Algorithm::Diffs->new( \@seq1, \@seq2, \%opts );
187
188       "new" computes the smallest set of additions and deletions necessary to
189       turn the first sequence into the second and compactly records them in
190       the object.
191
192       You use the object to iterate over hunks, where each hunk represents a
193       contiguous section of items which should be added, deleted, replaced,
194       or left unchanged.
195
196           The following summary of all of the methods looks a lot like Perl
197           code but some of the symbols have different meanings:
198
199               [ ]     Encloses optional arguments
200               :       Is followed by the default value for an optional argument
201               |       Separates alternate return results
202
203           Method summary:
204
205               $obj        = Algorithm::Diff->new( \@seq1, \@seq2, [ \%opts ] );
206               $pos        = $obj->Next(  [ $count : 1 ] );
207               $revPos     = $obj->Prev(  [ $count : 1 ] );
208               $obj        = $obj->Reset( [ $pos : 0 ] );
209               $copy       = $obj->Copy(  [ $pos, [ $newBase ] ] );
210               $oldBase    = $obj->Base(  [ $newBase ] );
211
212           Note that all of the following methods "die" if used on an object
213           that is "reset" (not currently pointing at any hunk).
214
215               $bits       = $obj->Diff(  );
216               @items|$cnt = $obj->Same(  );
217               @items|$cnt = $obj->Items( $seqNum );
218               @idxs |$cnt = $obj->Range( $seqNum, [ $base ] );
219               $minIdx     = $obj->Min(   $seqNum, [ $base ] );
220               $maxIdx     = $obj->Max(   $seqNum, [ $base ] );
221               @values     = $obj->Get(   @names );
222
223           Passing in "undef" for an optional argument is always treated the
224           same as if no argument were passed in.
225
226           "Next"
227
228               $pos = $diff->Next();    # Move forward 1 hunk
229               $pos = $diff->Next( 2 ); # Move forward 2 hunks
230               $pos = $diff->Next(-5);  # Move backward 5 hunks
231
232           "Next" moves the object to point at the next hunk.  The object
233           starts out "reset", which means it isn't pointing at any hunk.  If
234           the object is reset, then "Next()" moves to the first hunk.
235
236           "Next" returns a true value iff the move didn't go past the last
237           hunk.  So Next(0) will return true iff the object is not reset.
238
239           Actually, "Next" returns the object's new position, which is a
240           number between 1 and the number of hunks (inclusive), or returns a
241           false value.
242
243           "Prev"
244
245           "Prev($N)" is almost identical to "Next(-$N)"; it moves to the $Nth
246           previous hunk.  On a 'reset' object, "Prev()" [and "Next(-1)"] move
247           to the last hunk.
248
249           The position returned by "Prev" is relative to the end of the
250           hunks; -1 for the last hunk, -2 for the second-to-last, etc.
251
252           "Reset"
253
254               $diff->Reset();     # Reset the object's position
255               $diff->Reset($pos); # Move to the specified hunk
256               $diff->Reset(1);    # Move to the first hunk
257               $diff->Reset(-1);   # Move to the last hunk
258
259           "Reset" returns the object, so, for example, you could use
260           "$diff->Reset()->Next(-1)" to get the number of hunks.
261
262           "Copy"
263
264               $copy = $diff->Copy( $newPos, $newBase );
265
266           "Copy" returns a copy of the object.  The copy and the orignal
267           object share most of their data, so making copies takes very little
268           memory.  The copy maintains its own position (separate from the
269           original), which is the main purpose of copies.  It also maintains
270           its own base.
271
272           By default, the copy's position starts out the same as the original
273           object's position.  But "Copy" takes an optional first argument to
274           set the new position, so the following three snippets are
275           equivalent:
276
277               $copy = $diff->Copy($pos);
278
279               $copy = $diff->Copy();
280               $copy->Reset($pos);
281
282               $copy = $diff->Copy()->Reset($pos);
283
284           "Copy" takes an optional second argument to set the base for the
285           copy.  If you wish to change the base of the copy but leave the
286           position the same as in the original, here are two equivalent ways:
287
288               $copy = $diff->Copy();
289               $copy->Base( 0 );
290
291               $copy = $diff->Copy(undef,0);
292
293           Here are two equivalent way to get a "reset" copy:
294
295               $copy = $diff->Copy(0);
296
297               $copy = $diff->Copy()->Reset();
298
299           "Diff"
300
301               $bits = $obj->Diff();
302
303           "Diff" returns a true value iff the current hunk contains items
304           that are different between the two sequences.  It actually returns
305           one of the follow 4 values:
306
307           3   "3==(1|2)".  This hunk contains items from @seq1 and the items
308               from @seq2 that should replace them.  Both sequence 1 and 2
309               contain changed items so both the 1 and 2 bits are set.
310
311           2   This hunk only contains items from @seq2 that should be
312               inserted (not items from @seq1).  Only sequence 2 contains
313               changed items so only the 2 bit is set.
314
315           1   This hunk only contains items from @seq1 that should be deleted
316               (not items from @seq2).  Only sequence 1 contains changed items
317               so only the 1 bit is set.
318
319           0   This means that the items in this hunk are the same in both
320               sequences.  Neither sequence 1 nor 2 contain changed items so
321               neither the 1 nor the 2 bits are set.
322
323           "Same"
324
325           "Same" returns a true value iff the current hunk contains items
326           that are the same in both sequences.  It actually returns the list
327           of items if they are the same or an emty list if they aren't.  In a
328           scalar context, it returns the size of the list.
329
330           "Items"
331
332               $count = $diff->Items(2);
333               @items = $diff->Items($seqNum);
334
335           "Items" returns the (number of) items from the specified sequence
336           that are part of the current hunk.
337
338           If the current hunk contains only insertions, then
339           "$diff->Items(1)" will return an empty list (0 in a scalar conext).
340           If the current hunk contains only deletions, then "$diff->Items(2)"
341           will return an empty list (0 in a scalar conext).
342
343           If the hunk contains replacements, then both "$diff->Items(1)" and
344           "$diff->Items(2)" will return different, non-empty lists.
345
346           Otherwise, the hunk contains identical items and all of the
347           following will return the same lists:
348
349               @items = $diff->Items(1);
350               @items = $diff->Items(2);
351               @items = $diff->Same();
352
353           "Range"
354
355               $count = $diff->Range( $seqNum );
356               @indices = $diff->Range( $seqNum );
357               @indices = $diff->Range( $seqNum, $base );
358
359           "Range" is like "Items" except that it returns a list of indices to
360           the items rather than the items themselves.  By default, the index
361           of the first item (in each sequence) is 0 but this can be changed
362           by calling the "Base" method.  So, by default, the following two
363           snippets return the same lists:
364
365               @list = $diff->Items(2);
366               @list = @seq2[ $diff->Range(2) ];
367
368           You can also specify the base to use as the second argument.  So
369           the following two snippets always return the same lists:
370
371               @list = $diff->Items(1);
372               @list = @seq1[ $diff->Range(1,0) ];
373
374           "Base"
375
376               $curBase = $diff->Base();
377               $oldBase = $diff->Base($newBase);
378
379           "Base" sets and/or returns the current base (usually 0 or 1) that
380           is used when you request range information.  The base defaults to 0
381           so that range information is returned as array indices.  You can
382           set the base to 1 if you want to report traditional line numbers
383           instead.
384
385           "Min"
386
387               $min1 = $diff->Min(1);
388               $min = $diff->Min( $seqNum, $base );
389
390           "Min" returns the first value that "Range" would return (given the
391           same arguments) or returns "undef" if "Range" would return an empty
392           list.
393
394           "Max"
395
396           "Max" returns the last value that "Range" would return or "undef".
397
398           "Get"
399
400               ( $n, $x, $r ) = $diff->Get(qw( min1 max1 range1 ));
401               @values = $diff->Get(qw( 0min2 1max2 range2 same base ));
402
403           "Get" returns one or more scalar values.  You pass in a list of the
404           names of the values you want returned.  Each name must match one of
405           the following regexes:
406
407               /^(-?\d+)?(min|max)[12]$/i
408               /^(range[12]|same|diff|base)$/i
409
410           The 1 or 2 after a name says which sequence you want the
411           information for (and where allowed, it is required).  The optional
412           number before "min" or "max" is the base to use.  So the following
413           equalities hold:
414
415               $diff->Get('min1') == $diff->Min(1)
416               $diff->Get('0min2') == $diff->Min(2,0)
417
418           Using "Get" in a scalar context when you've passed in more than one
419           name is a fatal error ("die" is called).
420
421   "prepare"
422       Given a reference to a list of items, "prepare" returns a reference to
423       a hash which can be used when comparing this sequence to other
424       sequences with "LCS" or "LCS_length".
425
426           $prep = prepare( \@seq1 );
427           for $i ( 0 .. 10_000 )
428           {
429               @lcs = LCS( $prep, $seq[$i] );
430               # do something useful with @lcs
431           }
432
433       "prepare" may be passed an optional third parameter; this is a CODE
434       reference to a key generation function.  See "KEY GENERATION
435       FUNCTIONS".
436
437           $prep = prepare( \@seq1, \&keyGen );
438           for $i ( 0 .. 10_000 )
439           {
440               @lcs = LCS( $seq[$i], $prep, \&keyGen );
441               # do something useful with @lcs
442           }
443
444       Using "prepare" provides a performance gain of about 50% when calling
445       LCS many times compared with not preparing.
446
447   "diff"
448           @diffs     = diff( \@seq1, \@seq2 );
449           $diffs_ref = diff( \@seq1, \@seq2 );
450
451       "diff" computes the smallest set of additions and deletions necessary
452       to turn the first sequence into the second, and returns a description
453       of these changes.  The description is a list of hunks; each hunk
454       represents a contiguous section of items which should be added,
455       deleted, or replaced.  (Hunks containing unchanged items are not
456       included.)
457
458       The return value of "diff" is a list of hunks, or, in scalar context, a
459       reference to such a list.  If there are no differences, the list will
460       be empty.
461
462       Here is an example.  Calling "diff" for the following two sequences:
463
464           a b c e h j l m n p
465           b c d e f j k l m r s t
466
467       would produce the following list:
468
469           (
470             [ [ '-', 0, 'a' ] ],
471
472             [ [ '+', 2, 'd' ] ],
473
474             [ [ '-', 4, 'h' ],
475               [ '+', 4, 'f' ] ],
476
477             [ [ '+', 6, 'k' ] ],
478
479             [ [ '-',  8, 'n' ],
480               [ '-',  9, 'p' ],
481               [ '+',  9, 'r' ],
482               [ '+', 10, 's' ],
483               [ '+', 11, 't' ] ],
484           )
485
486       There are five hunks here.  The first hunk says that the "a" at
487       position 0 of the first sequence should be deleted ("-").  The second
488       hunk says that the "d" at position 2 of the second sequence should be
489       inserted ("+").  The third hunk says that the "h" at position 4 of the
490       first sequence should be removed and replaced with the "f" from
491       position 4 of the second sequence.  And so on.
492
493       "diff" may be passed an optional third parameter; this is a CODE
494       reference to a key generation function.  See "KEY GENERATION
495       FUNCTIONS".
496
497       Additional parameters, if any, will be passed to the key generation
498       routine.
499
500   "sdiff"
501           @sdiffs     = sdiff( \@seq1, \@seq2 );
502           $sdiffs_ref = sdiff( \@seq1, \@seq2 );
503
504       "sdiff" computes all necessary components to show two sequences and
505       their minimized differences side by side, just like the Unix-utility
506       sdiff does:
507
508           same             same
509           before     |     after
510           old        <     -
511           -          >     new
512
513       It returns a list of array refs, each pointing to an array of display
514       instructions. In scalar context it returns a reference to such a list.
515       If there are no differences, the list will have one entry per item,
516       each indicating that the item was unchanged.
517
518       Display instructions consist of three elements: A modifier indicator
519       ("+": Element added, "-": Element removed, "u": Element unmodified,
520       "c": Element changed) and the value of the old and new elements, to be
521       displayed side-by-side.
522
523       An "sdiff" of the following two sequences:
524
525           a b c e h j l m n p
526           b c d e f j k l m r s t
527
528       results in
529
530           ( [ '-', 'a', ''  ],
531             [ 'u', 'b', 'b' ],
532             [ 'u', 'c', 'c' ],
533             [ '+', '',  'd' ],
534             [ 'u', 'e', 'e' ],
535             [ 'c', 'h', 'f' ],
536             [ 'u', 'j', 'j' ],
537             [ '+', '',  'k' ],
538             [ 'u', 'l', 'l' ],
539             [ 'u', 'm', 'm' ],
540             [ 'c', 'n', 'r' ],
541             [ 'c', 'p', 's' ],
542             [ '+', '',  't' ],
543           )
544
545       "sdiff" may be passed an optional third parameter; this is a CODE
546       reference to a key generation function.  See "KEY GENERATION
547       FUNCTIONS".
548
549       Additional parameters, if any, will be passed to the key generation
550       routine.
551
552   "compact_diff"
553       "compact_diff" is much like "sdiff" except it returns a much more
554       compact description consisting of just one flat list of indices.  An
555       example helps explain the format:
556
557           my @a = qw( a b c   e  h j   l m n p      );
558           my @b = qw(   b c d e f  j k l m    r s t );
559           @cdiff = compact_diff( \@a, \@b );
560           # Returns:
561           #   @a      @b       @a       @b
562           #  start   start   values   values
563           (    0,      0,   #       =
564                0,      0,   #    a  !
565                1,      0,   #  b c  =  b c
566                3,      2,   #       !  d
567                3,      3,   #    e  =  e
568                4,      4,   #    f  !  h
569                5,      5,   #    j  =  j
570                6,      6,   #       !  k
571                6,      7,   #  l m  =  l m
572                8,      9,   #  n p  !  r s t
573               10,     12,   #
574           );
575
576       The 0th, 2nd, 4th, etc. entries are all indices into @seq1 (@a in the
577       above example) indicating where a hunk begins.  The 1st, 3rd, 5th, etc.
578       entries are all indices into @seq2 (@b in the above example) indicating
579       where the same hunk begins.
580
581       So each pair of indices (except the last pair) describes where a hunk
582       begins (in each sequence).  Since each hunk must end at the item just
583       before the item that starts the next hunk, the next pair of indices can
584       be used to determine where the hunk ends.
585
586       So, the first 4 entries (0..3) describe the first hunk.  Entries 0 and
587       1 describe where the first hunk begins (and so are always both 0).
588       Entries 2 and 3 describe where the next hunk begins, so subtracting 1
589       from each tells us where the first hunk ends.  That is, the first hunk
590       contains items $diff[0] through "$diff[2] - 1" of the first sequence
591       and contains items $diff[1] through "$diff[3] - 1" of the second
592       sequence.
593
594       In other words, the first hunk consists of the following two lists of
595       items:
596
597                      #  1st pair     2nd pair
598                      # of indices   of indices
599           @list1 = @a[ $cdiff[0] .. $cdiff[2]-1 ];
600           @list2 = @b[ $cdiff[1] .. $cdiff[3]-1 ];
601                      # Hunk start   Hunk end
602
603       Note that the hunks will always alternate between those that are part
604       of the LCS (those that contain unchanged items) and those that contain
605       changes.  This means that all we need to be told is whether the first
606       hunk is a 'same' or 'diff' hunk and we can determine which of the other
607       hunks contain 'same' items or 'diff' items.
608
609       By convention, we always make the first hunk contain unchanged items.
610       So the 1st, 3rd, 5th, etc. hunks (all odd-numbered hunks if you start
611       counting from 1) all contain unchanged items.  And the 2nd, 4th, 6th,
612       etc. hunks (all even-numbered hunks if you start counting from 1) all
613       contain changed items.
614
615       Since @a and @b don't begin with the same value, the first hunk in our
616       example is empty (otherwise we'd violate the above convention).  Note
617       that the first 4 index values in our example are all zero.  Plug these
618       values into our previous code block and we get:
619
620           @hunk1a = @a[ 0 .. 0-1 ];
621           @hunk1b = @b[ 0 .. 0-1 ];
622
623       And "0..-1" returns the empty list.
624
625       Move down one pair of indices (2..5) and we get the offset ranges for
626       the second hunk, which contains changed items.
627
628       Since @diff[2..5] contains (0,0,1,0) in our example, the second hunk
629       consists of these two lists of items:
630
631               @hunk2a = @a[ $cdiff[2] .. $cdiff[4]-1 ];
632               @hunk2b = @b[ $cdiff[3] .. $cdiff[5]-1 ];
633           # or
634               @hunk2a = @a[ 0 .. 1-1 ];
635               @hunk2b = @b[ 0 .. 0-1 ];
636           # or
637               @hunk2a = @a[ 0 .. 0 ];
638               @hunk2b = @b[ 0 .. -1 ];
639           # or
640               @hunk2a = ( 'a' );
641               @hunk2b = ( );
642
643       That is, we would delete item 0 ('a') from @a.
644
645       Since @diff[4..7] contains (1,0,3,2) in our example, the third hunk
646       consists of these two lists of items:
647
648               @hunk3a = @a[ $cdiff[4] .. $cdiff[6]-1 ];
649               @hunk3a = @b[ $cdiff[5] .. $cdiff[7]-1 ];
650           # or
651               @hunk3a = @a[ 1 .. 3-1 ];
652               @hunk3a = @b[ 0 .. 2-1 ];
653           # or
654               @hunk3a = @a[ 1 .. 2 ];
655               @hunk3a = @b[ 0 .. 1 ];
656           # or
657               @hunk3a = qw( b c );
658               @hunk3a = qw( b c );
659
660       Note that this third hunk contains unchanged items as our convention
661       demands.
662
663       You can continue this process until you reach the last two indices,
664       which will always be the number of items in each sequence.  This is
665       required so that subtracting one from each will give you the indices to
666       the last items in each sequence.
667
668   "traverse_sequences"
669       "traverse_sequences" used to be the most general facility provided by
670       this module (the new OO interface is more powerful and much easier to
671       use).
672
673       Imagine that there are two arrows.  Arrow A points to an element of
674       sequence A, and arrow B points to an element of the sequence B.
675       Initially, the arrows point to the first elements of the respective
676       sequences.  "traverse_sequences" will advance the arrows through the
677       sequences one element at a time, calling an appropriate user-specified
678       callback function before each advance.  It willadvance the arrows in
679       such a way that if there are equal elements $A[$i] and $B[$j] which are
680       equal and which are part of the LCS, there will be some moment during
681       the execution of "traverse_sequences" when arrow A is pointing to
682       $A[$i] and arrow B is pointing to $B[$j].  When this happens,
683       "traverse_sequences" will call the "MATCH" callback function and then
684       it will advance both arrows.
685
686       Otherwise, one of the arrows is pointing to an element of its sequence
687       that is not part of the LCS.  "traverse_sequences" will advance that
688       arrow and will call the "DISCARD_A" or the "DISCARD_B" callback,
689       depending on which arrow it advanced.  If both arrows point to elements
690       that are not part of the LCS, then "traverse_sequences" will advance
691       one of them and call the appropriate callback, but it is not specified
692       which it will call.
693
694       The arguments to "traverse_sequences" are the two sequences to
695       traverse, and a hash which specifies the callback functions, like this:
696
697           traverse_sequences(
698               \@seq1, \@seq2,
699               {   MATCH => $callback_1,
700                   DISCARD_A => $callback_2,
701                   DISCARD_B => $callback_3,
702               }
703           );
704
705       Callbacks for MATCH, DISCARD_A, and DISCARD_B are invoked with at least
706       the indices of the two arrows as their arguments.  They are not
707       expected to return any values.  If a callback is omitted from the
708       table, it is not called.
709
710       Callbacks for A_FINISHED and B_FINISHED are invoked with at least the
711       corresponding index in A or B.
712
713       If arrow A reaches the end of its sequence, before arrow B does,
714       "traverse_sequences" will call the "A_FINISHED" callback when it
715       advances arrow B, if there is such a function; if not it will call
716       "DISCARD_B" instead.  Similarly if arrow B finishes first.
717       "traverse_sequences" returns when both arrows are at the ends of their
718       respective sequences.  It returns true on success and false on failure.
719       At present there is no way to fail.
720
721       "traverse_sequences" may be passed an optional fourth parameter; this
722       is a CODE reference to a key generation function.  See "KEY GENERATION
723       FUNCTIONS".
724
725       Additional parameters, if any, will be passed to the key generation
726       function.
727
728       If you want to pass additional parameters to your callbacks, but don't
729       need a custom key generation function, you can get the default by
730       passing undef:
731
732           traverse_sequences(
733               \@seq1, \@seq2,
734               {   MATCH => $callback_1,
735                   DISCARD_A => $callback_2,
736                   DISCARD_B => $callback_3,
737               },
738               undef,     # default key-gen
739               $myArgument1,
740               $myArgument2,
741               $myArgument3,
742           );
743
744       "traverse_sequences" does not have a useful return value; you are
745       expected to plug in the appropriate behavior with the callback
746       functions.
747
748   "traverse_balanced"
749       "traverse_balanced" is an alternative to "traverse_sequences". It uses
750       a different algorithm to iterate through the entries in the computed
751       LCS. Instead of sticking to one side and showing element changes as
752       insertions and deletions only, it will jump back and forth between the
753       two sequences and report changes occurring as deletions on one side
754       followed immediatly by an insertion on the other side.
755
756       In addition to the "DISCARD_A", "DISCARD_B", and "MATCH" callbacks
757       supported by "traverse_sequences", "traverse_balanced" supports a
758       "CHANGE" callback indicating that one element got "replaced" by
759       another:
760
761           traverse_balanced(
762               \@seq1, \@seq2,
763               {   MATCH => $callback_1,
764                   DISCARD_A => $callback_2,
765                   DISCARD_B => $callback_3,
766                   CHANGE    => $callback_4,
767               }
768           );
769
770       If no "CHANGE" callback is specified, "traverse_balanced" will map
771       "CHANGE" events to "DISCARD_A" and "DISCARD_B" actions, therefore
772       resulting in a similar behaviour as "traverse_sequences" with different
773       order of events.
774
775       "traverse_balanced" might be a bit slower than "traverse_sequences",
776       noticable only while processing huge amounts of data.
777
778       The "sdiff" function of this module is implemented as call to
779       "traverse_balanced".
780
781       "traverse_balanced" does not have a useful return value; you are
782       expected to plug in the appropriate behavior with the callback
783       functions.
784

KEY GENERATION FUNCTIONS

786       Most of the functions accept an optional extra parameter.  This is a
787       CODE reference to a key generating (hashing) function that should
788       return a string that uniquely identifies a given element.  It should be
789       the case that if two elements are to be considered equal, their keys
790       should be the same (and the other way around).  If no key generation
791       function is provided, the key will be the element as a string.
792
793       By default, comparisons will use "eq" and elements will be turned into
794       keys using the default stringizing operator '""'.
795
796       Where this is important is when you're comparing something other than
797       strings.  If it is the case that you have multiple different objects
798       that should be considered to be equal, you should supply a key
799       generation function. Otherwise, you have to make sure that your arrays
800       contain unique references.
801
802       For instance, consider this example:
803
804           package Person;
805
806           sub new
807           {
808               my $package = shift;
809               return bless { name => '', ssn => '', @_ }, $package;
810           }
811
812           sub clone
813           {
814               my $old = shift;
815               my $new = bless { %$old }, ref($old);
816           }
817
818           sub hash
819           {
820               return shift()->{'ssn'};
821           }
822
823           my $person1 = Person->new( name => 'Joe', ssn => '123-45-6789' );
824           my $person2 = Person->new( name => 'Mary', ssn => '123-47-0000' );
825           my $person3 = Person->new( name => 'Pete', ssn => '999-45-2222' );
826           my $person4 = Person->new( name => 'Peggy', ssn => '123-45-9999' );
827           my $person5 = Person->new( name => 'Frank', ssn => '000-45-9999' );
828
829       If you did this:
830
831           my $array1 = [ $person1, $person2, $person4 ];
832           my $array2 = [ $person1, $person3, $person4, $person5 ];
833           Algorithm::Diff::diff( $array1, $array2 );
834
835       everything would work out OK (each of the objects would be converted
836       into a string like "Person=HASH(0x82425b0)" for comparison).
837
838       But if you did this:
839
840           my $array1 = [ $person1, $person2, $person4 ];
841           my $array2 = [ $person1, $person3, $person4->clone(), $person5 ];
842           Algorithm::Diff::diff( $array1, $array2 );
843
844       $person4 and $person4->clone() (which have the same name and SSN) would
845       be seen as different objects. If you wanted them to be considered
846       equivalent, you would have to pass in a key generation function:
847
848           my $array1 = [ $person1, $person2, $person4 ];
849           my $array2 = [ $person1, $person3, $person4->clone(), $person5 ];
850           Algorithm::Diff::diff( $array1, $array2, \&Person::hash );
851
852       This would use the 'ssn' field in each Person as a comparison key, and
853       so would consider $person4 and $person4->clone() as equal.
854
855       You may also pass additional parameters to the key generation function
856       if you wish.
857

ERROR CHECKING

859       If you pass these routines a non-reference and they expect a reference,
860       they will die with a message.
861

AUTHOR

863       This version released by Tye McQueen (http://perlmonks.org/?node=tye).
864

LICENSE

866       Parts Copyright (c) 2000-2004 Ned Konz.  All rights reserved.  Parts by
867       Tye McQueen.
868
869       This program is free software; you can redistribute it and/or modify it
870       under the same terms as Perl.
871

MAILING LIST

873       Mark-Jason still maintains a mailing list.  To join a low-volume
874       mailing list for announcements related to diff and Algorithm::Diff,
875       send an empty mail message to mjd-perl-diff-request@plover.com.
876

CREDITS

878       Versions through 0.59 (and much of this documentation) were written by:
879
880       Mark-Jason Dominus, mjd-perl-diff@plover.com
881
882       This version borrows some documentation and routine names from Mark-
883       Jason's, but Diff.pm's code was completely replaced.
884
885       This code was adapted from the Smalltalk code of Mario Wolczko
886       <mario@wolczko.com>, which is available at
887       ftp://st.cs.uiuc.edu/pub/Smalltalk/MANCHESTER/manchester/4.0/diff.st
888
889       "sdiff" and "traverse_balanced" were written by Mike Schilli
890       <m@perlmeister.com>.
891
892       The algorithm is that described in A Fast Algorithm for Computing
893       Longest Common Subsequences, CACM, vol.20, no.5, pp.350-353, May 1977,
894       with a few minor improvements to improve the speed.
895
896       Much work was done by Ned Konz (perl@bike-nomad.com).
897
898       The OO interface and some other changes are by Tye McQueen.
899

POD ERRORS

901       Hey! The above document had some coding errors, which are explained
902       below:
903
904       Around line 989:
905           You can't have =items (as at line 1021) unless the first thing
906           after the =over is an =item
907
908       Around line 1108:
909           Expected text after =item, not a number
910
911       Around line 1114:
912           Expected text after =item, not a number
913
914       Around line 1120:
915           Expected text after =item, not a number
916
917
918
919perl v5.16.3                      2006-07-31                Algorithm::Diff(3)
Impressum