1XML::DifferenceMarkup(3U)ser Contributed Perl DocumentatiXoMnL::DifferenceMarkup(3)
2
3
4
6 XML::DifferenceMarkup - XML diff and merge
7
9 use XML::DifferenceMarkup qw(make_diff);
10 use XML::LibXML;
11
12 $parser = XML::LibXML->new(keep_blanks => 0, load_ext_dtd => 0);
13 $d1 = $parser->parse_file($fname1);
14 $d2 = $parser->parse_file($fname2);
15
16 $dom = make_diff($d1, $d2);
17 print $dom->toString(1);
18
20 This module implements an XML diff producing XML output. Both input and
21 output are DOM documents, as implemented by XML::LibXML.
22
23 The diff format used by XML::DifferenceMarkup is meant to be human-
24 readable (i.e. simple, as opposed to short) - basically the diff is a
25 subset of the input trees, annotated with instruction element nodes
26 specifying how to convert the source tree to the target by inserting
27 and deleting nodes. To prevent name colisions with input trees, all
28 added elements are in a namespace "http://www.locus.cz/diffmark" (the
29 diff will fail on input trees which already use that namespace).
30
31 The top-level node of the diff is always <diff/> (or rather <dm:diff
32 xmlns:dm="http://www.locus.cz/diffmark"> ... </dm:diff> - this
33 description omits the namespace specification from now on); under it
34 are fragments of the input trees and instruction nodes: <insert/>,
35 <delete/> and <copy/>. <copy/> is used in places where the input
36 subtrees are the same - in the limit, the diff of 2 identical documents
37 is
38
39 <?xml version="1.0"?>
40 <dm:diff xmlns:dm="http://www.locus.cz/diffmark">
41 <dm:copy count="1"/>
42 </dm:diff>
43
44 (copy always has the count attribute and no other content). <insert/>
45 and <delete/> have the obvious meaning - in the limit a diff of 2
46 documents which have nothing in common is something like
47
48 <?xml version="1.0"?>
49 <dm:diff xmlns:dm="http://www.locus.cz/diffmark">
50 <dm:delete>
51 <old/>
52 </dm:delete>
53 <dm:insert>
54 <new>
55 <tree>with the whole subtree, of course</tree>
56 </new>
57 </dm:insert>
58 </dm:diff>
59
60 A combination of <insert/>, <delete/> and <copy/> can capture any
61 difference, but it's sub-optimal for the case where (for example) the
62 top-level elements in the two input documents differ while their
63 subtrees are exactly the same. This case is handled by putting the
64 element from the second document into the diff, adding to it a special
65 attribute dm:update (whose value is the element name from the first
66 document) marking the element change:
67
68 <?xml version="1.0"?>
69 <dm:diff xmlns:dm="http://www.locus.cz/XML/diffmark">
70 <top-of-second dm:update="top-of-first">
71 <dm:copy count="42"/>
72 </top-of-second>
73 </dm:diff>
74
75 <delete/> contains just one level of nested nodes - their subtrees are
76 not included in the diff (but the element nodes which are included
77 always come with all their attributes). <insert/> and <delete/> don't
78 have any attributes and always contain some subtree.
79
80 Instruction nodes are never nested; all nodes above an instruction node
81 (except the top-level <diff/>) come from the input trees. A node from
82 the second input tree might be included in the output diff to provide
83 context for instruction nodes when it's an element node whose subtree
84 is not the same in the two input documents. When such an element has
85 the same name, attributes (names and values) and namespace declarations
86 in both input documents, it's always included in the diff (its
87 different output trees guarantee that it will have some chindren
88 there). If the corresponding elements are different, the one from the
89 second document might still be included, with an added dm:update
90 attribute, provided that both corresponding elements have non-empty
91 subtrees, and these subtrees are so similar that deleting the first
92 corresponding element and inserting the second would lead to a larger
93 diff. And if this paragraph seems too complicated, don't despair - just
94 ignore it and look at some examples.
95
97 Note that XML::DifferenceMarkup functions must be explicitly imported
98 (i.e. with "use XML::DifferenceMarkup qw(make_diff merge_diff);")
99 before they can be called.
100
101 make_diff
102 "make_diff" takes 2 parameters (the input documents) and produces their
103 diff. Note that the diff is asymmetric - "make_diff($a, $b)" is
104 different from "make_diff($b, $a)".
105
106 merge_diff
107 "merge_diff" takes the first document passed to "make_diff" and its
108 return value and produces the second document. (More-or-less - the
109 document isn't canonicalized, so opinions on its "equality" may
110 differ.)
111
112 Error Handling
113 Both "make_diff" and "merge_diff" throw exceptions on invalid input -
114 their own exceptions as well as exceptions thrown by XML::LibXML. These
115 exceptions can usually (probably not always, though - it used to be
116 possible to construct an input which would crash the calling process)
117 be catched by calling the functions from an eval block.
118
120 ยท information outside the document element is not processed
121
123 Vaclav Barta <vbar@comp.cz>
124
126 XML::LibXML
127
128
129
130perl v5.32.0 2020-07-28 XML::DifferenceMarkup(3)