1LaTeXML::Core::DocumentU(s3e)r Contributed Perl DocumentaLtaiToenXML::Core::Document(3)
2
3
4
6 "LaTeXML::Core::Document" - represents an XML document under
7 construction.
8
10 A "LaTeXML::Core::Document" represents an XML document being
11 constructed by LaTeXML, and also provides the methods for constructing
12 it. It extends LaTeXML::Common::Object.
13
14 LaTeXML will have digested the source material resulting in a
15 LaTeXML::Core::List (from a LaTeXML::Core::Stomach) of
16 LaTeXML::Core::Boxs, LaTeXML::Core::Whatsits and sublists. At this
17 stage, a document is created and it is responsible for `absorbing' the
18 digested material. Generally, the LaTeXML::Core::Boxs and
19 LaTeXML::Core::Lists create text nodes, whereas the
20 LaTeXML::Core::Whatsits create "XML" document fragments, elements and
21 attributes according to the defining
22 LaTeXML::Core::Definition::Constructor.
23
24 Most document construction occurs at a current insertion point where
25 material will be added, and which moves along with the inserted
26 material. The LaTeXML::Common::Model, derived from various
27 declarations and document type, is consulted to determine whether an
28 insertion is allowed and when elements may need to be automatically
29 opened or closed in order to carry out a given insertion. For example,
30 a "subsection" element will typically be closed automatically when it
31 is attempted to open a "section" element.
32
33 In the methods described here, the term $qname is used for XML
34 qualified names. These are tag names with a namespace prefix. The
35 prefix should be one registered with the current Model, for use within
36 the code. This prefix is not necessarily the same as the one used in
37 any DTD, but should be mapped to the a Namespace URI that was
38 registered for the DTD.
39
40 The arguments named $node are an XML::LibXML node.
41
42 The methods here are grouped into three sections covering basic access
43 to the document, insertion methods at the current insertion point, and
44 less commonly used, lower-level, document manipulation methods.
45
46 Accessors
47 "$doc = $document->getDocument;"
48 Returns the "XML::LibXML::Document" currently being constructed.
49
50 "$doc = $document->getModel;"
51 Returns the "LaTeXML::Common::Model" that represents the document
52 model used for this document.
53
54 "$node = $document->getNode;"
55 Returns the node at the current insertion point during
56 construction. This node is considered still to be `open'; any
57 insertions will go into it (if possible). The node will be an
58 "XML::LibXML::Element", "XML::LibXML::Text" or, initially,
59 "XML::LibXML::Document".
60
61 "$node = $document->getElement;"
62 Returns the closest ancestor to the current insertion point that is
63 an Element.
64
65 "$node = $document->getChildElement($node);"
66 Returns a list of the child elements, if any, of the $node.
67
68 "@nodes = $document->getLastChildElement($node);"
69 Returns the last child element of the $node, if it has one, else
70 undef.
71
72 "$node = $document->getFirstChildElement($node);"
73 Returns the first child element of the $node, if it has one, else
74 undef.
75
76 "@nodes = $document->findnodes($xpath,$node);"
77 Returns a list of nodes matching the given $xpath expression. The
78 context node for $xpath is $node, if given, otherwise it is the
79 document element.
80
81 "$node = $document->findnode($xpath,$node);"
82 Returns the first node matching the given $xpath expression. The
83 context node for $xpath is $node, if given, otherwise it is the
84 document element.
85
86 "$node = $document->getNodeQName($node);"
87 Returns the qualified name (localname with namespace prefix) of the
88 given $node. The namespace prefix mapping is the code mapping of
89 the current document model.
90
91 "$boolean = $document->canContain($tag,$child);"
92 Returns whether an element $tag can contain a child $child. $tag
93 and $child can be nodes, qualified names of nodes
94 (prefix:localname), or one of a set of special symbols "#PCDATA",
95 "#Comment", "#Document" or "#ProcessingInstruction".
96
97 "$boolean = $document->canContainIndirect($tag,$child);"
98 Returns whether an element $tag can contain a child $child either
99 directly, or after automatically opening one or more autoOpen-able
100 elements.
101
102 "$boolean = $document->canContainSomehow($tag,$child);"
103 Returns whether an element $tag can contain a child $child either
104 directly, or after automatically opening one or more autoOpen-able
105 elements.
106
107 "$boolean = $document->canHaveAttribute($tag,$attrib);"
108 Returns whether an element $tag can have an attribute named
109 $attrib.
110
111 "$boolean = $document->canAutoOpen($tag);"
112 Returns whether an element $tag is able to be automatically opened.
113
114 "$boolean = $document->canAutoClose($node);"
115 Returns whether the node $node can be automatically closed.
116
117 Construction Methods
118 These methods are the most common ones used for construction of
119 documents. They generally operate by creating new material at the
120 current insertion point. That point initially is just the document
121 itself, but it moves along to follow any new insertions. These methods
122 also adapt to the document model so as to automatically open or close
123 elements, when it is required for the pending insertion and allowed by
124 the document model (See Tag).
125
126 "$xmldoc = $document->finalize;"
127 This method finalizes the document by cleaning up various temporary
128 attributes, and returns the XML::LibXML::Document that was
129 constructed.
130
131 "@nodes = $document->absorb($digested);"
132 Absorb the $digested object into the document at the current
133 insertion point according to its type. Various of the the other
134 methods are invoked as needed, and document nodes may be
135 automatically opened or closed according to the document model.
136
137 This method returns the nodes that were constructed. Note that the
138 nodes may include children of other nodes, and nodes that may
139 already have been removed from the document (See filterChildren and
140 filterDeleted). Also, text insertions are often merged with
141 existing text nodes; in such cases, the whole text node is included
142 in the result.
143
144 "$document->insertElement($qname,$content,%attributes);"
145 This is a shorthand for creating an element $qname (with given
146 attributes), absorbing $content from within that new node, and then
147 closing it. The $content must be digested material, either a
148 single box, or an array of boxes, which will be absorbed into the
149 element. This method returns the newly created node, although it
150 will no longer be the current insertion point.
151
152 "$document->insertMathToken($string,%attributes);"
153 Insert a math token (XMTok) containing the string $string with the
154 given attributes. Useful attributes would be name, role, font.
155 Returns the newly inserted node.
156
157 "$document->insertComment($text);"
158 Insert, and return, a comment with the given $text into the current
159 node.
160
161 "$document->insertPI($op,%attributes);"
162 Insert, and return, a ProcessingInstruction into the current node.
163
164 "$document->openText($text,$font);"
165 Open a text node in font $font, performing any required automatic
166 opening and closing of intermedate nodes (including those needed
167 for font changes) and inserting the string $text into it.
168
169 "$document->openElement($qname,%attributes);"
170 Open an element, named $qname and with the given attributes. This
171 will be inserted into the current node while performing any
172 required automatic opening and closing of intermedate nodes. The
173 new element is returned, and also becomes the current insertion
174 point. An error (fatal if in "Strict" mode) is signalled if there
175 is no allowed way to insert such an element into the current node.
176
177 "$document->closeElement($qname);"
178 Close the closest open element named $qname including any
179 intermedate nodes that may be automatically closed. If that is not
180 possible, signal an error. The closed node's parent becomes the
181 current node. This method returns the closed node.
182
183 "$node = $document->isOpenable($qname);"
184 Check whether it is possible to open a $qname element at the
185 current insertion point.
186
187 "$node = $document->isCloseable($qname);"
188 Check whether it is possible to close a $qname element, returning
189 the node that would be closed if possible, otherwise undef.
190
191 "$document->maybeCloseElement($qname);"
192 Close a $qname element, if it is possible to do so, returns the
193 closed node if it was found, else undef.
194
195 "$document->addAttribute($key=>$value);"
196 Add the given attribute to the node nearest to the current
197 insertion point that is allowed to have it. This does not change
198 the current insertion point.
199
200 "$document->closeToNode($node);"
201 This method closes all children of $node until $node becomes the
202 insertion point. Note that it closes any open nodes, not only
203 autoCloseable ones.
204
205 Internal Insertion Methods
206
207 These are described as an aide to understanding the code; they rarely,
208 if ever, should be used outside this module.
209
210 "$document->setNode($node);"
211 Sets the current insertion point to be $node. This should be
212 rarely used, if at all; The construction methods of document
213 generally maintain the notion of insertion point automatically.
214 This may be useful to allow insertion into a different part of the
215 document, but you probably want to set the insertion point back to
216 the previous node, afterwards.
217
218 "$string = $document->getInsertionContext($levels);"
219 For debugging, return a string showing the context of the current
220 insertion point; that is, the string of the nodes leading up to it.
221 if $levels is defined, show only that many nodes.
222
223 "$node = $document->find_insertion_point($qname);"
224 This internal method is used to find the appropriate point,
225 relative to the current insertion point, that an element with the
226 specified $qname can be inserted. That position may require
227 automatic opening or closing of elements, according to what is
228 allowed by the document model.
229
230 "@nodes = getInsertionCandidates($node);"
231 Returns a list of elements where an arbitrary insertion might take
232 place. Roughly this is a list starting with $node, followed by its
233 parent and the parents siblings (in reverse order), followed by the
234 grandparent and siblings (in reverse order).
235
236 "$node = $document->floatToElement($qname);"
237 Finds the nearest element at or preceding the current insertion
238 point (see "getInsertionCandidates"), that can accept an element
239 $qname; it moves the insertion point to that point, and returns the
240 previous insertion point. Generally, after doing whatever you need
241 at the new insertion point, you should call
242 "$document->setNode($node);" to restore the insertion point. If no
243 such point is found, the insertion point is left unchanged, and
244 undef is returned.
245
246 "$node = $document->floatToAttribute($key);"
247 This method works the same as "floatToElement", but find the
248 nearest element that can accept the attribute $key.
249
250 "$node = $document->openText_internal($text);"
251 This is an internal method, used by "openText", that assumes the
252 insertion point has been appropriately adjusted.)
253
254 "$node = $document->openMathText_internal($text);"
255 This internal method appends $text to the current insertion point,
256 which is assumed to be a math node. It checks for math ligatures
257 and carries out any combinations called for.
258
259 "$node = $document->closeText_internal();"
260 This internal method closes the current node, which should be a
261 text node. It carries out any text ligatures on the content.
262
263 "$node = $document->closeNode_internal($node);"
264 This internal method closes any open text or element nodes starting
265 at the current insertion point, up to and including $node.
266 Afterwards, the parent of $node will be the current insertion
267 point. It condenses the tree to avoid redundant font switching
268 elements.
269
270 "$document->afterOpen($node);"
271 Carries out any afterOpen operations that have been recorded (using
272 "Tag") for the element name of $node.
273
274 "$document->afterClose($node);"
275 Carries out any afterClose operations that have been recorded
276 (using "Tag") for the element name of $node.
277
278 Document Modification
279 The following methods are used to perform various sorts of modification
280 and rearrangements of the document, after the normal flow of insertion
281 has taken place. These may be needed after an environment (or perhaps
282 the whole document) has been completed and one needs to analyze what it
283 contains to decide on the appropriate representation.
284
285 "$document->setAttribute($node,$key,$value);"
286 Sets the attribute $key to $value on $node. This method is
287 preferred over the direct LibXML one, since it takes care of
288 decoding namespaces (if $key is a qname), and also manages
289 recording of xml:id's.
290
291 "$document->recordID($id,$node);"
292 Records the association of the given $node with the $id, which
293 should be the "xml:id" attribute of the $node. Usually this
294 association will be maintained by the methods that create nodes or
295 set attributes.
296
297 "$document->unRecordID($id);"
298 Removes the node associated with the given $id, if any. This might
299 be needed if a node is deleted.
300
301 "$document->modifyID($id);"
302 Adjusts $id, if needed, so that it is unique. It does this by
303 appending a letter and incrementing until it finds an id that is
304 not yet associated with a node.
305
306 "$node = $document->lookupID($id);"
307 Returns the node, if any, that is associated with the given $id.
308
309 "$document->setNodeBox($node,$box);"
310 Records the $box (being a Box, Whatsit or List), that was
311 (presumably) responsible for the creation of the element $node.
312 This information is useful for determining source locations,
313 original TeX strings, and so forth.
314
315 "$box = $document->getNodeBox($node);"
316 Returns the $box that was responsible for creating the element
317 $node.
318
319 "$document->setNodeFont($node,$font);"
320 Records the font object that encodes the font that should be used
321 to display any text within the element $node.
322
323 "$font = $document->getNodeFont($node);"
324 Returns the font object associated with the element $node.
325
326 "$node = $document->openElementAt($point,$qname,%attributes);"
327 Opens a new child element in $point with the qualified name $qname
328 and with the given attributes. This method is not affected by, nor
329 does it affect, the current insertion point. It does manage
330 namespaces, xml:id's and associating a box, font and locator with
331 the new element, as well as running any "afterOpen" operations.
332
333 "$node = $document->closeElementAt($node);"
334 Closes $node. This method is not affected by, nor does it affect,
335 the current insertion point. However, it does run any "afterClose"
336 operations, so any element that was created using the lower-level
337 "openElementAt" should be closed using this method.
338
339 "$node = $document->appendClone($node,@newchildren);"
340 Appends clones of @newchildren to $node. This method modifies any
341 ids found within @newchildren (using "modifyID"), and fixes up any
342 references to those ids within the clones so that they refer to the
343 modified id.
344
345 "$node = $document->wrapNodes($qname,@nodes);"
346 This method wraps the @nodes by a new element with qualified name
347 $qname, that new node replaces the first of @node. The remaining
348 nodes in @nodes must be following siblings of the first one.
349
350 NOTE: Does this need multiple nodes? If so, perhaps some kind of
351 movenodes helper? Otherwise, what about attributes?
352
353 "$node = $document->unwrapNodes($node);"
354 Unwrap the children of $node, by replacing $node by its children.
355
356 "$node = $document->replaceNode($node,@nodes);"
357 Replace $node by @nodes; presumably they are some sort of
358 descendant nodes.
359
360 "$node = $document->renameNode($node,$newname);"
361 Rename $node to the tagname $newname; equivalently replace $node by
362 a new node with name $newname and copy the attributes and contents.
363 It is assumed that $newname can contain those attributes and
364 contents.
365
366 "@nodes = $document->filterDeletions(@nodes);"
367 This function is useful with "$doc-"absorb($box)>, when you want to
368 filter out any nodes that have been deleted and no longer appear in
369 the document.
370
371 "@nodes = $document->filterChildren(@nodes);"
372 This function is useful with "$doc-"absorb($box)>, when you want to
373 filter out any nodes that are children of other nodes in @nodes.
374
376 Bruce Miller <bruce.miller@nist.gov>
377
379 Public domain software, produced as part of work done by the United
380 States Government & not subject to copyright in the US.
381
382
383
384perl v5.32.0 2020-11-17 LaTeXML::Core::Document(3)