1HXUNENT(1) HTML-XML-utils HXUNENT(1)
2
3
4
6 hxunent - replace HTML predefined character entities by UTF-8
7
9 hxunent [ -b ] [ -f ] [ file ]
10
12 The hxunent command reads the file (or standard input) and copies it to
13 standard output with &-entities by their equivalent character (encoded
14 as UTF-8). E.g., " is replaced by " and < is replaced by <.
15
17 The following options are supported:
18
19 -b The five builtin entities of XML (< > " '
20 &) are not replaced but copied unchanged. This is neces‐
21 sary if the output has to be valid XML or SGML.
22
23 -f This option changes how unknown entities or lone ampersands
24 are handled. Normally they are copied unchanged, but this
25 option tries to "fix" them by replacing ampersands by &.
26 Often such stray ampersands are the result of copy and paste
27 of URLs into a document and then this option indeed fixes
28 them and makes the document valid.
29
31 The program's exit value is 0 if all went well, otherwise:
32
33 1 The input couldn't be read (file not found, file not read‐
34 able...)
35
36 2 Wrong command line arguments.
37
39 asc2xml(1), xml2asc(1), UTF-8 (RFC 2279)
40
42 The program assumes entities are as defined by HTML. It doesn't read a
43 document's DTD to find the actual definitions in use in a document.
44 With -f, it will even remove all entities that are not HTML entities.
45
46
47
485.x 21 Nov 2008 HXUNENT(1)