1uwildmat(3)               InterNetNews Documentation               uwildmat(3)
2
3
4

NAME

6       uwildmat, uwildmat_simple, uwildmat_poison - Perform wildmat matching
7

SYNOPSIS

9       #include <libinn.h>
10
11       bool uwildmat(const char *text, const char *pattern);
12
13       bool uwildmat_simple(const char *text, const char *pattern);
14
15       enum uwildmat uwildmat_poison(const char *text, const char *pattern);
16

DESCRIPTION

18       uwildmat compares text against the wildmat expression pattern, return‐
19       ing true if and only if the expression matches the text.  "@" has no
20       special meaning in pattern when passed to uwildmat.  Both text and pat‐
21       tern are assumed to be in the UTF-8 character encoding, although mal‐
22       formed UTF-8 sequences are treated in a way that attempts to be mostly
23       compatible with single-octet character sets like ISO 8859-1.  (In other
24       words, if you try to match ISO 8859-1 text with these routines every‐
25       thing should work as expected unless the ISO 8859-1 text contains valid
26       UTF-8 sequences, which thankfully is somewhat rare.)
27
28       uwildmat_simple is identical to uwildmat except that neither "!"  nor
29       "," have any special meaning and pattern is always treated as a single
30       pattern.  This function exists solely to support legacy interfaces like
31       NNTP's XPAT command, and should be avoided when implementing new fea‐
32       tures.
33
34       uwildmat_poison works similarly to uwildmat, except that "@" as the
35       first character of one of the patterns in the expression (see below)
36       "poisons" the match if it matches.  uwildmat_poison returns UWILD‐
37       MAT_MATCH if the expression matches the text, UWILDMAT_FAIL if it
38       doesn't, and UWILDMAT_POISON if the expression doesn't match because a
39       poisoned pattern matched the text.  These enumeration constants are
40       defined in the libinn.h header.
41

WILDMAT EXPRESSIONS

43       A wildmat expression follows rules similar to those of shell filename
44       wildcards but with some additions and changes.  A wildmat expression is
45       composed of one or more wildmat patterns separated by commas.  Each
46       character in the wildmat pattern matches a literal occurance of that
47       same character in the text, with the exception of the following
48       metacharacters:
49
50       ?       Matches any single character (including a single UTF-8 multi‐
51               byte character, so "?" can match more than one byte).
52
53       *       Matches any sequence of zero or more characters.
54
55       \       Turns off any special meaning of the following character; the
56               following character will match itself in the text.  "\" will
57               escape any character, including another backslash or a comma
58               that otherwise would separate a pattern from the next pattern
59               in an expression.  Note that "\" is not special inside a char‐
60               acter range (no metacharacters are).
61
62       [...]   A character set, which matches any single character that falls
63               within that set.  The presence of a character between the
64               brackets adds that character to the set; for example, "[amv]"
65               specifies the set containing the characters "a", "m", and "v".
66               A range of characters may be specified using "-"; for example,
67               "[0-5abc]" is equivalent to "[012345abc]".  The order of char‐
68               acters is as defined in the UTF-8 character set, and if the
69               start character of such a range falls after the ending charac‐
70               ter of the range in that ranking the results of attempting a
71               match with that pattern are undefined.
72
73               In order to include a literal "]" character in the set, it must
74               be the first character of the set (possibly following "^"); for
75               example, "[]a]" matches either "]" or "a".  To include a lit‐
76               eral "-" character in the set, it must be either the first or
77               the last character of the set.  Backslashes have no special
78               meaning inside a character set, nor do any other of the wildmat
79               metacharacters.
80
81       [^...]  A negated character set.  Follows the same rules as a character
82               set above, but matches any character not contained in the set.
83               So, for example, "[^]-]" matches any character except "]" and
84               "-".
85
86       In addition, "!" (and possibly "@") have special meaning as the first
87       character of a pattern; see below.
88
89       When matching a wildmat expression against some text, each comma-sepa‐
90       rated pattern is matched in order from left to right.  In order to
91       match, the pattern must match the whole text; in regular expression
92       terminology, it's implicitly anchored at both the beginning and the
93       end.  For example, the pattern "a" matches only the text "a"; it
94       doesn't match "ab" or "ba" or even "aa".  If none of the patterns
95       match, the whole expression doesn't match.  Otherwise, whether the
96       expression matches is determined entirely by the rightmost matching
97       pattern; the expression matches the text if and only if the rightmost
98       matching pattern is not negated.
99
100       For example, consider the text "news.misc".  The expression "*" matches
101       this text, of course, as does "comp.*,news.*" (because the second pat‐
102       tern matches).  "news.*,!news.misc" does not match this text because
103       both patterns match, meaning that the rightmost takes precedence, and
104       the rightmost matching pattern is negated.  "news.*,!news.misc,*.misc"
105       does match this text, since the rightmost matching pattern is not
106       negated.
107
108       Note that the expression "!news.misc" can't match anything.  Either the
109       pattern doesn't match, in which case no patterns match and the expres‐
110       sion doesn't match, or the pattern does match, in which case because
111       it's negated the expression doesn't match.  "*,!news.misc", on the
112       other hand, is a useful pattern that matches anything except
113       "news.misc".
114
115       "!" has significance only as the first character of a pattern; anywhere
116       else in the pattern, it matches a literal "!" in the text like any
117       other non-metacharacter.
118
119       If the uwildmat_poison interface is used, then "@" behaves the same as
120       "!" except that if an expression fails to match because the rightmost
121       matching pattern began with "@", UWILDMAT_POISON is returned instead of
122       UWILDMAT_FAIL.
123
124       If the uwildmat_simple interface is used, the matching rules are the
125       same as above except that none of "!", "@", or "," have any special
126       meaning at all and only match those literal characters.
127

BUGS

129       All of these functions internally convert the passed arguments to const
130       unsigned char pointers.  The only reason why they take regular char
131       pointers instead of unsigned char is for the convenience of INN and
132       other callers that may not be using unsigned char everywhere they
133       should.  In a future revision, the public interface should be changed
134       to just take unsigned char pointers.
135

HISTORY

137       Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to Usenet
138       several times since then, most notably in comp.sources.misc in March,
139       1991.
140
141       Lars Mathiesen <thorinn@diku.dk> enhanced the multi-asterisk failure
142       mode in early 1991.
143
144       Rich and Lars increased the efficiency of star patterns and reposted it
145       to comp.sources.misc in April, 1991.
146
147       Robert Elz <kre@munnari.oz.au> added minus sign and close bracket han‐
148       dling in June, 1991.
149
150       Russ Allbery <rra@stanford.edu> added support for comma-separated pat‐
151       terns and the "!" and "@" metacharacters to the core wildmat routines
152       in July, 2000.  He also added support for UTF-8 characters, changed the
153       default behavior to assume that both the text and the pattern are in
154       UTF-8, and largely rewrote this documentation to expand and clarify the
155       description of how a wildmat expression matches.
156
157       Please note that the interfaces to these functions are named uwildmat
158       and the like rather than wildmat to distinguish them from the wildmat
159       function provided by Rich $alz's original implementation.  While this
160       code is heavily based on Rich's original code, it has substantial dif‐
161       ferences, including the extension to support UTF-8 characters, and has
162       noticable functionality changes.  Any bugs present in it aren't Rich's
163       fault.
164
165       $Id: uwildmat.3 5652 2002-08-24 17:25:23Z vinocur $
166

SEE ALSO

168       grep(1), fnmatch(3), regex(3), regexp(3).
169
170
171
172INN 2.4.0                         2002-08-10                       uwildmat(3)
Impressum