1PCRE2CONVERT(3) Library Functions Manual PCRE2CONVERT(3)
2
3
4
6 PCRE2 - Perl-compatible regular expressions (revised API)
7
9
10 This document describes a set of functions that can be used to convert
11 "foreign" patterns into PCRE2 regular expressions. This facility is
12 currently experimental, and may be changed in future releases. Two
13 kinds of pattern, globs and POSIX patterns, are supported.
14
16
17 pcre2_convert_context *pcre2_convert_context_create(
18 pcre2_general_context *gcontext);
19
20 pcre2_convert_context *pcre2_convert_context_copy(
21 pcre2_convert_context *cvcontext);
22
23 void pcre2_convert_context_free(pcre2_convert_context *cvcontext);
24
25 int pcre2_set_glob_escape(pcre2_convert_context *cvcontext,
26 uint32_t escape_char);
27
28 int pcre2_set_glob_separator(pcre2_convert_context *cvcontext,
29 uint32_t separator_char);
30
31 A convert context is used to hold parameters that affect the way that
32 pattern conversion works. Like all PCRE2 contexts, you need to use a
33 context only if you want to override the defaults. There are the usual
34 create, copy, and free functions. If custom memory management functions
35 are set in a general context that is passed to pcre2_convert_con‐
36 text_create(), they are used for all memory management within the con‐
37 version functions.
38
39 There are only two parameters in the convert context at present. Both
40 apply only to glob conversions. The escape character defaults to grave
41 accent under Windows, otherwise backslash. It can be set to zero, mean‐
42 ing no escape character, or to any punctuation character with a code
43 point less than 256. The separator character defaults to backslash un‐
44 der Windows, otherwise forward slash. It can be set to forward slash,
45 backslash, or dot.
46
47 The two setting functions return zero on success, or PCRE2_ERROR_BAD‐
48 DATA if their second argument is invalid.
49
51
52 int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length,
53 uint32_t options, PCRE2_UCHAR **buffer,
54 PCRE2_SIZE *blength, pcre2_convert_context *cvcontext);
55
56 void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern);
57
58 The first two arguments of pcre2_pattern_convert() define the foreign
59 pattern that is to be converted. The length may be given as
60 PCRE2_ZERO_TERMINATED. The options argument defines how the pattern is
61 to be processed. If the input is UTF, the PCRE2_CONVERT_UTF option
62 should be set. PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are
63 sure the input is valid. One or more of the glob options, or one of
64 the following POSIX options must be set to define the type of conver‐
65 sion that is required:
66
67 PCRE2_CONVERT_GLOB
68 PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
69 PCRE2_CONVERT_GLOB_NO_STARSTAR
70 PCRE2_CONVERT_POSIX_BASIC
71 PCRE2_CONVERT_POSIX_EXTENDED
72
73 Details of the conversions are given below. The buffer and blength ar‐
74 guments define how the output is handled:
75
76 If buffer is NULL, the function just returns the length of the con‐
77 verted pattern via blength. This is one less than the length of buffer
78 needed, because a terminating zero is always added to the output.
79
80 If buffer points to a NULL pointer, an output buffer is obtained using
81 the allocator in the context or malloc() if no context is supplied. A
82 pointer to this buffer is placed in the variable to which buffer
83 points. When no longer needed the output buffer must be freed by call‐
84 ing pcre2_converted_pattern_free(). If this function is called with a
85 NULL argument, it returns immediately without doing anything.
86
87 If buffer points to a non-NULL pointer, blength must be set to the ac‐
88 tual length of the buffer provided (in code units).
89
90 In all cases, after successful conversion, the variable pointed to by
91 blength is updated to the length actually used (in code units), exclud‐
92 ing the terminating zero that is always added.
93
94 If an error occurs, the length (via blength) is set to the offset
95 within the input pattern where the error was detected. Only gross syn‐
96 tax errors are caught; there are plenty of errors that will get passed
97 on for pcre2_compile() to discover.
98
99 The return from pcre2_pattern_convert() is zero on success or a non-
100 zero PCRE2 error code. Note that PCRE2 error codes may be positive or
101 negative: pcre2_compile() uses mostly positive codes and pcre2_match()
102 negative ones; pcre2_convert() uses existing codes of both kinds. A
103 textual error message can be obtained by calling pcre2_get_error_mes‐
104 sage().
105
107
108 Globs are used to match file names, and consequently have the concept
109 of a "path separator", which defaults to backslash under Windows and
110 forward slash otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards *
111 and ? are not permitted to match separator characters, but the double-
112 star (**) feature (which does match separators) is supported.
113
114 PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards al‐
115 lowed to match separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR
116 matches globs with the double-star feature disabled. These options may
117 be given together.
118
120
121 POSIX defines two kinds of regular expression pattern: basic and ex‐
122 tended. These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
123 PCRE2_CONVERT_POSIX_EXTENDED, respectively.
124
125 In POSIX patterns, backslash is not special in a character class. Un‐
126 matched closing parentheses are treated as literals.
127
128 In basic patterns, ? + | {} and () must be escaped to be recognized as
129 metacharacters outside a character class. If the first character in the
130 pattern is * it is treated as a literal. ^ is a metacharacter only at
131 the start of a branch.
132
133 In extended patterns, a backslash not in a character class always makes
134 the next character literal, whatever it is. There are no backrefer‐
135 ences.
136
137 Note: POSIX mandates that the longest possible match at the first
138 matching position must be found. This is not what pcre2_match() does;
139 it yields the first match that is found. An application can use
140 pcre2_dfa_match() to find the longest match, but that does not support
141 backreferences (but then neither do POSIX extended patterns).
142
144
145 Philip Hazel
146 University Computing Service
147 Cambridge, England.
148
150
151 Last updated: 28 June 2018
152 Copyright (c) 1997-2018 University of Cambridge.
153
154
155
156PCRE2 10.32 28 June 2018 PCRE2CONVERT(3)