1PCRE2CONVERT(3)            Library Functions Manual            PCRE2CONVERT(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

EXPERIMENTAL PATTERN CONVERSION FUNCTIONS

9
10       This  document describes a set of functions that can be used to convert
11       "foreign" patterns into PCRE2 regular  expressions.  This  facility  is
12       currently  experimental,  and  may  be  changed in future releases. Two
13       kinds of pattern, globs and POSIX patterns, are supported.
14

THE CONVERT CONTEXT

16
17       pcre2_convert_context *pcre2_convert_context_create(
18         pcre2_general_context *gcontext);
19
20       pcre2_convert_context *pcre2_convert_context_copy(
21         pcre2_convert_context *cvcontext);
22
23       void pcre2_convert_context_free(pcre2_convert_context *cvcontext);
24
25       int pcre2_set_glob_escape(pcre2_convert_context *cvcontext,
26         uint32_t escape_char);
27
28       int pcre2_set_glob_separator(pcre2_convert_context *cvcontext,
29         uint32_t separator_char);
30
31       A convert context is used to hold parameters that affect the  way  that
32       pattern  conversion  works.  Like all PCRE2 contexts, you need to use a
33       context only if you want to override the defaults. There are the  usual
34       create, copy, and free functions. If custom memory management functions
35       are set in a general  context  that  is  passed  to  pcre2_convert_con‐
36       text_create(),  they are used for all memory management within the con‐
37       version functions.
38
39       There are only two parameters in the convert context at  present.  Both
40       apply  only to glob conversions. The escape character defaults to grave
41       accent under Windows, otherwise backslash. It can be set to zero, mean‐
42       ing  no  escape  character, or to any punctuation character with a code
43       point less than 256.  The separator character defaults to backslash un‐
44       der  Windows,  otherwise forward slash. It can be set to forward slash,
45       backslash, or dot.
46
47       The two setting functions return zero on success,  or  PCRE2_ERROR_BAD‐
48       DATA if their second argument is invalid.
49

THE CONVERSION FUNCTION

51
52       int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length,
53         uint32_t options, PCRE2_UCHAR **buffer,
54         PCRE2_SIZE *blength, pcre2_convert_context *cvcontext);
55
56       void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern);
57
58       The  first  two arguments of pcre2_pattern_convert() define the foreign
59       pattern  that  is  to  be  converted.  The  length  may  be  given   as
60       PCRE2_ZERO_TERMINATED.  The options argument defines how the pattern is
61       to be processed. If the input  is  UTF,  the  PCRE2_CONVERT_UTF  option
62       should  be  set.  PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are
63       sure the input is valid.  One or more of the glob options,  or  one  of
64       the  following  POSIX options must be set to define the type of conver‐
65       sion that is required:
66
67         PCRE2_CONVERT_GLOB
68         PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
69         PCRE2_CONVERT_GLOB_NO_STARSTAR
70         PCRE2_CONVERT_POSIX_BASIC
71         PCRE2_CONVERT_POSIX_EXTENDED
72
73       Details of the conversions are given below. The buffer and blength  ar‐
74       guments define how the output is handled:
75
76       If  buffer  is  NULL,  the function just returns the length of the con‐
77       verted pattern via blength. This is one less than the length of  buffer
78       needed, because a terminating zero is always added to the output.
79
80       If  buffer points to a NULL pointer, an output buffer is obtained using
81       the allocator in the context or malloc() if no context is  supplied.  A
82       pointer  to  this  buffer  is  placed  in  the variable to which buffer
83       points.  When no longer needed the output buffer must be freed by call‐
84       ing  pcre2_converted_pattern_free().  If this function is called with a
85       NULL argument, it returns immediately without doing anything.
86
87       If buffer points to a non-NULL pointer, blength must be set to the  ac‐
88       tual length of the buffer provided (in code units).
89
90       In  all  cases, after successful conversion, the variable pointed to by
91       blength is updated to the length actually used (in code units), exclud‐
92       ing the terminating zero that is always added.
93
94       If  an  error  occurs,  the  length  (via blength) is set to the offset
95       within the input pattern where the error was detected. Only gross  syn‐
96       tax  errors are caught; there are plenty of errors that will get passed
97       on for pcre2_compile() to discover.
98
99       The return from pcre2_pattern_convert() is zero on success  or  a  non-
100       zero  PCRE2  error code. Note that PCRE2 error codes may be positive or
101       negative: pcre2_compile() uses mostly positive codes and  pcre2_match()
102       negative  ones;  pcre2_convert()  uses  existing codes of both kinds. A
103       textual error message can be obtained by  calling  pcre2_get_error_mes‐
104       sage().
105

CONVERTING GLOBS

107
108       Globs  are  used to match file names, and consequently have the concept
109       of a "path separator", which defaults to backslash  under  Windows  and
110       forward  slash otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards *
111       and ? are not permitted to match separator characters, but the  double-
112       star (**) feature (which does match separators) is supported.
113
114       PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR  matches  globs with wildcards al‐
115       lowed to  match  separator  characters.  PCRE2_CONVERT_GLOB_NO_STARSTAR
116       matches  globs with the double-star feature disabled. These options may
117       be given together.
118

CONVERTING POSIX PATTERNS

120
121       POSIX defines two kinds of regular expression pattern:  basic  and  ex‐
122       tended.  These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
123       PCRE2_CONVERT_POSIX_EXTENDED, respectively.
124
125       In POSIX patterns, backslash is not special in a character  class.  Un‐
126       matched closing parentheses are treated as literals.
127
128       In  basic patterns, ? + | {} and () must be escaped to be recognized as
129       metacharacters outside a character class. If the first character in the
130       pattern  is  * it is treated as a literal. ^ is a metacharacter only at
131       the start of a branch.
132
133       In extended patterns, a backslash not in a character class always makes
134       the  next  character  literal,  whatever it is. There are no backrefer‐
135       ences.
136
137       Note: POSIX mandates that the  longest  possible  match  at  the  first
138       matching  position  must be found. This is not what pcre2_match() does;
139       it yields the first  match  that  is  found.  An  application  can  use
140       pcre2_dfa_match()  to find the longest match, but that does not support
141       backreferences (but then neither do POSIX extended patterns).
142

AUTHOR

144
145       Philip Hazel
146       University Computing Service
147       Cambridge, England.
148

REVISION

150
151       Last updated: 28 June 2018
152       Copyright (c) 1997-2018 University of Cambridge.
153
154
155
156PCRE2 10.32                      28 June 2018                  PCRE2CONVERT(3)
Impressum