1Str(3) OCaml library Str(3)
2
3
4
6 Str - Regular expressions and high-level string processing
7
9 Module Str
10
12 Module Str
13 : sig end
14
15
16 Regular expressions and high-level string processing
17
18
19
20
21
22
23
24 === Regular expressions ===
25
26
27 type regexp
28
29
30 The type of compiled regular expressions.
31
32
33
34 val regexp : string -> regexp
35
36 Compile a regular expression. The following constructs are recognized:
37
38 - . Matches any character except newline.
39
40 - * (postfix) Matches the preceding expression zero, one or several
41 times
42
43 - + (postfix) Matches the preceding expression one or several times
44
45 - ? (postfix) Matches the preceding expression once or not at all
46
47 - [..] Character set. Ranges are denoted with - , as in [a-z] . An
48 initial ^ , as in [^0-9] , complements the set. To include a ] charac‐
49 ter in a set, make it the first character of the set. To include a -
50 character in a set, make it the first or the last character of the set.
51
52 - ^ Matches at beginning of line: either at the beginning of the
53 matched string, or just after a '\n' character.
54
55 - $ Matches at end of line: either at the end of the matched string, or
56 just before a '\n' character.
57
58 - \| (infix) Alternative between two expressions.
59
60 - \(..\) Grouping and naming of the enclosed expression.
61
62 - \1 The text matched by the first \(...\) expression ( \2 for the sec‐
63 ond expression, and so on up to \9 ).
64
65 - \b Matches word boundaries.
66
67 - \ Quotes special characters. The special characters are $^\.*+?[] .
68
69 Note: the argument to regexp is usually a string literal. In this case,
70 any backslash character in the regular expression must be doubled to
71 make it past the OCaml string parser. For example, the following
72 expression: let r = Str.regexp hello \\([A-Za-z]+\\) in
73 Str.replace_first r \\1 hello world returns the string world .
74
75 In particular, if you want a regular expression that matches a single
76 backslash character, you need to quote it in the argument to regexp
77 (according to the last item of the list above) by adding a second back‐
78 slash. Then you need to quote both backslashes (according to the syntax
79 of string constants in OCaml) by doubling them again, so you need to
80 write four backslash characters: Str.regexp \\\\ .
81
82
83
84 val regexp_case_fold : string -> regexp
85
86 Same as regexp , but the compiled expression will match text in a
87 case-insensitive way: uppercase and lowercase letters will be consid‐
88 ered equivalent.
89
90
91
92 val quote : string -> string
93
94
95 Str.quote s returns a regexp string that matches exactly s and nothing
96 else.
97
98
99
100 val regexp_string : string -> regexp
101
102
103 Str.regexp_string s returns a regular expression that matches exactly s
104 and nothing else.
105
106
107
108 val regexp_string_case_fold : string -> regexp
109
110
111 Str.regexp_string_case_fold is similar to Str.regexp_string , but the
112 regexp matches in a case-insensitive way.
113
114
115
116
117 === String matching and searching ===
118
119
120 val string_match : regexp -> string -> int -> bool
121
122
123 string_match r s start tests whether a substring of s that starts at
124 position start matches the regular expression r . The first character
125 of a string has position 0 , as usual.
126
127
128
129 val search_forward : regexp -> string -> int -> int
130
131
132 search_forward r s start searches the string s for a substring matching
133 the regular expression r . The search starts at position start and pro‐
134 ceeds towards the end of the string. Return the position of the first
135 character of the matched substring.
136
137
138 Raises Not_found if no substring matches.
139
140
141
142 val search_backward : regexp -> string -> int -> int
143
144
145 search_backward r s last searches the string s for a substring matching
146 the regular expression r . The search first considers substrings that
147 start at position last and proceeds towards the beginning of string.
148 Return the position of the first character of the matched substring.
149
150
151 Raises Not_found if no substring matches.
152
153
154
155 val string_partial_match : regexp -> string -> int -> bool
156
157 Similar to Str.string_match , but also returns true if the argument
158 string is a prefix of a string that matches. This includes the case of
159 a true complete match.
160
161
162
163 val matched_string : string -> string
164
165
166 matched_string s returns the substring of s that was matched by the
167 last call to one of the following matching or searching functions:
168
169 - Str.string_match
170
171
172 - Str.search_forward
173
174
175 - Str.search_backward
176
177
178 - Str.string_partial_match
179
180
181 - Str.global_substitute
182
183
184 - Str.substitute_first
185
186 provided that none of the following functions was called inbetween:
187
188 - Str.global_replace
189
190
191 - Str.replace_first
192
193
194 - Str.split
195
196
197 - Str.bounded_split
198
199
200 - Str.split_delim
201
202
203 - Str.bounded_split_delim
204
205
206 - Str.full_split
207
208
209 - Str.bounded_full_split
210
211 Note: in the case of global_substitute and substitute_first , a call to
212 matched_string is only valid within the subst argument, not after
213 global_substitute or substitute_first returns.
214
215 The user must make sure that the parameter s is the same string that
216 was passed to the matching or searching function.
217
218
219
220 val match_beginning : unit -> int
221
222
223 match_beginning() returns the position of the first character of the
224 substring that was matched by the last call to a matching or searching
225 function (see Str.matched_string for details).
226
227
228
229 val match_end : unit -> int
230
231
232 match_end() returns the position of the character following the last
233 character of the substring that was matched by the last call to a
234 matching or searching function (see Str.matched_string for details).
235
236
237
238 val matched_group : int -> string -> string
239
240
241 matched_group n s returns the substring of s that was matched by the n
242 th group \(...\) of the regular expression that was matched by the last
243 call to a matching or searching function (see Str.matched_string for
244 details). The user must make sure that the parameter s is the same
245 string that was passed to the matching or searching function.
246
247
248 Raises Not_found if the n th group of the regular expression was not
249 matched. This can happen with groups inside alternatives \| , options
250 ? or repetitions * . For instance, the empty string will match \(a\)*
251 , but matched_group 1 will raise Not_found because the first group
252 itself was not matched.
253
254
255
256 val group_beginning : int -> int
257
258
259 group_beginning n returns the position of the first character of the
260 substring that was matched by the n th group of the regular expression
261 that was matched by the last call to a matching or searching function
262 (see Str.matched_string for details).
263
264
265 Raises Not_found if the n th group of the regular expression was not
266 matched.
267
268
269 Raises Invalid_argument if there are fewer than n groups in the regular
270 expression.
271
272
273
274 val group_end : int -> int
275
276
277 group_end n returns the position of the character following the last
278 character of substring that was matched by the n th group of the regu‐
279 lar expression that was matched by the last call to a matching or
280 searching function (see Str.matched_string for details).
281
282
283 Raises Not_found if the n th group of the regular expression was not
284 matched.
285
286
287 Raises Invalid_argument if there are fewer than n groups in the regular
288 expression.
289
290
291
292
293 === Replacement ===
294
295
296 val global_replace : regexp -> string -> string -> string
297
298
299 global_replace regexp templ s returns a string identical to s , except
300 that all substrings of s that match regexp have been replaced by templ
301 . The replacement template templ can contain \1 , \2 , etc; these
302 sequences will be replaced by the text matched by the corresponding
303 group in the regular expression. \0 stands for the text matched by the
304 whole regular expression.
305
306
307
308 val replace_first : regexp -> string -> string -> string
309
310 Same as Str.global_replace , except that only the first substring
311 matching the regular expression is replaced.
312
313
314
315 val global_substitute : regexp -> (string -> string) -> string ->
316 string
317
318
319 global_substitute regexp subst s returns a string identical to s ,
320 except that all substrings of s that match regexp have been replaced by
321 the result of function subst . The function subst is called once for
322 each matching substring, and receives s (the whole text) as argument.
323
324
325
326 val substitute_first : regexp -> (string -> string) -> string -> string
327
328 Same as Str.global_substitute , except that only the first substring
329 matching the regular expression is replaced.
330
331
332
333 val replace_matched : string -> string -> string
334
335
336 replace_matched repl s returns the replacement text repl in which \1 ,
337 \2 , etc. have been replaced by the text matched by the corresponding
338 groups in the regular expression that was matched by the last call to a
339 matching or searching function (see Str.matched_string for details). s
340 must be the same string that was passed to the matching or searching
341 function.
342
343
344
345
346 === Splitting ===
347
348
349 val split : regexp -> string -> string list
350
351
352 split r s splits s into substrings, taking as delimiters the substrings
353 that match r , and returns the list of substrings. For instance, split
354 (regexp [ \t]+ ) s splits s into blank-separated words. An occurrence
355 of the delimiter at the beginning or at the end of the string is
356 ignored.
357
358
359
360 val bounded_split : regexp -> string -> int -> string list
361
362 Same as Str.split , but splits into at most n substrings, where n is
363 the extra integer parameter.
364
365
366
367 val split_delim : regexp -> string -> string list
368
369 Same as Str.split but occurrences of the delimiter at the beginning and
370 at the end of the string are recognized and returned as empty strings
371 in the result. For instance, split_delim (regexp ) abc returns ["";
372 abc ; ] , while split with the same arguments returns ["abc"] .
373
374
375
376 val bounded_split_delim : regexp -> string -> int -> string list
377
378 Same as Str.bounded_split , but occurrences of the delimiter at the
379 beginning and at the end of the string are recognized and returned as
380 empty strings in the result.
381
382
383 type split_result =
384 | Text of string
385 | Delim of string
386
387
388
389
390
391 val full_split : regexp -> string -> split_result list
392
393 Same as Str.split_delim , but returns the delimiters as well as the
394 substrings contained between delimiters. The former are tagged Delim
395 in the result list; the latter are tagged Text . For instance,
396 full_split (regexp [{}] ) {ab} returns [Delim { ; Text ab ; Delim } ] .
397
398
399
400 val bounded_full_split : regexp -> string -> int -> split_result list
401
402 Same as Str.bounded_split_delim , but returns the delimiters as well as
403 the substrings contained between delimiters. The former are tagged
404 Delim in the result list; the latter are tagged Text .
405
406
407
408
409 === Extracting substrings ===
410
411
412 val string_before : string -> int -> string
413
414
415 string_before s n returns the substring of all characters of s that
416 precede position n (excluding the character at position n ).
417
418
419
420 val string_after : string -> int -> string
421
422
423 string_after s n returns the substring of all characters of s that fol‐
424 low position n (including the character at position n ).
425
426
427
428 val first_chars : string -> int -> string
429
430
431 first_chars s n returns the first n characters of s . This is the same
432 function as Str.string_before .
433
434
435
436 val last_chars : string -> int -> string
437
438
439 last_chars s n returns the last n characters of s .
440
441
442
443
444
445OCamldoc 2019-02-02 Str(3)