1String(3) OCaml library String(3)
2
3
4
6 String - Strings.
7
9 Module String
10
12 Module String
13 : sig end
14
15
16 Strings.
17
18 A string s of length n is an indexable and immutable sequence of n
19 bytes. For historical reasons these bytes are referred to as charac‐
20 ters.
21
22 The semantics of string functions is defined in terms of indices and
23 positions. These are depicted and described as follows.
24
25 positions 0 1 2 3 4 n-1 n +---+---+---+---+ +-----+
26 indices | 0 | 1 | 2 | 3 | ... | n-1 | +---+---+---+---+ +-----+
27
28 -An index i of s is an integer in the range [ 0 ; n-1 ]. It represents
29 the i th byte (character) of s which can be accessed using the constant
30 time string indexing operator s.[i] .
31
32 -A position i of s is an integer in the range [ 0 ; n ]. It represents
33 either the point at the beginning of the string, or the point between
34 two indices, or the point at the end of the string. The i th byte index
35 is between position i and i+1 .
36
37
38 Two integers start and len are said to define a valid substring of s if
39 len >= 0 and start , start+len are positions of s .
40
41 Unicode text. Strings being arbitrary sequences of bytes, they can hold
42 any kind of textual encoding. However the recommended encoding for
43 storing Unicode text in OCaml strings is UTF-8. This is the encoding
44 used by Unicode escapes in string literals. For example the string
45 "\u{1F42B}" is the UTF-8 encoding of the Unicode character U+1F42B.
46
47 Past mutability. Before OCaml 4.02, strings used to be modifiable in
48 place like Bytes.t mutable sequences of bytes. OCaml 4 had various
49 compiler flags and configuration options to support the transition pe‐
50 riod from mutable to immutable strings. Those options are no longer
51 available, and strings are now always immutable.
52
53 The labeled version of this module can be used as described in the Std‐
54 Labels module.
55
56
57
58
59
60
61
62 Strings
63 type t = string
64
65
66 The type for strings.
67
68
69
70 val make : int -> char -> string
71
72
73 make n c is a string of length n with each index holding the character
74 c .
75
76
77 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
78
79
80
81 val init : int -> (int -> char) -> string
82
83
84 init n f is a string of length n with index i holding the character f i
85 (called in increasing index order).
86
87
88 Since 4.02.0
89
90
91 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
92
93
94
95 val empty : string
96
97 The empty string.
98
99
100 Since 4.13.0
101
102
103
104 val of_bytes : bytes -> string
105
106 Return a new string that contains the same bytes as the given byte se‐
107 quence.
108
109
110 Since 4.13.0
111
112
113
114 val to_bytes : string -> bytes
115
116 Return a new byte sequence that contains the same bytes as the given
117 string.
118
119
120 Since 4.13.0
121
122
123
124 val length : string -> int
125
126
127 length s is the length (number of bytes/characters) of s .
128
129
130
131 val get : string -> int -> char
132
133
134 get s i is the character at index i in s . This is the same as writing
135 s.[i] .
136
137
138 Raises Invalid_argument if i not an index of s .
139
140
141
142
143 Concatenating
144 Note. The (^) binary operator concatenates two strings.
145
146 val concat : string -> string list -> string
147
148
149 concat sep ss concatenates the list of strings ss , inserting the sepa‐
150 rator string sep between each.
151
152
153 Raises Invalid_argument if the result is longer than
154 Sys.max_string_length bytes.
155
156
157
158 val cat : string -> string -> string
159
160
161 cat s1 s2 concatenates s1 and s2 ( s1 ^ s2 ).
162
163
164 Since 4.13.0
165
166
167 Raises Invalid_argument if the result is longer than
168 Sys.max_string_length bytes.
169
170
171
172
173 Predicates and comparisons
174 val equal : t -> t -> bool
175
176
177 equal s0 s1 is true if and only if s0 and s1 are character-wise equal.
178
179
180 Since 4.03.0 (4.05.0 in StringLabels)
181
182
183
184 val compare : t -> t -> int
185
186
187 compare s0 s1 sorts s0 and s1 in lexicographical order. compare be‐
188 haves like compare on strings but may be more efficient.
189
190
191
192 val starts_with : prefix:string -> string -> bool
193
194
195 starts_with ~prefix s is true if and only if s starts with prefix .
196
197
198 Since 4.13.0
199
200
201
202 val ends_with : suffix:string -> string -> bool
203
204
205 ends_with ~suffix s is true if and only if s ends with suffix .
206
207
208 Since 4.13.0
209
210
211
212 val contains_from : string -> int -> char -> bool
213
214
215 contains_from s start c is true if and only if c appears in s after po‐
216 sition start .
217
218
219 Raises Invalid_argument if start is not a valid position in s .
220
221
222
223 val rcontains_from : string -> int -> char -> bool
224
225
226 rcontains_from s stop c is true if and only if c appears in s before
227 position stop+1 .
228
229
230 Raises Invalid_argument if stop < 0 or stop+1 is not a valid position
231 in s .
232
233
234
235 val contains : string -> char -> bool
236
237
238 contains s c is String.contains_from s 0 c .
239
240
241
242
243 Extracting substrings
244 val sub : string -> int -> int -> string
245
246
247 sub s pos len is a string of length len , containing the substring of s
248 that starts at position pos and has length len .
249
250
251 Raises Invalid_argument if pos and len do not designate a valid sub‐
252 string of s .
253
254
255
256 val split_on_char : char -> string -> string list
257
258
259 split_on_char sep s is the list of all (possibly empty) substrings of s
260 that are delimited by the character sep .
261
262 The function's result is specified by the following invariants:
263
264 -The list is not empty.
265
266 -Concatenating its elements using sep as a separator returns a string
267 equal to the input ( concat (make 1 sep)
268 (split_on_char sep s) = s ).
269
270 -No string in the result contains the sep character.
271
272
273
274 Since 4.04.0 (4.05.0 in StringLabels)
275
276
277
278
279 Transforming
280 val map : (char -> char) -> string -> string
281
282
283 map f s is the string resulting from applying f to all the characters
284 of s in increasing order.
285
286
287 Since 4.00.0
288
289
290
291 val mapi : (int -> char -> char) -> string -> string
292
293
294 mapi f s is like String.map but the index of the character is also
295 passed to f .
296
297
298 Since 4.02.0
299
300
301
302 val fold_left : ('a -> char -> 'a) -> 'a -> string -> 'a
303
304
305 fold_left f x s computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1] ,
306 where n is the length of the string s .
307
308
309 Since 4.13.0
310
311
312
313 val fold_right : (char -> 'a -> 'a) -> string -> 'a -> 'a
314
315
316 fold_right f s x computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...)) ,
317 where n is the length of the string s .
318
319
320 Since 4.13.0
321
322
323
324 val for_all : (char -> bool) -> string -> bool
325
326
327 for_all p s checks if all characters in s satisfy the predicate p .
328
329
330 Since 4.13.0
331
332
333
334 val exists : (char -> bool) -> string -> bool
335
336
337 exists p s checks if at least one character of s satisfies the predi‐
338 cate p .
339
340
341 Since 4.13.0
342
343
344
345 val trim : string -> string
346
347
348 trim s is s without leading and trailing whitespace. Whitespace charac‐
349 ters are: ' ' , '\x0C' (form feed), '\n' , '\r' , and '\t' .
350
351
352 Since 4.00.0
353
354
355
356 val escaped : string -> string
357
358
359 escaped s is s with special characters represented by escape sequences,
360 following the lexical conventions of OCaml.
361
362 All characters outside the US-ASCII printable range [0x20;0x7E] are es‐
363 caped, as well as backslash (0x2F) and double-quote (0x22).
364
365 The function Scanf.unescaped is a left inverse of escaped , i.e.
366 Scanf.unescaped (escaped s) = s for any string s (unless escaped s
367 fails).
368
369
370 Raises Invalid_argument if the result is longer than
371 Sys.max_string_length bytes.
372
373
374
375 val uppercase_ascii : string -> string
376
377
378 uppercase_ascii s is s with all lowercase letters translated to upper‐
379 case, using the US-ASCII character set.
380
381
382 Since 4.03.0 (4.05.0 in StringLabels)
383
384
385
386 val lowercase_ascii : string -> string
387
388
389 lowercase_ascii s is s with all uppercase letters translated to lower‐
390 case, using the US-ASCII character set.
391
392
393 Since 4.03.0 (4.05.0 in StringLabels)
394
395
396
397 val capitalize_ascii : string -> string
398
399
400 capitalize_ascii s is s with the first character set to uppercase, us‐
401 ing the US-ASCII character set.
402
403
404 Since 4.03.0 (4.05.0 in StringLabels)
405
406
407
408 val uncapitalize_ascii : string -> string
409
410
411 uncapitalize_ascii s is s with the first character set to lowercase,
412 using the US-ASCII character set.
413
414
415 Since 4.03.0 (4.05.0 in StringLabels)
416
417
418
419
420 Traversing
421 val iter : (char -> unit) -> string -> unit
422
423
424 iter f s applies function f in turn to all the characters of s . It is
425 equivalent to f s.[0]; f s.[1]; ...; f s.[length s - 1]; () .
426
427
428
429 val iteri : (int -> char -> unit) -> string -> unit
430
431
432 iteri is like String.iter , but the function is also given the corre‐
433 sponding character index.
434
435
436 Since 4.00.0
437
438
439
440
441 Searching
442 val index_from : string -> int -> char -> int
443
444
445 index_from s i c is the index of the first occurrence of c in s after
446 position i .
447
448
449 Raises Not_found if c does not occur in s after position i .
450
451
452 Raises Invalid_argument if i is not a valid position in s .
453
454
455
456 val index_from_opt : string -> int -> char -> int option
457
458
459 index_from_opt s i c is the index of the first occurrence of c in s af‐
460 ter position i (if any).
461
462
463 Since 4.05
464
465
466 Raises Invalid_argument if i is not a valid position in s .
467
468
469
470 val rindex_from : string -> int -> char -> int
471
472
473 rindex_from s i c is the index of the last occurrence of c in s before
474 position i+1 .
475
476
477 Raises Not_found if c does not occur in s before position i+1 .
478
479
480 Raises Invalid_argument if i+1 is not a valid position in s .
481
482
483
484 val rindex_from_opt : string -> int -> char -> int option
485
486
487 rindex_from_opt s i c is the index of the last occurrence of c in s be‐
488 fore position i+1 (if any).
489
490
491 Since 4.05
492
493
494 Raises Invalid_argument if i+1 is not a valid position in s .
495
496
497
498 val index : string -> char -> int
499
500
501 index s c is String.index_from s 0 c .
502
503
504
505 val index_opt : string -> char -> int option
506
507
508 index_opt s c is String.index_from_opt s 0 c .
509
510
511 Since 4.05
512
513
514
515 val rindex : string -> char -> int
516
517
518 rindex s c is String.rindex_from s (length s - 1) c .
519
520
521
522 val rindex_opt : string -> char -> int option
523
524
525 rindex_opt s c is String.rindex_from_opt s (length s - 1) c .
526
527
528 Since 4.05
529
530
531
532
533 Strings and Sequences
534 val to_seq : t -> char Seq.t
535
536
537 to_seq s is a sequence made of the string's characters in increasing
538 order. In "unsafe-string" mode, modifications of the string during it‐
539 eration will be reflected in the sequence.
540
541
542 Since 4.07
543
544
545
546 val to_seqi : t -> (int * char) Seq.t
547
548
549 to_seqi s is like String.to_seq but also tuples the corresponding in‐
550 dex.
551
552
553 Since 4.07
554
555
556
557 val of_seq : char Seq.t -> t
558
559
560 of_seq s is a string made of the sequence's characters.
561
562
563 Since 4.07
564
565
566
567
568 UTF decoding and validations
569 UTF-8
570 val get_utf_8_uchar : t -> int -> Uchar.utf_decode
571
572
573 get_utf_8_uchar b i decodes an UTF-8 character at index i in b .
574
575
576
577 val is_valid_utf_8 : t -> bool
578
579
580 is_valid_utf_8 b is true if and only if b contains valid UTF-8 data.
581
582
583
584
585 UTF-16BE
586 val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
587
588
589 get_utf_16be_uchar b i decodes an UTF-16BE character at index i in b .
590
591
592
593 val is_valid_utf_16be : t -> bool
594
595
596 is_valid_utf_16be b is true if and only if b contains valid UTF-16BE
597 data.
598
599
600
601
602 UTF-16LE
603 val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
604
605
606 get_utf_16le_uchar b i decodes an UTF-16LE character at index i in b .
607
608
609
610 val is_valid_utf_16le : t -> bool
611
612
613 is_valid_utf_16le b is true if and only if b contains valid UTF-16LE
614 data.
615
616
617
618 val blit : string -> int -> bytes -> int -> int -> unit
619
620
621 blit src src_pos dst dst_pos len copies len bytes from the string src ,
622 starting at index src_pos , to byte sequence dst , starting at charac‐
623 ter number dst_pos .
624
625
626 Raises Invalid_argument if src_pos and len do not designate a valid
627 range of src , or if dst_pos and len do not designate a valid range of
628 dst .
629
630
631
632
633 Binary decoding of integers
634 The functions in this section binary decode integers from strings.
635
636 All following functions raise Invalid_argument if the characters needed
637 at index i to decode the integer are not available.
638
639 Little-endian (resp. big-endian) encoding means that least (resp. most)
640 significant bytes are stored first. Big-endian is also known as net‐
641 work byte order. Native-endian encoding is either little-endian or
642 big-endian depending on Sys.big_endian .
643
644 32-bit and 64-bit integers are represented by the int32 and int64
645 types, which can be interpreted either as signed or unsigned numbers.
646
647 8-bit and 16-bit integers are represented by the int type, which has
648 more bits than the binary encoding. These extra bits are sign-extended
649 (or zero-extended) for functions which decode 8-bit or 16-bit integers
650 and represented them with int values.
651
652 val get_uint8 : string -> int -> int
653
654
655 get_uint8 b i is b 's unsigned 8-bit integer starting at character in‐
656 dex i .
657
658
659 Since 4.13.0
660
661
662
663 val get_int8 : string -> int -> int
664
665
666 get_int8 b i is b 's signed 8-bit integer starting at character index i
667 .
668
669
670 Since 4.13.0
671
672
673
674 val get_uint16_ne : string -> int -> int
675
676
677 get_uint16_ne b i is b 's native-endian unsigned 16-bit integer start‐
678 ing at character index i .
679
680
681 Since 4.13.0
682
683
684
685 val get_uint16_be : string -> int -> int
686
687
688 get_uint16_be b i is b 's big-endian unsigned 16-bit integer starting
689 at character index i .
690
691
692 Since 4.13.0
693
694
695
696 val get_uint16_le : string -> int -> int
697
698
699 get_uint16_le b i is b 's little-endian unsigned 16-bit integer start‐
700 ing at character index i .
701
702
703 Since 4.13.0
704
705
706
707 val get_int16_ne : string -> int -> int
708
709
710 get_int16_ne b i is b 's native-endian signed 16-bit integer starting
711 at character index i .
712
713
714 Since 4.13.0
715
716
717
718 val get_int16_be : string -> int -> int
719
720
721 get_int16_be b i is b 's big-endian signed 16-bit integer starting at
722 character index i .
723
724
725 Since 4.13.0
726
727
728
729 val get_int16_le : string -> int -> int
730
731
732 get_int16_le b i is b 's little-endian signed 16-bit integer starting
733 at character index i .
734
735
736 Since 4.13.0
737
738
739
740 val get_int32_ne : string -> int -> int32
741
742
743 get_int32_ne b i is b 's native-endian 32-bit integer starting at char‐
744 acter index i .
745
746
747 Since 4.13.0
748
749
750
751 val hash : t -> int
752
753 An unseeded hash function for strings, with the same output value as
754 Hashtbl.hash . This function allows this module to be passed as argu‐
755 ment to the functor Hashtbl.Make .
756
757
758 Since 5.0.0
759
760
761
762 val seeded_hash : int -> t -> int
763
764 A seeded hash function for strings, with the same output value as
765 Hashtbl.seeded_hash . This function allows this module to be passed as
766 argument to the functor Hashtbl.MakeSeeded .
767
768
769 Since 5.0.0
770
771
772
773 val get_int32_be : string -> int -> int32
774
775
776 get_int32_be b i is b 's big-endian 32-bit integer starting at charac‐
777 ter index i .
778
779
780 Since 4.13.0
781
782
783
784 val get_int32_le : string -> int -> int32
785
786
787 get_int32_le b i is b 's little-endian 32-bit integer starting at char‐
788 acter index i .
789
790
791 Since 4.13.0
792
793
794
795 val get_int64_ne : string -> int -> int64
796
797
798 get_int64_ne b i is b 's native-endian 64-bit integer starting at char‐
799 acter index i .
800
801
802 Since 4.13.0
803
804
805
806 val get_int64_be : string -> int -> int64
807
808
809 get_int64_be b i is b 's big-endian 64-bit integer starting at charac‐
810 ter index i .
811
812
813 Since 4.13.0
814
815
816
817 val get_int64_le : string -> int -> int64
818
819
820 get_int64_le b i is b 's little-endian 64-bit integer starting at char‐
821 acter index i .
822
823
824 Since 4.13.0
825
826
827
828
829
830OCamldoc 2023-07-20 String(3)