1String(3) OCaml library String(3)
2
3
4
6 String - Strings.
7
9 Module String
10
12 Module String
13 : sig end
14
15
16 Strings.
17
18 A string s of length n is an indexable and immutable sequence of n
19 bytes. For historical reasons these bytes are referred to as charac‐
20 ters.
21
22 The semantics of string functions is defined in terms of indices and
23 positions. These are depicted and described as follows.
24
25 positions 0 1 2 3 4 n-1 n +---+---+---+---+ +-----+
26 indices | 0 | 1 | 2 | 3 | ... | n-1 | +---+---+---+---+ +-----+
27
28 -An index i of s is an integer in the range [ 0 ; n-1 ]. It represents
29 the i th byte (character) of s which can be accessed using the constant
30 time string indexing operator s.[i] .
31
32 -A position i of s is an integer in the range [ 0 ; n ]. It represents
33 either the point at the beginning of the string, or the point between
34 two indices, or the point at the end of the string. The i th byte index
35 is between position i and i+1 .
36
37
38 Two integers start and len are said to define a valid substring of s if
39 len >= 0 and start , start+len are positions of s .
40
41 Unicode text. Strings being arbitrary sequences of bytes, they can hold
42 any kind of textual encoding. However the recommended encoding for
43 storing Unicode text in OCaml strings is UTF-8. This is the encoding
44 used by Unicode escapes in string literals. For example the string
45 "\u{1F42B}" is the UTF-8 encoding of the Unicode character U+1F42B.
46
47 Past mutability. OCaml strings used to be modifiable in place, for in‐
48 stance via the String.set and String.blit functions. This use is nowa‐
49 days only possible when the compiler is put in "unsafe-string" mode by
50 giving the -unsafe-string command-line option. This compatibility mode
51 makes the types string and bytes (see Bytes.t ) interchangeable so that
52 functions expecting byte sequences can also accept strings as arguments
53 and modify them.
54
55 The distinction between bytes and string was introduced in OCaml 4.02,
56 and the "unsafe-string" compatibility mode was the default until OCaml
57 4.05. Starting with 4.06, the compatibility mode is opt-in; we intend
58 to remove the option in the future.
59
60 The labeled version of this module can be used as described in the Std‐
61 Labels module.
62
63
64
65
66
67
68
69 Strings
70 type t = string
71
72
73 The type for strings.
74
75
76
77 val make : int -> char -> string
78
79
80 make n c is a string of length n with each index holding the character
81 c .
82
83
84 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
85
86
87
88 val init : int -> (int -> char) -> string
89
90
91 init n f is a string of length n with index i holding the character f i
92 (called in increasing index order).
93
94
95 Since 4.02.0
96
97
98 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
99
100
101
102 val empty : string
103
104 The empty string.
105
106
107 Since 4.13.0
108
109
110
111 val of_bytes : bytes -> string
112
113 Return a new string that contains the same bytes as the given byte se‐
114 quence.
115
116
117 Since 4.13.0
118
119
120
121 val to_bytes : string -> bytes
122
123 Return a new byte sequence that contains the same bytes as the given
124 string.
125
126
127 Since 4.13.0
128
129
130
131 val length : string -> int
132
133
134 length s is the length (number of bytes/characters) of s .
135
136
137
138 val get : string -> int -> char
139
140
141 get s i is the character at index i in s . This is the same as writing
142 s.[i] .
143
144
145 Raises Invalid_argument if i not an index of s .
146
147
148
149
150 Concatenating
151 Note. The (^) binary operator concatenates two strings.
152
153 val concat : string -> string list -> string
154
155
156 concat sep ss concatenates the list of strings ss , inserting the sepa‐
157 rator string sep between each.
158
159
160 Raises Invalid_argument if the result is longer than
161 Sys.max_string_length bytes.
162
163
164
165 val cat : string -> string -> string
166
167
168 cat s1 s2 concatenates s1 and s2 ( s1 ^ s2 ).
169
170
171 Since 4.13.0
172
173
174 Raises Invalid_argument if the result is longer then than
175 Sys.max_string_length bytes.
176
177
178
179
180 Predicates and comparisons
181 val equal : t -> t -> bool
182
183
184 equal s0 s1 is true if and only if s0 and s1 are character-wise equal.
185
186
187 Since 4.03.0 (4.05.0 in StringLabels)
188
189
190
191 val compare : t -> t -> int
192
193
194 compare s0 s1 sorts s0 and s1 in lexicographical order. compare be‐
195 haves like compare on strings but may be more efficient.
196
197
198
199 val starts_with : prefix:string -> string -> bool
200
201
202 starts_with ~ prefix s is true if and only if s starts with prefix .
203
204
205 Since 4.13.0
206
207
208
209 val ends_with : suffix:string -> string -> bool
210
211
212 ends_with suffix s is true if and only if s ends with suffix .
213
214
215 Since 4.13.0
216
217
218
219 val contains_from : string -> int -> char -> bool
220
221
222 contains_from s start c is true if and only if c appears in s after po‐
223 sition start .
224
225
226 Raises Invalid_argument if start is not a valid position in s .
227
228
229
230 val rcontains_from : string -> int -> char -> bool
231
232
233 rcontains_from s stop c is true if and only if c appears in s before
234 position stop+1 .
235
236
237 Raises Invalid_argument if stop < 0 or stop+1 is not a valid position
238 in s .
239
240
241
242 val contains : string -> char -> bool
243
244
245 contains s c is String.contains_from s 0 c .
246
247
248
249
250 Extracting substrings
251 val sub : string -> int -> int -> string
252
253
254 sub s pos len is a string of length len , containing the substring of s
255 that starts at position pos and has length len .
256
257
258 Raises Invalid_argument if pos and len do not designate a valid sub‐
259 string of s .
260
261
262
263 val split_on_char : char -> string -> string list
264
265
266 split_on_char sep s is the list of all (possibly empty) substrings of s
267 that are delimited by the character sep .
268
269 The function's result is specified by the following invariants:
270
271 -The list is not empty.
272
273 -Concatenating its elements using sep as a separator returns a string
274 equal to the input ( concat (make 1 sep)
275 (split_on_char sep s) = s ).
276
277 -No string in the result contains the sep character.
278
279
280
281 Since 4.04.0 (4.05.0 in StringLabels)
282
283
284
285
286 Transforming
287 val map : (char -> char) -> string -> string
288
289
290 map f s is the string resulting from applying f to all the characters
291 of s in increasing order.
292
293
294 Since 4.00.0
295
296
297
298 val mapi : (int -> char -> char) -> string -> string
299
300
301 mapi f s is like String.map but the index of the character is also
302 passed to f .
303
304
305 Since 4.02.0
306
307
308
309 val fold_left : ('a -> char -> 'a) -> 'a -> string -> 'a
310
311
312 fold_left f x s computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1] ,
313 where n is the length of the string s .
314
315
316 Since 4.13.0
317
318
319
320 val fold_right : (char -> 'a -> 'a) -> string -> 'a -> 'a
321
322
323 fold_right f s x computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...)) ,
324 where n is the length of the string s .
325
326
327 Since 4.13.0
328
329
330
331 val for_all : (char -> bool) -> string -> bool
332
333
334 for_all p s checks if all characters in s satisfy the predicate p .
335
336
337 Since 4.13.0
338
339
340
341 val exists : (char -> bool) -> string -> bool
342
343
344 exists p s checks if at least one character of s satisfies the predi‐
345 cate p .
346
347
348 Since 4.13.0
349
350
351
352 val trim : string -> string
353
354
355 trim s is s without leading and trailing whitespace. Whitespace charac‐
356 ters are: ' ' , '\x0C' (form feed), '\n' , '\r' , and '\t' .
357
358
359 Since 4.00.0
360
361
362
363 val escaped : string -> string
364
365
366 escaped s is s with special characters represented by escape sequences,
367 following the lexical conventions of OCaml.
368
369 All characters outside the US-ASCII printable range [0x20;0x7E] are es‐
370 caped, as well as backslash (0x2F) and double-quote (0x22).
371
372 The function Scanf.unescaped is a left inverse of escaped , i.e.
373 Scanf.unescaped (escaped s) = s for any string s (unless escaped s
374 fails).
375
376
377 Raises Invalid_argument if the result is longer than
378 Sys.max_string_length bytes.
379
380
381
382 val uppercase_ascii : string -> string
383
384
385 uppercase_ascii s is s with all lowercase letters translated to upper‐
386 case, using the US-ASCII character set.
387
388
389 Since 4.03.0 (4.05.0 in StringLabels)
390
391
392
393 val lowercase_ascii : string -> string
394
395
396 lowercase_ascii s is s with all uppercase letters translated to lower‐
397 case, using the US-ASCII character set.
398
399
400 Since 4.03.0 (4.05.0 in StringLabels)
401
402
403
404 val capitalize_ascii : string -> string
405
406
407 capitalize_ascii s is s with the first character set to uppercase, us‐
408 ing the US-ASCII character set.
409
410
411 Since 4.03.0 (4.05.0 in StringLabels)
412
413
414
415 val uncapitalize_ascii : string -> string
416
417
418 uncapitalize_ascii s is s with the first character set to lowercase,
419 using the US-ASCII character set.
420
421
422 Since 4.03.0 (4.05.0 in StringLabels)
423
424
425
426
427 Traversing
428 val iter : (char -> unit) -> string -> unit
429
430
431 iter f s applies function f in turn to all the characters of s . It is
432 equivalent to f s.[0]; f s.[1]; ...; f s.[length s - 1]; () .
433
434
435
436 val iteri : (int -> char -> unit) -> string -> unit
437
438
439 iteri is like String.iter , but the function is also given the corre‐
440 sponding character index.
441
442
443 Since 4.00.0
444
445
446
447
448 Searching
449 val index_from : string -> int -> char -> int
450
451
452 index_from s i c is the index of the first occurrence of c in s after
453 position i .
454
455
456 Raises Not_found if c does not occur in s after position i .
457
458
459 Raises Invalid_argument if i is not a valid position in s .
460
461
462
463 val index_from_opt : string -> int -> char -> int option
464
465
466 index_from_opt s i c is the index of the first occurrence of c in s af‐
467 ter position i (if any).
468
469
470 Since 4.05
471
472
473 Raises Invalid_argument if i is not a valid position in s .
474
475
476
477 val rindex_from : string -> int -> char -> int
478
479
480 rindex_from s i c is the index of the last occurrence of c in s before
481 position i+1 .
482
483
484 Raises Not_found if c does not occur in s before position i+1 .
485
486
487 Raises Invalid_argument if i+1 is not a valid position in s .
488
489
490
491 val rindex_from_opt : string -> int -> char -> int option
492
493
494 rindex_from_opt s i c is the index of the last occurrence of c in s be‐
495 fore position i+1 (if any).
496
497
498 Since 4.05
499
500
501 Raises Invalid_argument if i+1 is not a valid position in s .
502
503
504
505 val index : string -> char -> int
506
507
508 index s c is String.index_from s 0 c .
509
510
511
512 val index_opt : string -> char -> int option
513
514
515 index_opt s c is String.index_from_opt s 0 c .
516
517
518 Since 4.05
519
520
521
522 val rindex : string -> char -> int
523
524
525 rindex s c is String.rindex_from s (length s - 1) c .
526
527
528
529 val rindex_opt : string -> char -> int option
530
531
532 rindex_opt s c is String.rindex_from_opt s (length s - 1) c .
533
534
535 Since 4.05
536
537
538
539
540 Strings and Sequences
541 val to_seq : t -> char Seq.t
542
543
544 to_seq s is a sequence made of the string's characters in increasing
545 order. In "unsafe-string" mode, modifications of the string during it‐
546 eration will be reflected in the sequence.
547
548
549 Since 4.07
550
551
552
553 val to_seqi : t -> (int * char) Seq.t
554
555
556 to_seqi s is like String.to_seq but also tuples the corresponding in‐
557 dex.
558
559
560 Since 4.07
561
562
563
564 val of_seq : char Seq.t -> t
565
566
567 of_seq s is a string made of the sequence's characters.
568
569
570 Since 4.07
571
572
573
574
575 UTF decoding and validations
576 UTF-8
577 val get_utf_8_uchar : t -> int -> Uchar.utf_decode
578
579
580 get_utf_8_uchar b i decodes an UTF-8 character at index i in b .
581
582
583
584 val is_valid_utf_8 : t -> bool
585
586
587 is_valid_utf_8 b is true if and only if b contains valid UTF-8 data.
588
589
590
591
592 UTF-16BE
593 val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
594
595
596 get_utf_16be_uchar b i decodes an UTF-16BE character at index i in b .
597
598
599
600 val is_valid_utf_16be : t -> bool
601
602
603 is_valid_utf_16be b is true if and only if b contains valid UTF-16BE
604 data.
605
606
607
608
609 UTF-16LE
610 val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
611
612
613 get_utf_16le_uchar b i decodes an UTF-16LE character at index i in b .
614
615
616
617 val is_valid_utf_16le : t -> bool
618
619
620 is_valid_utf_16le b is true if and only if b contains valid UTF-16LE
621 data.
622
623
624
625
626 Deprecated functions
627 val create : int -> bytes
628
629 Deprecated. This is a deprecated alias of Bytes.create / BytesLa‐
630 bels.create .
631
632
633
634 create n returns a fresh byte sequence of length n . The sequence is
635 uninitialized and contains arbitrary bytes.
636
637
638 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
639
640
641
642 val set : bytes -> int -> char -> unit
643
644 Deprecated. This is a deprecated alias of Bytes.set / BytesLabels.set
645 .
646
647
648
649 set s n c modifies byte sequence s in place, replacing the byte at in‐
650 dex n with c . You can also write s.[n] <- c instead of set s n c .
651
652
653 Raises Invalid_argument if n is not a valid index in s .
654
655
656
657 val blit : string -> int -> bytes -> int -> int -> unit
658
659
660 blit src src_pos dst dst_pos len copies len bytes from the string src ,
661 starting at index src_pos , to byte sequence dst , starting at charac‐
662 ter number dst_pos .
663
664
665 Raises Invalid_argument if src_pos and len do not designate a valid
666 range of src , or if dst_pos and len do not designate a valid range of
667 dst .
668
669
670
671 val copy : string -> string
672
673 Deprecated. Because strings are immutable, it doesn't make much sense
674 to make identical copies of them.
675
676
677 Return a copy of the given string.
678
679
680
681 val fill : bytes -> int -> int -> char -> unit
682
683 Deprecated. This is a deprecated alias of Bytes.fill / BytesLa‐
684 bels.fill .
685
686
687
688 fill s pos len c modifies byte sequence s in place, replacing len bytes
689 by c , starting at pos .
690
691
692 Raises Invalid_argument if pos and len do not designate a valid sub‐
693 string of s .
694
695
696
697 val uppercase : string -> string
698
699 Deprecated. Functions operating on Latin-1 character set are depre‐
700 cated.
701
702
703 Return a copy of the argument, with all lowercase letters translated to
704 uppercase, including accented letters of the ISO Latin-1 (8859-1) char‐
705 acter set.
706
707
708
709 val lowercase : string -> string
710
711 Deprecated. Functions operating on Latin-1 character set are depre‐
712 cated.
713
714
715 Return a copy of the argument, with all uppercase letters translated to
716 lowercase, including accented letters of the ISO Latin-1 (8859-1) char‐
717 acter set.
718
719
720
721 val capitalize : string -> string
722
723 Deprecated. Functions operating on Latin-1 character set are depre‐
724 cated.
725
726
727 Return a copy of the argument, with the first character set to upper‐
728 case, using the ISO Latin-1 (8859-1) character set..
729
730
731
732 val uncapitalize : string -> string
733
734 Deprecated. Functions operating on Latin-1 character set are depre‐
735 cated.
736
737
738 Return a copy of the argument, with the first character set to lower‐
739 case, using the ISO Latin-1 (8859-1) character set.
740
741
742
743
744 Binary decoding of integers
745 The functions in this section binary decode integers from strings.
746
747 All following functions raise Invalid_argument if the characters needed
748 at index i to decode the integer are not available.
749
750 Little-endian (resp. big-endian) encoding means that least (resp. most)
751 significant bytes are stored first. Big-endian is also known as net‐
752 work byte order. Native-endian encoding is either little-endian or
753 big-endian depending on Sys.big_endian .
754
755 32-bit and 64-bit integers are represented by the int32 and int64
756 types, which can be interpreted either as signed or unsigned numbers.
757
758 8-bit and 16-bit integers are represented by the int type, which has
759 more bits than the binary encoding. These extra bits are sign-extended
760 (or zero-extended) for functions which decode 8-bit or 16-bit integers
761 and represented them with int values.
762
763 val get_uint8 : string -> int -> int
764
765
766 get_uint8 b i is b 's unsigned 8-bit integer starting at character in‐
767 dex i .
768
769
770 Since 4.13.0
771
772
773
774 val get_int8 : string -> int -> int
775
776
777 get_int8 b i is b 's signed 8-bit integer starting at character index i
778 .
779
780
781 Since 4.13.0
782
783
784
785 val get_uint16_ne : string -> int -> int
786
787
788 get_uint16_ne b i is b 's native-endian unsigned 16-bit integer start‐
789 ing at character index i .
790
791
792 Since 4.13.0
793
794
795
796 val get_uint16_be : string -> int -> int
797
798
799 get_uint16_be b i is b 's big-endian unsigned 16-bit integer starting
800 at character index i .
801
802
803 Since 4.13.0
804
805
806
807 val get_uint16_le : string -> int -> int
808
809
810 get_uint16_le b i is b 's little-endian unsigned 16-bit integer start‐
811 ing at character index i .
812
813
814 Since 4.13.0
815
816
817
818 val get_int16_ne : string -> int -> int
819
820
821 get_int16_ne b i is b 's native-endian signed 16-bit integer starting
822 at character index i .
823
824
825 Since 4.13.0
826
827
828
829 val get_int16_be : string -> int -> int
830
831
832 get_int16_be b i is b 's big-endian signed 16-bit integer starting at
833 character index i .
834
835
836 Since 4.13.0
837
838
839
840 val get_int16_le : string -> int -> int
841
842
843 get_int16_le b i is b 's little-endian signed 16-bit integer starting
844 at character index i .
845
846
847 Since 4.13.0
848
849
850
851 val get_int32_ne : string -> int -> int32
852
853
854 get_int32_ne b i is b 's native-endian 32-bit integer starting at char‐
855 acter index i .
856
857
858 Since 4.13.0
859
860
861
862 val get_int32_be : string -> int -> int32
863
864
865 get_int32_be b i is b 's big-endian 32-bit integer starting at charac‐
866 ter index i .
867
868
869 Since 4.13.0
870
871
872
873 val get_int32_le : string -> int -> int32
874
875
876 get_int32_le b i is b 's little-endian 32-bit integer starting at char‐
877 acter index i .
878
879
880 Since 4.13.0
881
882
883
884 val get_int64_ne : string -> int -> int64
885
886
887 get_int64_ne b i is b 's native-endian 64-bit integer starting at char‐
888 acter index i .
889
890
891 Since 4.13.0
892
893
894
895 val get_int64_be : string -> int -> int64
896
897
898 get_int64_be b i is b 's big-endian 64-bit integer starting at charac‐
899 ter index i .
900
901
902 Since 4.13.0
903
904
905
906 val get_int64_le : string -> int -> int64
907
908
909 get_int64_le b i is b 's little-endian 64-bit integer starting at char‐
910 acter index i .
911
912
913 Since 4.13.0
914
915
916
917
918
919OCamldoc 2022-07-22 String(3)