1String(3)                        OCaml library                       String(3)
2
3
4

NAME

6       String - Strings.
7

Module

9       Module   String
10

Documentation

12       Module String
13        : sig end
14
15
16       Strings.
17
18       A  string  s  of  length  n is an indexable and immutable sequence of n
19       bytes. For historical reasons these bytes are referred  to  as  charac‐
20       ters.
21
22       The  semantics  of  string functions is defined in terms of indices and
23       positions. These are depicted and described as follows.
24
25       positions  0   1   2   3   4    n-1    n +---+---+---+---+      +-----+
26       indices  | 0 | 1 | 2 | 3 | ... | n-1 | +---+---+---+---+     +-----+
27
28       -An index i of s is an integer in the range [ 0 ; n-1 ].  It represents
29       the i th byte (character) of s which can be accessed using the constant
30       time string indexing operator s.[i] .
31
32       -A  position i of s is an integer in the range [ 0 ; n ]. It represents
33       either the point at the beginning of the string, or the  point  between
34       two indices, or the point at the end of the string. The i th byte index
35       is between position i and i+1 .
36
37
38       Two integers start and len are said to define a valid substring of s if
39       len >= 0 and start , start+len are positions of s .
40
41       Unicode text. Strings being arbitrary sequences of bytes, they can hold
42       any kind of textual encoding.  However  the  recommended  encoding  for
43       storing  Unicode  text  in OCaml strings is UTF-8. This is the encoding
44       used by Unicode escapes in string  literals.  For  example  the  string
45       "\u{1F42B}" is the UTF-8 encoding of the Unicode character U+1F42B.
46
47       Past  mutability. OCaml strings used to be modifiable in place, for in‐
48       stance via the String.set and String.blit functions. This use is  nowa‐
49       days  only possible when the compiler is put in "unsafe-string" mode by
50       giving the -unsafe-string command-line option. This compatibility  mode
51       makes the types string and bytes (see Bytes.t ) interchangeable so that
52       functions expecting byte sequences can also accept strings as arguments
53       and modify them.
54
55       The  distinction between bytes and string was introduced in OCaml 4.02,
56       and the "unsafe-string" compatibility mode was the default until  OCaml
57       4.05.  Starting  with 4.06, the compatibility mode is opt-in; we intend
58       to remove the option in the future.
59
60       The labeled version of this module can be used as described in the Std‐
61       Labels module.
62
63
64
65
66
67
68
69   Strings
70       type t = string
71
72
73       The type for strings.
74
75
76
77       val make : int -> char -> string
78
79
80       make  n c is a string of length n with each index holding the character
81       c .
82
83
84       Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
85
86
87
88       val init : int -> (int -> char) -> string
89
90
91       init n f is a string of length n with index i holding the character f i
92       (called in increasing index order).
93
94
95       Since 4.02.0
96
97
98       Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
99
100
101
102       val empty : string
103
104       The empty string.
105
106
107       Since 4.13.0
108
109
110
111       val of_bytes : bytes -> string
112
113       Return  a new string that contains the same bytes as the given byte se‐
114       quence.
115
116
117       Since 4.13.0
118
119
120
121       val to_bytes : string -> bytes
122
123       Return a new byte sequence that contains the same bytes  as  the  given
124       string.
125
126
127       Since 4.13.0
128
129
130
131       val length : string -> int
132
133
134       length s is the length (number of bytes/characters) of s .
135
136
137
138       val get : string -> int -> char
139
140
141       get  s i is the character at index i in s . This is the same as writing
142       s.[i] .
143
144
145       Raises Invalid_argument if i not an index of s .
146
147
148
149
150   Concatenating
151       Note. The (^) binary operator concatenates two strings.
152
153       val concat : string -> string list -> string
154
155
156       concat sep ss concatenates the list of strings ss , inserting the sepa‐
157       rator string sep between each.
158
159
160       Raises    Invalid_argument    if    the    result    is   longer   than
161       Sys.max_string_length bytes.
162
163
164
165       val cat : string -> string -> string
166
167
168       cat s1 s2 concatenates s1 and s2 ( s1 ^ s2 ).
169
170
171       Since 4.13.0
172
173
174       Raises  Invalid_argument  if   the   result   is   longer   then   than
175       Sys.max_string_length bytes.
176
177
178
179
180   Predicates and comparisons
181       val equal : t -> t -> bool
182
183
184       equal s0 s1 is true if and only if s0 and s1 are character-wise equal.
185
186
187       Since 4.03.0 (4.05.0 in StringLabels)
188
189
190
191       val compare : t -> t -> int
192
193
194       compare  s0  s1  sorts s0 and s1 in lexicographical order.  compare be‐
195       haves like compare on strings but may be more efficient.
196
197
198
199       val starts_with : prefix:string -> string -> bool
200
201
202       starts_with ~ prefix s is true if and only if s starts with prefix .
203
204
205       Since 4.13.0
206
207
208
209       val ends_with : suffix:string -> string -> bool
210
211
212       ends_with suffix s is true if and only if s ends with suffix .
213
214
215       Since 4.13.0
216
217
218
219       val contains_from : string -> int -> char -> bool
220
221
222       contains_from s start c is true if and only if c appears in s after po‐
223       sition start .
224
225
226       Raises Invalid_argument if start is not a valid position in s .
227
228
229
230       val rcontains_from : string -> int -> char -> bool
231
232
233       rcontains_from  s  stop  c is true if and only if c appears in s before
234       position stop+1 .
235
236
237       Raises Invalid_argument if stop < 0 or stop+1 is not a  valid  position
238       in s .
239
240
241
242       val contains : string -> char -> bool
243
244
245       contains s c is String.contains_from s 0 c .
246
247
248
249
250   Extracting substrings
251       val sub : string -> int -> int -> string
252
253
254       sub s pos len is a string of length len , containing the substring of s
255       that starts at position pos and has length len .
256
257
258       Raises Invalid_argument if pos and len do not designate  a  valid  sub‐
259       string of s .
260
261
262
263       val split_on_char : char -> string -> string list
264
265
266       split_on_char sep s is the list of all (possibly empty) substrings of s
267       that are delimited by the character sep .
268
269       The function's result is specified by the following invariants:
270
271       -The list is not empty.
272
273       -Concatenating its elements using sep as a separator returns  a  string
274       equal to the input ( concat (make 1 sep)
275             (split_on_char sep s) = s ).
276
277       -No string in the result contains the sep character.
278
279
280
281       Since 4.04.0 (4.05.0 in StringLabels)
282
283
284
285
286   Transforming
287       val map : (char -> char) -> string -> string
288
289
290       map  f  s is the string resulting from applying f to all the characters
291       of s in increasing order.
292
293
294       Since 4.00.0
295
296
297
298       val mapi : (int -> char -> char) -> string -> string
299
300
301       mapi f s is like String.map but the index  of  the  character  is  also
302       passed to f .
303
304
305       Since 4.02.0
306
307
308
309       val fold_left : ('a -> char -> 'a) -> 'a -> string -> 'a
310
311
312       fold_left  f  x  s computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1] ,
313       where n is the length of the string s .
314
315
316       Since 4.13.0
317
318
319
320       val fold_right : (char -> 'a -> 'a) -> string -> 'a -> 'a
321
322
323       fold_right f s x computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))  ,
324       where n is the length of the string s .
325
326
327       Since 4.13.0
328
329
330
331       val for_all : (char -> bool) -> string -> bool
332
333
334       for_all p s checks if all characters in s satisfy the predicate p .
335
336
337       Since 4.13.0
338
339
340
341       val exists : (char -> bool) -> string -> bool
342
343
344       exists  p  s checks if at least one character of s satisfies the predi‐
345       cate p .
346
347
348       Since 4.13.0
349
350
351
352       val trim : string -> string
353
354
355       trim s is s without leading and trailing whitespace. Whitespace charac‐
356       ters are: ' ' , '\x0C' (form feed), '\n' , '\r' , and '\t' .
357
358
359       Since 4.00.0
360
361
362
363       val escaped : string -> string
364
365
366       escaped s is s with special characters represented by escape sequences,
367       following the lexical conventions of OCaml.
368
369       All characters outside the US-ASCII printable range [0x20;0x7E] are es‐
370       caped, as well as backslash (0x2F) and double-quote (0x22).
371
372       The  function  Scanf.unescaped  is  a  left  inverse  of escaped , i.e.
373       Scanf.unescaped (escaped s) = s for any  string  s  (unless  escaped  s
374       fails).
375
376
377       Raises    Invalid_argument    if    the    result    is   longer   than
378       Sys.max_string_length bytes.
379
380
381
382       val uppercase_ascii : string -> string
383
384
385       uppercase_ascii s is s with all lowercase letters translated to  upper‐
386       case, using the US-ASCII character set.
387
388
389       Since 4.03.0 (4.05.0 in StringLabels)
390
391
392
393       val lowercase_ascii : string -> string
394
395
396       lowercase_ascii  s is s with all uppercase letters translated to lower‐
397       case, using the US-ASCII character set.
398
399
400       Since 4.03.0 (4.05.0 in StringLabels)
401
402
403
404       val capitalize_ascii : string -> string
405
406
407       capitalize_ascii s is s with the first character set to uppercase,  us‐
408       ing the US-ASCII character set.
409
410
411       Since 4.03.0 (4.05.0 in StringLabels)
412
413
414
415       val uncapitalize_ascii : string -> string
416
417
418       uncapitalize_ascii  s  is  s with the first character set to lowercase,
419       using the US-ASCII character set.
420
421
422       Since 4.03.0 (4.05.0 in StringLabels)
423
424
425
426
427   Traversing
428       val iter : (char -> unit) -> string -> unit
429
430
431       iter f s applies function f in turn to all the characters of s .  It is
432       equivalent to f s.[0]; f s.[1]; ...; f s.[length s - 1]; () .
433
434
435
436       val iteri : (int -> char -> unit) -> string -> unit
437
438
439       iteri  is  like String.iter , but the function is also given the corre‐
440       sponding character index.
441
442
443       Since 4.00.0
444
445
446
447
448   Searching
449       val index_from : string -> int -> char -> int
450
451
452       index_from s i c is the index of the first occurrence of c in  s  after
453       position i .
454
455
456       Raises Not_found if c does not occur in s after position i .
457
458
459       Raises Invalid_argument if i is not a valid position in s .
460
461
462
463       val index_from_opt : string -> int -> char -> int option
464
465
466       index_from_opt s i c is the index of the first occurrence of c in s af‐
467       ter position i (if any).
468
469
470       Since 4.05
471
472
473       Raises Invalid_argument if i is not a valid position in s .
474
475
476
477       val rindex_from : string -> int -> char -> int
478
479
480       rindex_from s i c is the index of the last occurrence of c in s  before
481       position i+1 .
482
483
484       Raises Not_found if c does not occur in s before position i+1 .
485
486
487       Raises Invalid_argument if i+1 is not a valid position in s .
488
489
490
491       val rindex_from_opt : string -> int -> char -> int option
492
493
494       rindex_from_opt s i c is the index of the last occurrence of c in s be‐
495       fore position i+1 (if any).
496
497
498       Since 4.05
499
500
501       Raises Invalid_argument if i+1 is not a valid position in s .
502
503
504
505       val index : string -> char -> int
506
507
508       index s c is String.index_from s 0 c .
509
510
511
512       val index_opt : string -> char -> int option
513
514
515       index_opt s c is String.index_from_opt s 0 c .
516
517
518       Since 4.05
519
520
521
522       val rindex : string -> char -> int
523
524
525       rindex s c is String.rindex_from s (length s - 1) c .
526
527
528
529       val rindex_opt : string -> char -> int option
530
531
532       rindex_opt s c is String.rindex_from_opt s (length s - 1) c .
533
534
535       Since 4.05
536
537
538
539
540   Strings and Sequences
541       val to_seq : t -> char Seq.t
542
543
544       to_seq s is a sequence made of the string's  characters  in  increasing
545       order.  In "unsafe-string" mode, modifications of the string during it‐
546       eration will be reflected in the sequence.
547
548
549       Since 4.07
550
551
552
553       val to_seqi : t -> (int * char) Seq.t
554
555
556       to_seqi s is like String.to_seq but also tuples the  corresponding  in‐
557       dex.
558
559
560       Since 4.07
561
562
563
564       val of_seq : char Seq.t -> t
565
566
567       of_seq s is a string made of the sequence's characters.
568
569
570       Since 4.07
571
572
573
574
575   UTF decoding and validations
576   UTF-8
577       val get_utf_8_uchar : t -> int -> Uchar.utf_decode
578
579
580       get_utf_8_uchar b i decodes an UTF-8 character at index i in b .
581
582
583
584       val is_valid_utf_8 : t -> bool
585
586
587       is_valid_utf_8 b is true if and only if b contains valid UTF-8 data.
588
589
590
591
592   UTF-16BE
593       val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
594
595
596       get_utf_16be_uchar b i decodes an UTF-16BE character at index i in b .
597
598
599
600       val is_valid_utf_16be : t -> bool
601
602
603       is_valid_utf_16be  b  is  true if and only if b contains valid UTF-16BE
604       data.
605
606
607
608
609   UTF-16LE
610       val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
611
612
613       get_utf_16le_uchar b i decodes an UTF-16LE character at index i in b .
614
615
616
617       val is_valid_utf_16le : t -> bool
618
619
620       is_valid_utf_16le b is true if and only if b  contains  valid  UTF-16LE
621       data.
622
623
624
625
626   Deprecated functions
627       val create : int -> bytes
628
629       Deprecated.   This  is  a  deprecated  alias of Bytes.create / BytesLa‐
630       bels.create .
631
632
633
634       create n returns a fresh byte sequence of length n .  The  sequence  is
635       uninitialized and contains arbitrary bytes.
636
637
638       Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
639
640
641
642       val set : bytes -> int -> char -> unit
643
644       Deprecated.   This is a deprecated alias of Bytes.set / BytesLabels.set
645       .
646
647
648
649       set s n c modifies byte sequence s in place, replacing the byte at  in‐
650       dex n with c .  You can also write s.[n] <- c instead of set s n c .
651
652
653       Raises Invalid_argument if n is not a valid index in s .
654
655
656
657       val blit : string -> int -> bytes -> int -> int -> unit
658
659
660       blit src src_pos dst dst_pos len copies len bytes from the string src ,
661       starting at index src_pos , to byte sequence dst , starting at  charac‐
662       ter number dst_pos .
663
664
665       Raises  Invalid_argument  if  src_pos  and len do not designate a valid
666       range of src , or if dst_pos and len do not designate a valid range  of
667       dst .
668
669
670
671       val copy : string -> string
672
673       Deprecated.   Because strings are immutable, it doesn't make much sense
674       to make identical copies of them.
675
676
677       Return a copy of the given string.
678
679
680
681       val fill : bytes -> int -> int -> char -> unit
682
683       Deprecated.  This is  a  deprecated  alias  of  Bytes.fill  /  BytesLa‐
684       bels.fill .
685
686
687
688       fill s pos len c modifies byte sequence s in place, replacing len bytes
689       by c , starting at pos .
690
691
692       Raises Invalid_argument if pos and len do not designate  a  valid  sub‐
693       string of s .
694
695
696
697       val uppercase : string -> string
698
699       Deprecated.   Functions  operating  on Latin-1 character set are depre‐
700       cated.
701
702
703       Return a copy of the argument, with all lowercase letters translated to
704       uppercase, including accented letters of the ISO Latin-1 (8859-1) char‐
705       acter set.
706
707
708
709       val lowercase : string -> string
710
711       Deprecated.  Functions operating on Latin-1 character  set  are  depre‐
712       cated.
713
714
715       Return a copy of the argument, with all uppercase letters translated to
716       lowercase, including accented letters of the ISO Latin-1 (8859-1) char‐
717       acter set.
718
719
720
721       val capitalize : string -> string
722
723       Deprecated.   Functions  operating  on Latin-1 character set are depre‐
724       cated.
725
726
727       Return a copy of the argument, with the first character set  to  upper‐
728       case, using the ISO Latin-1 (8859-1) character set..
729
730
731
732       val uncapitalize : string -> string
733
734       Deprecated.   Functions  operating  on Latin-1 character set are depre‐
735       cated.
736
737
738       Return a copy of the argument, with the first character set  to  lower‐
739       case, using the ISO Latin-1 (8859-1) character set.
740
741
742
743
744   Binary decoding of integers
745       The functions in this section binary decode integers from strings.
746
747       All following functions raise Invalid_argument if the characters needed
748       at index i to decode the integer are not available.
749
750       Little-endian (resp. big-endian) encoding means that least (resp. most)
751       significant  bytes  are stored first.  Big-endian is also known as net‐
752       work byte order.  Native-endian encoding  is  either  little-endian  or
753       big-endian depending on Sys.big_endian .
754
755       32-bit  and  64-bit  integers  are  represented  by the int32 and int64
756       types, which can be interpreted either as signed or unsigned numbers.
757
758       8-bit and 16-bit integers are represented by the int  type,  which  has
759       more bits than the binary encoding.  These extra bits are sign-extended
760       (or zero-extended) for functions which decode 8-bit or 16-bit  integers
761       and represented them with int values.
762
763       val get_uint8 : string -> int -> int
764
765
766       get_uint8  b i is b 's unsigned 8-bit integer starting at character in‐
767       dex i .
768
769
770       Since 4.13.0
771
772
773
774       val get_int8 : string -> int -> int
775
776
777       get_int8 b i is b 's signed 8-bit integer starting at character index i
778       .
779
780
781       Since 4.13.0
782
783
784
785       val get_uint16_ne : string -> int -> int
786
787
788       get_uint16_ne  b i is b 's native-endian unsigned 16-bit integer start‐
789       ing at character index i .
790
791
792       Since 4.13.0
793
794
795
796       val get_uint16_be : string -> int -> int
797
798
799       get_uint16_be b i is b 's big-endian unsigned 16-bit  integer  starting
800       at character index i .
801
802
803       Since 4.13.0
804
805
806
807       val get_uint16_le : string -> int -> int
808
809
810       get_uint16_le  b i is b 's little-endian unsigned 16-bit integer start‐
811       ing at character index i .
812
813
814       Since 4.13.0
815
816
817
818       val get_int16_ne : string -> int -> int
819
820
821       get_int16_ne b i is b 's native-endian signed 16-bit  integer  starting
822       at character index i .
823
824
825       Since 4.13.0
826
827
828
829       val get_int16_be : string -> int -> int
830
831
832       get_int16_be  b  i is b 's big-endian signed 16-bit integer starting at
833       character index i .
834
835
836       Since 4.13.0
837
838
839
840       val get_int16_le : string -> int -> int
841
842
843       get_int16_le b i is b 's little-endian signed 16-bit  integer  starting
844       at character index i .
845
846
847       Since 4.13.0
848
849
850
851       val get_int32_ne : string -> int -> int32
852
853
854       get_int32_ne b i is b 's native-endian 32-bit integer starting at char‐
855       acter index i .
856
857
858       Since 4.13.0
859
860
861
862       val get_int32_be : string -> int -> int32
863
864
865       get_int32_be b i is b 's big-endian 32-bit integer starting at  charac‐
866       ter index i .
867
868
869       Since 4.13.0
870
871
872
873       val get_int32_le : string -> int -> int32
874
875
876       get_int32_le b i is b 's little-endian 32-bit integer starting at char‐
877       acter index i .
878
879
880       Since 4.13.0
881
882
883
884       val get_int64_ne : string -> int -> int64
885
886
887       get_int64_ne b i is b 's native-endian 64-bit integer starting at char‐
888       acter index i .
889
890
891       Since 4.13.0
892
893
894
895       val get_int64_be : string -> int -> int64
896
897
898       get_int64_be  b i is b 's big-endian 64-bit integer starting at charac‐
899       ter index i .
900
901
902       Since 4.13.0
903
904
905
906       val get_int64_le : string -> int -> int64
907
908
909       get_int64_le b i is b 's little-endian 64-bit integer starting at char‐
910       acter index i .
911
912
913       Since 4.13.0
914
915
916
917
918
919OCamldoc                          2023-01-23                         String(3)
Impressum