1StringLabels(3) OCaml library StringLabels(3)
2
3
4
6 StringLabels - Strings.
7
9 Module StringLabels
10
12 Module StringLabels
13 : sig end
14
15
16 Strings.
17
18 A string s of length n is an indexable and immutable sequence of n
19 bytes. For historical reasons these bytes are referred to as charac‐
20 ters.
21
22 The semantics of string functions is defined in terms of indices and
23 positions. These are depicted and described as follows.
24
25 positions 0 1 2 3 4 n-1 n +---+---+---+---+ +-----+
26 indices | 0 | 1 | 2 | 3 | ... | n-1 | +---+---+---+---+ +-----+
27
28 -An index i of s is an integer in the range [ 0 ; n-1 ]. It represents
29 the i th byte (character) of s which can be accessed using the constant
30 time string indexing operator s.[i] .
31
32 -A position i of s is an integer in the range [ 0 ; n ]. It represents
33 either the point at the beginning of the string, or the point between
34 two indices, or the point at the end of the string. The i th byte index
35 is between position i and i+1 .
36
37
38 Two integers start and len are said to define a valid substring of s if
39 len >= 0 and start , start+len are positions of s .
40
41 Unicode text. Strings being arbitrary sequences of bytes, they can hold
42 any kind of textual encoding. However the recommended encoding for
43 storing Unicode text in OCaml strings is UTF-8. This is the encoding
44 used by Unicode escapes in string literals. For example the string
45 "\u{1F42B}" is the UTF-8 encoding of the Unicode character U+1F42B.
46
47 Past mutability. Before OCaml 4.02, strings used to be modifiable in
48 place like Bytes.t mutable sequences of bytes. OCaml 4 had various
49 compiler flags and configuration options to support the transition pe‐
50 riod from mutable to immutable strings. Those options are no longer
51 available, and strings are now always immutable.
52
53 The labeled version of this module can be used as described in the Std‐
54 Labels module.
55
56
57
58
59
60
61
62 Strings
63 type t = string
64
65
66 The type for strings.
67
68
69
70 val make : int -> char -> string
71
72
73 make n c is a string of length n with each index holding the character
74 c .
75
76
77 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
78
79
80
81 val init : int -> f:(int -> char) -> string
82
83
84 init n ~f is a string of length n with index i holding the character f
85 i (called in increasing index order).
86
87
88 Since 4.02.0
89
90
91 Raises Invalid_argument if n < 0 or n > Sys.max_string_length .
92
93
94
95 val empty : string
96
97 The empty string.
98
99
100 Since 4.13.0
101
102
103
104 val of_bytes : bytes -> string
105
106 Return a new string that contains the same bytes as the given byte se‐
107 quence.
108
109
110 Since 4.13.0
111
112
113
114 val to_bytes : string -> bytes
115
116 Return a new byte sequence that contains the same bytes as the given
117 string.
118
119
120 Since 4.13.0
121
122
123
124 val length : string -> int
125
126
127 length s is the length (number of bytes/characters) of s .
128
129
130
131 val get : string -> int -> char
132
133
134 get s i is the character at index i in s . This is the same as writing
135 s.[i] .
136
137
138 Raises Invalid_argument if i not an index of s .
139
140
141
142
143 Concatenating
144 Note. The (^) binary operator concatenates two strings.
145
146 val concat : sep:string -> string list -> string
147
148
149 concat ~sep ss concatenates the list of strings ss , inserting the sep‐
150 arator string sep between each.
151
152
153 Raises Invalid_argument if the result is longer than
154 Sys.max_string_length bytes.
155
156
157
158 val cat : string -> string -> string
159
160
161 cat s1 s2 concatenates s1 and s2 ( s1 ^ s2 ).
162
163
164 Since 4.13.0
165
166
167 Raises Invalid_argument if the result is longer than
168 Sys.max_string_length bytes.
169
170
171
172
173 Predicates and comparisons
174 val equal : t -> t -> bool
175
176
177 equal s0 s1 is true if and only if s0 and s1 are character-wise equal.
178
179
180 Since 4.05.0
181
182
183
184 val compare : t -> t -> int
185
186
187 compare s0 s1 sorts s0 and s1 in lexicographical order. compare be‐
188 haves like compare on strings but may be more efficient.
189
190
191
192 val starts_with : prefix:string -> string -> bool
193
194
195 starts_with ~prefix s is true if and only if s starts with prefix .
196
197
198 Since 4.13.0
199
200
201
202 val ends_with : suffix:string -> string -> bool
203
204
205 ends_with ~suffix s is true if and only if s ends with suffix .
206
207
208 Since 4.13.0
209
210
211
212 val contains_from : string -> int -> char -> bool
213
214
215 contains_from s start c is true if and only if c appears in s after po‐
216 sition start .
217
218
219 Raises Invalid_argument if start is not a valid position in s .
220
221
222
223 val rcontains_from : string -> int -> char -> bool
224
225
226 rcontains_from s stop c is true if and only if c appears in s before
227 position stop+1 .
228
229
230 Raises Invalid_argument if stop < 0 or stop+1 is not a valid position
231 in s .
232
233
234
235 val contains : string -> char -> bool
236
237
238 contains s c is String.contains_from s 0 c .
239
240
241
242
243 Extracting substrings
244 val sub : string -> pos:int -> len:int -> string
245
246
247 sub s ~pos ~len is a string of length len , containing the substring of
248 s that starts at position pos and has length len .
249
250
251 Raises Invalid_argument if pos and len do not designate a valid sub‐
252 string of s .
253
254
255
256 val split_on_char : sep:char -> string -> string list
257
258
259 split_on_char ~sep s is the list of all (possibly empty) substrings of
260 s that are delimited by the character sep .
261
262 The function's result is specified by the following invariants:
263
264 -The list is not empty.
265
266 -Concatenating its elements using sep as a separator returns a string
267 equal to the input ( concat (make 1 sep)
268 (split_on_char sep s) = s ).
269
270 -No string in the result contains the sep character.
271
272
273
274 Since 4.05.0
275
276
277
278
279 Transforming
280 val map : f:(char -> char) -> string -> string
281
282
283 map f s is the string resulting from applying f to all the characters
284 of s in increasing order.
285
286
287 Since 4.00.0
288
289
290
291 val mapi : f:(int -> char -> char) -> string -> string
292
293
294 mapi ~f s is like StringLabels.map but the index of the character is
295 also passed to f .
296
297
298 Since 4.02.0
299
300
301
302 val fold_left : f:('a -> char -> 'a) -> init:'a -> string -> 'a
303
304
305 fold_left f x s computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1] ,
306 where n is the length of the string s .
307
308
309 Since 4.13.0
310
311
312
313 val fold_right : f:(char -> 'a -> 'a) -> string -> init:'a -> 'a
314
315
316 fold_right f s x computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...)) ,
317 where n is the length of the string s .
318
319
320 Since 4.13.0
321
322
323
324 val for_all : f:(char -> bool) -> string -> bool
325
326
327 for_all p s checks if all characters in s satisfy the predicate p .
328
329
330 Since 4.13.0
331
332
333
334 val exists : f:(char -> bool) -> string -> bool
335
336
337 exists p s checks if at least one character of s satisfies the predi‐
338 cate p .
339
340
341 Since 4.13.0
342
343
344
345 val trim : string -> string
346
347
348 trim s is s without leading and trailing whitespace. Whitespace charac‐
349 ters are: ' ' , '\x0C' (form feed), '\n' , '\r' , and '\t' .
350
351
352 Since 4.00.0
353
354
355
356 val escaped : string -> string
357
358
359 escaped s is s with special characters represented by escape sequences,
360 following the lexical conventions of OCaml.
361
362 All characters outside the US-ASCII printable range [0x20;0x7E] are es‐
363 caped, as well as backslash (0x2F) and double-quote (0x22).
364
365 The function Scanf.unescaped is a left inverse of escaped , i.e.
366 Scanf.unescaped (escaped s) = s for any string s (unless escaped s
367 fails).
368
369
370 Raises Invalid_argument if the result is longer than
371 Sys.max_string_length bytes.
372
373
374
375 val uppercase_ascii : string -> string
376
377
378 uppercase_ascii s is s with all lowercase letters translated to upper‐
379 case, using the US-ASCII character set.
380
381
382 Since 4.05.0
383
384
385
386 val lowercase_ascii : string -> string
387
388
389 lowercase_ascii s is s with all uppercase letters translated to lower‐
390 case, using the US-ASCII character set.
391
392
393 Since 4.05.0
394
395
396
397 val capitalize_ascii : string -> string
398
399
400 capitalize_ascii s is s with the first character set to uppercase, us‐
401 ing the US-ASCII character set.
402
403
404 Since 4.05.0
405
406
407
408 val uncapitalize_ascii : string -> string
409
410
411 uncapitalize_ascii s is s with the first character set to lowercase,
412 using the US-ASCII character set.
413
414
415 Since 4.05.0
416
417
418
419
420 Traversing
421 val iter : f:(char -> unit) -> string -> unit
422
423
424 iter ~f s applies function f in turn to all the characters of s . It
425 is equivalent to f s.[0]; f s.[1]; ...; f s.[length s - 1]; () .
426
427
428
429 val iteri : f:(int -> char -> unit) -> string -> unit
430
431
432 iteri is like StringLabels.iter , but the function is also given the
433 corresponding character index.
434
435
436 Since 4.00.0
437
438
439
440
441 Searching
442 val index_from : string -> int -> char -> int
443
444
445 index_from s i c is the index of the first occurrence of c in s after
446 position i .
447
448
449 Raises Not_found if c does not occur in s after position i .
450
451
452 Raises Invalid_argument if i is not a valid position in s .
453
454
455
456 val index_from_opt : string -> int -> char -> int option
457
458
459 index_from_opt s i c is the index of the first occurrence of c in s af‐
460 ter position i (if any).
461
462
463 Since 4.05
464
465
466 Raises Invalid_argument if i is not a valid position in s .
467
468
469
470 val rindex_from : string -> int -> char -> int
471
472
473 rindex_from s i c is the index of the last occurrence of c in s before
474 position i+1 .
475
476
477 Raises Not_found if c does not occur in s before position i+1 .
478
479
480 Raises Invalid_argument if i+1 is not a valid position in s .
481
482
483
484 val rindex_from_opt : string -> int -> char -> int option
485
486
487 rindex_from_opt s i c is the index of the last occurrence of c in s be‐
488 fore position i+1 (if any).
489
490
491 Since 4.05
492
493
494 Raises Invalid_argument if i+1 is not a valid position in s .
495
496
497
498 val index : string -> char -> int
499
500
501 index s c is String.index_from s 0 c .
502
503
504
505 val index_opt : string -> char -> int option
506
507
508 index_opt s c is String.index_from_opt s 0 c .
509
510
511 Since 4.05
512
513
514
515 val rindex : string -> char -> int
516
517
518 rindex s c is String.rindex_from s (length s - 1) c .
519
520
521
522 val rindex_opt : string -> char -> int option
523
524
525 rindex_opt s c is String.rindex_from_opt s (length s - 1) c .
526
527
528 Since 4.05
529
530
531
532
533 Strings and Sequences
534 val to_seq : t -> char Seq.t
535
536
537 to_seq s is a sequence made of the string's characters in increasing
538 order. In "unsafe-string" mode, modifications of the string during it‐
539 eration will be reflected in the sequence.
540
541
542 Since 4.07
543
544
545
546 val to_seqi : t -> (int * char) Seq.t
547
548
549 to_seqi s is like StringLabels.to_seq but also tuples the corresponding
550 index.
551
552
553 Since 4.07
554
555
556
557 val of_seq : char Seq.t -> t
558
559
560 of_seq s is a string made of the sequence's characters.
561
562
563 Since 4.07
564
565
566
567
568 UTF decoding and validations
569 UTF-8
570 val get_utf_8_uchar : t -> int -> Uchar.utf_decode
571
572
573 get_utf_8_uchar b i decodes an UTF-8 character at index i in b .
574
575
576
577 val is_valid_utf_8 : t -> bool
578
579
580 is_valid_utf_8 b is true if and only if b contains valid UTF-8 data.
581
582
583
584
585 UTF-16BE
586 val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
587
588
589 get_utf_16be_uchar b i decodes an UTF-16BE character at index i in b .
590
591
592
593 val is_valid_utf_16be : t -> bool
594
595
596 is_valid_utf_16be b is true if and only if b contains valid UTF-16BE
597 data.
598
599
600
601
602 UTF-16LE
603 val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
604
605
606 get_utf_16le_uchar b i decodes an UTF-16LE character at index i in b .
607
608
609
610 val is_valid_utf_16le : t -> bool
611
612
613 is_valid_utf_16le b is true if and only if b contains valid UTF-16LE
614 data.
615
616
617
618 val blit : src:string -> src_pos:int -> dst:bytes -> dst_pos:int ->
619 len:int -> unit
620
621
622 blit ~src ~src_pos ~dst ~dst_pos ~len copies len bytes from the string
623 src , starting at index src_pos , to byte sequence dst , starting at
624 character number dst_pos .
625
626
627 Raises Invalid_argument if src_pos and len do not designate a valid
628 range of src , or if dst_pos and len do not designate a valid range of
629 dst .
630
631
632
633
634 Binary decoding of integers
635 The functions in this section binary decode integers from strings.
636
637 All following functions raise Invalid_argument if the characters needed
638 at index i to decode the integer are not available.
639
640 Little-endian (resp. big-endian) encoding means that least (resp. most)
641 significant bytes are stored first. Big-endian is also known as net‐
642 work byte order. Native-endian encoding is either little-endian or
643 big-endian depending on Sys.big_endian .
644
645 32-bit and 64-bit integers are represented by the int32 and int64
646 types, which can be interpreted either as signed or unsigned numbers.
647
648 8-bit and 16-bit integers are represented by the int type, which has
649 more bits than the binary encoding. These extra bits are sign-extended
650 (or zero-extended) for functions which decode 8-bit or 16-bit integers
651 and represented them with int values.
652
653 val get_uint8 : string -> int -> int
654
655
656 get_uint8 b i is b 's unsigned 8-bit integer starting at character in‐
657 dex i .
658
659
660 Since 4.13.0
661
662
663
664 val get_int8 : string -> int -> int
665
666
667 get_int8 b i is b 's signed 8-bit integer starting at character index i
668 .
669
670
671 Since 4.13.0
672
673
674
675 val get_uint16_ne : string -> int -> int
676
677
678 get_uint16_ne b i is b 's native-endian unsigned 16-bit integer start‐
679 ing at character index i .
680
681
682 Since 4.13.0
683
684
685
686 val get_uint16_be : string -> int -> int
687
688
689 get_uint16_be b i is b 's big-endian unsigned 16-bit integer starting
690 at character index i .
691
692
693 Since 4.13.0
694
695
696
697 val get_uint16_le : string -> int -> int
698
699
700 get_uint16_le b i is b 's little-endian unsigned 16-bit integer start‐
701 ing at character index i .
702
703
704 Since 4.13.0
705
706
707
708 val get_int16_ne : string -> int -> int
709
710
711 get_int16_ne b i is b 's native-endian signed 16-bit integer starting
712 at character index i .
713
714
715 Since 4.13.0
716
717
718
719 val get_int16_be : string -> int -> int
720
721
722 get_int16_be b i is b 's big-endian signed 16-bit integer starting at
723 character index i .
724
725
726 Since 4.13.0
727
728
729
730 val get_int16_le : string -> int -> int
731
732
733 get_int16_le b i is b 's little-endian signed 16-bit integer starting
734 at character index i .
735
736
737 Since 4.13.0
738
739
740
741 val get_int32_ne : string -> int -> int32
742
743
744 get_int32_ne b i is b 's native-endian 32-bit integer starting at char‐
745 acter index i .
746
747
748 Since 4.13.0
749
750
751
752 val hash : t -> int
753
754 An unseeded hash function for strings, with the same output value as
755 Hashtbl.hash . This function allows this module to be passed as argu‐
756 ment to the functor Hashtbl.Make .
757
758
759 Since 5.0.0
760
761
762
763 val seeded_hash : int -> t -> int
764
765 A seeded hash function for strings, with the same output value as
766 Hashtbl.seeded_hash . This function allows this module to be passed as
767 argument to the functor Hashtbl.MakeSeeded .
768
769
770 Since 5.0.0
771
772
773
774 val get_int32_be : string -> int -> int32
775
776
777 get_int32_be b i is b 's big-endian 32-bit integer starting at charac‐
778 ter index i .
779
780
781 Since 4.13.0
782
783
784
785 val get_int32_le : string -> int -> int32
786
787
788 get_int32_le b i is b 's little-endian 32-bit integer starting at char‐
789 acter index i .
790
791
792 Since 4.13.0
793
794
795
796 val get_int64_ne : string -> int -> int64
797
798
799 get_int64_ne b i is b 's native-endian 64-bit integer starting at char‐
800 acter index i .
801
802
803 Since 4.13.0
804
805
806
807 val get_int64_be : string -> int -> int64
808
809
810 get_int64_be b i is b 's big-endian 64-bit integer starting at charac‐
811 ter index i .
812
813
814 Since 4.13.0
815
816
817
818 val get_int64_le : string -> int -> int64
819
820
821 get_int64_le b i is b 's little-endian 64-bit integer starting at char‐
822 acter index i .
823
824
825 Since 4.13.0
826
827
828
829
830
831OCamldoc 2023-07-20 StringLabels(3)