1Stdlib.Scanf(3)                  OCaml library                 Stdlib.Scanf(3)
2
3
4

NAME

6       Stdlib.Scanf - no description
7

Module

9       Module   Stdlib.Scanf
10

Documentation

12       Module Scanf
13        : (module Stdlib__Scanf)
14
15
16
17
18
19
20
21
22
23   Introduction
24   Functional input with format strings
25       The module Scanf provides formatted input functions or scanners.
26
27       The  formatted input functions can read from any kind of input, includ‐
28       ing strings, files, or anything that can return  characters.  The  more
29       general  source  of  characters  is named a formatted input channel (or
30       scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
31       eral  formatted  input  function  reads from any scanning buffer and is
32       named bscanf .
33
34       Generally speaking, the formatted input functions have 3 arguments:
35
36       -the first argument is a source of characters for the input,
37
38       -the second argument is a format string that specifies  the  values  to
39       read,
40
41       -the  third argument is a receiver function that is applied to the val‐
42       ues read.
43
44       Hence, a typical call to the formatted input function  Scanf.bscanf  is
45       bscanf ic fmt f , where:
46
47
48       - ic is a source of characters (typically a     formatted input channel
49       with type Scanf.Scanning.in_channel ),
50
51
52       - fmt is a format string (the same format  strings  as  those  used  to
53       print material with module Printf or Format ),
54
55
56       - f is a function that has as many arguments as the number of values to
57       read in the input according to fmt .
58
59
60   A simple example
61       As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
62       teger n from the source of characters ic and returns f n .
63
64       For instance,
65
66
67       -if  we use stdin as the source of characters ( Scanf.Scanning.stdin is
68       the predefined formatted input channel that reads from standard input),
69
70
71       -if we define the receiver f as let f x = x + 1 ,
72
73       then bscanf Scanning.stdin "%d" f reads an integer n from the  standard
74       input  and  returns  f  n (that is n + 1 ). Thus, if we evaluate bscanf
75       stdin "%d" f , and then enter 41 at the keyboard, the result we get  is
76       42 .
77
78   Formatted input as a functional feature
79       The  OCaml scanning facility is reminiscent of the corresponding C fea‐
80       ture.  However, it is also largely different,  simpler,  and  yet  more
81       powerful:  the  formatted  input functions are higher-order functionals
82       and the parameter passing mechanism is just the regular function appli‐
83       cation not the variable assignment based mechanism which is typical for
84       formatted input in imperative languages; the OCaml format strings  also
85       feature  useful  additions to easily define complex tokens; as expected
86       within a functional programming language, the formatted input functions
87       also  support  polymorphism,  in  particular arbitrary interaction with
88       polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
89       put facility is fully type-checked at compile time.
90
91       Unsynchronized accesses
92
93       Unsynchronized  accesses  to a Scanf.Scanning.in_channel may lead to an
94       invalid Scanf.Scanning.in_channel state. Thus, concurrent  accesses  to
95       Scanf.Scanning.in_channel  s  must be synchronized (for instance with a
96       Mutex.t ).
97
98   Formatted input channel
99       module Scanning : sig end
100
101
102
103
104
105
106   Type of formatted input functions
107       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
108       'd, 'd) format6 -> 'c
109
110
111       The  type  of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
112       type of a formatted input function that reads from some formatted input
113       channel  according  to  some  format string; more precisely, if scan is
114       some formatted input function, then scan
115           ic fmt f applies f to all the arguments specified by format  string
116       fmt  ,  when  scan  has  read  those  arguments  from  the  Scanf.Scan‐
117       ning.in_channel formatted input channel ic .
118
119       For instance, the Scanf.scanf function below has type ('a, 'b, 'c,  'd)
120       scanner  ,  since  it  is  a  formatted  input function that reads from
121       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
122       by fmt , reading those arguments from stdin as expected.
123
124       If  the format fmt has some %r indications, the corresponding formatted
125       input functions must be provided before receiver function f .  For  in‐
126       stance,  if  read_elem is an input function for values of type t , then
127       bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
128       character, and returns f v .
129
130
131       Since 3.10.0
132
133
134       type  ('a,  'b, 'c, 'd) scanner_opt = ('a, Scanning.in_channel, 'b, 'c,
135       'a -> 'd option, 'd) format6 -> 'c
136
137
138
139
140
141       exception Scan_failure of string
142
143
144       When the input can not be read according to the format string  specifi‐
145       cation,  formatted input functions typically raise exception Scan_fail‐
146       ure .
147
148
149
150
151   The general formatted input function
152       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
153
154
155
156
157
158       bscanf ic fmt r1  ...  rN  f  reads  characters  from  the  Scanf.Scan‐
159       ning.in_channel  formatted input channel ic and converts them to values
160       according to format string fmt .  As a final step, receiver function  f
161       is applied to the values read and gives the result of the bscanf call.
162
163       For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
164       "x = 1" "%s = %i" f returns 2 .
165
166       Arguments r1 to rN are user-defined input functions that read the argu‐
167       ment  corresponding  to  the  %r  conversions  specified  in the format
168       string.
169
170       val bscanf_opt : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner_opt
171
172       Same as Scanf.bscanf , but returns None in case of scanning failure.
173
174
175       Since 5.0
176
177
178
179
180   Format string description
181       The format string is a character string which contains three  types  of
182       objects:
183
184       -plain  characters, which are simply matched with the characters of the
185       input (with a special case for space and line feed, see Scanf.space ),
186
187       -conversion specifications, each of which causes reading and conversion
188       of one argument for the function f (see Scanf.conversion ),
189
190       -scanning  indications  to  specify  boundaries of tokens (see scanning
191       Scanf.indication ).
192
193
194   The space character in format strings
195       As mentioned above, a plain character in  the  format  string  is  just
196       matched  with  the next character of the input; however, two characters
197       are special exceptions to this rule: the space character ( ' ' or ASCII
198       code 32) and the line feed character ( '\n' or ASCII code 10).  A space
199       does not match a single space character, but any amount of 'whitespace'
200       in  the input. More precisely, a space inside the format string matches
201       any number of tab, space, line feed  and  carriage  return  characters.
202       Similarly,  a line feed character in the format string matches either a
203       single line feed or a carriage return followed by a line feed.
204
205       Matching any amount of whitespace, a space in the  format  string  also
206       matches no amount of whitespace at all; hence, the call bscanf ib
207            "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
208       input with various whitespace in it, such as Price = 1 $ , Price  =   1
209       $ , or even Price=1$ .
210
211   Conversion specifications in format strings
212       Conversion  specifications  consist  in the % character, followed by an
213       optional flag, an optional field width, and followed by one or two con‐
214       version characters.
215
216       The conversion characters and their meanings are:
217
218
219       - d : reads an optionally signed decimal integer ( 0-9 +).
220
221       -  i  : reads an optionally signed integer (usual input conventions for
222       decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal  (
223       0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
224
225       - u : reads an unsigned decimal integer.
226
227       - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
228
229       - o : reads an unsigned octal integer ( [0-7]+ ).
230
231       -  s  : reads a string argument that spreads as much as possible, until
232       the following bounding condition holds:
233
234       -a whitespace has been found (see Scanf.space ),
235
236       -a scanning indication (see scanning Scanf.indication )  has  been  en‐
237       countered,
238
239       -the end-of-input has been reached.
240
241       Hence,  this  conversion always succeeds: it returns an empty string if
242       the bounding condition holds when the scan begins.
243
244       - S : reads a delimited string argument (delimiters and special escaped
245       characters follow the lexical conventions of OCaml).
246
247       -  c  :  reads  a single character. To test the current input character
248       without reading it, specify a null field width, i.e. use  specification
249       %0c  .  Raise  Invalid_argument  ,  if the field width specification is
250       greater than 1.
251
252       - C : reads a single delimited character (delimiters  and  special  es‐
253       caped characters follow the lexical conventions of OCaml).
254
255       -  f , e , E , g , G : reads an optionally signed floating-point number
256       in decimal notation, in the style dddd.ddd
257             e/E+-dd .
258
259       - h , H : reads an optionally signed floating-point number in hexadeci‐
260       mal notation.
261
262       -  F  :  reads a floating point number according to the lexical conven‐
263       tions of OCaml (hence the decimal point is mandatory  if  the  exponent
264       part is not mentioned).
265
266       - B : reads a boolean argument ( true or false ).
267
268       -  b : reads a boolean argument (for backward compatibility; do not use
269       in new programs).
270
271       - ld , li , lu , lx , lX , lo : reads an int32 argument to  the  format
272       specified by the second letter for regular integers.
273
274       -  nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
275       mat specified by the second letter for regular integers.
276
277       - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to  the  format
278       specified by the second letter for regular integers.
279
280       -  [ range ] : reads characters that matches one of the characters men‐
281       tioned in the range of characters range (or not mentioned in it, if the
282       range  starts  with  ^ ). Reads a string that can be empty, if the next
283       input character does not match the range. The set of characters from c1
284       to  c2  (inclusively)  is  denoted  by c1-c2 .  Hence, %[0-9] returns a
285       string representing a decimal number or an empty string if  no  decimal
286       digit  is  found;  similarly, %[0-9a-f] returns a string of hexadecimal
287       digits.  If a closing bracket appears in a range, it must occur as  the
288       first  character  of  the  range  (or just after the ^ in case of range
289       negation); hence []] matches a ] character and [^]] matches any charac‐
290       ter that is not ] .  Use %% and %@ to include a % or a @ in a range.
291
292       -  r  : user-defined reader. Takes the next ri formatted input function
293       and applies it to the scanning buffer ib to read the next argument. The
294       input  function  ri  must therefore have type Scanning.in_channel -> 'a
295       and the argument read has type 'a .
296
297       - { fmt %} : reads a format string argument.  The  format  string  read
298       must  have  the  same type as the format string specification fmt . For
299       instance, "%{ %i %}" reads any format string that can read a  value  of
300       type  int  ;  hence,  if  s is the string "fmt:\"number is %u\"" , then
301       Scanf.sscanf s "fmt: %{%i%}" succeeds and  returns  the  format  string
302       "number is %u" .
303
304       -  (  fmt %) : scanning sub-format substitution.  Reads a format string
305       rf in the input, then goes on scanning with rf instead of scanning with
306       fmt  .   The  format  string  rf  must have the same type as the format
307       string specification fmt that it replaces.  For instance,  "%(  %i  %)"
308       reads  any  format string that can read a value of type int .  The con‐
309       version returns the format string read rf , and then a value read using
310       rf  .  Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
311       "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) .   This  be‐
312       haviour  is  not mere format substitution, since the conversion returns
313       the format string read as additional argument. If you need pure  format
314       substitution,  use  special  flag _ to discard the extraneous argument:
315       conversion %_( fmt %) reads a format string rf  and  then  behaves  the
316       same  as format string rf .  Hence, if s is the string "\"%4d\"1234.00"
317       , then Scanf.sscanf s "%_(%i%)" is simply  equivalent  to  Scanf.sscanf
318       "1234.00" "%4d" .
319
320       - l : returns the number of lines read so far.
321
322       - n : returns the number of characters read so far.
323
324       - N or L : returns the number of tokens read so far.
325
326       - !  : matches the end of input condition.
327
328       - % : matches one % character in the input.
329
330       - @ : matches one @ character in the input.
331
332       - , : does nothing.
333
334       Following  the  %  character that introduces a conversion, there may be
335       the special flag _ : the conversion that follows occurs as  usual,  but
336       the  resulting  value is discarded.  For instance, if f is the function
337       fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
338       = %i" f returns 2 .
339
340       The  field  width is composed of an optional integer literal indicating
341       the maximal width of the token to read.  For instance, %6d reads an in‐
342       teger,  having at most 6 decimal digits; %4f reads a float with at most
343       4 characters; and %8[\000-\255] returns the next 8 characters  (or  all
344       the  characters  still available, if fewer than 8 characters are avail‐
345       able in the input).
346
347       Notes:
348
349
350       -as mentioned above, a %s conversion always succeeds, even if there  is
351       nothing to read in the input: in this case, it simply returns "" .
352
353
354       -in  addition  to the relevant digits, '_' characters may appear inside
355       numbers (this is reminiscent to the usual OCaml  lexical  conventions).
356       If  stricter scanning is desired, use the range conversion facility in‐
357       stead of the number conversions.
358
359
360       -the scanf facility is not intended for heavy duty lexical analysis and
361       parsing.  If  it  appears not expressive enough for your needs, several
362       alternative exists: regular expressions (module Str ), stream  parsers,
363       ocamllex -generated lexers, ocamlyacc -generated parsers.
364
365
366   Scanning indications in format strings
367       Scanning indications appear just after the string conversions %s and %[
368       range ] to delimit the end of the token. A scanning indication  is  in‐
369       troduced  by  a  @  character,  followed by some plain character c . It
370       means that the string token should end just before the next matching  c
371       (which  is skipped). If no c character is encountered, the string token
372       spreads as much as possible. For instance, "%s@\t" reads a string up to
373       the next tab character or to the end of input. If a @ character appears
374       anywhere else in the format string, it is treated as a plain character.
375
376       Note:
377
378
379       -As usual in format strings, % and @ characters must be  escaped  using
380       %% and %@ ; this rule still holds within range specifications and scan‐
381       ning indications.  For instance, format "%s@%%" reads a  string  up  to
382       the  next % character, and format "%s@%@" reads a string up to the next
383       @ .
384
385       -The scanning indications introduce slight differences in the syntax of
386       Scanf  format  strings,  compared  to those used for the Printf module.
387       However, the scanning indications are similar to those used in the For‐
388       mat  module;  hence,  when  producing  formatted  text to be scanned by
389       Scanf.bscanf , it is wise to use printing  functions  from  the  Format
390       module  (or, if you need to use functions from Printf , banish or care‐
391       fully double check the format strings that contain '@' characters).
392
393
394   Exceptions during scanning
395       Scanners may raise the following exceptions when the  input  cannot  be
396       read according to the format string:
397
398
399       -Raise Scanf.Scan_failure if the input does not match the format.
400
401
402       -Raise Failure if a conversion to a number is not possible.
403
404
405       -Raise  End_of_file  if the end of input is encountered while some more
406       characters are needed to read the current conversion specification.
407
408
409       -Raise Invalid_argument if the format string is invalid.
410
411       Note:
412
413
414       -as a consequence, scanning a  %s  conversion  never  raises  exception
415       End_of_file  :  if  the end of input is reached the conversion succeeds
416       and simply returns the characters read so far, or "" if none were  ever
417       read.
418
419
420   Specialised formatted input functions
421       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
422
423       Same as Scanf.bscanf , but reads from the given string.
424
425
426
427       val sscanf_opt : string -> ('a, 'b, 'c, 'd) scanner_opt
428
429       Same as Scanf.sscanf , but returns None in case of scanning failure.
430
431
432       Since 5.0
433
434
435
436       val scanf : ('a, 'b, 'c, 'd) scanner
437
438       Same  as  Scanf.bscanf  , but reads from the predefined formatted input
439       channel Scanf.Scanning.stdin that is connected to stdin .
440
441
442
443       val scanf_opt : ('a, 'b, 'c, 'd) scanner_opt
444
445       Same as Scanf.scanf , but returns None in case of scanning failure.
446
447
448       Since 5.0
449
450
451
452       val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn ->  'd)
453       -> ('a, 'b, 'c, 'd) scanner
454
455       Same  as  Scanf.bscanf  ,  but takes an additional function argument ef
456       that is called in case of error: if the scanning process or  some  con‐
457       version  fails,  the  scanning function aborts and calls the error han‐
458       dling function ef with the formatted input channel  and  the  exception
459       that aborted the scanning process as arguments.
460
461
462
463       val  ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
464       'c, 'd) scanner
465
466       Same as Scanf.kscanf but reads from the given string.
467
468
469       Since 4.02.0
470
471
472
473
474   Reading format strings from input
475       val bscanf_format : Scanning.in_channel -> ('a, 'b,  'c,  'd,  'e,  'f)
476       format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
477
478
479       bscanf_format  ic  fmt f reads a format string token from the formatted
480       input channel ic , according to the given format string fmt ,  and  ap‐
481       plies f to the resulting format string value.
482
483
484       Since 3.09.0
485
486
487       Raises  Scan_failure  if the format string value read does not have the
488       same type as fmt .
489
490
491
492       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
493       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
494
495       Same as Scanf.bscanf_format , but reads from the given string.
496
497
498       Since 3.09.0
499
500
501
502       val  format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
503       ('a, 'b, 'c, 'd, 'e, 'f) format6
504
505
506       format_from_string s fmt converts a string argument to a format string,
507       according to the given format string fmt .
508
509
510       Since 3.10.0
511
512
513       Raises Scan_failure if s , considered as a format string, does not have
514       the same type as fmt .
515
516
517
518       val unescaped : string -> string
519
520
521       unescaped s return a copy of s with escape sequences (according to  the
522       lexical  conventions  of OCaml) replaced by their corresponding special
523       characters.  More precisely, Scanf.unescaped has  the  following  prop‐
524       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
525
526       Always  return  a  copy of the argument, even if there is no escape se‐
527       quence in the argument.
528
529
530       Since 4.00.0
531
532
533       Raises Scan_failure if s is not properly escaped (i.e.  s  has  invalid
534       escape  sequences or special characters that are not properly escaped).
535       For instance, Scanf.unescaped "\"" will fail.
536
537
538
539
540
541OCamldoc                          2023-07-20                   Stdlib.Scanf(3)
Impressum