1Scanf(3)                         OCaml library                        Scanf(3)
2
3
4

NAME

6       Scanf - Formatted input functions.
7

Module

9       Module   Scanf
10

Documentation

12       Module Scanf
13        : sig end
14
15
16       Formatted input functions.
17
18
19       Alert   unsynchronized_access.    Unsynchronized   accesses   to  Scan‐
20       ning.in_channel are a programming error.
21
22
23
24
25
26
27
28   Introduction
29   Functional input with format strings
30       The module Scanf provides formatted input functions or scanners.
31
32       The formatted input functions can read from any kind of input,  includ‐
33       ing  strings,  files,  or anything that can return characters. The more
34       general source of characters is named a  formatted  input  channel  (or
35       scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
36       eral formatted input function reads from any  scanning  buffer  and  is
37       named bscanf .
38
39       Generally speaking, the formatted input functions have 3 arguments:
40
41       -the first argument is a source of characters for the input,
42
43       -the  second  argument  is a format string that specifies the values to
44       read,
45
46       -the third argument is a receiver function that is applied to the  val‐
47       ues read.
48
49       Hence,  a  typical call to the formatted input function Scanf.bscanf is
50       bscanf ic fmt f , where:
51
52
53       - ic is a source of characters (typically a     formatted input channel
54       with type Scanf.Scanning.in_channel ),
55
56
57       -  fmt  is  a  format  string (the same format strings as those used to
58       print material with module Printf or Format ),
59
60
61       - f is a function that has as many arguments as the number of values to
62       read in the input according to fmt .
63
64
65   A simple example
66       As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
67       teger n from the source of characters ic and returns f n .
68
69       For instance,
70
71
72       -if we use stdin as the source of characters ( Scanf.Scanning.stdin  is
73       the predefined formatted input channel that reads from standard input),
74
75
76       -if we define the receiver f as let f x = x + 1 ,
77
78       then  bscanf Scanning.stdin "%d" f reads an integer n from the standard
79       input and returns f n (that is n + 1 ). Thus,  if  we  evaluate  bscanf
80       stdin  "%d" f , and then enter 41 at the keyboard, the result we get is
81       42 .
82
83   Formatted input as a functional feature
84       The OCaml scanning facility is reminiscent of the corresponding C  fea‐
85       ture.   However,  it  is  also largely different, simpler, and yet more
86       powerful: the formatted input functions  are  higher-order  functionals
87       and the parameter passing mechanism is just the regular function appli‐
88       cation not the variable assignment based mechanism which is typical for
89       formatted  input in imperative languages; the OCaml format strings also
90       feature useful additions to easily define complex tokens;  as  expected
91       within a functional programming language, the formatted input functions
92       also support polymorphism, in  particular  arbitrary  interaction  with
93       polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
94       put facility is fully type-checked at compile time.
95
96       Unsynchronized accesses
97
98       Unsynchronized accesses to a Scanf.Scanning.in_channel may lead  to  an
99       invalid  Scanf.Scanning.in_channel  state. Thus, concurrent accesses to
100       Scanf.Scanning.in_channel s must be synchronized (for instance  with  a
101       Mutex.t ).
102
103   Formatted input channel
104       module Scanning : sig end
105
106
107
108
109
110
111   Type of formatted input functions
112       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
113       'd, 'd) format6 -> 'c
114
115
116       The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner  is  the
117       type of a formatted input function that reads from some formatted input
118       channel according to some format string; more  precisely,  if  scan  is
119       some formatted input function, then scan
120            ic fmt f applies f to all the arguments specified by format string
121       fmt  ,  when  scan  has  read  those  arguments  from  the  Scanf.Scan‐
122       ning.in_channel formatted input channel ic .
123
124       For  instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
125       scanner , since it is  a  formatted  input  function  that  reads  from
126       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
127       by fmt , reading those arguments from stdin as expected.
128
129       If the format fmt has some %r indications, the corresponding  formatted
130       input  functions  must be provided before receiver function f . For in‐
131       stance, if read_elem is an input function for values of type t  ,  then
132       bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
133       character, and returns f v .
134
135
136       Since 3.10.0
137
138
139       type ('a, 'b, 'c, 'd) scanner_opt = ('a, Scanning.in_channel,  'b,  'c,
140       'a -> 'd option, 'd) format6 -> 'c
141
142
143
144
145
146       exception Scan_failure of string
147
148
149       When  the input can not be read according to the format string specifi‐
150       cation, formatted input functions typically raise exception  Scan_fail‐
151       ure .
152
153
154
155
156   The general formatted input function
157       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
158
159
160
161
162
163       bscanf  ic  fmt  r1  ...  rN  f  reads  characters from the Scanf.Scan‐
164       ning.in_channel formatted input channel ic and converts them to  values
165       according  to format string fmt .  As a final step, receiver function f
166       is applied to the values read and gives the result of the bscanf call.
167
168       For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
169       "x = 1" "%s = %i" f returns 2 .
170
171       Arguments r1 to rN are user-defined input functions that read the argu‐
172       ment corresponding to  the  %r  conversions  specified  in  the  format
173       string.
174
175       val bscanf_opt : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner_opt
176
177       Same as Scanf.bscanf , but returns None in case of scanning failure.
178
179
180       Since 5.0
181
182
183
184
185   Format string description
186       The  format  string is a character string which contains three types of
187       objects:
188
189       -plain characters, which are simply matched with the characters of  the
190       input (with a special case for space and line feed, see Scanf.space ),
191
192       -conversion specifications, each of which causes reading and conversion
193       of one argument for the function f (see Scanf.conversion ),
194
195       -scanning indications to specify boundaries  of  tokens  (see  scanning
196       Scanf.indication ).
197
198
199   The space character in format strings
200       As  mentioned  above,  a  plain  character in the format string is just
201       matched with the next character of the input; however,  two  characters
202       are special exceptions to this rule: the space character ( ' ' or ASCII
203       code 32) and the line feed character ( '\n' or ASCII code 10).  A space
204       does not match a single space character, but any amount of 'whitespace'
205       in the input. More precisely, a space inside the format string  matches
206       any  number  of  tab,  space, line feed and carriage return characters.
207       Similarly, a line feed character in the format string matches either  a
208       single line feed or a carriage return followed by a line feed.
209
210       Matching  any  amount  of whitespace, a space in the format string also
211       matches no amount of whitespace at all; hence, the call bscanf ib
212           "Price = %d $" (fun p -> p) succeeds and returns 1 when reading  an
213       input  with various whitespace in it, such as Price = 1 $ , Price  =  1
214       $ , or even Price=1$ .
215
216   Conversion specifications in format strings
217       Conversion specifications consist in the % character,  followed  by  an
218       optional flag, an optional field width, and followed by one or two con‐
219       version characters.
220
221       The conversion characters and their meanings are:
222
223
224       - d : reads an optionally signed decimal integer ( 0-9 +).
225
226       - i : reads an optionally signed integer (usual input  conventions  for
227       decimal  ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal (
228       0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
229
230       - u : reads an unsigned decimal integer.
231
232       - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
233
234       - o : reads an unsigned octal integer ( [0-7]+ ).
235
236       - s : reads a string argument that spreads as much as  possible,  until
237       the following bounding condition holds:
238
239       -a whitespace has been found (see Scanf.space ),
240
241       -a  scanning  indication  (see scanning Scanf.indication ) has been en‐
242       countered,
243
244       -the end-of-input has been reached.
245
246       Hence, this conversion always succeeds: it returns an empty  string  if
247       the bounding condition holds when the scan begins.
248
249       - S : reads a delimited string argument (delimiters and special escaped
250       characters follow the lexical conventions of OCaml).
251
252       - c : reads a single character. To test  the  current  input  character
253       without  reading it, specify a null field width, i.e. use specification
254       %0c . Raise Invalid_argument , if  the  field  width  specification  is
255       greater than 1.
256
257       -  C  :  reads a single delimited character (delimiters and special es‐
258       caped characters follow the lexical conventions of OCaml).
259
260       - f , e , E , g , G : reads an optionally signed floating-point  number
261       in decimal notation, in the style dddd.ddd
262             e/E+-dd .
263
264       - h , H : reads an optionally signed floating-point number in hexadeci‐
265       mal notation.
266
267       - F : reads a floating point number according to  the  lexical  conven‐
268       tions  of  OCaml  (hence the decimal point is mandatory if the exponent
269       part is not mentioned).
270
271       - B : reads a boolean argument ( true or false ).
272
273       - b : reads a boolean argument (for backward compatibility; do not  use
274       in new programs).
275
276       -  ld  , li , lu , lx , lX , lo : reads an int32 argument to the format
277       specified by the second letter for regular integers.
278
279       - nd , ni , nu , nx , nX , no : reads a nativeint argument to the  for‐
280       mat specified by the second letter for regular integers.
281
282       -  Ld  , Li , Lu , Lx , LX , Lo : reads an int64 argument to the format
283       specified by the second letter for regular integers.
284
285       - [ range ] : reads characters that matches one of the characters  men‐
286       tioned in the range of characters range (or not mentioned in it, if the
287       range starts with ^ ). Reads a string that can be empty,  if  the  next
288       input character does not match the range. The set of characters from c1
289       to c2 (inclusively) is denoted by c1-c2  .   Hence,  %[0-9]  returns  a
290       string  representing  a decimal number or an empty string if no decimal
291       digit is found; similarly, %[0-9a-f] returns a  string  of  hexadecimal
292       digits.   If a closing bracket appears in a range, it must occur as the
293       first character of the range (or just after the  ^  in  case  of  range
294       negation); hence []] matches a ] character and [^]] matches any charac‐
295       ter that is not ] .  Use %% and %@ to include a % or a @ in a range.
296
297       - r : user-defined reader. Takes the next ri formatted  input  function
298       and applies it to the scanning buffer ib to read the next argument. The
299       input function ri must therefore have type  Scanning.in_channel  ->  'a
300       and the argument read has type 'a .
301
302       -  {  fmt  %}  : reads a format string argument. The format string read
303       must have the same type as the format string specification  fmt  .  For
304       instance,  "%{  %i %}" reads any format string that can read a value of
305       type int ; hence, if s is the string  "fmt:\"number  is  %u\""  ,  then
306       Scanf.sscanf  s  "fmt:  %{%i%}"  succeeds and returns the format string
307       "number is %u" .
308
309       - ( fmt %) : scanning sub-format substitution.  Reads a  format  string
310       rf in the input, then goes on scanning with rf instead of scanning with
311       fmt .  The format string rf must have  the  same  type  as  the  format
312       string  specification  fmt  that it replaces.  For instance, "%( %i %)"
313       reads any format string that can read a value of type int .   The  con‐
314       version returns the format string read rf , and then a value read using
315       rf .  Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf  s
316       "%(%i%)"  (fun  fmt i -> fmt, i) evaluates to ("%4d", 1234) .  This be‐
317       haviour is not mere format substitution, since the  conversion  returns
318       the  format string read as additional argument. If you need pure format
319       substitution, use special flag _ to discard  the  extraneous  argument:
320       conversion  %_(  fmt  %)  reads a format string rf and then behaves the
321       same as format string rf .  Hence, if s is the string  "\"%4d\"1234.00"
322       ,  then  Scanf.sscanf  s "%_(%i%)" is simply equivalent to Scanf.sscanf
323       "1234.00" "%4d" .
324
325       - l : returns the number of lines read so far.
326
327       - n : returns the number of characters read so far.
328
329       - N or L : returns the number of tokens read so far.
330
331       - !  : matches the end of input condition.
332
333       - % : matches one % character in the input.
334
335       - @ : matches one @ character in the input.
336
337       - , : does nothing.
338
339       Following the % character that introduces a conversion,  there  may  be
340       the  special  flag _ : the conversion that follows occurs as usual, but
341       the resulting value is discarded.  For instance, if f is  the  function
342       fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
343       = %i" f returns 2 .
344
345       The field width is composed of an optional integer  literal  indicating
346       the maximal width of the token to read.  For instance, %6d reads an in‐
347       teger, having at most 6 decimal digits; %4f reads a float with at  most
348       4  characters;  and %8[\000-\255] returns the next 8 characters (or all
349       the characters still available, if fewer than 8 characters  are  avail‐
350       able in the input).
351
352       Notes:
353
354
355       -as  mentioned above, a %s conversion always succeeds, even if there is
356       nothing to read in the input: in this case, it simply returns "" .
357
358
359       -in addition to the relevant digits, '_' characters may  appear  inside
360       numbers  (this  is reminiscent to the usual OCaml lexical conventions).
361       If stricter scanning is desired, use the range conversion facility  in‐
362       stead of the number conversions.
363
364
365       -the scanf facility is not intended for heavy duty lexical analysis and
366       parsing. If it appears not expressive enough for  your  needs,  several
367       alternative  exists: regular expressions (module Str ), stream parsers,
368       ocamllex -generated lexers, ocamlyacc -generated parsers.
369
370
371   Scanning indications in format strings
372       Scanning indications appear just after the string conversions %s and %[
373       range  ]  to delimit the end of the token. A scanning indication is in‐
374       troduced by a @ character, followed by some  plain  character  c  .  It
375       means  that the string token should end just before the next matching c
376       (which is skipped). If no c character is encountered, the string  token
377       spreads as much as possible. For instance, "%s@\t" reads a string up to
378       the next tab character or to the end of input. If a @ character appears
379       anywhere else in the format string, it is treated as a plain character.
380
381       Note:
382
383
384       -As  usual  in format strings, % and @ characters must be escaped using
385       %% and %@ ; this rule still holds within range specifications and scan‐
386       ning  indications.   For  instance, format "%s@%%" reads a string up to
387       the next % character, and format "%s@%@" reads a string up to the  next
388       @ .
389
390       -The scanning indications introduce slight differences in the syntax of
391       Scanf format strings, compared to those used  for  the  Printf  module.
392       However, the scanning indications are similar to those used in the For‐
393       mat module; hence, when producing  formatted  text  to  be  scanned  by
394       Scanf.bscanf  ,  it  is  wise to use printing functions from the Format
395       module (or, if you need to use functions from Printf , banish or  care‐
396       fully double check the format strings that contain '@' characters).
397
398
399   Exceptions during scanning
400       Scanners  may  raise  the following exceptions when the input cannot be
401       read according to the format string:
402
403
404       -Raise Scanf.Scan_failure if the input does not match the format.
405
406
407       -Raise Failure if a conversion to a number is not possible.
408
409
410       -Raise End_of_file if the end of input is encountered while  some  more
411       characters are needed to read the current conversion specification.
412
413
414       -Raise Invalid_argument if the format string is invalid.
415
416       Note:
417
418
419       -as  a  consequence,  scanning  a  %s conversion never raises exception
420       End_of_file : if the end of input is reached  the  conversion  succeeds
421       and  simply returns the characters read so far, or "" if none were ever
422       read.
423
424
425   Specialised formatted input functions
426       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
427
428       Same as Scanf.bscanf , but reads from the given string.
429
430
431
432       val sscanf_opt : string -> ('a, 'b, 'c, 'd) scanner_opt
433
434       Same as Scanf.sscanf , but returns None in case of scanning failure.
435
436
437       Since 5.0
438
439
440
441       val scanf : ('a, 'b, 'c, 'd) scanner
442
443       Same as Scanf.bscanf , but reads from the  predefined  formatted  input
444       channel Scanf.Scanning.stdin that is connected to stdin .
445
446
447
448       val scanf_opt : ('a, 'b, 'c, 'd) scanner_opt
449
450       Same as Scanf.scanf , but returns None in case of scanning failure.
451
452
453       Since 5.0
454
455
456
457       val  kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
458       -> ('a, 'b, 'c, 'd) scanner
459
460       Same as Scanf.bscanf , but takes an  additional  function  argument  ef
461       that  is  called in case of error: if the scanning process or some con‐
462       version fails, the scanning function aborts and calls  the  error  han‐
463       dling  function  ef  with the formatted input channel and the exception
464       that aborted the scanning process as arguments.
465
466
467
468       val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a,  'b,
469       'c, 'd) scanner
470
471       Same as Scanf.kscanf but reads from the given string.
472
473
474       Since 4.02.0
475
476
477
478
479   Reading format strings from input
480       val  bscanf_format  :  Scanning.in_channel  -> ('a, 'b, 'c, 'd, 'e, 'f)
481       format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
482
483
484       bscanf_format ic fmt f reads a format string token from  the  formatted
485       input  channel  ic , according to the given format string fmt , and ap‐
486       plies f to the resulting format string value.
487
488
489       Since 3.09.0
490
491
492       Raises Scan_failure if the format string value read does not  have  the
493       same type as fmt .
494
495
496
497       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
498       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
499
500       Same as Scanf.bscanf_format , but reads from the given string.
501
502
503       Since 3.09.0
504
505
506
507       val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6  ->
508       ('a, 'b, 'c, 'd, 'e, 'f) format6
509
510
511       format_from_string s fmt converts a string argument to a format string,
512       according to the given format string fmt .
513
514
515       Since 3.10.0
516
517
518       Raises Scan_failure if s , considered as a format string, does not have
519       the same type as fmt .
520
521
522
523       val unescaped : string -> string
524
525
526       unescaped  s return a copy of s with escape sequences (according to the
527       lexical conventions of OCaml) replaced by their  corresponding  special
528       characters.   More  precisely,  Scanf.unescaped has the following prop‐
529       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
530
531       Always return a copy of the argument, even if there is  no  escape  se‐
532       quence in the argument.
533
534
535       Since 4.00.0
536
537
538       Raises  Scan_failure  if s is not properly escaped (i.e.  s has invalid
539       escape sequences or special characters that are not properly  escaped).
540       For instance, Scanf.unescaped "\"" will fail.
541
542
543
544
545
546OCamldoc                          2023-07-20                          Scanf(3)
Impressum