1Scanf(3)                         OCaml library                        Scanf(3)
2
3
4

NAME

6       Scanf - Formatted input functions.
7

Module

9       Module   Scanf
10

Documentation

12       Module Scanf
13        : sig end
14
15
16       Formatted input functions.
17
18
19
20
21
22
23
24   Introduction
25   Functional input with format strings
26       The module Scanf provides formatted input functions or scanners.
27
28       The  formatted input functions can read from any kind of input, includ‐
29       ing strings, files, or anything that can return  characters.  The  more
30       general  source  of  characters  is named a formatted input channel (or
31       scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
32       eral  formatted  input  function  reads from any scanning buffer and is
33       named bscanf .
34
35       Generally speaking, the formatted input functions have 3 arguments:
36
37       -the first argument is a source of characters for the input,
38
39       -the second argument is a format string that specifies  the  values  to
40       read,
41
42       -the  third argument is a receiver function that is applied to the val‐
43       ues read.
44
45       Hence, a typical call to the formatted input function  Scanf.bscanf  is
46       bscanf ic fmt f , where:
47
48
49       - ic is a source of characters (typically a     formatted input channel
50       with type Scanf.Scanning.in_channel ),
51
52
53       - fmt is a format string (the same format  strings  as  those  used  to
54       print material with module Printf or Format ),
55
56
57       - f is a function that has as many arguments as the number of values to
58       read in the input according to fmt .
59
60
61   A simple example
62       As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
63       teger n from the source of characters ic and returns f n .
64
65       For instance,
66
67
68       -if  we use stdin as the source of characters ( Scanf.Scanning.stdin is
69       the predefined formatted input channel that reads from standard input),
70
71
72       -if we define the receiver f as let f x = x + 1 ,
73
74       then bscanf Scanning.stdin "%d" f reads an integer n from the  standard
75       input  and  returns  f  n (that is n + 1 ). Thus, if we evaluate bscanf
76       stdin "%d" f , and then enter 41 at the keyboard, the result we get  is
77       42 .
78
79   Formatted input as a functional feature
80       The  OCaml scanning facility is reminiscent of the corresponding C fea‐
81       ture.  However, it is also largely different,  simpler,  and  yet  more
82       powerful:  the  formatted  input functions are higher-order functionals
83       and the parameter passing mechanism is just the regular function appli‐
84       cation not the variable assignment based mechanism which is typical for
85       formatted input in imperative languages; the OCaml format strings  also
86       feature  useful  additions to easily define complex tokens; as expected
87       within a functional programming language, the formatted input functions
88       also  support  polymorphism,  in  particular arbitrary interaction with
89       polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
90       put facility is fully type-checked at compile time.
91
92   Formatted input channel
93       module Scanning : sig end
94
95
96
97
98
99
100   Type of formatted input functions
101       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
102       'd, 'd) format6 -> 'c
103
104
105       The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner  is  the
106       type of a formatted input function that reads from some formatted input
107       channel according to some format string; more  precisely,  if  scan  is
108       some formatted input function, then scan
109            ic fmt f applies f to all the arguments specified by format string
110       fmt  ,  when  scan  has  read  those  arguments  from  the  Scanf.Scan‐
111       ning.in_channel formatted input channel ic .
112
113       For  instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
114       scanner , since it is  a  formatted  input  function  that  reads  from
115       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
116       by fmt , reading those arguments from stdin as expected.
117
118       If the format fmt has some %r indications, the corresponding  formatted
119       input  functions  must be provided before receiver function f . For in‐
120       stance, if read_elem is an input function for values of type t  ,  then
121       bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
122       character, and returns f v .
123
124
125       Since 3.10.0
126
127
128
129       exception Scan_failure of string
130
131
132       When the input can not be read according to the format string  specifi‐
133       cation,  formatted input functions typically raise exception Scan_fail‐
134       ure .
135
136
137
138
139   The general formatted input function
140       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
141
142
143
144
145
146       bscanf ic fmt r1  ...  rN  f  reads  characters  from  the  Scanf.Scan‐
147       ning.in_channel  formatted input channel ic and converts them to values
148       according to format string fmt .  As a final step, receiver function  f
149       is applied to the values read and gives the result of the bscanf call.
150
151       For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
152       "x= 1" "%s = %i" f returns 2 .
153
154       Arguments r1 to rN are user-defined input functions that read the argu‐
155       ment  corresponding  to  the  %r  conversions  specified  in the format
156       string.
157
158   Format string description
159       The format string is a character string which contains three  types  of
160       objects:
161
162       -plain  characters, which are simply matched with the characters of the
163       input (with a special case for space and line feed, see Scanf.space ),
164
165       -conversion specifications, each of which causes reading and conversion
166       of one argument for the function f (see Scanf.conversion ),
167
168       -scanning  indications  to  specify  boundaries of tokens (see scanning
169       Scanf.indication ).
170
171
172   The space character in format strings
173       As mentioned above, a plain character in  the  format  string  is  just
174       matched  with  the next character of the input; however, two characters
175       are special exceptions to this rule: the space character ( ' ' or ASCII
176       code 32) and the line feed character ( '\n' or ASCII code 10).  A space
177       does not match a single space character, but any amount of 'whitespace'
178       in  the input. More precisely, a space inside the format string matches
179       any number of tab, space, line feed  and  carriage  return  characters.
180       Similarly,  a line feed character in the format string matches either a
181       single line feed or a carriage return followed by a line feed.
182
183       Matching any amount of whitespace, a space in the  format  string  also
184       matches no amount of whitespace at all; hence, the call bscanf ib
185            "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
186       input with various whitespace in it, such as Price = 1 $ , Price  =   1
187       $ , or even Price=1$ .
188
189   Conversion specifications in format strings
190       Conversion  specifications  consist  in the % character, followed by an
191       optional flag, an optional field width, and followed by one or two con‐
192       version characters.
193
194       The conversion characters and their meanings are:
195
196
197       - d : reads an optionally signed decimal integer ( 0-9 +).
198
199       -  i  : reads an optionally signed integer (usual input conventions for
200       decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal  (
201       0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
202
203       - u : reads an unsigned decimal integer.
204
205       - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
206
207       - o : reads an unsigned octal integer ( [0-7]+ ).
208
209       -  s  : reads a string argument that spreads as much as possible, until
210       the following bounding condition holds:
211
212       -a whitespace has been found (see Scanf.space ),
213
214       -a scanning indication (see scanning Scanf.indication )  has  been  en‐
215       countered,
216
217       -the end-of-input has been reached.
218
219       Hence,  this  conversion always succeeds: it returns an empty string if
220       the bounding condition holds when the scan begins.
221
222       - S : reads a delimited string argument (delimiters and special escaped
223       characters follow the lexical conventions of OCaml).
224
225       -  c  :  reads  a single character. To test the current input character
226       without reading it, specify a null field width, i.e. use  specification
227       %0c  .  Raise  Invalid_argument  ,  if the field width specification is
228       greater than 1.
229
230       - C : reads a single delimited character (delimiters  and  special  es‐
231       caped characters follow the lexical conventions of OCaml).
232
233       -  f , e , E , g , G : reads an optionally signed floating-point number
234       in decimal notation, in the style dddd.ddd
235             e/E+-dd .
236
237       - h , H : reads an optionally signed floating-point number in hexadeci‐
238       mal notation.
239
240       -  F  :  reads a floating point number according to the lexical conven‐
241       tions of OCaml (hence the decimal point is mandatory  if  the  exponent
242       part is not mentioned).
243
244       - B : reads a boolean argument ( true or false ).
245
246       -  b : reads a boolean argument (for backward compatibility; do not use
247       in new programs).
248
249       - ld , li , lu , lx , lX , lo : reads an int32 argument to  the  format
250       specified by the second letter for regular integers.
251
252       -  nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
253       mat specified by the second letter for regular integers.
254
255       - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to  the  format
256       specified by the second letter for regular integers.
257
258       -  [ range ] : reads characters that matches one of the characters men‐
259       tioned in the range of characters range (or not mentioned in it, if the
260       range  starts  with  ^ ). Reads a string that can be empty, if the next
261       input character does not match the range. The set of characters from c1
262       to  c2  (inclusively)  is  denoted  by c1-c2 .  Hence, %[0-9] returns a
263       string representing a decimal number or an empty string if  no  decimal
264       digit  is  found;  similarly, %[0-9a-f] returns a string of hexadecimal
265       digits.  If a closing bracket appears in a range, it must occur as  the
266       first  character  of  the  range  (or just after the ^ in case of range
267       negation); hence []] matches a ] character and [^]] matches any charac‐
268       ter that is not ] .  Use %% and %@ to include a % or a @ in a range.
269
270       -  r  : user-defined reader. Takes the next ri formatted input function
271       and applies it to the scanning buffer ib to read the next argument. The
272       input  function  ri  must therefore have type Scanning.in_channel -> 'a
273       and the argument read has type 'a .
274
275       - { fmt %} : reads a format string argument.  The  format  string  read
276       must  have  the  same type as the format string specification fmt . For
277       instance, "%{ %i %}" reads any format string that can read a  value  of
278       type  int  ;  hence,  if  s is the string "fmt:\"number is %u\"" , then
279       Scanf.sscanf s "fmt: %{%i%}" succeeds and  returns  the  format  string
280       "number is %u" .
281
282       -  (  fmt %) : scanning sub-format substitution.  Reads a format string
283       rf in the input, then goes on scanning with rf instead of scanning with
284       fmt  .   The  format  string  rf  must have the same type as the format
285       string specification fmt that it replaces.  For instance,  "%(  %i  %)"
286       reads  any  format string that can read a value of type int .  The con‐
287       version returns the format string read rf , and then a value read using
288       rf  .  Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
289       "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) .   This  be‐
290       haviour  is  not mere format substitution, since the conversion returns
291       the format string read as additional argument. If you need pure  format
292       substitution,  use  special  flag _ to discard the extraneous argument:
293       conversion %_( fmt %) reads a format string rf  and  then  behaves  the
294       same  as format string rf .  Hence, if s is the string "\"%4d\"1234.00"
295       , then Scanf.sscanf s "%_(%i%)" is simply  equivalent  to  Scanf.sscanf
296       "1234.00" "%4d" .
297
298       - l : returns the number of lines read so far.
299
300       - n : returns the number of characters read so far.
301
302       - N or L : returns the number of tokens read so far.
303
304       - !  : matches the end of input condition.
305
306       - % : matches one % character in the input.
307
308       - @ : matches one @ character in the input.
309
310       - , : does nothing.
311
312       Following  the  %  character that introduces a conversion, there may be
313       the special flag _ : the conversion that follows occurs as  usual,  but
314       the  resulting  value is discarded.  For instance, if f is the function
315       fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
316       = %i" f returns 2 .
317
318       The  field  width is composed of an optional integer literal indicating
319       the maximal width of the token to read.  For instance, %6d reads an in‐
320       teger,  having at most 6 decimal digits; %4f reads a float with at most
321       4 characters; and %8[\000-\255] returns the next 8 characters  (or  all
322       the  characters  still available, if fewer than 8 characters are avail‐
323       able in the input).
324
325       Notes:
326
327
328       -as mentioned above, a %s conversion always succeeds, even if there  is
329       nothing to read in the input: in this case, it simply returns "" .
330
331
332       -in  addition  to the relevant digits, '_' characters may appear inside
333       numbers (this is reminiscent to the usual OCaml  lexical  conventions).
334       If  stricter scanning is desired, use the range conversion facility in‐
335       stead of the number conversions.
336
337
338       -the scanf facility is not intended for heavy duty lexical analysis and
339       parsing.  If  it  appears not expressive enough for your needs, several
340       alternative exists: regular expressions (module Str ), stream  parsers,
341       ocamllex -generated lexers, ocamlyacc -generated parsers.
342
343
344   Scanning indications in format strings
345       Scanning indications appear just after the string conversions %s and %[
346       range ] to delimit the end of the token. A scanning indication  is  in‐
347       troduced  by  a  @  character,  followed by some plain character c . It
348       means that the string token should end just before the next matching  c
349       (which  is skipped). If no c character is encountered, the string token
350       spreads as much as possible. For instance, "%s@\t" reads a string up to
351       the next tab character or to the end of input. If a @ character appears
352       anywhere else in the format string, it is treated as a plain character.
353
354       Note:
355
356
357       -As usual in format strings, % and @ characters must be  escaped  using
358       %% and %@ ; this rule still holds within range specifications and scan‐
359       ning indications.  For instance, format "%s@%%" reads a  string  up  to
360       the  next % character, and format "%s@%@" reads a string up to the next
361       @ .
362
363       -The scanning indications introduce slight differences in the syntax of
364       Scanf  format  strings,  compared  to those used for the Printf module.
365       However, the scanning indications are similar to those used in the For‐
366       mat  module;  hence,  when  producing  formatted  text to be scanned by
367       Scanf.bscanf , it is wise to use printing  functions  from  the  Format
368       module  (or, if you need to use functions from Printf , banish or care‐
369       fully double check the format strings that contain '@' characters).
370
371
372   Exceptions during scanning
373       Scanners may raise the following exceptions when the  input  cannot  be
374       read according to the format string:
375
376
377       -Raise Scanf.Scan_failure if the input does not match the format.
378
379
380       -Raise Failure if a conversion to a number is not possible.
381
382
383       -Raise  End_of_file  if the end of input is encountered while some more
384       characters are needed to read the current conversion specification.
385
386
387       -Raise Invalid_argument if the format string is invalid.
388
389       Note:
390
391
392       -as a consequence, scanning a  %s  conversion  never  raises  exception
393       End_of_file  :  if  the end of input is reached the conversion succeeds
394       and simply returns the characters read so far, or "" if none were  ever
395       read.
396
397
398   Specialised formatted input functions
399       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
400
401       Same as Scanf.bscanf , but reads from the given string.
402
403
404
405       val scanf : ('a, 'b, 'c, 'd) scanner
406
407       Same  as  Scanf.bscanf  , but reads from the predefined formatted input
408       channel Scanf.Scanning.stdin that is connected to stdin .
409
410
411
412       val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn ->  'd)
413       -> ('a, 'b, 'c, 'd) scanner
414
415       Same  as  Scanf.bscanf  ,  but takes an additional function argument ef
416       that is called in case of error: if the scanning process or  some  con‐
417       version  fails,  the  scanning function aborts and calls the error han‐
418       dling function ef with the formatted input channel  and  the  exception
419       that aborted the scanning process as arguments.
420
421
422
423       val  ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
424       'c, 'd) scanner
425
426       Same as Scanf.kscanf but reads from the given string.
427
428
429       Since 4.02.0
430
431
432
433
434   Reading format strings from input
435       val bscanf_format : Scanning.in_channel -> ('a, 'b,  'c,  'd,  'e,  'f)
436       format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
437
438
439       bscanf_format  ic  fmt f reads a format string token from the formatted
440       input channel ic , according to the given format string fmt ,  and  ap‐
441       plies f to the resulting format string value.
442
443
444       Since 3.09.0
445
446
447       Raises  Scan_failure  if the format string value read does not have the
448       same type as fmt .
449
450
451
452       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
453       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
454
455       Same as Scanf.bscanf_format , but reads from the given string.
456
457
458       Since 3.09.0
459
460
461
462       val  format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
463       ('a, 'b, 'c, 'd, 'e, 'f) format6
464
465
466       format_from_string s fmt converts a string argument to a format string,
467       according to the given format string fmt .
468
469
470       Since 3.10.0
471
472
473       Raises Scan_failure if s , considered as a format string, does not have
474       the same type as fmt .
475
476
477
478       val unescaped : string -> string
479
480
481       unescaped s return a copy of s with escape sequences (according to  the
482       lexical  conventions  of OCaml) replaced by their corresponding special
483       characters.  More precisely, Scanf.unescaped has  the  following  prop‐
484       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
485
486       Always  return  a  copy of the argument, even if there is no escape se‐
487       quence in the argument.
488
489
490       Since 4.00.0
491
492
493       Raises Scan_failure if s is not properly escaped (i.e.  s  has  invalid
494       escape  sequences or special characters that are not properly escaped).
495       For instance, Scanf.unescaped "\"" will fail.
496
497
498
499
500   Deprecated
501       val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
502
503       Deprecated.
504
505       Scanf.fscanf is error prone and deprecated since 4.03.0.
506
507       This function violates the following invariant of the Scanf module:  To
508       preserve  scanning  semantics,  all scanning functions defined in Scanf
509       must read from a user defined Scanf.Scanning.in_channel formatted input
510       channel.
511
512       If  you need to read from a in_channel input channel ic , simply define
513       a Scanf.Scanning.in_channel formatted input channel  as  in  let  ib  =
514       Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
515
516
517
518       val  kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
519       'b, 'c, 'd) scanner
520
521       Deprecated.
522
523       Scanf.kfscanf is error prone and deprecated since 4.03.0.
524
525
526
527
528
529OCamldoc                          2021-07-22                          Scanf(3)
Impressum