1Scanf(3)                         OCaml library                        Scanf(3)
2
3
4

NAME

6       Scanf - Formatted input functions.
7

Module

9       Module   Scanf
10

Documentation

12       Module Scanf
13        : sig end
14
15
16       Formatted input functions.
17
18
19
20
21
22
23
24   Introduction
25   Functional input with format strings
26       The module Scanf provides formatted input functions or scanners.
27
28       The  formatted input functions can read from any kind of input, includ‐
29       ing strings, files, or anything that can return  characters.  The  more
30       general  source  of  characters  is named a formatted input channel (or
31       scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
32       eral  formatted  input  function  reads from any scanning buffer and is
33       named bscanf .
34
35       Generally speaking, the formatted input functions have 3 arguments:
36
37       -the first argument is a source of characters for the input,
38
39       -the second argument is a format string that specifies  the  values  to
40       read,
41
42       -the  third argument is a receiver function that is applied to the val‐
43       ues read.
44
45       Hence, a typical call to the formatted input function  Scanf.bscanf  is
46       bscanf ic fmt f , where:
47
48
49       - ic is a source of characters (typically a     formatted input channel
50       with type Scanf.Scanning.in_channel ),
51
52
53       - fmt is a format string (the same format  strings  as  those  used  to
54       print material with module Printf or Format ),
55
56
57       - f is a function that has as many arguments as the number of values to
58       read in the input according to fmt .
59
60
61   A simple example
62       As suggested above, the expression bscanf ic %d f reads a decimal inte‐
63       ger n from the source of characters ic and returns f n .
64
65       For instance,
66
67
68       -if  we use stdin as the source of characters ( Scanf.Scanning.stdin is
69       the predefined formatted input channel that reads from standard input),
70
71
72       -if we define the receiver f as let f x = x + 1 ,
73
74       then bscanf Scanning.stdin %d f reads an integer n  from  the  standard
75       input  and  returns  f  n (that is n + 1 ). Thus, if we evaluate bscanf
76       stdin %d f , and then enter 41 at the keyboard, the result we get is 42
77       .
78
79   Formatted input as a functional feature
80       The  OCaml scanning facility is reminiscent of the corresponding C fea‐
81       ture.  However, it is also largely different,  simpler,  and  yet  more
82       powerful:  the  formatted  input functions are higher-order functionals
83       and the parameter passing mechanism is just the regular function appli‐
84       cation not the variable assignment based mechanism which is typical for
85       formatted input in imperative languages; the OCaml format strings  also
86       feature  useful  additions to easily define complex tokens; as expected
87       within a functional programming language, the formatted input functions
88       also  support  polymorphism,  in  particular arbitrary interaction with
89       polymorphic user-defined scanners.  Furthermore,  the  OCaml  formatted
90       input facility is fully type-checked at compile time.
91
92   Formatted input channel
93       module Scanning : sig end
94
95
96
97
98
99
100   Type of formatted input functions
101       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
102       'd, 'd) format6 -> 'c
103
104
105       The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner  is  the
106       type of a formatted input function that reads from some formatted input
107       channel according to some format string; more  precisely,  if  scan  is
108       some  formatted input function, then scan ic fmt f applies f to all the
109       arguments specified by format string fmt , when  scan  has  read  those
110       arguments from the Scanf.Scanning.in_channel formatted input channel ic
111       .
112
113       For instance, the Scanf.scanf function below has type ('a, 'b, 'c,  'd)
114       scanner  ,  since  it  is  a  formatted  input function that reads from
115       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
116       by fmt , reading those arguments from stdin as expected.
117
118       If  the format fmt has some %r indications, the corresponding formatted
119       input functions must be provided  before  receiver  function  f  .  For
120       instance, if read_elem is an input function for values of type t , then
121       bscanf ic %r; read_elem f reads a value v of type t followed by  a  ';'
122       character, and returns f v .
123
124
125       Since 3.10.0
126
127
128
129       exception Scan_failure of string
130
131
132       When  the input can not be read according to the format string specifi‐
133       cation, formatted input functions typically raise exception  Scan_fail‐
134       ure .
135
136
137
138
139   The general formatted input function
140       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
141
142
143
144
145
146       bscanf  ic  fmt  r1  ...  rN  f  reads  characters from the Scanf.Scan‐
147       ning.in_channel formatted input channel ic and converts them to  values
148       according  to format string fmt .  As a final step, receiver function f
149       is applied to the values read and gives the result of the bscanf call.
150
151       For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
152       x= 1 %s = %i f returns 2 .
153
154       Arguments r1 to rN are user-defined input functions that read the argu‐
155       ment corresponding to  the  %r  conversions  specified  in  the  format
156       string.
157
158   Format string description
159       The  format  string is a character string which contains three types of
160       objects:
161
162       -plain characters, which are simply matched with the characters of  the
163       input (with a special case for space and line feed, see Scanf.space ),
164
165       -conversion specifications, each of which causes reading and conversion
166       of one argument for the function f (see Scanf.conversion ),
167
168       -scanning indications to specify boundaries  of  tokens  (see  scanning
169       Scanf.indication ).
170
171
172   The space character in format strings
173       As  mentioned  above,  a  plain  character in the format string is just
174       matched with the next character of the input; however,  two  characters
175       are special exceptions to this rule: the space character ( ' ' or ASCII
176       code 32) and the line feed character ( '\n' or ASCII code 10).  A space
177       does not match a single space character, but any amount of 'whitespace'
178       in the input. More precisely, a space inside the format string  matches
179       any  number  of  tab,  space, line feed and carriage return characters.
180       Similarly, a line feed character in the format string matches either  a
181       single line feed or a carriage return followed by a line feed.
182
183       Matching  any  amount  of whitespace, a space in the format string also
184       matches no amount of whitespace at all; hence, the call bscanf ib Price
185       =  %d  $ (fun p -> p) succeeds and returns 1 when reading an input with
186       various whitespace in it, such as Price = 1 $ , Price = 1 $ ,  or  even
187       Price=1$ .
188
189   Conversion specifications in format strings
190       Conversion  specifications  consist  in the % character, followed by an
191       optional flag, an optional field width, and followed by one or two con‐
192       version characters.
193
194       The conversion characters and their meanings are:
195
196
197       - d : reads an optionally signed decimal integer ( 0-9 +).
198
199       -  i  : reads an optionally signed integer (usual input conventions for
200       decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal  (
201       0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
202
203       - u : reads an unsigned decimal integer.
204
205       - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
206
207       - o : reads an unsigned octal integer ( [0-7]+ ).
208
209       -  s  : reads a string argument that spreads as much as possible, until
210       the following bounding condition holds:
211
212       -a whitespace has been found (see Scanf.space ),
213
214       -a scanning  indication  (see  scanning  Scanf.indication  )  has  been
215       encountered,
216
217       -the end-of-input has been reached.
218
219       Hence,  this  conversion always succeeds: it returns an empty string if
220       the bounding condition holds when the scan begins.
221
222       - S : reads a delimited string argument (delimiters and special escaped
223       characters follow the lexical conventions of OCaml).
224
225       -  c  :  reads  a single character. To test the current input character
226       without reading it, specify a null field width, i.e. use  specification
227       %0c  .  Raise  Invalid_argument  ,  if the field width specification is
228       greater than 1.
229
230       - C : reads  a  single  delimited  character  (delimiters  and  special
231       escaped characters follow the lexical conventions of OCaml).
232
233       -  f , e , E , g , G : reads an optionally signed floating-point number
234       in decimal notation, in the style dddd.ddd e/E+-dd .
235
236       - h , H : reads an optionally signed floating-point number in hexadeci‐
237       mal notation.
238
239       -  F  :  reads a floating point number according to the lexical conven‐
240       tions of OCaml (hence the decimal point is mandatory  if  the  exponent
241       part is not mentioned).
242
243       - B : reads a boolean argument ( true or false ).
244
245       -  b : reads a boolean argument (for backward compatibility; do not use
246       in new programs).
247
248       - ld , li , lu , lx , lX , lo : reads an int32 argument to  the  format
249       specified by the second letter for regular integers.
250
251       -  nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
252       mat specified by the second letter for regular integers.
253
254       - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to  the  format
255       specified by the second letter for regular integers.
256
257       -  [ range ] : reads characters that matches one of the characters men‐
258       tioned in the range of characters range (or not mentioned in it, if the
259       range  starts  with  ^ ). Reads a string that can be empty, if the next
260       input character does not match the range. The set of characters from c1
261       to  c2  (inclusively)  is  denoted  by c1-c2 .  Hence, %[0-9] returns a
262       string representing a decimal number or an empty string if  no  decimal
263       digit  is  found;  similarly, %[0-9a-f] returns a string of hexadecimal
264       digits.  If a closing bracket appears in a range, it must occur as  the
265       first  character  of  the  range  (or just after the ^ in case of range
266       negation); hence []] matches a ] character and [^]] matches any charac‐
267       ter that is not ] .  Use %% and %@ to include a % or a @ in a range.
268
269       -  r  : user-defined reader. Takes the next ri formatted input function
270       and applies it to the scanning buffer ib to read the next argument. The
271       input  function  ri  must therefore have type Scanning.in_channel -> 'a
272       and the argument read has type 'a .
273
274       - { fmt %} : reads a format string argument.  The  format  string  read
275       must  have  the  same type as the format string specification fmt . For
276       instance, %{ %i %} reads any format string that can  read  a  value  of
277       type  int  ;  hence,  if  s  is the string fmt:\ number is %u\"" , then
278       Scanf.sscanf s fmt: %{%i%} succeeds and returns the format string  num‐
279       ber is %u .
280
281       -  (  fmt %) : scanning sub-format substitution.  Reads a format string
282       rf in the input, then goes on scanning with rf instead of scanning with
283       fmt  .   The  format  string  rf  must have the same type as the format
284       string specification fmt that it replaces.   For  instance,  %(  %i  %)
285       reads  any  format string that can read a value of type int .  The con‐
286       version returns the format string read rf , and then a value read using
287       rf  .   Hence, if s is the string \ %4d\"1234.00" , then Scanf.sscanf s
288       %(%i%) (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) .  This  behav‐
289       iour  is not mere format substitution, since the conversion returns the
290       format string read as additional argument. If you need pure format sub‐
291       stitution,  use special flag _ to discard the extraneous argument: con‐
292       version %_( fmt %) reads a format string rf and then behaves  the  same
293       as format string rf .  Hence, if s is the string \ %4d\"1234.00" , then
294       Scanf.sscanf s %_(%i%) is simply equivalent to Scanf.sscanf 1234.00 %4d
295       .
296
297       - l : returns the number of lines read so far.
298
299       - n : returns the number of characters read so far.
300
301       - N or L : returns the number of tokens read so far.
302
303       - !  : matches the end of input condition.
304
305       - % : matches one % character in the input.
306
307       - @ : matches one @ character in the input.
308
309       - , : does nothing.
310
311       Following  the  %  character that introduces a conversion, there may be
312       the special flag _ : the conversion that follows occurs as  usual,  but
313       the  resulting  value is discarded.  For instance, if f is the function
314       fun i -> i + 1 , and s is the string x = 1 , then Scanf.sscanf s %_s  =
315       %i f returns 2 .
316
317       The  field  width is composed of an optional integer literal indicating
318       the maximal width of the token to read.  For  instance,  %6d  reads  an
319       integer,  having  at  most  6 decimal digits; %4f reads a float with at
320       most 4 characters; and %8[\000-\255] returns the next 8 characters  (or
321       all  the  characters  still  available,  if fewer than 8 characters are
322       available in the input).
323
324       Notes:
325
326
327       -as mentioned above, a %s conversion always succeeds, even if there  is
328       nothing to read in the input: in this case, it simply returns  .
329
330
331       -in  addition  to the relevant digits, '_' characters may appear inside
332       numbers (this is reminiscent to the usual OCaml  lexical  conventions).
333       If  stricter  scanning  is  desired,  use the range conversion facility
334       instead of the number conversions.
335
336
337       -the scanf facility is not intended for heavy duty lexical analysis and
338       parsing.  If  it  appears not expressive enough for your needs, several
339       alternative exists: regular expressions (module Str ), stream  parsers,
340       ocamllex -generated lexers, ocamlyacc -generated parsers.
341
342
343   Scanning indications in format strings
344       Scanning indications appear just after the string conversions %s and %[
345       range ] to delimit the end of  the  token.  A  scanning  indication  is
346       introduced  by  a  @ character, followed by some plain character c . It
347       means that the string token should end just before the next matching  c
348       (which  is skipped). If no c character is encountered, the string token
349       spreads as much as possible. For instance, %s@\t reads a string  up  to
350       the next tab character or to the end of input. If a @ character appears
351       anywhere else in the format string, it is treated as a plain character.
352
353       Note:
354
355
356       -As usual in format strings, % and @ characters must be  escaped  using
357       %% and %@ ; this rule still holds within range specifications and scan‐
358       ning indications.  For instance, format %s@%% reads a string up to  the
359       next % character, and format %s@%@ reads a string up to the next @ .
360
361       -The scanning indications introduce slight differences in the syntax of
362       Scanf format strings, compared to those used  for  the  Printf  module.
363       However, the scanning indications are similar to those used in the For‐
364       mat module; hence, when producing  formatted  text  to  be  scanned  by
365       Scanf.bscanf  ,  it  is  wise to use printing functions from the Format
366       module (or, if you need to use functions from Printf , banish or  care‐
367       fully double check the format strings that contain '@' characters).
368
369
370   Exceptions during scanning
371       Scanners  may  raise  the following exceptions when the input cannot be
372       read according to the format string:
373
374
375       -Raise Scanf.Scan_failure if the input does not match the format.
376
377
378       -Raise Failure if a conversion to a number is not possible.
379
380
381       -Raise End_of_file if the end of input is encountered while  some  more
382       characters are needed to read the current conversion specification.
383
384
385       -Raise Invalid_argument if the format string is invalid.
386
387       Note:
388
389
390       -as  a  consequence,  scanning  a  %s conversion never raises exception
391       End_of_file : if the end of input is reached  the  conversion  succeeds
392       and  simply  returns  the characters read so far, or  if none were ever
393       read.
394
395
396   Specialised formatted input functions
397       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
398
399       Same as Scanf.bscanf , but reads from the given string.
400
401
402
403       val scanf : ('a, 'b, 'c, 'd) scanner
404
405       Same as Scanf.bscanf , but reads from the  predefined  formatted  input
406       channel Scanf.Scanning.stdin that is connected to stdin .
407
408
409
410       val  kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn -> 'd)
411       -> ('a, 'b, 'c, 'd) scanner
412
413       Same as Scanf.bscanf , but takes an  additional  function  argument  ef
414       that  is  called in case of error: if the scanning process or some con‐
415       version fails, the scanning function aborts and calls  the  error  han‐
416       dling  function  ef  with the formatted input channel and the exception
417       that aborted the scanning process as arguments.
418
419
420
421       val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a,  'b,
422       'c, 'd) scanner
423
424       Same as Scanf.kscanf but reads from the given string.
425
426
427       Since 4.02.0
428
429
430
431
432   Reading format strings from input
433       val  bscanf_format  :  Scanning.in_channel  -> ('a, 'b, 'c, 'd, 'e, 'f)
434       format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
435
436
437       bscanf_format ic fmt f reads a format string token from  the  formatted
438       input  channel  ic  ,  according  to  the given format string fmt , and
439       applies f to the resulting format string value.  Raise Scanf.Scan_fail‐
440       ure  if the format string value read does not have the same type as fmt
441       .
442
443
444       Since 3.09.0
445
446
447
448       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
449       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
450
451       Same as Scanf.bscanf_format , but reads from the given string.
452
453
454       Since 3.09.0
455
456
457
458       val  format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
459       ('a, 'b, 'c, 'd, 'e, 'f) format6
460
461
462       format_from_string s fmt converts a string argument to a format string,
463       according to the given format string fmt .  Raise Scanf.Scan_failure if
464       s , considered as a format string, does not have the same type as fmt .
465
466
467       Since 3.10.0
468
469
470
471       val unescaped : string -> string
472
473
474       unescaped s return a copy of s with escape sequences (according to  the
475       lexical  conventions  of OCaml) replaced by their corresponding special
476       characters.  More precisely, Scanf.unescaped has  the  following  prop‐
477       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
478
479       Always  return  a  copy  of  the  argument,  even if there is no escape
480       sequence in the argument.  Raise Scanf.Scan_failure if s is  not  prop‐
481       erly  escaped  (i.e.  s has invalid escape sequences or special charac‐
482       ters that are not properly escaped).  For instance, Scanf.unescaped  \"
483       will fail.
484
485
486       Since 4.00.0
487
488
489
490
491   Deprecated
492       val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
493
494       Deprecated.
495
496       Scanf.fscanf is error prone and deprecated since 4.03.0.
497
498       This  function violates the following invariant of the Scanf module: To
499       preserve scanning semantics, all scanning functions  defined  in  Scanf
500       must read from a user defined Scanf.Scanning.in_channel formatted input
501       channel.
502
503       If you need to read from a in_channel input channel ic , simply  define
504       a  Scanf.Scanning.in_channel  formatted  input  channel  as in let ib =
505       Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
506
507
508
509       val kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) ->  ('a,
510       'b, 'c, 'd) scanner
511
512       Deprecated.
513
514       Scanf.kfscanf is error prone and deprecated since 4.03.0.
515
516
517
518
519
520OCamldoc                          2019-07-30                          Scanf(3)
Impressum