1FASTAVRO(1)                        fastavro                        FASTAVRO(1)
2
3
4

NAME

6       fastavro - fastavro Documentation
7
8       The current Python avro package is dog slow.
9
10       On  a  test  case of about 10K records, it takes about 14sec to iterate
11       over all of them. In comparison the JAVA avro  SDK  does  it  in  about
12       1.9sec.
13
14       fastavro is an alternative implementation that is much faster. It iter‐
15       ates over the same 10K records in 2.9sec, and if you use it  with  PyPy
16       it’ll  do  it  in  1.5sec (to be fair, the JAVA benchmark is doing some
17       extra JSON encoding/decoding).
18
19       If the optional C extension (generated by Cython)  is  available,  then
20       fastavro  will  be  even  faster. For the same 10K records it’ll run in
21       about 1.7sec.
22

SUPPORTED FEATURES

24       · File Writer
25
26       · File Reader (iterating via records or blocks)
27
28       · Schemaless Writer
29
30       · Schemaless Reader
31
32       · Snappy and Deflate codecs
33
34       · Schema resolution
35
36       · Aliases
37
38       · Logical Types
39

MISSING FEATURES

41       · Anything involving Avro’s RPC features
42
43       · Parsing schemas into the canonical form
44
45       · Schema fingerprinting
46

EXAMPLE

48          from fastavro import writer, reader, parse_schema
49
50          schema = {
51              'doc': 'A weather reading.',
52              'name': 'Weather',
53              'namespace': 'test',
54              'type': 'record',
55              'fields': [
56                  {'name': 'station', 'type': 'string'},
57                  {'name': 'time', 'type': 'long'},
58                  {'name': 'temp', 'type': 'int'},
59              ],
60          }
61          parsed_schema = parse_schema(schema)
62
63          # 'records' can be an iterable (including generator)
64          records = [
65              {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
66              {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
67              {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
68              {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
69          ]
70
71          # Writing
72          with open('weather.avro', 'wb') as out:
73              writer(out, parsed_schema, records)
74
75          # Reading
76          with open('weather.avro', 'rb') as fo:
77              for record in reader(fo):
78                  print(record)
79

DOCUMENTATION

81   fastavro.read
82       class reader(fo, reader_schema=None)
83              Iterator over records in an avro file.
84
85              Parameters
86
87                     · fo (file-like) – Input stream
88
89                     · reader_schema (dict, optional) – Reader schema
90
91              Example:
92
93                 from fastavro import reader
94                 with open('some-file.avro', 'rb') as fo:
95                     avro_reader = reader(fo)
96                     for record in avro_reader:
97                         process_record(record)
98
99              metadata
100                     Key-value pairs in the header metadata
101
102              codec  The codec used when writing
103
104              writer_schema
105                     The schema used when writing
106
107              reader_schema
108                     The schema used when reading (if provided)
109
110       class block_reader(fo, reader_schema=None)
111              Iterator over Block in an avro file.
112
113              Parameters
114
115                     · fo (file-like) – Input stream
116
117                     · reader_schema (dict, optional) – Reader schema
118
119              Example:
120
121                 from fastavro import block_reader
122                 with open('some-file.avro', 'rb') as fo:
123                     avro_reader = block_reader(fo)
124                     for block in avro_reader:
125                         process_block(block)
126
127              metadata
128                     Key-value pairs in the header metadata
129
130              codec  The codec used when writing
131
132              writer_schema
133                     The schema used when writing
134
135              reader_schema
136                     The schema used when reading (if provided)
137
138       class Block(bytes_, num_records, codec,  reader_schema,  writer_schema,
139       offset, size)
140              An avro block. Will yield records when iterated over
141
142              num_records
143                     Number of records in the block
144
145              writer_schema
146                     The schema used when writing
147
148              reader_schema
149                     The schema used when reading (if provided)
150
151              offset Offset of the block from the begining of the avro file
152
153              size   Size of the block in bytes
154
155       schemaless_reader(fo, writer_schema, reader_schema=None)
156              Reads a single record writen using the schemaless_writer()
157
158              Parameters
159
160                     · fo (file-like) – Input stream
161
162                     · writer_schema (dict) – Schema used when calling schema‐
163                       less_writer
164
165                     · reader_schema (dict, optional)  –  If  the  schema  has
166                       changed  since being written then the new schema can be
167                       given to allow for schema migration
168
169              Example:
170
171                 parsed_schema = fastavro.parse_schema(schema)
172                 with open('file.avro', 'rb') as fp:
173                     record = fastavro.schemaless_reader(fp, parsed_schema)
174
175              Note: The schemaless_reader can only read a single record.
176
177       is_avro(path_or_buffer)
178              Return True if path (or buffer) points to an Avro file.
179
180              Parameters
181                     path_or_buffer (path to file or file-like object) –  Path
182                     to file
183
184   fastavro.write
185       writer(fo,  schema,  records,  codec='null', sync_interval=16000, meta‐
186       data=None, validator=None, sync_marker=None)
187              Write records to fo (stream) according to schema
188
189              Parameters
190
191                     · fo (file-like) – Output stream
192
193                     · records (iterable) – Records to write. This is commonly
194                       a list of the dictionary representation of the records,
195                       but it can be any iterable
196
197                     · codec (string, optional) – Compression  codec,  can  be
198                       ‘null’, ‘deflate’ or ‘snappy’ (if installed)
199
200                     · sync_interval (int, optional) – Size of sync interval
201
202                     · metadata (dict, optional) – Header metadata
203
204                     · validator  (None, True or a function) – Validator func‐
205                       tion. If None (the default) - no  validation.  If  True
206                       then then fastavro.validation.validate will be used. If
207                       it’s a function, it should have the same  signature  as
208                       fastavro.writer.validate   and  raise  an  exeption  on
209                       error.
210
211                     · sync_marker (bytes, optional) – A byte string  used  as
212                       the  avro  sync  marker. If not provided, a random byte
213                       string will be used.
214
215              Example:
216
217                 from fastavro import writer, parse_schema
218
219                 schema = {
220                     'doc': 'A weather reading.',
221                     'name': 'Weather',
222                     'namespace': 'test',
223                     'type': 'record',
224                     'fields': [
225                         {'name': 'station', 'type': 'string'},
226                         {'name': 'time', 'type': 'long'},
227                         {'name': 'temp', 'type': 'int'},
228                     ],
229                 }
230                 parsed_schema = parse_schema(schema)
231
232                 records = [
233                     {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
234                     {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
235                     {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
236                     {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
237                 ]
238
239                 with open('weather.avro', 'wb') as out:
240                     writer(out, parsed_schema, records)
241
242              Given an existing avro file, it’s possible to append  to  it  by
243              re-opening  the  file in a+b mode. If the file is only opened in
244              ab mode, we aren’t able to read  some  of  the  existing  header
245              information and an error will be raised. For example:
246
247                 # Write initial records
248                 with open('weather.avro', 'wb') as out:
249                     writer(out, parsed_schema, records)
250
251                 # Write some more records
252                 with open('weather.avro', 'a+b') as out:
253                     writer(out, parsed_schema, more_records)
254
255       schemaless_writer(fo, schema, record)
256              Write a single record without the schema or header information
257
258              Parameters
259
260                     · fo (file-like) – Output file
261
262                     · schema (dict) – Schema
263
264                     · record (dict) – Record to write
265
266              Example:
267
268                 parsed_schema = fastavro.parse_schema(schema)
269                 with open('file.avro', 'rb') as fp:
270                     fastavro.schemaless_writer(fp, parsed_schema, record)
271
272              Note: The schemaless_writer can only write a single record.
273
274   fastavro.schema
275       parse_schema(schema, _write_hint=True, _force=False)
276              Returns a parsed avro schema
277
278              It is not necessary to call parse_schema but doing so and saving
279              the parsed schema for use  later  will  make  future  operations
280              faster as the schema will not need to be reparsed.
281
282              Parameters
283
284                     · schema (dict) – Input schema
285
286                     · _write_hint  (bool)  – Internal API argument specifying
287                       whether or not the __fastavro_parsed marker  should  be
288                       added to the schema
289
290                     · _force  (bool)  –  Internal  API argument. If True, the
291                       schema will always be parsed even if it has been parsed
292                       and has the __fastavro_parsed marker
293
294              Example:
295
296                 from fastavro import parse_schema
297                 from fastavro import writer
298
299                 parsed_schema = parse_schema(original_schema)
300                 with open('weather.avro', 'wb') as out:
301                     writer(out, parsed_schema, records)
302
303   fastavro.validation
304       validate(datum, schema, field=None, raise_errors=True)
305              Determine if a python datum is an instance of a schema.
306
307              Parameters
308
309                     · datum (Any) – Data being validated
310
311                     · schema (dict) – Schema
312
313                     · field (str, optional) – Record field being validated
314
315                     · raise_errors  (bool,  optional)  –  If true, errors are
316                       raised for  invalid  data.  If  false,  a  simple  True
317                       (valid) or False (invalid) result is returned
318
319              Example:
320
321                 from fastavro.validation import validate
322                 schema = {...}
323                 record = {...}
324                 validate(record, schema)
325
326       validate_many(records, schema, raise_errors=True)
327              Validate a list of data!
328
329              Parameters
330
331                     · records (iterable) – List of records to validate
332
333                     · schema (dict) – Schema
334
335                     · raise_errors  (bool,  optional)  –  If true, errors are
336                       raised for  invalid  data.  If  false,  a  simple  True
337                       (valid) or False (invalid) result is returned
338
339              Example:
340
341                 from fastavro.validation import validate_many
342                 schema = {...}
343                 records = [{...}, {...}, ...]
344                 validate_many(records, schema)
345
346   fastavro command line script
347       A command line script is installed with the library that can be used to
348       dump the contents of avro file(s) to the standard output.
349
350       Usage:
351
352          usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]
353
354          iter over avro file, emit records as JSON
355
356          positional arguments:
357            file          file(s) to parse
358
359          optional arguments:
360            -h, --help    show this help message and exit
361            --schema      dump schema instead of records
362            --codecs      print supported codecs
363            --version     show program's version number and exit
364            -p, --pretty  pretty print json
365
366   Examples
367       Read an avro file:
368
369          $ fastavro weather.avro
370
371          {"temp": 0, "station": "011990-99999", "time": -619524000000}
372          {"temp": 22, "station": "011990-99999", "time": -619506000000}
373          {"temp": -11, "station": "011990-99999", "time": -619484400000}
374          {"temp": 111, "station": "012650-99999", "time": -655531200000}
375          {"temp": 78, "station": "012650-99999", "time": -655509600000}
376
377       Show the schema:
378
379          $ fastavro --schema weather.avro
380
381          {
382           "type": "record",
383           "namespace": "test",
384           "doc": "A weather reading.",
385           "fields": [
386            {
387             "type": "string",
388             "name": "station"
389            },
390            {
391             "type": "long",
392             "name": "time"
393            },
394            {
395             "type": "int",
396             "name": "temp"
397            }
398           ],
399           "name": "Weather"
400          }
401
402       · genindex
403
404       · modindex
405
406       · search
407

AUTHOR

409       Miki Tebeka
410
412       2012, Miki Tebeka
413
414
415
416
4170.21.24                          Jun 01, 2019                      FASTAVRO(1)
Impressum