1FASTAVRO(1) fastavro FASTAVRO(1)
2
3
4
6 fastavro - fastavro Documentation
7
8 The current Python avro package is dog slow.
9
10 On a test case of about 10K records, it takes about 14sec to iterate
11 over all of them. In comparison the JAVA avro SDK does it in about
12 1.9sec.
13
14 fastavro is an alternative implementation that is much faster. It iter‐
15 ates over the same 10K records in 2.9sec, and if you use it with PyPy
16 it’ll do it in 1.5sec (to be fair, the JAVA benchmark is doing some
17 extra JSON encoding/decoding).
18
19 If the optional C extension (generated by Cython) is available, then
20 fastavro will be even faster. For the same 10K records it’ll run in
21 about 1.7sec.
22
24 · File Writer
25
26 · File Reader (iterating via records or blocks)
27
28 · Schemaless Writer
29
30 · Schemaless Reader
31
32 · Snappy and Deflate codecs
33
34 · Schema resolution
35
36 · Aliases
37
38 · Logical Types
39
41 · Anything involving Avro’s RPC features
42
43 · Parsing schemas into the canonical form
44
45 · Schema fingerprinting
46
48 from fastavro import writer, reader, parse_schema
49
50 schema = {
51 'doc': 'A weather reading.',
52 'name': 'Weather',
53 'namespace': 'test',
54 'type': 'record',
55 'fields': [
56 {'name': 'station', 'type': 'string'},
57 {'name': 'time', 'type': 'long'},
58 {'name': 'temp', 'type': 'int'},
59 ],
60 }
61 parsed_schema = parse_schema(schema)
62
63 # 'records' can be an iterable (including generator)
64 records = [
65 {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
66 {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
67 {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
68 {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
69 ]
70
71 # Writing
72 with open('weather.avro', 'wb') as out:
73 writer(out, parsed_schema, records)
74
75 # Reading
76 with open('weather.avro', 'rb') as fo:
77 for record in reader(fo):
78 print(record)
79
81 fastavro.read
82 class reader(fo, reader_schema=None)
83 Iterator over records in an avro file.
84
85 Parameters
86
87 · fo (file-like) – Input stream
88
89 · reader_schema (dict, optional) – Reader schema
90
91 Example:
92
93 from fastavro import reader
94 with open('some-file.avro', 'rb') as fo:
95 avro_reader = reader(fo)
96 for record in avro_reader:
97 process_record(record)
98
99 metadata
100 Key-value pairs in the header metadata
101
102 codec The codec used when writing
103
104 writer_schema
105 The schema used when writing
106
107 reader_schema
108 The schema used when reading (if provided)
109
110 class block_reader(fo, reader_schema=None)
111 Iterator over Block in an avro file.
112
113 Parameters
114
115 · fo (file-like) – Input stream
116
117 · reader_schema (dict, optional) – Reader schema
118
119 Example:
120
121 from fastavro import block_reader
122 with open('some-file.avro', 'rb') as fo:
123 avro_reader = block_reader(fo)
124 for block in avro_reader:
125 process_block(block)
126
127 metadata
128 Key-value pairs in the header metadata
129
130 codec The codec used when writing
131
132 writer_schema
133 The schema used when writing
134
135 reader_schema
136 The schema used when reading (if provided)
137
138 class Block(bytes_, num_records, codec, reader_schema, writer_schema,
139 offset, size)
140 An avro block. Will yield records when iterated over
141
142 num_records
143 Number of records in the block
144
145 writer_schema
146 The schema used when writing
147
148 reader_schema
149 The schema used when reading (if provided)
150
151 offset Offset of the block from the begining of the avro file
152
153 size Size of the block in bytes
154
155 schemaless_reader(fo, writer_schema, reader_schema=None)
156 Reads a single record writen using the schemaless_writer()
157
158 Parameters
159
160 · fo (file-like) – Input stream
161
162 · writer_schema (dict) – Schema used when calling schema‐
163 less_writer
164
165 · reader_schema (dict, optional) – If the schema has
166 changed since being written then the new schema can be
167 given to allow for schema migration
168
169 Example:
170
171 parsed_schema = fastavro.parse_schema(schema)
172 with open('file.avro', 'rb') as fp:
173 record = fastavro.schemaless_reader(fp, parsed_schema)
174
175 Note: The schemaless_reader can only read a single record.
176
177 is_avro(path_or_buffer)
178 Return True if path (or buffer) points to an Avro file.
179
180 Parameters
181 path_or_buffer (path to file or file-like object) – Path
182 to file
183
184 fastavro.write
185 writer(fo, schema, records, codec='null', sync_interval=16000, meta‐
186 data=None, validator=None, sync_marker=None)
187 Write records to fo (stream) according to schema
188
189 Parameters
190
191 · fo (file-like) – Output stream
192
193 · records (iterable) – Records to write. This is commonly
194 a list of the dictionary representation of the records,
195 but it can be any iterable
196
197 · codec (string, optional) – Compression codec, can be
198 ‘null’, ‘deflate’ or ‘snappy’ (if installed)
199
200 · sync_interval (int, optional) – Size of sync interval
201
202 · metadata (dict, optional) – Header metadata
203
204 · validator (None, True or a function) – Validator func‐
205 tion. If None (the default) - no validation. If True
206 then then fastavro.validation.validate will be used. If
207 it’s a function, it should have the same signature as
208 fastavro.writer.validate and raise an exeption on
209 error.
210
211 · sync_marker (bytes, optional) – A byte string used as
212 the avro sync marker. If not provided, a random byte
213 string will be used.
214
215 Example:
216
217 from fastavro import writer, parse_schema
218
219 schema = {
220 'doc': 'A weather reading.',
221 'name': 'Weather',
222 'namespace': 'test',
223 'type': 'record',
224 'fields': [
225 {'name': 'station', 'type': 'string'},
226 {'name': 'time', 'type': 'long'},
227 {'name': 'temp', 'type': 'int'},
228 ],
229 }
230 parsed_schema = parse_schema(schema)
231
232 records = [
233 {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
234 {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
235 {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
236 {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
237 ]
238
239 with open('weather.avro', 'wb') as out:
240 writer(out, parsed_schema, records)
241
242 Given an existing avro file, it’s possible to append to it by
243 re-opening the file in a+b mode. If the file is only opened in
244 ab mode, we aren’t able to read some of the existing header
245 information and an error will be raised. For example:
246
247 # Write initial records
248 with open('weather.avro', 'wb') as out:
249 writer(out, parsed_schema, records)
250
251 # Write some more records
252 with open('weather.avro', 'a+b') as out:
253 writer(out, parsed_schema, more_records)
254
255 schemaless_writer(fo, schema, record)
256 Write a single record without the schema or header information
257
258 Parameters
259
260 · fo (file-like) – Output file
261
262 · schema (dict) – Schema
263
264 · record (dict) – Record to write
265
266 Example:
267
268 parsed_schema = fastavro.parse_schema(schema)
269 with open('file.avro', 'rb') as fp:
270 fastavro.schemaless_writer(fp, parsed_schema, record)
271
272 Note: The schemaless_writer can only write a single record.
273
274 fastavro.schema
275 parse_schema(schema, _write_hint=True, _force=False)
276 Returns a parsed avro schema
277
278 It is not necessary to call parse_schema but doing so and saving
279 the parsed schema for use later will make future operations
280 faster as the schema will not need to be reparsed.
281
282 Parameters
283
284 · schema (dict) – Input schema
285
286 · _write_hint (bool) – Internal API argument specifying
287 whether or not the __fastavro_parsed marker should be
288 added to the schema
289
290 · _force (bool) – Internal API argument. If True, the
291 schema will always be parsed even if it has been parsed
292 and has the __fastavro_parsed marker
293
294 Example:
295
296 from fastavro import parse_schema
297 from fastavro import writer
298
299 parsed_schema = parse_schema(original_schema)
300 with open('weather.avro', 'wb') as out:
301 writer(out, parsed_schema, records)
302
303 fastavro.validation
304 validate(datum, schema, field=None, raise_errors=True)
305 Determine if a python datum is an instance of a schema.
306
307 Parameters
308
309 · datum (Any) – Data being validated
310
311 · schema (dict) – Schema
312
313 · field (str, optional) – Record field being validated
314
315 · raise_errors (bool, optional) – If true, errors are
316 raised for invalid data. If false, a simple True
317 (valid) or False (invalid) result is returned
318
319 Example:
320
321 from fastavro.validation import validate
322 schema = {...}
323 record = {...}
324 validate(record, schema)
325
326 validate_many(records, schema, raise_errors=True)
327 Validate a list of data!
328
329 Parameters
330
331 · records (iterable) – List of records to validate
332
333 · schema (dict) – Schema
334
335 · raise_errors (bool, optional) – If true, errors are
336 raised for invalid data. If false, a simple True
337 (valid) or False (invalid) result is returned
338
339 Example:
340
341 from fastavro.validation import validate_many
342 schema = {...}
343 records = [{...}, {...}, ...]
344 validate_many(records, schema)
345
346 fastavro command line script
347 A command line script is installed with the library that can be used to
348 dump the contents of avro file(s) to the standard output.
349
350 Usage:
351
352 usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]
353
354 iter over avro file, emit records as JSON
355
356 positional arguments:
357 file file(s) to parse
358
359 optional arguments:
360 -h, --help show this help message and exit
361 --schema dump schema instead of records
362 --codecs print supported codecs
363 --version show program's version number and exit
364 -p, --pretty pretty print json
365
366 Examples
367 Read an avro file:
368
369 $ fastavro weather.avro
370
371 {"temp": 0, "station": "011990-99999", "time": -619524000000}
372 {"temp": 22, "station": "011990-99999", "time": -619506000000}
373 {"temp": -11, "station": "011990-99999", "time": -619484400000}
374 {"temp": 111, "station": "012650-99999", "time": -655531200000}
375 {"temp": 78, "station": "012650-99999", "time": -655509600000}
376
377 Show the schema:
378
379 $ fastavro --schema weather.avro
380
381 {
382 "type": "record",
383 "namespace": "test",
384 "doc": "A weather reading.",
385 "fields": [
386 {
387 "type": "string",
388 "name": "station"
389 },
390 {
391 "type": "long",
392 "name": "time"
393 },
394 {
395 "type": "int",
396 "name": "temp"
397 }
398 ],
399 "name": "Weather"
400 }
401
402 · genindex
403
404 · modindex
405
406 · search
407
409 Miki Tebeka
410
412 2012, Miki Tebeka
413
414
415
416
4170.21.24 Jun 01, 2019 FASTAVRO(1)