1FASTAVRO(1) fastavro FASTAVRO(1)
2
3
4
6 fastavro - fastavro Documentation
7
8 The current Python avro package is packed with features but dog slow.
9
10 On a test case of about 10K records, it takes about 14sec to iterate
11 over all of them. In comparison the JAVA avro SDK does it in about
12 1.9sec.
13
14 fastavro is less feature complete than avro, however it’s much faster.
15 It iterates over the same 10K records in 2.9sec, and if you use it with
16 PyPy it’ll do it in 1.5sec (to be fair, the JAVA benchmark is doing
17 some extra JSON encoding/decoding).
18
19 If the optional C extension (generated by Cython) is available, then
20 fastavro will be even faster. For the same 10K records it’ll run in
21 about 1.7sec.
22
24 # Writing
25 from fastavro import writer
26
27 schema = {
28 'doc': 'A weather reading.',
29 'name': 'Weather',
30 'namespace': 'test',
31 'type': 'record',
32 'fields': [
33 {'name': 'station', 'type': 'string'},
34 {'name': 'time', 'type': 'long'},
35 {'name': 'temp', 'type': 'int'},
36 ],
37 }
38
39 # 'records' can be an iterable (including generator)
40 records = [
41 {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
42 {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
43 {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
44 {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
45 ]
46
47 with open('weather.avro', 'wb') as out:
48 writer(out, schema, records)
49
50 # Reading
51 import fastavro
52
53 with open('weather.avro', 'rb') as fo:
54 reader = fastavro.reader(fo)
55 schema = reader.schema
56
57 for record in reader:
58 print(record)
59
61 fastavro.read
62 class reader(fo, reader_schema=None)
63 Iterator over records in an avro file.
64
65 Parameters
66
67 · fo (file-like) – Input stream
68
69 · reader_schema (dict, optional) – Reader schema
70
71 Example:
72
73 from fastavro import reader
74 with open('some-file.avro', 'rb') as fo:
75 avro_reader = reader(fo)
76 schema = avro_reader.schema
77 for record in avro_reader:
78 process_record(record)
79
80 class block_reader(fo, reader_schema=None)
81 Iterator over blocks in an avro file.
82
83 Parameters
84
85 · fo (file-like) – Input stream
86
87 · reader_schema (dict, optional) – Reader schema
88
89 Example:
90
91 from fastavro import block_reader
92 with open('some-file.avro', 'rb') as fo:
93 avro_reader = block_reader(fo)
94 schema = avro_reader.schema
95 for block in avro_reader:
96 process_block(block)
97
98 schemaless_reader(fo, writer_schema, reader_schema=None)
99 Reads a single record writen using the schemaless_writer
100
101 Parameters
102
103 · fo (file-like) – Input stream
104
105 · writer_schema (dict) – Schema used when calling schema‐
106 less_writer
107
108 · reader_schema (dict, optional) – If the schema has
109 changed since being written then the new schema can be
110 given to allow for schema migration
111
112 Example:
113
114 with open('file.avro', 'rb') as fp:
115 record = fastavro.schemaless_reader(fp, schema)
116
117 Note: The schemaless_reader can only read a single record.
118
119 is_avro(path_or_buffer)
120 Return True if path (or buffer) points to an Avro file.
121
122 Parameters
123 path_or_buffer (path to file or file-like object) – Path
124 to file
125
126 fastavro.write
127 writer(fo, schema, records, codec='null', sync_interval=16000, meta‐
128 data=None, validator=None)
129 Write records to fo (stream) according to schema
130
131 Parameters
132
133 · fo (file-like) – Output stream
134
135 · records (iterable) – Records to write
136
137 · codec (string, optional) – Compression codec, can be
138 ‘null’, ‘deflate’ or ‘snappy’ (if installed)
139
140 · sync_interval (int, optional) – Size of sync interval
141
142 · metadata (dict, optional) – Header metadata
143
144 · validator (None, True or a function) – Validator func‐
145 tion. If None (the default) - no validation. If True
146 then then fastavro.validation.validate will be used. If
147 it’s a function, it should have the same signature as
148 fastavro.writer.validate and raise an exeption on
149 error.
150
151 Example:
152
153 from fastavro import writer
154
155 schema = {
156 'doc': 'A weather reading.',
157 'name': 'Weather',
158 'namespace': 'test',
159 'type': 'record',
160 'fields': [
161 {'name': 'station', 'type': 'string'},
162 {'name': 'time', 'type': 'long'},
163 {'name': 'temp', 'type': 'int'},
164 ],
165 }
166
167 records = [
168 {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
169 {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
170 {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
171 {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
172 ]
173
174 with open('weather.avro', 'wb') as out:
175 writer(out, schema, records)
176
177 schemaless_writer(fo, schema, record)
178 Write a single record without the schema or header information
179
180 Parameters
181
182 · fo (file-like) – Output file
183
184 · schema (dict) – Schema
185
186 · record (dict) – Record to write
187
188 Example:
189
190 with open('file.avro', 'rb') as fp:
191 fastavro.schemaless_writer(fp, schema, record)
192
193 Note: The schemaless_writer can only write a single record.
194
195 fastavro.validation
196 validate(datum, schema, field=None, raise_errors=True)
197 Determine if a python datum is an instance of a schema.
198
199 Parameters
200
201 · datum (Any) – Data being validated
202
203 · schema (dict) – Schema
204
205 · field (str, optional) – Record field being validated
206
207 · raise_errors (bool, optional) – If true, errors are
208 raised for invalid data. If false, a simple True
209 (valid) or False (invalid) result is returned
210
211 Example:
212
213 from fastavro.validation import validate
214 schema = {...}
215 record = {...}
216 validate(record, schema)
217
218 validate_many(records, schema, raise_errors=True)
219 Validate a list of data!
220
221 Parameters
222
223 · records (iterable) – List of records to validate
224
225 · schema (dict) – Schema
226
227 · raise_errors (bool, optional) – If true, errors are
228 raised for invalid data. If false, a simple True
229 (valid) or False (invalid) result is returned
230
231 Example:
232
233 from fastavro.validation import validate_many
234 schema = {...}
235 records = [{...}, {...}, ...]
236 validate_many(records, schema)
237
238 fastavro command line script
239 A command line script is installed with the library that can be used to
240 dump the contents of avro file(s) to the standard output.
241
242 Usage:
243
244 usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]
245
246 iter over avro file, emit records as JSON
247
248 positional arguments:
249 file file(s) to parse
250
251 optional arguments:
252 -h, --help show this help message and exit
253 --schema dump schema instead of records
254 --codecs print supported codecs
255 --version show program's version number and exit
256 -p, --pretty pretty print json
257
258 Examples
259 Read an avro file:
260
261 $ fastavro weather.avro
262
263 {"temp": 0, "station": "011990-99999", "time": -619524000000}
264 {"temp": 22, "station": "011990-99999", "time": -619506000000}
265 {"temp": -11, "station": "011990-99999", "time": -619484400000}
266 {"temp": 111, "station": "012650-99999", "time": -655531200000}
267 {"temp": 78, "station": "012650-99999", "time": -655509600000}
268
269 Show the schema:
270
271 $ fastavro --schema weather.avro
272
273 {
274 "type": "record",
275 "namespace": "test",
276 "doc": "A weather reading.",
277 "fields": [
278 {
279 "type": "string",
280 "name": "station"
281 },
282 {
283 "type": "long",
284 "name": "time"
285 },
286 {
287 "type": "int",
288 "name": "temp"
289 }
290 ],
291 "name": "Weather"
292 }
293
294 · genindex
295
296 · modindex
297
298 · search
299
301 Miki Tebeka
302
304 2012, Miki Tebeka
305
306
307
308
3090.19.8 Jul 15, 2018 FASTAVRO(1)