CREATE_AGGREGATE(7)

1CREATE AGGREGATE(7)      PostgreSQL 11.6 Documentation     CREATE AGGREGATE(7)
2
3
4

NAME

6       CREATE_AGGREGATE - define a new aggregate function
7

SYNOPSIS

9       CREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
10           SFUNC = sfunc,
11           STYPE = state_data_type
12           [ , SSPACE = state_data_size ]
13           [ , FINALFUNC = ffunc ]
14           [ , FINALFUNC_EXTRA ]
15           [ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]
16           [ , COMBINEFUNC = combinefunc ]
17           [ , SERIALFUNC = serialfunc ]
18           [ , DESERIALFUNC = deserialfunc ]
19           [ , INITCOND = initial_condition ]
20           [ , MSFUNC = msfunc ]
21           [ , MINVFUNC = minvfunc ]
22           [ , MSTYPE = mstate_data_type ]
23           [ , MSSPACE = mstate_data_size ]
24           [ , MFINALFUNC = mffunc ]
25           [ , MFINALFUNC_EXTRA ]
26           [ , MFINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]
27           [ , MINITCOND = minitial_condition ]
28           [ , SORTOP = sort_operator ]
29           [ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]
30       )
31
32       CREATE AGGREGATE name ( [ [ argmode ] [ argname ] arg_data_type [ , ... ] ]
33                               ORDER BY [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
34           SFUNC = sfunc,
35           STYPE = state_data_type
36           [ , SSPACE = state_data_size ]
37           [ , FINALFUNC = ffunc ]
38           [ , FINALFUNC_EXTRA ]
39           [ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]
40           [ , INITCOND = initial_condition ]
41           [ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]
42           [ , HYPOTHETICAL ]
43       )
44
45       or the old syntax
46
47       CREATE AGGREGATE name (
48           BASETYPE = base_type,
49           SFUNC = sfunc,
50           STYPE = state_data_type
51           [ , SSPACE = state_data_size ]
52           [ , FINALFUNC = ffunc ]
53           [ , FINALFUNC_EXTRA ]
54           [ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]
55           [ , COMBINEFUNC = combinefunc ]
56           [ , SERIALFUNC = serialfunc ]
57           [ , DESERIALFUNC = deserialfunc ]
58           [ , INITCOND = initial_condition ]
59           [ , MSFUNC = msfunc ]
60           [ , MINVFUNC = minvfunc ]
61           [ , MSTYPE = mstate_data_type ]
62           [ , MSSPACE = mstate_data_size ]
63           [ , MFINALFUNC = mffunc ]
64           [ , MFINALFUNC_EXTRA ]
65           [ , MFINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]
66           [ , MINITCOND = minitial_condition ]
67           [ , SORTOP = sort_operator ]
68       )
69

DESCRIPTION

71       CREATE AGGREGATE defines a new aggregate function. Some basic and
72       commonly-used aggregate functions are included with the distribution;
73       they are documented in Section 9.20. If one defines new types or needs
74       an aggregate function not already provided, then CREATE AGGREGATE can
75       be used to provide the desired features.
76
77       If a schema name is given (for example, CREATE AGGREGATE myschema.myagg
78       ...) then the aggregate function is created in the specified schema.
79       Otherwise it is created in the current schema.
80
81       An aggregate function is identified by its name and input data type(s).
82       Two aggregates in the same schema can have the same name if they
83       operate on different input types. The name and input data type(s) of an
84       aggregate must also be distinct from the name and input data type(s) of
85       every ordinary function in the same schema. This behavior is identical
86       to overloading of ordinary function names (see CREATE FUNCTION
87       (CREATE_FUNCTION(7))).
88
89       A simple aggregate function is made from one or two ordinary functions:
90       a state transition function sfunc, and an optional final calculation
91       function ffunc. These are used as follows:
92
93           sfunc( internal-state, next-data-values ) ---> next-internal-state
94           ffunc( internal-state ) ---> aggregate-value
95
96       PostgreSQL creates a temporary variable of data type stype to hold the
97       current internal state of the aggregate. At each input row, the
98       aggregate argument value(s) are calculated and the state transition
99       function is invoked with the current state value and the new argument
100       value(s) to calculate a new internal state value. After all the rows
101       have been processed, the final function is invoked once to calculate
102       the aggregate's return value. If there is no final function then the
103       ending state value is returned as-is.
104
105       An aggregate function can provide an initial condition, that is, an
106       initial value for the internal state value. This is specified and
107       stored in the database as a value of type text, but it must be a valid
108       external representation of a constant of the state value data type. If
109       it is not supplied then the state value starts out null.
110
111       If the state transition function is declared “strict”, then it cannot
112       be called with null inputs. With such a transition function, aggregate
113       execution behaves as follows. Rows with any null input values are
114       ignored (the function is not called and the previous state value is
115       retained). If the initial state value is null, then at the first row
116       with all-nonnull input values, the first argument value replaces the
117       state value, and the transition function is invoked at each subsequent
118       row with all-nonnull input values. This is handy for implementing
119       aggregates like max. Note that this behavior is only available when
120       state_data_type is the same as the first arg_data_type. When these
121       types are different, you must supply a nonnull initial condition or use
122       a nonstrict transition function.
123
124       If the state transition function is not strict, then it will be called
125       unconditionally at each input row, and must deal with null inputs and
126       null state values for itself. This allows the aggregate author to have
127       full control over the aggregate's handling of null values.
128
129       If the final function is declared “strict”, then it will not be called
130       when the ending state value is null; instead a null result will be
131       returned automatically. (Of course this is just the normal behavior of
132       strict functions.) In any case the final function has the option of
133       returning a null value. For example, the final function for avg returns
134       null when it sees there were zero input rows.
135
136       Sometimes it is useful to declare the final function as taking not just
137       the state value, but extra parameters corresponding to the aggregate's
138       input values. The main reason for doing this is if the final function
139       is polymorphic and the state value's data type would be inadequate to
140       pin down the result type. These extra parameters are always passed as
141       NULL (and so the final function must not be strict when the
142       FINALFUNC_EXTRA option is used), but nonetheless they are valid
143       parameters. The final function could for example make use of
144       get_fn_expr_argtype to identify the actual argument type in the current
145       call.
146
147       An aggregate can optionally support moving-aggregate mode, as described
148       in Section 38.11.1. This requires specifying the MSFUNC, MINVFUNC, and
149       MSTYPE parameters, and optionally the MSSPACE, MFINALFUNC,
150       MFINALFUNC_EXTRA, MFINALFUNC_MODIFY, and MINITCOND parameters. Except
151       for MINVFUNC, these parameters work like the corresponding
152       simple-aggregate parameters without M; they define a separate
153       implementation of the aggregate that includes an inverse transition
154       function.
155
156       The syntax with ORDER BY in the parameter list creates a special type
157       of aggregate called an ordered-set aggregate; or if HYPOTHETICAL is
158       specified, then a hypothetical-set aggregate is created. These
159       aggregates operate over groups of sorted values in order-dependent
160       ways, so that specification of an input sort order is an essential part
161       of a call. Also, they can have direct arguments, which are arguments
162       that are evaluated only once per aggregation rather than once per input
163       row. Hypothetical-set aggregates are a subclass of ordered-set
164       aggregates in which some of the direct arguments are required to match,
165       in number and data types, the aggregated argument columns. This allows
166       the values of those direct arguments to be added to the collection of
167       aggregate-input rows as an additional “hypothetical” row.
168
169       An aggregate can optionally support partial aggregation, as described
170       in Section 38.11.4. This requires specifying the COMBINEFUNC parameter.
171       If the state_data_type is internal, it's usually also appropriate to
172       provide the SERIALFUNC and DESERIALFUNC parameters so that parallel
173       aggregation is possible. Note that the aggregate must also be marked
174       PARALLEL SAFE to enable parallel aggregation.
175
176       Aggregates that behave like MIN or MAX can sometimes be optimized by
177       looking into an index instead of scanning every input row. If this
178       aggregate can be so optimized, indicate it by specifying a sort
179       operator. The basic requirement is that the aggregate must yield the
180       first element in the sort ordering induced by the operator; in other
181       words:
182
183           SELECT agg(col) FROM tab;
184
185       must be equivalent to:
186
187           SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
188
189       Further assumptions are that the aggregate ignores null inputs, and
190       that it delivers a null result if and only if there were no non-null
191       inputs. Ordinarily, a data type's < operator is the proper sort
192       operator for MIN, and > is the proper sort operator for MAX. Note that
193       the optimization will never actually take effect unless the specified
194       operator is the “less than” or “greater than” strategy member of a
195       B-tree index operator class.
196
197       To be able to create an aggregate function, you must have USAGE
198       privilege on the argument types, the state type(s), and the return
199       type, as well as EXECUTE privilege on the supporting functions.
200

PARAMETERS

202       name
203           The name (optionally schema-qualified) of the aggregate function to
204           create.
205
206       argmode
207           The mode of an argument: IN or VARIADIC. (Aggregate functions do
208           not support OUT arguments.) If omitted, the default is IN. Only the
209           last argument can be marked VARIADIC.
210
211       argname
212           The name of an argument. This is currently only useful for
213           documentation purposes. If omitted, the argument has no name.
214
215       arg_data_type
216           An input data type on which this aggregate function operates. To
217           create a zero-argument aggregate function, write * in place of the
218           list of argument specifications. (An example of such an aggregate
219           is count(*).)
220
221       base_type
222           In the old syntax for CREATE AGGREGATE, the input data type is
223           specified by a basetype parameter rather than being written next to
224           the aggregate name. Note that this syntax allows only one input
225           parameter. To define a zero-argument aggregate function with this
226           syntax, specify the basetype as "ANY" (not *). Ordered-set
227           aggregates cannot be defined with the old syntax.
228
229       sfunc
230           The name of the state transition function to be called for each
231           input row. For a normal N-argument aggregate function, the sfunc
232           must take N+1 arguments, the first being of type state_data_type
233           and the rest matching the declared input data type(s) of the
234           aggregate. The function must return a value of type
235           state_data_type. This function takes the current state value and
236           the current input data value(s), and returns the next state value.
237
238           For ordered-set (including hypothetical-set) aggregates, the state
239           transition function receives only the current state value and the
240           aggregated arguments, not the direct arguments. Otherwise it is the
241           same.
242
243       state_data_type
244           The data type for the aggregate's state value.
245
246       state_data_size
247           The approximate average size (in bytes) of the aggregate's state
248           value. If this parameter is omitted or is zero, a default estimate
249           is used based on the state_data_type. The planner uses this value
250           to estimate the memory required for a grouped aggregate query. The
251           planner will consider using hash aggregation for such a query only
252           if the hash table is estimated to fit in work_mem; therefore, large
253           values of this parameter discourage use of hash aggregation.
254
255       ffunc
256           The name of the final function called to compute the aggregate's
257           result after all input rows have been traversed. For a normal
258           aggregate, this function must take a single argument of type
259           state_data_type. The return data type of the aggregate is defined
260           as the return type of this function. If ffunc is not specified,
261           then the ending state value is used as the aggregate's result, and
262           the return type is state_data_type.
263
264           For ordered-set (including hypothetical-set) aggregates, the final
265           function receives not only the final state value, but also the
266           values of all the direct arguments.
267
268           If FINALFUNC_EXTRA is specified, then in addition to the final
269           state value and any direct arguments, the final function receives
270           extra NULL values corresponding to the aggregate's regular
271           (aggregated) arguments. This is mainly useful to allow correct
272           resolution of the aggregate result type when a polymorphic
273           aggregate is being defined.
274
275       FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE }
276           This option specifies whether the final function is a pure function
277           that does not modify its arguments.  READ_ONLY indicates it does
278           not; the other two values indicate that it may change the
279           transition state value. See NOTES below for more detail. The
280           default is READ_ONLY, except for ordered-set aggregates, for which
281           the default is READ_WRITE.
282
283       combinefunc
284           The combinefunc function may optionally be specified to allow the
285           aggregate function to support partial aggregation. If provided, the
286           combinefunc must combine two state_data_type values, each
287           containing the result of aggregation over some subset of the input
288           values, to produce a new state_data_type that represents the result
289           of aggregating over both sets of inputs. This function can be
290           thought of as an sfunc, where instead of acting upon an individual
291           input row and adding it to the running aggregate state, it adds
292           another aggregate state to the running state.
293
294           The combinefunc must be declared as taking two arguments of the
295           state_data_type and returning a value of the state_data_type.
296           Optionally this function may be “strict”. In this case the function
297           will not be called when either of the input states are null; the
298           other state will be taken as the correct result.
299
300           For aggregate functions whose state_data_type is internal, the
301           combinefunc must not be strict. In this case the combinefunc must
302           ensure that null states are handled correctly and that the state
303           being returned is properly stored in the aggregate memory context.
304
305       serialfunc
306           An aggregate function whose state_data_type is internal can
307           participate in parallel aggregation only if it has a serialfunc
308           function, which must serialize the aggregate state into a bytea
309           value for transmission to another process. This function must take
310           a single argument of type internal and return type bytea. A
311           corresponding deserialfunc is also required.
312
313       deserialfunc
314           Deserialize a previously serialized aggregate state back into
315           state_data_type. This function must take two arguments of types
316           bytea and internal, and produce a result of type internal. (Note:
317           the second, internal argument is unused, but is required for type
318           safety reasons.)
319
320       initial_condition
321           The initial setting for the state value. This must be a string
322           constant in the form accepted for the data type state_data_type. If
323           not specified, the state value starts out null.
324
325       msfunc
326           The name of the forward state transition function to be called for
327           each input row in moving-aggregate mode. This is exactly like the
328           regular transition function, except that its first argument and
329           result are of type mstate_data_type, which might be different from
330           state_data_type.
331
332       minvfunc
333           The name of the inverse state transition function to be used in
334           moving-aggregate mode. This function has the same argument and
335           result types as msfunc, but it is used to remove a value from the
336           current aggregate state, rather than add a value to it. The inverse
337           transition function must have the same strictness attribute as the
338           forward state transition function.
339
340       mstate_data_type
341           The data type for the aggregate's state value, when using
342           moving-aggregate mode.
343
344       mstate_data_size
345           The approximate average size (in bytes) of the aggregate's state
346           value, when using moving-aggregate mode. This works the same as
347           state_data_size.
348
349       mffunc
350           The name of the final function called to compute the aggregate's
351           result after all input rows have been traversed, when using
352           moving-aggregate mode. This works the same as ffunc, except that
353           its first argument's type is mstate_data_type and extra dummy
354           arguments are specified by writing MFINALFUNC_EXTRA. The aggregate
355           result type determined by mffunc or mstate_data_type must match
356           that determined by the aggregate's regular implementation.
357
358       MFINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE }
359           This option is like FINALFUNC_MODIFY, but it describes the behavior
360           of the moving-aggregate final function.
361
362       minitial_condition
363           The initial setting for the state value, when using
364           moving-aggregate mode. This works the same as initial_condition.
365
366       sort_operator
367           The associated sort operator for a MIN- or MAX-like aggregate. This
368           is just an operator name (possibly schema-qualified). The operator
369           is assumed to have the same input data types as the aggregate
370           (which must be a single-argument normal aggregate).
371
372       PARALLEL = { SAFE | RESTRICTED | UNSAFE }
373           The meanings of PARALLEL SAFE, PARALLEL RESTRICTED, and PARALLEL
374           UNSAFE are the same as in CREATE FUNCTION (CREATE_FUNCTION(7)). An
375           aggregate will not be considered for parallelization if it is
376           marked PARALLEL UNSAFE (which is the default!) or PARALLEL
377           RESTRICTED. Note that the parallel-safety markings of the
378           aggregate's support functions are not consulted by the planner,
379           only the marking of the aggregate itself.
380
381       HYPOTHETICAL
382           For ordered-set aggregates only, this flag specifies that the
383           aggregate arguments are to be processed according to the
384           requirements for hypothetical-set aggregates: that is, the last few
385           direct arguments must match the data types of the aggregated
386           (WITHIN GROUP) arguments. The HYPOTHETICAL flag has no effect on
387           run-time behavior, only on parse-time resolution of the data types
388           and collations of the aggregate's arguments.
389
390       The parameters of CREATE AGGREGATE can be written in any order, not
391       just the order illustrated above.
392

NOTES

394       In parameters that specify support function names, you can write a
395       schema name if needed, for example SFUNC = public.sum. Do not write
396       argument types there, however — the argument types of the support
397       functions are determined from other parameters.
398
399       Ordinarily, PostgreSQL functions are expected to be true functions that
400       do not modify their input values. However, an aggregate transition
401       function, when used in the context of an aggregate, is allowed to cheat
402       and modify its transition-state argument in place. This can provide
403       substantial performance benefits compared to making a fresh copy of the
404       transition state each time.
405
406       Likewise, while an aggregate final function is normally expected not to
407       modify its input values, sometimes it is impractical to avoid modifying
408       the transition-state argument. Such behavior must be declared using the
409       FINALFUNC_MODIFY parameter. The READ_WRITE value indicates that the
410       final function modifies the transition state in unspecified ways. This
411       value prevents use of the aggregate as a window function, and it also
412       prevents merging of transition states for aggregate calls that share
413       the same input values and transition functions. The SHAREABLE value
414       indicates that the transition function cannot be applied after the
415       final function, but multiple final-function calls can be performed on
416       the ending transition state value. This value prevents use of the
417       aggregate as a window function, but it allows merging of transition
418       states. (That is, the optimization of interest here is not applying the
419       same final function repeatedly, but applying different final functions
420       to the same ending transition state value. This is allowed as long as
421       none of the final functions are marked READ_WRITE.)
422
423       If an aggregate supports moving-aggregate mode, it will improve
424       calculation efficiency when the aggregate is used as a window function
425       for a window with moving frame start (that is, a frame start mode other
426       than UNBOUNDED PRECEDING). Conceptually, the forward transition
427       function adds input values to the aggregate's state when they enter the
428       window frame from the bottom, and the inverse transition function
429       removes them again when they leave the frame at the top. So, when
430       values are removed, they are always removed in the same order they were
431       added. Whenever the inverse transition function is invoked, it will
432       thus receive the earliest added but not yet removed argument value(s).
433       The inverse transition function can assume that at least one row will
434       remain in the current state after it removes the oldest row. (When this
435       would not be the case, the window function mechanism simply starts a
436       fresh aggregation, rather than using the inverse transition function.)
437
438       The forward transition function for moving-aggregate mode is not
439       allowed to return NULL as the new state value. If the inverse
440       transition function returns NULL, this is taken as an indication that
441       the inverse function cannot reverse the state calculation for this
442       particular input, and so the aggregate calculation will be redone from
443       scratch for the current frame starting position. This convention allows
444       moving-aggregate mode to be used in situations where there are some
445       infrequent cases that are impractical to reverse out of the running
446       state value.
447
448       If no moving-aggregate implementation is supplied, the aggregate can
449       still be used with moving frames, but PostgreSQL will recompute the
450       whole aggregation whenever the start of the frame moves. Note that
451       whether or not the aggregate supports moving-aggregate mode, PostgreSQL
452       can handle a moving frame end without recalculation; this is done by
453       continuing to add new values to the aggregate's state. This is why use
454       of an aggregate as a window function requires that the final function
455       be read-only: it must not damage the aggregate's state value, so that
456       the aggregation can be continued even after an aggregate result value
457       has been obtained for one set of frame boundaries.
458
459       The syntax for ordered-set aggregates allows VARIADIC to be specified
460       for both the last direct parameter and the last aggregated (WITHIN
461       GROUP) parameter. However, the current implementation restricts use of
462       VARIADIC in two ways. First, ordered-set aggregates can only use
463       VARIADIC "any", not other variadic array types. Second, if the last
464       direct parameter is VARIADIC "any", then there can be only one
465       aggregated parameter and it must also be VARIADIC "any". (In the
466       representation used in the system catalogs, these two parameters are
467       merged into a single VARIADIC "any" item, since pg_proc cannot
468       represent functions with more than one VARIADIC parameter.) If the
469       aggregate is a hypothetical-set aggregate, the direct arguments that
470       match the VARIADIC "any" parameter are the hypothetical ones; any
471       preceding parameters represent additional direct arguments that are not
472       constrained to match the aggregated arguments.
473
474       Currently, ordered-set aggregates do not need to support
475       moving-aggregate mode, since they cannot be used as window functions.
476
477       Partial (including parallel) aggregation is currently not supported for
478       ordered-set aggregates. Also, it will never be used for aggregate calls
479       that include DISTINCT or ORDER BY clauses, since those semantics cannot
480       be supported during partial aggregation.
481

EXAMPLES

483       See Section 38.11.
484

COMPATIBILITY

486       CREATE AGGREGATE is a PostgreSQL language extension. The SQL standard
487       does not provide for user-defined aggregate functions.
488