Queries with Aggregate Expressions
An aggregate expression applies an aggregate function across the rows that a query selects. An aggregate function performs a calculation on a set of values and returns a single value, such as the sum or average of the set of values. The syntax of an aggregate expression is one of the following:
aggregate_name(expression [ , ... ] ) [ FILTER ( WHERE filter_clause ) ]
— operates across all input rows for which the expected result value is non-null.ALL
is the default.aggregate_name(ALL expression [ , ... ] ) [ FILTER ( WHERE filter_clause ) ]
— operates identically to the first form becauseALL
is the default.aggregate_name(DISTINCT expression [ , ... ] ) [ FILTER ( WHERE filter_clause ) ]
— operates across all distinct non-null values of input rows.aggregate_name(*) [ FILTER ( WHERE filter_clause ) ]
— operates on all rows with values both null and non-null. Generally, this form is most useful for thecount(*)
aggregate function.
Where aggregate_name
is a previously defined aggregate (possibly schema-qualified) and expression
is any value expression that does not contain an aggregate expression.
For example, count(*)
yields the total number of input rows, count(f1)
yields the number of input rows in which f1
is non-null, andcount(distinct f1)
yields the number of distinct non-null values of f1
.
If FILTER
is specified, then only the input rows for which the filter_clause
evaluates to true are fed to the aggregate function; other rows are discarded. For example:
SELECT
count(*) AS unfiltered,
count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
unfiltered | filtered
------------+----------
10 | 4
(1 row)
You can also add custom aggregate functions.
Apache Cloudberry provides the MEDIAN
aggregate function, which returns the fiftieth percentile of the PERCENTILE_CONT
result and special aggregate expressions for inverse distribution functions as follows:
PERCENTILE_CONT(<percentage>) WITHIN GROUP (ORDER BY <expression>)
PERCENTILE_DISC(<percentage>) WITHIN GROUP (ORDER BY <expression>)
Currently you can use only these two expressions with the keyword WITHIN GROUP
.
Limitations of aggregate expressions
The following are current limitations of the aggregate expressions:
- Apache Cloudberry does not support the following keywords:
ALL
,DISTINCT
, andOVER
. - An aggregate expression can appear only in the result list or
HAVING
clause of aSELECT
command. It is forbidden in other clauses, such asWHERE
, because those clauses are logically evaluated before the results of aggregates form. This restriction applies to the query level to which the aggregate belongs. - When an aggregate expression appears in a subquery, the aggregate is normally evaluated over the rows of the subquery. If the aggregate's arguments (and
filter_clause
if any) contain only outer-level variables, the aggregate belongs to the nearest such outer level and evaluates over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery in which it appears, and the aggregate expression acts as a constant over any one evaluation of that subquery. The restriction about appearing only in the result list orHAVING
clause applies with respect to the query level at which the aggregate appears. - Apache Cloudberry does not support specifying an aggregate function as an argument to another aggregate function.
- Apache Cloudberry does not support specifying a window function as an argument to an aggregate function.