Queries with Window Functions
Window expressions allow application developers to more easily compose complex online analytical processing (OLAP) queries using standard SQL commands. For example, with window expressions, users can calculate moving averages or sums over various intervals, reset aggregations and ranks as selected column values change, and express complex ratios in simple terms.
A window expression represents the application of a <window function> to a <window frame>, which is defined with an OVER() clause. This is comparable to the type of calculation that can be done with an aggregate function and a GROUP BY clause. Unlike aggregate functions, which return a single result value for each group of rows, window functions return a result value for every row, but that value is calculated with respect to the set of rows in the window frame to which the row belongs. The OVER() clause allows dividing the rows into partitions and then further restricting the window frame by specifying which rows preceding or following the current row within its partition to include in the calculation.
Apache Cloudberry does not support specifying a window function as an argument to another window function.
The syntax of a window expression is:
<window_function> ( [<expression> [, ...]] ) [ FILTER ( WHERE <filter_clause> ) ] OVER ( <window_specification> )
Where <window_function> might be a user-defined window function, and <expression> is any value expression that does not contain a window expression, and <window_specification> is:
[<window_name>]
[PARTITION BY <expression> [, ...]]
[[ORDER BY <expression> [ASC | DESC | USING operator] [NULLS {FIRST | LAST}] [, ...]
[ <frame_clause> ]
The optional <frame_clause> can be one of the following:
{ RANGE | ROWS | GROUPS } <frame_start> [ <frame_exclusion> ]
{ RANGE | ROWS | GROUPS } BETWEEN <frame_start> AND <frame_end> [ <frame_exclusion> ]
Where <frame_start> and <frame_end> can be one of the following:
UNBOUNDED PRECEDING
<offset> PRECEDING
CURRENT ROW
<offset> FOLLOWING
UNBOUNDED FOLLOWING
and <frame_exclusion> can be one of the following:
EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS
A window expression can appear only in the select list of a SELECT command. For example:
SELECT count(*) OVER(PARTITION BY customer_id), * FROM sales;
If FILTER is specified, then only the input rows for which the <filter_clause> evaluates to true are fed to the window function; other rows are discarded. In a window expression, a FILTER clause can be used only with a <window_function> that is an aggregate function.
In a window expression, the expression must contain an OVER clause. The OVER clause specifies the window frame—the rows to be processed by the window function. This syntactically distinguishes the function from a regular or aggregate function.
In a window aggregate function that is used in a window expression, Apache Cloudberry does not support a DISTINCT clause with multiple input expressions.
A window specification has the following characteristics:
- The
PARTITION BYclause defines the window partitions to which the window function is applied. If omitted, the entire result set is treated as one partition. - The
ORDER BYclause defines the expression(s) for sorting rows within a window partition. TheORDER BYclause of a window specification is separate and distinct from theORDER BYclause of a regular query expression. TheORDER BYclause is required for the window functions that calculate rankings, as it identifies the measure(s) for the ranking values. For OLAP aggregations, theORDER BYclause is required to use window frames (theROWS,RANGEorGROUPSclause).
Columns of data types without a coherent ordering, such as time, are not good candidates for use in the ORDER BY clause of a window specification. Time, with or without a specified time zone, lacks a coherent ordering because addition and subtraction do not have the expected effects. For example, the following is not generally true: x::time < x::time + '2 hour'::interval
-
The
<frame_clause>specifies the set of rows constituting the<window frame>, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified inRANGE,ROWSorGROUPSmode; in each case, it runs from the<frame_start>to the<frame_end>. If<frame_end>is omitted, the end defaults toCURRENT ROW. -
A
<frame_start>ofUNBOUNDED PRECEDINGmeans that the frame starts with the first row of the partition, and similarly a<frame_end>ofUNBOUNDED FOLLOWINGmeans that the frame ends with the last row of the partition. -
In
RANGEorGROUPSmode, a<frame_start>ofCURRENT ROWmeans the frame starts with the current row's first peer row (a row that the window'sORDER BYclause sorts as equivalent to the current row), while a<frame_end>ofCURRENT ROWmeans the frame ends with the current row's last peer row. InROWSmode,CURRENT ROWsimply means the current row. -
In the
<offset> PRECEDINGand<offset> FOLLOWINGframe options, the<offset>must be an expression not containing any variables, aggregate functions, or window functions. The meaning of the<offset>depends on the frame mode:-
In
ROWSmode, the<offset>must yield a non-null, non-negative integer, and the option means that the frame starts or ends the specified number of rows before or after the current row. -
In
GROUPSmode, the<offset>again must yield a non-null, non-negative integer, and the option means that the frame starts or ends the specified number of peer groups before or after the current row's peer group, where a peer group is a set of rows that are equivalent in theORDER BYordering. (There must be anORDER BYclause in the window definition to useGROUPSmode). -
In
RANGEmode, these options require that theORDER BYclause specifies exactly one column. The<offset>specifies the maximum difference between the value of that column in the current row and its value in preceding or following rows of the frame. The data type of the<offset>expression varies depending on the data type of the ordering column. For numeric ordering columns it is typically of the same type as the ordering column, but for datetime ordering columns it is aninterval. For example, if the ordering column is of typedateortimestamp, one could writeRANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING. The<offset>is still required to be non-null and non-negative, though the meaning of “non-negative” depends on its data type.
In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.
-
-
Notice that in both
ROWSandGROUPSmode,0 PRECEDINGand0 FOLLOWINGare equivalent toCURRENT ROW. This normally holds inRANGEmode as well, for an appropriate data-type-specific meaning of “zero”. -
The
<frame_exclusion>option allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options.EXCLUDE CURRENT ROWexcludes the current row from the frame.EXCLUDE GROUPexcludes the current row and its ordering peers from the frame.EXCLUDE TIESexcludes any peers of the current row from the frame, but not the current row itself.EXCLUDE NO OTHERSsimply specifies explicitly the default behavior of not excluding the current row or its peers. -
The default framing option is
RANGE UNBOUNDED PRECEDING, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. WithORDER BY, this sets the frame to be all rows from the partition start up through the current row's lastORDER BYpeer. WithoutORDER BY, this means all rows of the partition are included in the window frame, because all rows become peers of the current row. -
Restrictions are that
<frame_start>cannot beUNBOUNDED FOLLOWING,<frame_end>cannot beUNBOUNDED PRECEDING, and the<frame_end>choice cannot appear earlier in the above list than the<frame_start>choice. for exampleRANGE BETWEEN CURRENT ROW AND value PRECEDINGis not allowed.
Window examples
The following examples demonstrate using window functions with partitions and window frames.
Example 1 – Aggregate window function over a partition
The PARTITION BY list in the OVER clause divides the rows into groups, or partitions, that have the same values as the specified expressions.
This example compares employees' salaries with the average salaries for their departments:
SELECT depname, empno, salary, avg(salary) OVER(PARTITION BY depname)
FROM empsalary;
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 9 | 4500 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
develop | 11 | 5200 | 5020.0000000000000000
develop | 7 | 4200 | 5020.0000000000000000
develop | 8 | 6000 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
personnel | 2 | 3900 | 3700.0000000000000000
sales | 1 | 5000 | 4866.6666666666666667
sales | 3 | 4800 | 4866.6666666666666667
sales | 4 | 4800 | 4866.6666666666666667
(10 rows)
The first three output columns come from the table empsalary, and there is one output row for each row in the table. The fourth column is the average calculated on all rows that have the same depname value as the current row. Rows that share the same depname value constitute a partition, and there are three partitions in this example. The avg function is the same as the regular avg aggregate function, but the OVER clause causes it to be applied as a window function.
You can also put the window specification in a WINDOW clause and reference it in the select list. This example is equivalent to the previous query:
SELECT depname, empno, salary, avg(salary) OVER(mywindow)
FROM empsalary
WINDOW mywindow AS (PARTITION BY depname);
Defining a named window is useful when the select list has multiple window functions using the same window specification.
Example 2 – Ranking window function with an ORDER BY clause
An ORDER BY clause within the OVER clause controls the order in which rows are processed by window functions. The ORDER BY list for the window function does not have to match the output order of the query. This example uses the rank() window function to rank employees' salaries within their departments:
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 10 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)
Example 3 – Aggregate function over a row window frame
A RANGE, ROWS or GROUPS clause defines the window frame—a set of rows within a partition—that the window function includes in the calculation. ROWS specifies a physical set of rows to process, for example all rows from the beginning of the partition to the current row.
This example calculates a running total of employee's salaries by department using the sum() function to total rows from the start of the partition to the current row:
SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
ROWS between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 13900
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 4800
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)
Example 4 – Aggregate function for a range or groups window frame
RANGE and GROUPS modes specify logical values based on values of the ORDER BY expression in the OVER clause. This example demonstrates the difference between ROWS and RANGE or GROUPS. The frame contains all rows with salary values less than or equal to the current row. Unlike the previous example, for employees with the same salary, the sum is the same and includes the salaries of all of those employees.
SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
RANGE between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 19100
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 9600
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)
In this example, because the <frame_start> and <frame_end> are not using an <offset>, RANGE and GROUPS mode provide identical results:
SELECT depname, empno, salary,
sum(salary) OVER (PARTITION BY depname ORDER BY salary
GROUPS between UNBOUNDED PRECEDING AND CURRENT ROW)
FROM empsalary ORDER BY depname, sum;
depname | empno | salary | sum
-----------+-------+--------+-------
develop | 7 | 4200 | 4200
develop | 9 | 4500 | 8700
develop | 11 | 5200 | 19100
develop | 10 | 5200 | 19100
develop | 8 | 6000 | 25100
personnel | 5 | 3500 | 3500
personnel | 2 | 3900 | 7400
sales | 4 | 4800 | 9600
sales | 3 | 4800 | 9600
sales | 1 | 5000 | 14600
(10 rows)
Example 5 – Aggregate function for a range and a groups window frame
This example demonstrates the difference between RANGE and GROUPS. The table sales lists the sales a company's two shops during a period of four days.
When using GROUPS mode, the value of <offset> indicates the number of peer groups before the current row's peer group. The different peer groups are set by the ORDER BY clause, in this case the date. because the query does not specify a value for <frame_end>, the end defaults to CURRENT ROW.
SELECT date, shop, total, sum(total) OVER (PARTITION BY shop ORDER BY date asc GROUPS 2 PRECEDING)
FROM sales ORDER BY shop, date;
date | shop | total | sum
------------+--------+---------+----------
2022-01-07 | Shop 1 | 3000.00 | 3000.00
2022-01-08 | Shop 1 | 1000.00 | 4000.00
2022-01-09 | Shop 1 | 5000.00 | 11000.00
2022-01-09 | Shop 1 | 2000.00 | 11000.00
2022-01-07 | Shop 2 | 4000.00 | 10000.00
2022-01-07 | Shop 2 | 6000.00 | 10000.00
2022-01-09 | Shop 2 | 7000.00 | 21000.00
2022-01-09 | Shop 2 | 4000.00 | 21000.00
2022-01-10 | Shop 2 | 2000.00 | 23000.00
(9 rows)
In order to get the equivalent result when using the RANGE mode, the <frame_clause> must specify an <offset> using the same data type as the ordering column, in this case date. Using the numberic value 2 will return an error.
SELECT date, shop, total, sum(total) OVER (PARTITION BY shop ORDER BY date asc RANGE '2 days' PRECEDING)
FROM sales ORDER BY shop, date;
date | shop | total | sum
------------+--------+---------+----------
2022-01-07 | Shop 1 | 3000.00 | 3000.00
2022-01-08 | Shop 1 | 1000.00 | 4000.00
2022-01-09 | Shop 1 | 5000.00 | 11000.00
2022-01-09 | Shop 1 | 2000.00 | 11000.00
2022-01-07 | Shop 2 | 4000.00 | 10000.00
2022-01-07 | Shop 2 | 6000.00 | 10000.00
2022-01-09 | Shop 2 | 7000.00 | 21000.00
2022-01-09 | Shop 2 | 4000.00 | 21000.00
2022-01-10 | Shop 2 | 2000.00 | 13000.00
(9 rows)
Note that the above outputs differ in the last row because RANGE mode uses the entries from the previous two days and there is no entry for 2022-01-08 for Shop 2, but GROUPS mode uses the previous two peer groups, which are 2022-01-07 and 2022-01-09.