dev-resources.site
for different kinds of informations.
Unlocking the Secrets of SQL Window Functions
A Comprehensive Guide to SQL Window Functions: With Examples
Introduction to SQL Window Functions
Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows related to the current row without collapsing the data into a single result. Unlike aggregate functions like SUM()
or AVG()
that group rows, window functions retain the detail of each row while providing additional insight.
In this guide, we’ll explore the various types of window functions, including their syntax, and provide examples to demonstrate their practical applications.
What is a Window Function?
A window function computes a value for each row in a result set based on a window of rows. It does this using the OVER()
clause, which defines the window (or subset of rows) for each function.
The general syntax for a window function is:
SELECT
column_name,
window_function() OVER (
[PARTITION BY column_name]
[ORDER BY column_name]
[ROWS or RANGE frame_specification]
) AS alias_name
FROM table_name;
-
window_function()
: The function you apply (e.g.,ROW_NUMBER()
,SUM()
). -
OVER
: Specifies the window (partition and order). -
PARTITION BY
: Divides the result set into partitions (similar toGROUP BY
). -
ORDER BY
: Specifies the order of rows within each partition. -
ROWS or RANGE
: Defines the frame of rows to consider for the calculation.
1. Ranking Functions
Ranking functions are used to assign a rank or position to rows within a partition. These include ROW_NUMBER()
, RANK()
, DENSE_RANK()
, and NTILE()
.
1.1 ROW_NUMBER()
ROW_NUMBER()
assigns a unique sequential number to each row, starting from 1
.
Example: Assigning Row Numbers to Employees by Salary
SELECT
employee_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
- This assigns a rank to each employee based on their
salary
, with the highest salary ranked1
.
Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 1 | 70000 | 3 |
1.2 RANK()
RANK()
works similarly to ROW_NUMBER()
but assigns the same rank to identical values, creating gaps in the ranking.
Example: Ranking Employees with Possible Ties
SELECT
employee_id,
salary,
RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
- Employees with the same salary get the same rank, but the next rank skips numbers.
Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 4 | 80000 | 2 |
| 1 | 70000 | 4 |
1.3 DENSE_RANK()
DENSE_RANK()
is similar to RANK()
but does not skip rank numbers for ties.
Example: Dense Ranking Employees
SELECT
employee_id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 4 | 80000 | 2 |
| 1 | 70000 | 3 |
2. Value Functions
Value functions allow you to access data from preceding or following rows. They include LEAD()
, LAG()
, FIRST_VALUE()
, and LAST_VALUE()
.
2.1 LEAD()
LEAD()
retrieves data from a subsequent row.
Example: Get Next Month's Salary
SELECT
employee_id,
salary,
LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary
FROM employees;
- This returns the
salary
value from the next row.
Output:
| employee_id | salary | next_salary |
|-------------|--------|-------------|
| 1 | 70000 | 80000 |
| 2 | 80000 | 90000 |
| 3 | 90000 | NULL |
2.2 LAG()
LAG()
retrieves data from a preceding row.
Example: Get Previous Month's Salary
SELECT
employee_id,
salary,
LAG(salary, 1) OVER (ORDER BY salary) AS previous_salary
FROM employees;
- This returns the
salary
value from the previous row.
Output:
| employee_id | salary | previous_salary |
|-------------|--------|-----------------|
| 1 | 70000 | NULL |
| 2 | 80000 | 70000 |
| 3 | 90000 | 80000 |
3. Aggregate Window Functions
These functions perform aggregation across rows but keep the detail of each row. They include SUM()
, AVG()
, MIN()
, MAX()
, and COUNT()
.
3.1 SUM()
Calculates the running total of salaries.
Example: Running Total of Salaries
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;
Output:
| employee_id | salary | running_total |
|-------------|--------|---------------|
| 1 | 70000 | 70000 |
| 2 | 80000 | 150000 |
| 3 | 90000 | 240000 |
3.2 AVG()
Calculates the average salary over a window.
Example: Average Salary in Each Department
SELECT
department_id,
employee_id,
salary,
AVG(salary) OVER (PARTITION BY department_id) AS avg_salary
FROM employees;
- This calculates the average salary for each department while keeping individual employee details.
4. Cumulative and Distribution Functions
Cumulative functions like CUME_DIST()
and PERCENT_RANK()
provide statistical insight into data distributions.
4.1 CUME_DIST()
Calculates the cumulative distribution of a value.
Example: Cumulative Distribution of Salaries
SELECT
employee_id,
salary,
CUME_DIST() OVER (ORDER BY salary) AS cumulative_distribution
FROM employees;
Output:
| employee_id | salary | cumulative_distribution |
|-------------|--------|------------------------|
| 1 | 70000 | 0.33 |
| 2 | 80000 | 0.67 |
| 3 | 90000 | 1.00 |
4.2 PERCENT_RANK()
Calculates the relative rank of a row within a partition.
Example: Percent Rank of Salaries
SELECT
employee_id,
salary,
PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;
5. Window Frames with ROWS
and RANGE
ROWS
and RANGE
clauses define the specific range of rows to consider for calculations.
Example: Rolling Sum of Last 3 Salaries
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_sum
FROM employees;
- This calculates the sum of the current row and the two preceding rows.
Conclusion
Window functions in SQL offer a rich set of capabilities for performing complex calculations, ranking, and aggregation over partitions of data while preserving the detail of each row. They are invaluable for analytical queries where understanding relationships between rows or computing running totals is required.
By mastering these functions—ROW_NUMBER()
, RANK()
, LEAD()
, SUM()
, CUME_DIST()
, and others—you can unlock powerful data analysis and reporting capabilities directly within SQL.
Whether you're dealing with financial data, customer transactions, or time series, window functions provide the flexibility and power needed to extract deeper insights.
Looking to supercharge your team with a seasoned Data Engineer? Let’s connect on LinkedIn or drop me a message — I’d love to explore how I can help drive your data success! Happy querying!
Featured ones: