Logo

dev-resources.site

for different kinds of informations.

Unlocking the Secrets of SQL Window Functions

Published at
10/24/2024
Categories
sqlwindowfunctions
dataanalysis
datastorytelling
sqlforbeginners
Author
Chetan Gupta
Unlocking the Secrets of SQL Window Functions

A Comprehensive Guide to SQL Window Functions: With Examples

Introduction to SQL Window Functions

Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows related to the current row without collapsing the data into a single result. Unlike aggregate functions like SUM() or AVG() that group rows, window functions retain the detail of each row while providing additional insight.

In this guide, we’ll explore the various types of window functions, including their syntax, and provide examples to demonstrate their practical applications.

What is a Window Function?

A window function computes a value for each row in a result set based on a window of rows. It does this using the OVER() clause, which defines the window (or subset of rows) for each function.

The general syntax for a window function is:

SELECT 
    column_name, 
    window_function() OVER (
        [PARTITION BY column_name] 
        [ORDER BY column_name]
        [ROWS or RANGE frame_specification]
    ) AS alias_name
FROM table_name;
  • window_function(): The function you apply (e.g., ROW_NUMBER(), SUM()).
  • OVER: Specifies the window (partition and order).
  • PARTITION BY: Divides the result set into partitions (similar to GROUP BY).
  • ORDER BY: Specifies the order of rows within each partition.
  • ROWS or RANGE: Defines the frame of rows to consider for the calculation.

1. Ranking Functions

Ranking functions are used to assign a rank or position to rows within a partition. These include ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().

1.1 ROW_NUMBER()

ROW_NUMBER() assigns a unique sequential number to each row, starting from 1.

Example: Assigning Row Numbers to Employees by Salary

SELECT 
    employee_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
  • This assigns a rank to each employee based on their salary, with the highest salary ranked 1.

Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 1 | 70000 | 3 |

1.2 RANK()

RANK() works similarly to ROW_NUMBER() but assigns the same rank to identical values, creating gaps in the ranking.

Example: Ranking Employees with Possible Ties

SELECT 
    employee_id,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
  • Employees with the same salary get the same rank, but the next rank skips numbers.

Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 4 | 80000 | 2 |
| 1 | 70000 | 4 |

1.3 DENSE_RANK()

DENSE_RANK() is similar to RANK() but does not skip rank numbers for ties.

Example: Dense Ranking Employees

SELECT 
    employee_id,
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;

Output:
| employee_id | salary | salary_rank |
|-------------|--------|-------------|
| 3 | 90000 | 1 |
| 2 | 80000 | 2 |
| 4 | 80000 | 2 |
| 1 | 70000 | 3 |

2. Value Functions

Value functions allow you to access data from preceding or following rows. They include LEAD(), LAG(), FIRST_VALUE(), and LAST_VALUE().

2.1 LEAD()

LEAD() retrieves data from a subsequent row.

Example: Get Next Month's Salary

SELECT 
    employee_id,
    salary,
    LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary
FROM employees;
  • This returns the salary value from the next row.

Output:
| employee_id | salary | next_salary |
|-------------|--------|-------------|
| 1 | 70000 | 80000 |
| 2 | 80000 | 90000 |
| 3 | 90000 | NULL |

2.2 LAG()

LAG() retrieves data from a preceding row.

Example: Get Previous Month's Salary

SELECT 
    employee_id,
    salary,
    LAG(salary, 1) OVER (ORDER BY salary) AS previous_salary
FROM employees;
  • This returns the salary value from the previous row.

Output:
| employee_id | salary | previous_salary |
|-------------|--------|-----------------|
| 1 | 70000 | NULL |
| 2 | 80000 | 70000 |
| 3 | 90000 | 80000 |

3. Aggregate Window Functions

These functions perform aggregation across rows but keep the detail of each row. They include SUM(), AVG(), MIN(), MAX(), and COUNT().

3.1 SUM()

Calculates the running total of salaries.

Example: Running Total of Salaries

SELECT 
    employee_id,
    salary,
    SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;

Output:
| employee_id | salary | running_total |
|-------------|--------|---------------|
| 1 | 70000 | 70000 |
| 2 | 80000 | 150000 |
| 3 | 90000 | 240000 |

3.2 AVG()

Calculates the average salary over a window.

Example: Average Salary in Each Department

SELECT 
    department_id,
    employee_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS avg_salary
FROM employees;
  • This calculates the average salary for each department while keeping individual employee details.

4. Cumulative and Distribution Functions

Cumulative functions like CUME_DIST() and PERCENT_RANK() provide statistical insight into data distributions.

4.1 CUME_DIST()

Calculates the cumulative distribution of a value.

Example: Cumulative Distribution of Salaries

SELECT 
    employee_id,
    salary,
    CUME_DIST() OVER (ORDER BY salary) AS cumulative_distribution
FROM employees;

Output:
| employee_id | salary | cumulative_distribution |
|-------------|--------|------------------------|
| 1 | 70000 | 0.33 |
| 2 | 80000 | 0.67 |
| 3 | 90000 | 1.00 |

4.2 PERCENT_RANK()

Calculates the relative rank of a row within a partition.

Example: Percent Rank of Salaries

SELECT 
    employee_id,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;

5. Window Frames with ROWS and RANGE

ROWS and RANGE clauses define the specific range of rows to consider for calculations.

Example: Rolling Sum of Last 3 Salaries

SELECT 
    employee_id,
    salary,
    SUM(salary) OVER (ORDER BY salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_sum
FROM employees;
  • This calculates the sum of the current row and the two preceding rows.

Conclusion

Window functions in SQL offer a rich set of capabilities for performing complex calculations, ranking, and aggregation over partitions of data while preserving the detail of each row. They are invaluable for analytical queries where understanding relationships between rows or computing running totals is required.

By mastering these functions—ROW_NUMBER(), RANK(), LEAD(), SUM(), CUME_DIST(), and others—you can unlock powerful data analysis and reporting capabilities directly within SQL.

Whether you're dealing with financial data, customer transactions, or time series, window functions provide the flexibility and power needed to extract deeper insights.
Looking to supercharge your team with a seasoned Data Engineer? Let’s connect on LinkedIn or drop me a message — I’d love to explore how I can help drive your data success! Happy querying!

Featured ones: