SQL Window Functions are a powerful feature in SQL that allow for advanced data analysis and manipulation. Unlike traditional aggregate functions, window functions perform calculations across a set of table rows that are somehow related to the current row. This makes them invaluable for tasks such as running totals, moving averages, and ranking operations, which are essential in data analytics and reporting.
This article aims to prepare you for technical interviews by providing a curated list of questions and answers focused on SQL Window Functions. By understanding these concepts and practicing the provided examples, you’ll be better equipped to demonstrate your proficiency and problem-solving skills in SQL during your interview.
SQL Window Function Interview Questions and Answers
1. Describe the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() functions.
The ROW_NUMBER(), RANK(), and DENSE_RANK() functions in SQL assign numbers to rows within a partition of a result set, differing in their handling of ties.
- ROW_NUMBER(): Assigns a unique number to each row, starting at 1 for the first row in each partition. Ties are handled arbitrarily.
- RANK(): Assigns the same rank to tied rows, with the next rank incremented by the number of tied rows.
- DENSE_RANK(): Similar to RANK(), but the next rank is incremented by 1 regardless of the number of tied rows.
Example:
SELECT employee_id, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;
In this example, if two employees have the same salary, ROW_NUMBER() assigns unique numbers, while RANK() and DENSE_RANK() handle ties differently.
2. How would you use the PARTITION BY clause in a window function? Provide an example scenario.
The PARTITION BY clause divides the result set into partitions for calculations. For instance, to calculate total sales for each salesperson within their regions:
SELECT salesperson, region, sales_amount, SUM(sales_amount) OVER (PARTITION BY region) AS total_sales_by_region FROM sales;
Here, the data is partitioned by region, and the SUM function calculates total sales within each region.
3. What is the purpose of the ORDER BY clause in a window function?
The ORDER BY clause in a window function specifies the order of rows within each partition, essential for functions like ROW_NUMBER(), RANK(), and DENSE_RANK().
Example:
SELECT employee_id, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank FROM employees;
Here, the ORDER BY clause orders rows by salary in descending order, affecting the rank assignment.
4. Explain the difference between LAG() and LEAD() functions.
LAG() and LEAD() access data from a previous or subsequent row in the result set, useful for comparing values in a sequence.
- LAG(): Accesses a row at a specified offset before the current row.
- LEAD(): Accesses a row at a specified offset after the current row.
Example:
SELECT employee_id, salary, LAG(salary, 1) OVER (ORDER BY employee_id) AS previous_salary, LEAD(salary, 1) OVER (ORDER BY employee_id) AS next_salary FROM employees;
LAG() retrieves the previous row’s salary, while LEAD() retrieves the next row’s salary.
5. Write a query to calculate the difference in sales between the current month and the previous month for each product.
To calculate the difference in sales between the current and previous month for each product:
SELECT product_id, month, sales, sales - LAG(sales, 1) OVER (PARTITION BY product_id ORDER BY month) AS sales_difference FROM sales_data;
LAG() accesses the previous month’s sales, with calculations done separately for each product.
6. Explain the concept of frame specification in window functions.
Frame specification in window functions determines the set of rows considered for calculations, using clauses like ROWS
or RANGE
.
Example:
SELECT employee_id, salary, SUM(salary) OVER ( PARTITION BY department_id ORDER BY employee_id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) AS moving_sum FROM employees;
Here, the frame includes the current row plus one preceding and one following row.
7. Write a query to find the top 3 highest-paid employees in each department.
To find the top 3 highest-paid employees in each department:
SELECT department, employee_name, salary FROM ( SELECT department, employee_name, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn FROM employees ) subquery WHERE rn <= 3;
ROW_NUMBER() ranks employees by salary within each department, and the outer query filters the top 3.
8. Write a query to calculate a moving average of sales over the last 3 months for each product.
To calculate a moving average of sales over the last 3 months for each product:
SELECT product_id, sale_date, sales, AVG(sales) OVER ( PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_avg_sales FROM sales_table;
9. How would you handle NULL values in window functions? Provide an example.
Handling NULL values in window functions can be done using COALESCE
, IFNULL
, or CASE
statements to replace NULLs with specified values.
Example:
SELECT employee_id, department, salary, RANK() OVER ( PARTITION BY department ORDER BY COALESCE(salary, 0) DESC ) as rank FROM employees;
COALESCE replaces NULL salary values with 0 before applying RANK().
10. Write a query to calculate the cumulative distribution of salaries within each department.
To calculate the cumulative distribution of salaries within each department:
SELECT department_id, employee_id, salary, CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cumulative_distribution FROM employees;
CUME_DIST() calculates the cumulative distribution of salary within each department.