15 SQL Join Interview Questions and Answers
Prepare for your interview with this guide on SQL joins. Enhance your skills in combining data from multiple tables with practical examples and answers.
Prepare for your interview with this guide on SQL joins. Enhance your skills in combining data from multiple tables with practical examples and answers.
SQL joins are fundamental for querying relational databases, allowing you to combine rows from two or more tables based on related columns. Mastery of SQL joins is essential for data analysis, reporting, and backend development, making it a critical skill for many technical roles. Understanding the various types of joins—inner, outer, left, right, and cross—enables you to efficiently retrieve and manipulate data, providing deeper insights and more robust applications.
This article offers a curated selection of SQL join questions and answers to help you prepare for your upcoming interview. By familiarizing yourself with these scenarios, you will enhance your ability to tackle complex data queries and demonstrate your proficiency in SQL joins, setting you apart as a knowledgeable candidate.
An INNER JOIN combines rows from two or more tables based on a related column, returning only the rows with matches in both tables. Here’s an example using the employees and departments tables:
SELECT employees.employee_id, employees.employee_name, departments.department_name FROM employees INNER JOIN departments ON employees.department_id = departments.department_id;
A LEFT JOIN returns all records from the left table and the matched records from the right table. If there is no match, the result is NULL on the right side. Here’s an example with orders and customers:
SELECT orders.order_id, orders.order_date, customers.customer_name FROM orders LEFT JOIN customers ON orders.customer_id = customers.customer_id;
If there are no matching rows in customers, customer_name will be NULL.
A RIGHT JOIN returns all records from the right table and the matched records from the left table. If there is no match, the result is NULL on the left side. Here’s an example with products and suppliers:
SELECT products.product_id, products.product_name, suppliers.supplier_id, suppliers.supplier_name FROM products RIGHT JOIN suppliers ON products.supplier_id = suppliers.supplier_id;
If a supplier has no associated products, product_id and product_name will be NULL.
A FULL OUTER JOIN returns all records when there is a match in either table. If there is no match, the result is NULL on the side without a match. Here’s an example with students and courses:
SELECT students.student_id, students.student_name, courses.course_id, courses.course_name FROM students FULL OUTER JOIN courses ON students.student_id = courses.student_id;
The result includes all records from both tables, with NULLs where there are no matches.
A self-join compares rows within the same table. To find pairs of employees in the same department:
SELECT e1.employee_id AS Employee1, e2.employee_id AS Employee2, e1.department_id FROM employees e1 JOIN employees e2 ON e1.department_id = e2.department_id WHERE e1.employee_id < e2.employee_id;
This query uses table aliases and ensures each pair is listed only once.
To join users and transactions while handling NULL values, use a LEFT JOIN and COALESCE:
SELECT users.user_id, users.user_name, transactions.transaction_id, COALESCE(transactions.amount, 0) AS amount FROM users LEFT JOIN transactions ON users.user_id = transactions.user_id;
COALESCE replaces NULLs in transactions.amount with 0.
To optimize a join between large tables, ensure columns used in the join condition are indexed. Here’s an example with sales and regions:
CREATE INDEX idx_sales_region_id ON sales(region_id); CREATE INDEX idx_regions_region_id ON regions(region_id); SELECT s.*, r.* FROM sales s JOIN regions r ON s.region_id = r.region_id WHERE s.sale_date >= '2023-01-01';
Indexes on region_id improve performance, and a WHERE clause reduces processed data.
A subquery within a join can filter results. Here’s an example with employees:
SELECT e.employee_id, e.employee_name, d.department_name FROM employees e JOIN ( SELECT department_id FROM departments WHERE location = 'New York' ) d_sub ON e.department_id = d_sub.department_id JOIN departments d ON e.department_id = d.department_id;
The subquery filters departments by location before joining with employees.
To calculate running totals using a window function and a join:
SELECT s1.sales_id, s1.sales_date, s1.amount, SUM(s2.amount) OVER (PARTITION BY s1.sales_id ORDER BY s1.sales_date) AS running_total FROM sales s1 JOIN sales s2 ON s1.sales_id = s2.sales_id AND s1.sales_date >= s2.sales_date ORDER BY s1.sales_date;
The SUM window function calculates the running total of the amount column.
To perform a recursive join on categories to retrieve all subcategories:
WITH RECURSIVE subcategories AS ( SELECT id, name, parent_id FROM categories WHERE id = ? -- Replace ? with the given category ID UNION ALL SELECT c.id, c.name, c.parent_id FROM categories c INNER JOIN subcategories s ON s.id = c.parent_id ) SELECT * FROM subcategories;
This CTE recursively retrieves all subcategories of a given category.
To join projects and tasks using multiple columns:
SELECT p.project_id, p.project_name, t.task_id, t.task_name FROM projects p JOIN tasks t ON p.project_id = t.project_id AND p.manager_id = t.manager_id;
The join uses both project_id and manager_id as conditions.
To handle duplicate rows in joins, use DISTINCT, GROUP BY, or window functions like ROW_NUMBER():
WITH RankedData AS ( SELECT a.*, b.*, ROW_NUMBER() OVER (PARTITION BY a.id ORDER BY b.date DESC) as row_num FROM TableA a JOIN TableB b ON a.id = b.a_id ) SELECT * FROM RankedData WHERE row_num = 1;
ROW_NUMBER() helps filter out duplicates by assigning a unique number to each row.
Indexes improve join performance by allowing quick row location. For example, indexing department_id in employees and departments speeds up joins:
CREATE INDEX idx_employees_department_id ON employees(department_id); CREATE INDEX idx_departments_department_id ON departments(department_id); SELECT e.name, d.department_name FROM employees e JOIN departments d ON e.department_id = d.department_id;
Indexes allow the database engine to efficiently find matching rows.
To achieve complex filtering using joins within subqueries:
SELECT e.employee_id, e.employee_name, d.department_name FROM employees e JOIN ( SELECT department_id, department_name FROM departments WHERE location_id = 100 ) d ON e.department_id = d.department_id WHERE e.salary > 50000;
The subquery filters departments by location, and the main query applies additional conditions.
Common mistakes in join operations include misunderstanding join types, creating Cartesian products, and using ambiguous column names. Ensure proper join conditions and indexing to avoid inefficiencies.
Example of a common mistake and correction:
-- Common mistake: Cartesian product due to missing join condition SELECT * FROM employees, departments; -- Correct approach: Using INNER JOIN with proper join condition SELECT * FROM employees e INNER JOIN departments d ON e.department_id = d.department_id;