50 SQL Interview Questions and Answers
Prepare for your next interview with our comprehensive guide to SQL interview questions, designed to enhance your data management skills.
Prepare for your next interview with our comprehensive guide to SQL interview questions, designed to enhance your data management skills.
SQL (Structured Query Language) is the backbone of relational database management and is essential for data manipulation and retrieval. It is widely used across various industries for tasks such as data analysis, reporting, and backend development. SQL’s ability to handle large datasets efficiently makes it a critical skill for roles involving data management and business intelligence.
This article offers a curated selection of SQL interview questions designed to test your understanding and proficiency. By working through these questions, you will gain the confidence and knowledge needed to demonstrate your SQL expertise in any technical interview setting.
To select all columns from a table named ’employees’ where the ‘department’ is ‘Sales’, use:
SELECT * FROM employees WHERE department = 'Sales';
This query retrieves all columns from the ’employees’ table where the ‘department’ is ‘Sales’.
In SQL, joins combine rows from two or more tables based on a related column. The main types are:
Example:
SELECT employees.name, departments.department_name FROM employees INNER JOIN departments ON employees.department_id = departments.id;
To count employees in each department:
SELECT department_id, COUNT(employee_id) AS employee_count FROM employees GROUP BY department_id;
To find departments with more than 10 employees:
SELECT d.department_name, COUNT(e.employee_id) AS employee_count FROM departments d JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name HAVING COUNT(e.employee_id) > 10;
To find employees with the highest salary in their department:
SELECT employee_id, department_id, salary FROM ( SELECT employee_id, department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rn FROM employees ) subquery WHERE rn = 1;
To join ’employees’ and ‘departments’ on ‘department_id’:
SELECT employees.employee_id, employees.employee_name, departments.department_name FROM employees JOIN departments ON employees.department_id = departments.department_id;
To combine results from two queries using UNION:
SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2;
To update the salary of an employee with a specific ’employee_id’:
UPDATE employees SET salary = 75000 WHERE employee_id = 12345;
A foreign key is a field in one table that uniquely identifies a row of another table, maintaining referential integrity between the two tables.
Example:
CREATE TABLE Customers ( CustomerID int PRIMARY KEY, CustomerName varchar(255) ); CREATE TABLE Orders ( OrderID int PRIMARY KEY, OrderDate date, CustomerID int, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) );
Normalization organizes data to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related ones and defining relationships between them.
There are several normal forms, each with specific rules:
Normalization reduces redundancy, enhances data integrity, and improves query performance.
ACID properties ensure reliable database transactions:
Stored procedures are precompiled SQL statements stored under a name and processed as a unit. They perform repetitive tasks, enforce business rules, and improve performance.
Advantages include:
Example:
CREATE PROCEDURE GetEmployeeDetails @EmployeeID INT AS BEGIN SELECT FirstName, LastName, Department FROM Employees WHERE EmployeeID = @EmployeeID; END; -- To execute the stored procedure EXEC GetEmployeeDetails @EmployeeID = 1;
Views are virtual tables defined by a SQL query. They do not store data physically but retrieve it dynamically from underlying tables.
Benefits include:
Differences from tables:
Triggers are stored procedures that execute automatically in response to certain events on a table, such as insertions, updates, or deletions.
Example:
CREATE TRIGGER update_timestamp BEFORE UPDATE ON employees FOR EACH ROW SET NEW.last_modified = NOW();
Constraints specify rules for data in a table, enforcing data integrity. Common types include:
Example:
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL, Email VARCHAR(100) UNIQUE, Salary DECIMAL(10, 2) CHECK (Salary > 0), DepartmentID INT, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) );
SQL databases use structured query language and are table-based, ideal for applications requiring multi-row transactions. NoSQL databases are non-relational, storing data in various formats, suitable for real-time analytics and handling large volumes of unstructured data.
Key Differences:
Use Cases:
A Database Administrator (DBA) manages databases, ensuring availability, integrity, and security. Key responsibilities include:
A data warehouse is a centralized repository for storing large volumes of data from multiple sources, optimized for query and analysis. It supports business intelligence activities, providing a single source of truth for accurate reporting and analysis.
Key benefits include:
ETL involves three main steps:
1. Extract: Retrieving data from various sources.
2. Transform: Cleaning and formatting data for analysis.
3. Load: Loading transformed data into a target repository.
ETL ensures data consistency, quality, and efficiency in data integration.
OLAP and OLTP systems serve different purposes:
OLAP systems are designed for complex queries and data analysis, supporting decision-making processes. They handle large volumes of data and enable multi-dimensional analysis.
Key characteristics of OLAP:
OLTP systems manage transactional data, optimized for write-heavy operations. They support day-to-day transactional processes and ensure data integrity through ACID properties.
Key characteristics of OLTP:
Use cases:
Database security protects sensitive information from unauthorized access and breaches. Measures include:
Database replication copies and maintains database objects in multiple instances, ensuring data consistency and availability. Types include:
Sharding splits a large database into smaller pieces called shards, distributing the load across multiple servers. Partitioning divides a single database into smaller pieces called partitions, improving query performance and manageability.
Common database indexing strategies include:
Database locking mechanisms maintain data integrity and consistency. Locks prevent concurrent transactions from interfering but can impact performance and concurrency.
Types of locks:
Strategies to mitigate issues:
Database tuning and optimization ensure efficient performance. Techniques include:
Metadata describes data structure, organization, and constraints. It aids in data management, integrity, query optimization, discovery, and documentation.
Data quality impacts the reliability of insights derived from data. Practices to ensure quality include:
Database migration involves transferring data from one database to another. Strategies include:
Big data technologies handle large volumes, high velocity, and a variety of data types. They offer scalability, flexibility, performance, cost-effectiveness, and real-time processing.
Cloud-based database services offer scalability, cost-efficiency, high availability, automatic updates, global accessibility, and security.
GDPR and data privacy regulations impact database management by imposing requirements on data storage, access, anonymization, subject rights, breach notification, and compliance documentation.
Tools for database auditing and monitoring include:
Machine learning in database management and analytics includes:
Real-time data processing requires databases to handle continuous input and provide immediate outputs. Implications for design include scalability, low latency, consistency, fault tolerance, data partitioning, indexing, and concurrency control.
Data lineage is important for data quality, compliance, impact analysis, governance, and troubleshooting.
Graph databases use graph structures to represent and store data, allowing efficient querying and analysis of complex relationships. Use cases include social networks, recommendation engines, fraud detection, network operations, and knowledge graphs.
Blockchain technology in database management offers immutable records, decentralization, transparency, smart contracts, and enhanced security.
Multi-model databases handle various data models within a single system, offering flexibility, reduced complexity, cost efficiency, consistency, and scalability.
Artificial intelligence impacts database management by automating tasks, optimizing performance, providing predictive analytics, detecting anomalies, enabling natural language processing, and assisting in data cleaning and integration.
Database virtualization abstracts the database layer, allowing multiple virtual databases on a single server. Benefits include resource optimization, cost efficiency, scalability, improved testing, disaster recovery, and isolation.
Data lakes store vast amounts of raw data in its native format, supporting big data analytics and machine learning. Benefits include scalability, flexibility, cost-effectiveness, and advanced analytics.
Serverless databases are managed by cloud providers, offering automatic scaling, cost efficiency, reduced operational overhead, high availability, and a focus on development.
Data cataloging improves data discovery, governance, collaboration, efficiency, and democratization.
Future trends in database technology include:
A subquery is a query nested inside another query. Subqueries can be correlated or non-correlated.
A non-correlated subquery is independent of the outer query and can be executed on its own.
Example:
SELECT employee_id, employee_name FROM employees WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');
A correlated subquery depends on the outer query for its values and is executed once for each row processed by the outer query.
Example:
SELECT e1.employee_id, e1.employee_name FROM employees e1 WHERE EXISTS ( SELECT 1 FROM employees e2 WHERE e2.manager_id = e1.employee_id );
Window functions perform calculations across a set of table rows related to the current row, useful for tasks like ranking and running totals.
Example of a running total:
SELECT employee_id, salary, SUM(salary) OVER (ORDER BY employee_id) AS running_total FROM employees;
Example of ranking:
SELECT employee_id, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank FROM employees;
Optimizing a slow-running query involves strategies like indexing, query structure improvement, database design, execution plan analysis, and caching.
Example:
-- Without Index SELECT * FROM orders WHERE customer_id = 123; -- With Index CREATE INDEX idx_customer_id ON orders(customer_id); SELECT * FROM orders WHERE customer_id = 123;
The DELETE, TRUNCATE, and DROP commands remove data from a database but differ in purpose and behavior.
DELETE: Removes specific rows based on a condition. It is a DML command and can be rolled back.
TRUNCATE: Removes all rows from a table. It is a DDL command and cannot be rolled back.
DROP: Removes an entire table or database. It is a DDL command and cannot be rolled back.
Common Table Expressions (CTEs) simplify complex queries by breaking them into smaller parts. Defined using the WITH keyword, CTEs can be recursive or non-recursive.
Example of a non-recursive CTE:
WITH Sales_CTE AS ( SELECT SalesPersonID, SUM(SalesAmount) AS TotalSales FROM Sales GROUP BY SalesPersonID ) SELECT SalesPersonID, TotalSales FROM Sales_CTE WHERE TotalSales > 10000;
Example of a recursive CTE:
WITH EmployeeHierarchy AS ( SELECT EmployeeID, ManagerID, 0 AS Level FROM Employees WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, eh.Level + 1 FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID ) SELECT EmployeeID, ManagerID, Level FROM EmployeeHierarchy;