What Are the 5 Steps of Root Cause Analysis?

The five steps of root cause analysis (RCA) are: define the problem, collect data, identify possible causes, determine the root cause, and implement a solution. This framework gives you a repeatable way to move past surface-level symptoms and fix the underlying issue so the same problem doesn’t keep coming back. While some organizations expand this into more granular phases, these five steps form the core process used across manufacturing, healthcare, IT, and general business problem-solving.

Step 1: Define the Problem

Before you can find a root cause, you need a precise description of what’s going wrong. This means stating the problem in specific, measurable terms rather than vague complaints. “Customer satisfaction is down” is too broad. “Our Net Promoter Score dropped 12 points among mid-tier accounts over the last quarter” gives you something concrete to investigate.

A good problem statement answers three questions: what is happening, where is it happening, and when did it start? It should also capture the gap between what you expected and what’s actually occurring. Getting this right matters more than it sounds. A vague problem statement sends your entire analysis in the wrong direction, and you end up solving the wrong thing. Spend time here, and involve the people closest to the issue to make sure you’re describing reality, not assumptions.

Step 2: Collect Data

Once you’ve defined the problem, gather the evidence that surrounds it. This includes quantitative data (metrics, logs, timelines, error rates) and qualitative data (interviews, observations, customer feedback). You’re building a factual picture of what happened, in what sequence, and under what conditions.

An “is/is not” comparative analysis is particularly useful here. You map out where the problem does appear and where it doesn’t, when it happens and when it doesn’t, who is affected and who isn’t. These contrasts help you spot patterns. For example, if a product defect shows up only on one production line and only during night shifts, you’ve already narrowed the search dramatically.

Talk to the people involved. In one case study of a ride-hailing company investigating a drop in daily active users in a specific city, the product team planned to interview 50 to 60 drivers who had recently deleted their accounts. Numbers tell you something changed, but conversations with people on the ground tell you why. Resist the temptation to skip this step or rush through it. Thin data leads to guesswork later.

Step 3: Identify Possible Causes

With your data in hand, brainstorm every plausible cause that could explain the problem. This is the step where analytical tools earn their keep.

The fishbone diagram (also called an Ishikawa diagram) is one of the most widely used. You place the problem at the “head” of the fish and draw branches for major cause categories, such as people, processes, equipment, materials, environment, and management. Then you brainstorm specific causes within each branch. This visual structure keeps you from fixating on one category and ignoring others.

The 5 Whys technique takes a different approach. You state the problem and ask “why?” repeatedly, with each answer becoming the subject of the next question, until you drill past symptoms into deeper causes. If a machine stopped working, you ask why. The fuse blew. Why? The bearing was overloaded. Why? Insufficient lubrication. Why? The lubrication pump wasn’t functioning. Why? The pump shaft was worn and rattling. Now you’re at a maintainable root cause instead of just replacing fuses.

A Pareto chart helps when you have multiple potential causes and want to see which ones account for the largest share of the problem. It’s a bar graph that ranks causes by frequency or impact, making it easy to identify the vital few versus the trivial many. If three out of twenty possible causes account for 80% of your defects, that’s where your attention should go.

The goal of this step is divergent thinking. Generate a wide list before you start narrowing. One of the most common RCA mistakes is using analytical tools to justify a cause the team already decided on before the analysis started. Keep an open mind and let the data guide you.

Step 4: Determine the Root Cause

Now you narrow your list. Test each candidate cause against the evidence you collected in Step 2. A true root cause meets three criteria: removing it would prevent the problem from recurring, it’s something you can actually influence or fix, and the evidence supports it rather than just intuition.

This is where teams often stumble by stopping too early. Blaming human error (“the operator made a mistake”) feels like a root cause, but it rarely is. Why did the operator make a mistake? Was the procedure unclear? Was the interface poorly designed? Was training inadequate? Human error is almost always a symptom of a systemic issue. Push past it.

You may find more than one root cause, and that’s normal. Complex problems often have multiple contributing factors that interact. The ride-hailing company investigating its user drop identified several distinct possibilities: a recent feature release that may have caused issues, software bugs, and changes to the driver incentive program. Each required a different fix, and each qualified as a legitimate root cause for different segments of the problem.

Step 5: Implement and Verify the Solution

Identifying the root cause is only valuable if you act on it. Develop specific corrective actions that address each confirmed root cause, assign ownership, and set timelines. The action plan should distinguish between immediate containment (stopping the bleeding right now) and permanent fixes (preventing the problem from ever returning).

In the ride-hailing example, the team proposed rolling back the problematic feature release with engineering, pushing bug fixes, and working with another team to restore or update the driver incentive program. Each root cause got a targeted response rather than a blanket fix.

Verification is the part most teams skip, and it’s arguably the most important. After you implement your solution, monitor the same metrics you used to define the problem in Step 1. Did the defect rate drop? Did the user numbers recover? Did the error stop recurring? Set a review date, typically 30 to 90 days out depending on the problem, and check whether your fix actually worked. If the problem persists, your root cause analysis wasn’t deep enough, and you need to revisit Steps 3 and 4.

Document the entire process: the problem statement, the data you gathered, the causes you considered, the root cause you confirmed, and the results of your fix. This record becomes a reference the next time a similar issue surfaces, and it builds your organization’s institutional knowledge over time.

When To Use Root Cause Analysis

RCA works best for recurring problems, significant one-time failures, or any situation where the obvious fix hasn’t worked. If you’ve solved the same issue three times and it keeps coming back, that’s a signal you’ve been treating symptoms. Manufacturing teams use RCA for quality defects. IT teams use it for system outages. Healthcare organizations use it for patient safety incidents. Product teams use it when key metrics drop unexpectedly.

It’s less useful for problems where the cause is already obvious and the solution is straightforward. You don’t need a fishbone diagram to figure out why the printer isn’t working when it’s unplugged. Save the full five-step process for problems where the cause genuinely isn’t clear, where the stakes justify the time investment, or where you need to build consensus across a team about what went wrong and how to fix it.