Machine downtime is any period when a piece of equipment is not producing output, whether that stoppage is scheduled for maintenance or caused by an unexpected breakdown. For manufacturers and other equipment-dependent businesses, downtime is one of the largest drains on profitability. Even small businesses can lose roughly $427 per minute of downtime, while medium and large operations face estimated costs near $9,000 per minute when you factor in lost production, labor sitting idle, and downstream delays.
Planned vs. Unplanned Downtime
Not all downtime is bad. Planned downtime happens when you deliberately take a machine offline for scheduled maintenance, software upgrades, tooling changes, or inspections. Because it’s anticipated, you can schedule it during off-peak hours, stage replacement parts in advance, and notify everyone affected. It’s a controlled cost that protects equipment longevity.
Unplanned downtime is the expensive kind. It strikes without warning when a component fails, a power outage hits, or an operator makes an error that halts the line. Because no one saw it coming, the response is reactive: technicians scramble to diagnose the problem, parts may need to be rush-ordered, and production schedules collapse. The goal of most downtime-reduction strategies is to convert as much unplanned downtime as possible into planned downtime by catching problems before they escalate.
What Causes Unplanned Downtime
Equipment failure is the most common trigger. Machines wear down over time, especially in harsh environments or when they’re pushed beyond rated capacity. Misuse, such as overloading a motor or skipping the manufacturer’s recommended operating procedures, accelerates that wear. In rarer cases, a manufacturing defect in the equipment itself is the culprit.
Human error is a close second. Operators may configure settings incorrectly, skip a step in a startup procedure, or troubleshoot a problem in a way that creates a new one. Maintenance personnel sometimes use the wrong tools or techniques during repairs, or simply forget to perform routine upkeep. Even documentation errors, like outdated procedures or unclear instructions posted at a workstation, can lead to mistakes that take a machine down.
Power outages, whether from severe weather, utility maintenance, or an accidental severed line, shut down anything that runs on electricity. And in manufacturing specifically, several less obvious causes add up: unbalanced tool assemblies that force operators to slow spindle speeds, poorly managed coolant that shortens tool life, disorganized tool cribs that leave workers waiting on parts, and leftover program stops in CNC code that idle a machine until someone walks over and pushes the start button.
Calculating the Cost
The simplest way to estimate what downtime costs your operation is a straightforward formula: multiply the number of minutes the machine was down by your cost per minute. That cost-per-minute figure should include lost production revenue, wages paid to idle workers, any expedited shipping for rush parts, scrap produced during startup, and penalties for late deliveries to customers.
For a rough benchmark, Atlassian’s framework suggests using $427 per minute for small businesses and $9,000 per minute for medium to large enterprises. Your actual figure will depend on how much revenue flows through the affected machine. A bottleneck machine that feeds every downstream process will cost far more per minute than a redundant unit with a backup sitting nearby. Tracking your real cost per minute, rather than relying on industry averages, gives you a concrete number to justify spending on preventive maintenance or backup equipment.
Metrics That Track Downtime Performance
Four metrics help you measure how often machines fail and how quickly your team recovers:
- Mean time between failures (MTBF) is the average operating time between breakdowns on a repairable machine. A rising MTBF means your equipment is running longer before something goes wrong.
- Mean time to failure (MTTF) applies to components that get replaced rather than repaired, like bearings or seals. It tells you the average lifespan of that part so you can schedule replacements before failure.
- Mean time to respond (MTTR) measures the average time from when an alert fires to when a technician begins working on the problem. A long response time often points to staffing gaps or poor notification systems.
- Mean time to resolve (MTTR) covers the full timeline from failure to the machine running normally again, including diagnosis, repair, testing, and any steps taken to prevent recurrence. This is the number that directly correlates with lost production.
Calculating any of these is simple: total up the relevant time spans across a period (a month or quarter) and divide by the number of incidents. Many facilities also track Overall Equipment Effectiveness (OEE), a composite score that combines availability, performance speed, and quality output into a single percentage. An OEE of 100% would mean the machine ran at full speed the entire scheduled time with zero defects, so the gap between your actual OEE and 100% represents the total opportunity lost to downtime, slowdowns, and scrap.
Strategies to Reduce Downtime
The most basic defense is a consistent preventive maintenance schedule. Keeping filters clean, topping off lubricant and hydraulic fluid reservoirs, maintaining correct coolant ratios, and performing regular calibration checks all extend equipment life and catch small problems before they become emergency shutdowns. Changeover time, the period spent switching a machine from one product setup to another, is often the single greatest improvement opportunity for shops running a high mix of parts. Standardizing setup procedures, pre-staging tooling, and using quick-change fixtures can cut changeover durations significantly.
Predictive maintenance goes a step further by using sensor data to detect trouble in real time. Vibration sensors, temperature probes, and pressure monitors installed on critical equipment feed data to software platforms that spot abnormal patterns, like a bearing vibration signature that’s drifting out of spec, before an actual failure occurs. Machine learning models trained on historical failure data improve the accuracy of these predictions over time. Edge computing, where data is processed on a device right at the machine rather than sent to a remote server, speeds up fault detection so maintenance teams can respond faster.
Digital twin technology creates a virtual replica of a physical machine or production line, allowing engineers to simulate scenarios and test “what if” questions without risking real equipment. If a sensor flags rising temperatures on a motor, the digital twin can model whether the machine can safely finish its current run or needs to be stopped immediately. Organizations managing large fleets of assets, from airport maintenance operations to power plant turbines, use these platforms to coordinate work across tens of thousands of components while tracking compliance and safety requirements in the same system.
Organizing Tool and Parts Access
Downtime caused by missing or mismanaged tooling is entirely preventable but surprisingly common. When operators can’t find the right insert, when someone forgets to reorder a critical cutter, or when obsolete inventory clutters the tool crib, production stalls while people search or wait for overnight shipments. Automated tool vending systems and inventory management software track what’s on hand, flag reorder points, and eliminate time spent hunting for parts. They also reduce excess inventory and procurement costs, freeing up budget for higher-impact investments.
What Downtime Means for Your Bottom Line
Beyond the direct cost of lost production, frequent unplanned downtime erodes customer satisfaction through missed delivery dates, increases overtime expenses when crews work extra shifts to catch up, and shortens equipment lifespan when machines are repeatedly stress-cycled through emergency stops and rushed restarts. Tracking your MTBF, mean time to resolve, and cost per minute gives you the data to prioritize which machines need attention first. The machines with the highest cost per minute of downtime and the lowest MTBF are where preventive and predictive maintenance investments pay off fastest.

