“Over the past twenty years, maintenance has changed, perhaps more so than any other management discipline. The changes are due to a huge increase in the complexity, number and variety of physical assets which must be maintained throughout the world….”

“RCM provides a framework which enables us to respond to these modern day challenges, quickly and simply, with a comprehensive, zero-based review of the maintenance requirements of each, asset in its operating context”..

John Moubray, Maintenance Guru

RCM focuses efforts on avoiding, eliminating and minimising the consequences of failure, and not simply maintaining an asset for the sake of carrying out what OEM’s say. And if there are only minor consequences or risks, then simply correct when failure occurs. It simply makes economic sense.

“A process used to determine what must be done to ensure that any physical asset continues to do what its users want it to do in its present operating context”

…...therefore we could have three different maintenance strategies for an identical asset depending on how it is being used:

• Periodic overhaul for a continuous running pump
• On-failure maintenance for a pump with stand-by available
• Periodic function test for a stand-by pump

In this way we maintain an asset based on its use and need that is both economical and effective.


Since the 1940’s, the evolution of maintenance can be traced through three generations.

Development of RCM - Airline Industry:

The airline industry must be recognised as the birthplace of RCM. Historically, maintenance was based on the principle that every asset had a ‘right’ age for overhaul to ensure safety and reliability; irrespective of its use. During the 1960’s a task force, including the FAA and the airlines was set to investigate the capabilities of preventive maintenance. The discoveries were striking, in that;

• Scheduled maintenance had little effect on the overall reliability, unless there was a dominant failure
• Many assets have no effective scheduled maintenance need.

This contradicted conventional thinking, in that:

• The asset is more likely to fail, the older it gets
• The more maintenance the better the asset is protected

The first major improvement came from the task force’s first report, MSG-1, [Maintenance Steering Group-1] that defined the basis of what is now known as Reliability-centred Maintenance. A subsequent report MSG-2 was prepared two years later incorporating a decision-diagram and this formed the basis of modern day RCM.

This approach had amazing benefits in airline safety and cost:

• Accidents reduced from 60 to 2 accidents per million take-offs
• Maintenance costs reduced from 4million to 66,000 man hrs of structural maintenance for the first 20,000 flying hrs of the Boeing 747 / DC8 aircraft

Developing RCM within your organization Precursor: As with all new initiatives, the basics of change management need to be in place prior to and during the process of introducing RCM. Please refer to the relevant section on this website. At a fundamental level, the introduction of RCM within your organization is best carried out by answering seven questions about the asset and/ or its system under review. They are as follows:


Precursor: As with all new initiatives, the basics of change management need to be in place prior to and during the process of introducing RCM. Please refer to the relevant section on this website.

At a fundamental level, the introduction of RCM within your organization is best carried out by answering seven questions about the asset and/ or its system under review. They are as follows:

Functions and Performance Standards [F]: Question 1

The first thing to do is ensure that the physical asset continues to do whatever its users want it to do in its present operating context; by:

• Determining what its users want it to do.
• Ensuring that it is capable of doing what its users want.

This is why the first step in the RCM process defines the functions of each asset in its operating context, together with the associated desired performance standards. There are two types of functions;

• Primary functions - why the asset was installed in the first place [speed, output, carrying or storage capacity, product quality, customer service etc].
• Secondary functions - which recognize that every asset is expected to do more than simply fulfill its primary functions [safety, control, containment, comfort, structural integrity, economy, protection, efficiency of operation, compliance with environmental regulations, appearance etc]

Functional Failures [FF]: Question 2

The only factor stopping any asset performing to the standard required by its users is some kind of failure. This means that maintenance achieves its objectives by adopting an effective strategy to failure avoidance. However, firstly we need to identify what failures can occur.

In the world of RCM, failed states are known as functional failures because they occur when an asset is unable to fulfill a function to a standard of performance which is acceptable to the user.

Failure Modes [FM]: Question 3

Once functional failures have been identified, the next step is to identify all the events which are reasonably likely to cause each failed state.

These events are known as failure modes. "Reasonably likely" causes include;

• those which have occurred on the same or similar equipment operating in the same context
• failures which are currently being prevented by existing maintenance procedures
• failures which have not happened yet but which are considered to be real possibilities

And these will include;
• failures caused by deterioration or normal wear and tear.
• failures caused by human errors (on the part of operators and maintainers)
• design flaws

Failure Effects [FE]: Question 4

The fourth step is to listing failure effects that describe what happens when each failure mode occurs. This should include all the information needed to support the evaluation of the consequences of the failure, such as:

• what is the evidence that the failure has occurred
• in what ways does it poses a threat to safety or the environment
• in what ways does it affect production or operations
• what physical damage is caused by the failure
• what must be done to repair the failure.

The process of identifying functions, failures, causes and their effects generate surprising and often very exciting opportunities for improving performance [reducing cost], safety as well as eliminating waste.

Failure Consequences [FC]: Question 5

The strength of RCM is that it recognizes that the consequences of failures are far more important than their technical characteristics.

In fact, it recognizes that the only reason for doing any kind of proactive maintenance is not to avoid failures per se, but to avoid or at least to reduce the consequences of failure.

The RCM process classifies these consequences into the following groups

Hidden Function – failure will NOT become evident to operators under normal circumstances if it occurs on its own

Evident Function – failure will become evident to operators under normal circumstances with four types of consequences:

• Safety: a failure mode has safety consequences if it causes a loss of function or other damage which could injure or kill someone.
• Environmental: a failure mode has environmental consequences if it causes a loss of function or other damage which could lead to the breach of any known environmental standard or regulation.
• Operational: a failure has operational consequences if it has a direct effect on operational capability
• Non-operational: any evident failure not included above.

RCM says the assets’ failure consequences need to be described for every failure mode and states in what way the failure matters in the areas of;

Consequence Risk Assessment:

It is important to be able to assess the strength and importance of the consequences of failure. This is done in an easy analytical way by using an RPN [risk priority number]

This attaches a numerical value to the probability, relative to the size and priority of each significant failure. These are categorised as; Occurrence, Severity and Detection and are rated on a scale 1 to 10.

RCM aims to reduce the RPN as part of its objective.

This is an example of what the RCM team could use to identify priorities alongside consequence, RPN and PCM [planned condition monitoring] maintenance policy

Proactive Tasks: Question 6

Many people still believe that the best way to optimize plant availability is proactive maintenance on a routine basis. Second Generation wisdom suggested that this should consist of overhauls of component replacements at fixed intervals.

This diagram shows the fixed interval view of failure.

The assumption is that most assets function reliably for a period of time, and then wear out. Classical thinking suggests that extensive records about failure will enable us to determine this life and so make plans to take preventive action shortly before the item is due to fail in future.

But this model is only true for certain types of simple equipment, and some complex items with dominant failure modes. In particular, wear-out characteristics are often found where equipment comes into direct contact with the product. Age-related failures are also often associated with fatigue, corrosion, abrasion and evaporation.

However, equipment in general is far more complex than it was twenty years ago. This has led to startling changes in the patterns of failure, as shown in diagrams above.

…approx 82% of failure conform to failure patterns E and F – not age related!

Studies done on civil aircraft showed that 4% of the items conformed to pattern A, 2% to B, 5% to C, 7% to D, 14% to E and no fewer than 68% to pattern F

Pattern A is the well-known bathtub curve. It begins with a high incidence of failure (known as infant mortality) followed by a constant or gradually increasing conditional probability of failure, then by a wear-out zone.

Pattern B shows constant or slowly increasing conditional probability of failure, ending in a wear-out zone (the same as the above diagram).

Pattern C shows slowly increasing conditional probability of failure, but there is no identifiable wear-out age.

Pattern D shows low conditional probability of failure when the item is new or just out of the shop, then a rapid increase to a constant level

Pattern E shows a constant conditional probability of failure at all ages (random failure).

Pattern F starts with high infant mortality, which drops eventually to a constant or very slowly increasing conditional probability of failure.

Proactive Maintenance: Ideas and Guidance

Use the consequences and risk facts to define the type and frequency of proactive maintenance tasks that must be carried out before failure occurs. These include:

• Predictive maintenance: i.e. on-condition based maintenance – important to know what happens once a failure has started to occur [use the P-F Curve]
• Preventative maintenance: i.e. scheduled repair, replacement and overhaul – the relationship between age and likelihood of failure

Although many failure modes are not age related, most give some warning of their occurrence. Then we could use on-condition monitoring to decide when to take avoiding action.

Predictive Maintenance: Ideas and Guidance

Always choose predictive before preventive maintenance because it:

• Can be done on-line without affecting operations
• Identifies corrective action before preventive work starts
• Enables the asset to realize most of its working life

This means carrying out on-condition based monitoring [so called because the equipment is left in service after inspection on-the-condition it will continue to perform. This is technically feasible if

• It is possible to define a clear potential failure condition using the P-F Curve
• The P-F interval is reasonably consistent
• It is practicable to monitor the equipment at intervals lees that the P-F interval.

On-Condition monitoring: Ideas and Guidance

There are several hundred techniques available; each designed to detect potential failures such as leaks, vibration, temperature changes, particles, etc.

On-condition monitoring can be used every minute to several months depending on the P-F curve interval. Two major categories are known as;

• Primary effects monitoring – readings taken by the operator or maintenance engineer on; speed, temperature, flow rates, pressures, power current and the actual readings compared with the P-F interval
• Human effects monitoring – where assessments are made based on the senses; look, listen, feel, smell.

Scheduled Repair / Replacement [incl. overhauls / turnarounds]

This work is done at pre-set intervals to prevent age-related failures.

It is used to restore the initial capability at or before a specified age, regardless of its apparent condition at the time. This must be applied to failure modes conforming to patterns A, B and C. This is technically feasible if there is;

• an identifiable age at which the equipment shows a rapid increase in the conditional probability of failure, and
• it restores the original resistance to failure.

Failure modes conforming to patterns; D, E and F require caution as there is no relationship between reliability and age. Scheduled maintenance can actually increase overall failure rates by introducing infant mortality into otherwise stable systems. It’s the operator who says; “Its taken us till Wednesday to get this machine going again”

Prevention Tasks – Understanding Age and Deterioration

For any asset to be maintainable the desired performance of the asset must fall within the envelope of its initial capability. The stresses cause the asset to deteriorate by lowering its resistance to future stress and eventually the resistance drops too low – below the desired performance and the asset fails.

Default Actions: Question 7

These deal with the failed state and are the best option when it is not possible to identify an effective pro-active task, and include;


Fault finding for hidden failures [proof testing] means satisfying ourselves that a proactive devise will work when required to do so; i.e. checking a pressure switch by dropping oil pressure and seeing if the machine shuts down or activate a fire alarm to check if it has failed.

This is worth doing if it reduces the probability of the associated multiple failure to a tolerable level. If a fault-finding task can not be found and the multiple failures do not affect safety or the environment, then it is acceptable to take no action.


Re-design may be justifiable if multiple failures has costly consequences and is compulsory for safety or environmental consequences where proactive maintenance is not possible.

Run to Failure

Run to failure as the name implies entails making no effort to anticipate or prevent failure modes to which it is applied and so those failures are simply allowed to happen and then repaired.


This is a major part of RCM and allows an effective analysis of the type of RCM required depending on the consequences of failure.


Importantly, the organization needs to implement the outcomes of the RCM process of course. To do this there has to be an effective and efficient process that entails consultation, development of new ways of working, signing off plans and procedures and introducing key performance indictors. This process is best described below:


Total Productive Maintenance (TPM) is a process to maximise the productivity of your equipment for its entire life. TPM fosters an environment where improvement efforts in safety, quality, reliability, delivery, cost and creativity are encouraged. It is a technique designed to optimise the utilisation, performance, and productivity of an organisations plant and equipment.

This is achieved through the involvement of the entire workforce in the pursuit of common and appropriate objectives. These people need to be from; Production Engineering, Maintenance, Setters, Operators, Team Leaders and Middle Managers.

The overall objective is to establish a system where all stakeholders work together to help reduce breakdowns, restore or improve on original or intended efficiencies, and reduce and eventually eliminate quality defects.

The goal of TPM is to maximise the Overall Equipment Effectiveness (OEE) and to reduce equipment downtime while improving quality and capacity.

TPM builds on Lean tools such as 5S, Visual Controls, SMED, Pokayoke, and others. The process is organised into several progressive steps, referred to as the "Six Pillars".

TPM is a powerful but often misunderstood strategy for eliminating equipment-related losses. In Lean Manufacturing this translates into eliminating equipment-related "wastes." Go for sustainable bottom line results with TPM and change the culture along the way by using all of the pillars of TPM the way they are intended to be used.

The Six Pillars of Total Productive Maintenance

TPM Organisational Assessment

ICP has developed a unique organisational assessment that addresses all of the areas identified in the model. This provides an excellent indicator of strengths and weaknesses and where prioritised actions are required. This forms an integral part of the TPM implementation process.

The ICP Model for TPM

It will turn your maintenance from a repair function to a reliability function and concentrate on getting the productivity needed from your current equipment assets. "Needed" means high overall equipment effectiveness (OEE) measured over the time you need that equipment to meet daily customer demand.

Total productive maintenance can give you overall equipment effectiveness (OEE) in the mid to upper 90% without major capital expenditures!

TPM addresses the “Six Big Losses” associated with Plant and Equipment, which are:

• Equipment failure
• Changeover – set up and adjustments
• Idling and minor stoppages
• Reduced speed
• Process defect
• Reduced yield



• Greater safety and environmental integrity
• Improved operating performance (output, quality, and customer service)
• Greater maintenance cost-effectiveness
• Longer useful life of expensive items
• A comprehensive database
• Greater motivation of individuals
• Better teamwork


• Schedules to be done by the maintenance department
• Revised operating schedules for the operator of the asset
• One-off changes to the design of the asset or the way it is operated to deal with the situations where the asset cannot deliver the desired performance in its current configuration.

home    |    change mgt    |    lean sigma    |    clients    |    contacts
Copyright © The Innovation Consultancy Partnership Ltd.