Malcolm Ridgway and Alan Lipschultz: Doing It by the Numbers

When our clinical colleagues ask us, “How do I know that all of my equipment is safe to use on my patients,” and we answer with, “Well, all of our PMs are always done on time,” is that really the most reassuring response? Wouldn’t it be better if our response was more like the following?

Well, there are some pieces of equipment, like the knee exerciser over there, that are not very complicated and which couldn’t possibly injure a patient, even if they failed completely.  Such items are classified as noncritical, and they only need very simple maintenance to keep them functional. A lot of the equipment on the hospital’s inventory falls into this noncritical category.

There are some other items, such as this critical care ventilator and this transport incubator, that are critical in the sense that if they fail completely, a patient could be injured. These kinds of devices are classified as reliability-critical, and we take a number of steps to make sure that the possibility of them failing completely is very rare. These kinds of devices are designed to be very reliable. For devices in this category, we follow all of the manufacturer’s recommendations for preventive maintenance, and we carefully analyze every instance when one of these types of device fails. We also monitor the statistics from our national database to confirm that each make and model of the devices that we classify as reliability-critical demonstrates an average or mean time between failures (MTBF) of  at least several hundred years.

“We have  some other types of devices, such as this patient monitoring system and this infant incubator, which can, in theory, cause patient injury if they fail in such a way that they are either providing substantially inaccurate and misleading  information, or they have degraded  to the point that they are no longer meeting the relevant safety specifications. All of the different ways that these kinds of devices, which we classify as performance/safety-critical devices, could fail  have been identified as potentially critical failures.  These kinds of failures are called hidden failures because they can usually be discovered only by performing some kind of periodic performance verification or safety testing. Such devices are also designed to be very reliable and, often, to warn the operator of any hidden failures. We have been collecting and aggregating the results of these performance and safety tests in our nationwide database and, again, we have confirmed that each make and model that we have of these performance/safety-critical devices develop potentially critical failures only very, very infrequently.  Our database shows that these kinds of hidden failures occur, on the average, less than once every 500 years.”

The suggested answer above is, of course, meant only to illustrate the principle. The underlined statements will need to be customized to each specific facility and adjusted to reflect the levels of safety that can be substantiated by actual test data.

If you agree that this is the kind of response that we should aim to provide, then you are invited to join us in exploring how we could assemble just such a database. You can do this by visiting  After clicking on the “here” link on the opening page, you will be able to see the main page. This is a password protected Wiki-type website, so follow the instructions at the top of the main page to gain read-only access. When asked to log in, use “view” as your username and “view” as your password. Do not attempt to create an account.

After you have reviewed the initial material, which is completely tentative at this point, e-mail your comments on the concept of the project and on the preliminary material to the address for the initial task force, which is

Malcolm Ridgway, PhD, CCE,  is chief clinical engineer at ARAMARK Healthcare Technologies. Alan Lipschultz, CCE, PE, CSP, is president and CEO of HealthCare Technology Consulting LLC.

3 thoughts on “Malcolm Ridgway and Alan Lipschultz: Doing It by the Numbers

  1. Another thought: When our clinical colleagues ask us, “How do I know that all of my equipment is safe to use on my patients?” IMHO, we should answer with, Well, you as the caregiver are responsible for the patient’s safety while using any medical device. Your responsibility is to be completely familiar with the device’s application and operating instructions and have performed a pre-use inspection and functional check prior to applying the equipment to the patient exactly as the device operators manual and AORN guidelines describe. As A biomed, my responsibility is to render technical assistance, perform maintenance at specified intervals or provide repair services when your assessment of the device indicates my services are required.

  2. I am grateful to my good friend Bill Hyman for his critique of the hypothetical dialog that Alan and I created as a model for a reassuring speech to a clinical colleague about equipment reliability and safety which he (rightfully) judges to be a little “over the top.” We too believe that true professionals should not have to resort to overstated superlatives! And his note of caution warning that historical averages are not reliable predictors of the future is also appropriate and well taken. (I am reminded of the story about the statistician who drowned in a river whose average depth was only three feet.) Every investment fund prospectus (rightfully) warns us that “past performance is no guarantee of future performance.” Yet many people who live in areas classified as a 100-year flood plain feel that they are better off living where they do than living in an area classified as a 20-year flood plain, even though they know full well that such a classification does not guarantee that there will be any particular minimum time between floods.

    The math in Bill’s “silly example” is, however, potentially misleading. Mean time between failures (MTBF) is defined as the reciprocal of the corresponding average failure rate and, as such, calculating it requires failure statistics for (preferably) a number of devices over some period of time. The way the Task Force is proposing to do the math is spelled out on the website – – in Section 4.3, using average device failure rates as a measure of device risk, and in more detail in Sections 4.4 – 4.6. For a “reasonably” representative average (and corresponding MPTF), the failure rate data needs to be collected from a “reasonably” large number of devices and aggregated over a “reasonably” long period of time. The Task Force has established a tentative threshold for considering the experience base to be ‘reasonable” as 50 device-years. Of course, bigger is better.

    His final observation that there is a “widely held belief” that traditional preventive maintenance has very little impact on medical device failure rates, and thus on current levels of patient safety, is the precise target of this project. Traditional PM will reduce the failure rate of certain medical devices – specifically for those devices that need periodic attention in the form of restoration or replacement of a part that would otherwise fail during the useful lifetime of the device. However, it is our belief that opting to not perform this restoration prospectively will change the failure rate (i.e., the reliability) of most medical devices by only a very small amount. And furthermore, there are only a few device types whose level of overall reliability affects their level of safety in any way whatsoever. The Task Force characterizes these types of devices as “reliability-critical.” See HTM ComDoc 3 and the related tables on the website for more on this analysis.

    Finally, it is probably fair to say that this “widely held belief” is widely held only within the HTM community itself. And the primary motivation for this project is the recognition that we probably won’t be able to extend our belief – that traditional PM performed strictly according to the device manufacturer’s recommendations on all medical devices is not necessary to achieve acceptable levels of patient safety – until we can present a case that is reasonably reassuring at a lay-person level – which probably means – doing it by some kind of numbers analogous to the 100-year flood plain.

  3. The effort to make PMs rational and efficient based on actual failure data is an important one, and I am participating in this project.

    However, as I have communicated to Malcolm, I have some issues with the admittedly hypothetical sample discussion that was presented here. For one, as noted, the “facts” are not actual facts. Secondly, the use of superlatives always makes me uncomfortable, i.e. “couldn’t possibly injure”, “all” of the failure modes, In addition, the MTBF, however large, does not mean that it won’t fail tomorrow, or even later today. and maybe the “average” MTBF isn’t good enough anyway. As a silly example, if 10 devices fail in one day and one fails at one year, the M(average)TBF is 34 days, but 91% of the failures where much shorter than that (if I did this right).

    Perhaps more importantly, the above does not address medical device harm that is unrelated to PM such as that due to use error (no “r” in use). I believe it is widely held that most medical device incidents are not technical device failures, and in particular not PM mediated failures. Therefore a rational allocation of resources in this domain would not support more PM.

    However supporting this proposition takes good data from a cross-section of the community. Please join this effort.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s