New rules, titled 2015 Edition Health Information Technology (Health IT) Certification Criteria, 2015 Edition Base Electronic Health Record (EHR) Definition, and ONC Health IT Certification Program Modifications were recently provided for public comment. In my opinion, these rules do not (yet) go far enough to assure safe and effective design and implementation of health IT. However, I believe they represent a reasonable, measured, incremental step toward uniform human factors engineering best practices that will be necessary for the health IT industry to consistently design safe, effective, efficient, and satisfying user interfaces.
I believe that I am uniquely qualified to comment on this topic since I have been a practicing clinician, human factors engineering (HFE) researcher, and educator. I am also a former medical device user interface standards developer, have worked as a consultant with industry to develop safe and effective medical technology, and oversaw Vanderbilt University’s summative usability testing (SUT) to comply with the safety-enhanced design (SED) requirements of Stage 2 of the federal government’s meaningful use program for EHRs. The following comments are based on 25 years of experience in both academics and industry. All of this information can be found in my 2011 textbook, The Handbook of Human Factors in Medical Device Design. While the book focuses on medical devices, my colleagues and I have been following the same human factors engineering processes with health IT for 10 years with uniform success.
Herein, I argue that either the proposed rules are reasonable and necessary or, in a few cases, insufficient to comply with best practices already in existence for technology user interfaces across all industries where public safety is at risk (i.e., medical devices, aviation, nuclear power, etc.).
Number of Test Participants (Criteria I & II). While it is conventional practice for formative usability testing to evaluate five to seven participants, for summative testing it is best practice to use at least 15 representative participants from each intended user group. In the case of health IT, this would include, for example, 15 physicians, 15 nurse practitioners (we have data to suggest that they use EHRs differently than do MDs), 15 nurses, and 15 ward clerks. Note that usage differences within nursing vary, and there could be real reasons to study separate cohorts (n=15 each) of emergency department, intensive care unit, and ward nurses, for example. Thus, for a typical comprehensive EHR, the total number of SUT participants could be at least—and likely more than—75 users.
Representative has a very specific meaning here. For each user group, the essential characteristics of the test participants must be representative of the intended user population. Essential characteristics will depend on which ones are most likely to affect user behavior when interacting with the product and, particularly, the likelihood of use errors that are determined through risk analysis to lead to harm. In general, for all SUT, these essential characteristics will include the following: participant gender, age, clinical experience, and prior experience with similar health IT products. For an evolutionary or legacy product, prior experience with the specific health IT product is essential, but other characteristics may prove to be critical to include in the participant sample(s). Further, the characteristics chosen on which to sample may well be different for different user groups (i.e., physicians vs. nurses).
No Off-Shoring of SUT. It is inappropriate for the SUT to be conducted in a developing country (to save costs) because the clinician participants would not be likely to have the same educational background, clinical training, clinical experience, and even cultural diversity as the ultimate American clinician users.
Structure of Usability Testing (Criteria II & IV). The critical elements of all usability testing, whether formative or summative, include the use of:
- Representative users
- Representative tasks (including those determined to include the highest risk of use error)
- Unguided performance of normally trained users (and, where indicated by user research of untrained users) to accomplish
- Objective measures of use performance (i.e., time on task, use errors)
- A detailed structured report (including video) of results and findings.
Thus, it is absolutely essential that in every test the chosen participants be representative of the intended user population, not have been involved in the design of the product in any way (naïve), and not be given any advance testing or training regimen that exceeds what is expected to occur routinely in actual deployment.
Which Health IT Functions to Test? It is not possible (or at least feasible) in a SUT to test all functions and use paths of the product. It is however essential to test those aspects of the product that are either common or high risk (i.e., highest likelihood of use errors leading to patient harm). Often, features that have significant use implications (e.g., high cost to the organization or the patient) are also tested. As alluded to in NISTIR 7804, use-related risk analysis is critical to identifying those product features that are highest risk. While ONC has specified particular (generally medication-related) health IT features required to be tested (§170.315 a & b), it is my recommendation that health IT developers have the flexibility—based on empirical evidence that includes a formal risk analysis—to adjust how they test these features to elicit the highest risk of use errors with their product.
A prudent health IT developer also would test (at least in formative usability testing, see below) any additional product feature that the use-related risk analysis identified as being associated with a significant risk of patient harm. An example might be a product feature, not otherwise specified in §170.315 a or b that, in the case of use error, could foreseeably lead to a significant delay in a life-saving treatment, including delayed diagnosis. An admittedly atypical example of such a usability issue would be the inability of inpatient treating physicians to access readily a nurse’s emergency room intake note that documented East African travel in a patient who proved to have Ebola.
The use scenarios applied in the SUT must be designed to rigorously test those product functions that are identified as being the highest risk. The rules call out specific recommended use scenarios (provided in NISTIR 7804-1). However, health IT developers should be allowed to substitute other equally rigorous use scenarios that are more appropriate to their product, provided that they can provide the empirical evidence to support this substitution.
Thus, I suggest that the text at the end of Criteria V is grossly insufficient to assure adequate SUT for safety. Instead, this section should read, “When the test scenarios used for summative usability testing to validate safety related usability are not those recommended (the use cases specified in NISTIR 7804-1), the health IT developer must provide empirical evidence, supported by a valid use-related risk analysis, to support the test scenarios chosen for use.”
To assure that user performance is unbiased, it is important that participants be given realistic clinical scenarios and then be asked to achieve, using the health IT, clinically meaningful goals (e.g., “Ms. Jones, an IV drug user, has a staphylococcus aureus skin infection, prescribe the appropriate treatment”) and not to complete pre-specified tasks – It is totally inappropriate to provide users with step-by-step directed tasks (i.e., “click on Prescribe Medication, select Cephazolin”).
Further, a usability test (like any empirical experiment) is virtually useless if it is not documented sufficiently to allow someone else to replicate it. Thus, the SUT report must include complete details of the test scenarios, specific criteria for task success, the test procedures (including what was measured and how), the participants’ relevant characteristics, all of the findings (both quantitative and qualitative), the analyses (both how they were achieved and what was found), and the interpretation of the findings.
Further, there should be a thorough analysis of all use errors that could have lead to patient harm including a discussion of potential mitigations to prevent these use errors and/or to prevent the potential harm should they occur in actual use.
User-Centered Design Process (Criteria III). I have long been an advocate for a regulatory framework that emphasizes a high quality, fully auditable, evidence-based user-centered design (UCD) process throughout each product’s life cycle. The most essential elements of UCD are:
- A thorough understanding of the product’s use context (which typically requires extensive HFE)
- Recurrent iterative cycles of user interface design and formative usability testing
- Clear, complete, and traceable documentation
Finally, effective UCD requires the use of formal evidence-informed risk analyses to allow health IT developers to scale their UCD for particular aspects of the health IT user interface depending on its actual safety implications (i.e., use error that can contribute to patient harm). Further, UCD for all high risk health IT must be performed in the context of an effective quality management system.
I strongly encourage ONC to support the creation of and ultimately mandate compliance with consensus national standards for risk/safety management and for quality management of health IT systems. Toward this end, the AAMI recently convened a new Health IT Standards Committee charged with developing two new standards: HIT1000, Application of risk management processes for health IT systems and HIT2000, Application of quality management system processes for health IT systems. Ideally, these national standards will be integrated with relevant international standards (e.g., the IEC 80001 series) to facilitate a global approach to health IT risk and quality management.
Added Value of Formative Usability Test (FUT) Results. Besides being an essential component of UCD throughout product development, the results of FUTs may be part of the developer’s evidence for product safety as advocated by many human factors engineering practitioners. This can decrease the need for as much SUT late in development (i.e., when the product appears to be ready for market). Further, the conduct of FUT throughout the product life cycle will substantially reduce the risk of undesirable use errors during the SUT. However, FUTs will only meet these goals when they are conducted with the same rigor as that of SUTs (as described above).
Across the Product Life Cycle. There are a number of problems with the use of SUT of health IT as the definitive “test” of product safety. One problem is that the health IT as configured by the developer for such summative testing is unlikely to be the same as the same health IT system ultimately configured (by the developer, implementers, and health delivery systems) for actual use at numerous customer sites. A second problem is that the user interfaces of these systems change continuously over their life cycle, sometimes without health IT developer knowledge or control. Thus, as has been the case in other high-risk industries, the UCD/HFE framework and regulatory oversight for health IT needs to take a life cycle approach. Toward this end, implementers and healthcare delivery organizations that configure and deploy health IT systems also need to fall under these regulations.
Matthew B. Weinger, MD, MS, is vice chair of medical research on the AAMI Board of Directors. He is the Norman Ty Smith chair in patient safety and medical simulation, and a professor of anesthesiology, biomedical informatics, and medical education at Vanderbilt University School of Medicine in Nashville, TN.