International – Calculating Sample Sizes For Human Factors Studies: What’s The Magic Number?

Designing a research study can be a daunting process that calls for many considerations, including statistical power and sample size calculations. Although sample sizes for medical device usability testing are mostly defined by regulatory standards, it is important for manufacturers to understand the underlying principles, and to make sound decisions accordingly. This article provides a brief overview of statistical power and sample size within the context of usability.

Calculating Statistical Power

Statistical power is the likelihood that the study will detect an effect when there actually is an effect to be detected. As statistical power increases, the probability of making a type II error, also called a “false negative,” will decrease. That means, when a study is underpowered, it might fail to detect an effect that actually exists. In medical device usability testing, an underpowered study could lead to the conclusion that the test device is safe when it is not.

Smaller effect sizes would warrant a larger sample size for the same statistical power, because they are more difficult to detect. Although it is best practice to calculate sample size for any research study, it is harder to calculate the effect size (and, consequently, the sample size) for qualitative studies, compared to quantitative studies.

Quantitative usability studies mostly test the differences between certain quantifiable measures, such as time to completion, click-through-rate, or accuracy. Qualitative usability studies for medical devices aim to detect safety-related design problems by investigating the root causes of use errors…