Standardization of Behavior Research Methods as a Determinant of Replicability

Global, open standards, like the fixed width of the soccer goal, reduce conflict when players from around the world come together. In psychology, such standards are few—and unsurprisingly “moving the goal posts” is a typical metaphor chosen for the controversy that often erupts after a failed replication attempt. At the same time, difficulties with replicating empirical works –reproducing analyses based on the same data, repeating previously made observations in new data – have been documented across the social and behavioral sciences. As psychology grapples with this crisis, many ask what constitutes a direct replication—when are materials and methods sufficiently similar to be considered the same? If we find an effect when using one measure but not when using an altered version, we may gain insight into boundary conditions and generalizability. However, we may also chase false leads: differences in results may not be replicable if researchers exploited degrees of freedom in measurement to obtain the desired results. Global, open standards remove these degrees of freedom and transparently crystallize agreement on basic aspects of research: units, norms, and measurement procedures. Without standards, efforts to build a cumulative evidence base through replication and evidence synthesis will often end in screaming matches about the goal posts (but without the benefit of a referee). With them, planning new research, assessing replicability of previous research, and synthesizing evidence all become easier.

We propose a comprehensive work programme to study the role standards (and their absence) play in the reproducibility, robustness, replicability, and generalizability of psychological research. We examine how open standards can help the transition of psychology to a mature, cumulative science. We develop SOBER, a rubric to describe and quantify measurement standardization in a machine-readable metadata standard. We demonstrate SOBER’s utility for the prediction of replicability in existing meta-analyses and large-scale replication projects and show how global standards ease evidence synthesis by reducing hypothesis-irrelevant variation. With SOBER, we lay the foundation to reduce redundancy, error, and flexibility in measurement. We catalogue flexible measurement practices, simulate their cost for psychometric quality and robustness of evidence, and test these costs empirically in a series of methodological experiments in which common psychological measures are modified. We integrate our findings into a framework to account for ad-hoc measurement modification effects in psychological research synthesis. Taken together, our plans to engineer a change towards a more standardized culture in psychology take the shape of tools, debates, and educational resources.

Further Information

Our Team

Links and Functions

Breadcrumb Navigation

Main Navigation

Content

Standardization of Behavior Research Methods as a Determinant of Replicability

Further Information

Footer