Randomization Code Review
We generally use computer code to perform random assignment, and yet computer code is complex and notoriously vulnerable to mistakes. Before we use code for random assignment, we make sure it has been independently reviewed by an OES team member who is not directly involved in the project. The reviewer works through the code line by line and may test some or all of the code by running it on either real or mock data.
Among other things, statistical power depends on the number of units included in an evaluation and the method by which they are assigned to different treatment conditions. If an evaluation lacks sufficient statistical power, then there is a risk of ending up with a “false negative” result. A “false negative” would be a failure to detect that an effective program change really was effective. When we design our evaluations, we pay particular attention to whether they will have adequate statistical power to support future decisions about program or policy changes. To progress to the “field” stage, every evaluation must have adequate power to detect meaningful, policy-relevant effects.
To ensure that our results mean what they are supposed to mean, we commit to a detailed Analysis Plan before we analyze data — a best practice that has received greater attention in the social sciences in recent years. In almost any analysis, there is a risk of inadvertently “fishing” for patterns in the data and finding results that are not reliable. To address this risk, we commit ourselves to specific outcome variables and analytic methods up front. We date-stamp the Analysis Plan and post it on our website so that others can hold us accountable, other researchers can verify that our methods are sound, and policymakers can base decisions on our results with confidence.
At the end of a project, when we report our results and findings, we use the Analysis Plan to clearly distinguish between results based on planned (confirmatory) analyses and results based on unplanned (exploratory) analyses. In general, results based on planned analyses carry greater weight and provide strong evidence that a program or policy change was effective. By contrast, results of unplanned analyses carry less weight; they should be treated as suggestive evidence and verified through further research. We are committed to drawing a strong distinction between these two types of evidence, and pre-committing to Analysis Plans is the principal way in which we do this.