Randomization Code Review

We generally use computer code to perform random assignment, and yet computer code is complex and notoriously vulnerable to mistakes. Before we use code for random assignment, we make sure it has been independently reviewed by an OES team member who is not directly involved in the project and thus has “fresh eyes.” The reviewer works through the code line by line and may also test some or all of the code by running it on either real or mock data. By checking that our random assignment code is correct, we ensure that our agency collaborator’s investment in a field evaluation is well founded and that, at the end of the project, the results mean what they are supposed to mean.

Statistical Power

Among other things, statistical power depends on the number of cases included in a study and the method by which they are assigned to different treatment conditions. If a study lacks sufficient statistical power, then there is a risk of ending up with a “false negative” result. A “false negative” would be a failure to detect that a program modification really was effective, and this can have repercussions for future program design and policy making. When we vet our study designs to ensure they are as strong as possible, we pay particular attention to whether the study will have adequate power for the policy-makers decision. To progress to the “field” stage, every study must have adequate power to detect meaningful, policy-relevant effects.

Analysis Plans

In our case, this would mean reporting “false positive” results that appear to indicate that a program or policy change was effective but instead reflect patterns or differences that appeared in the data by chance. To ensure that our positive results mean what they are supposed to mean, we commit to a detailed Analysis Plan before we analyze data — a best practice that has received greater attention in the social sciences in recent years. In particular, we commit ourselves to specific outcome variables and analytic methods, and we date-stamp the plan and post it on our website so that others can hold us accountable, other researchers can verify that our methods are sound, and policymakers can base decisions on our results with confidence.

At the end of a project, when we report our results and findings, we use the Analysis Plan to clearly distinguish between results based on planned (confirmatory) analyses and results based on unplanned (exploratory) analyses. In general, results based on planned analyses carry greater weight and provide strong evidence that a program or policy modification was effective in bringing about a change in outcomes. By contrast, results of unplanned analyses carry less weight; they should be treated as suggestive evidence and verified through further research. We are committed to drawing a strong distinction between these two types of evidence, and pre-committing to Analysis Plans is the principal way in which we do this.