Because results from OES evaluations impact the lives of millions of Americans, the quality of our work is of paramount importance.

OES Evaluation Policy & Process

The OES Evaluation Policy lays out the principles that guide our work.
Our evaluation projects follow six steps to produce results that are relevant and reliable.


Learn more about our project process here.

Methods for Evaluation Design and Statistical Analysis

We have produced a series of methods papers for our own team’s use in designing randomized evaluations and conducting statistical analysis. Take a look if you would like to know more about our methods. If you find these useful in your own evaluation work, or if you have questions or would like to request additional resources, please let us know.

Reporting Statistical Results in Text and in Graphs

This guidance paper describes OES’s preferred methods for reporting statistical results from a randomized evaluation. It explains how to report a regression coefficient that estimates the effect of a treatment or intervention, as well as how to produce the graphs that OES includes in its project abstracts. Code for generating graphs, both in R and in Stata, is included.

Blocking in Randomized Evaluations

Whenever possible, we incorporate background information about individuals (or other units) into an evaluation through block randomization. This helps make our estimates of the effects of a program or intervention as precise as possible. This guidance paper describes OES’s approach to block randomization.

Calculating Standard Errors Guide

OES often analyzes the results of a randomized evaluation by estimating a statistical model — typically an ordinary least squares (OLS) regression — where one of the parameters represents the effect of an intervention. In order to decide whether a result is statistically significant, we must estimate the standard error for this parameter. This guidance paper describes our preferred method for doing this. In particular, it explains the reasons for using so-called HC2 standard errors — and how to calculate them in R and Stata.

Multiple Comparison Adjustment Guide

When evaluators run multiple statistical tests — for example, looking at multiple possible outcomes of a program or intervention, or testing multiple versions of an intervention — they run the risk of getting a “false positive” result unless they account for these multiple tests in some way. There are various approaches to this, and OES’s preferred approach is described in this guidance paper.