Because results from OES evaluations impact the lives of millions of Americans, the quality of our work is of paramount importance.

OES evaluation policy and process

The OES Evaluation Policy (PDF) lays out the principles that guide our work. Our evaluation projects follow six steps to produce results that are relevant and reliable.

Project Process Diagram
  1. Partner with federal agencies to target priority outcomes
  2. Translate behavioral insights into concrete recommendations
  3. Embed evaluations
  4. Analyze results using existing administrative data
  5. Ensure our work meets evaluation best practice
  6. Measure impact and generate evidence to continuously improve

Learn more about our project process

Statistical analysis resources

We have produced a series of methods papers for our own team’s use in designing randomized evaluations and conducting statistical analysis. Take a look if you would like to know more about our methods. If you find these useful in your own evaluation work, or if you have questions or would like to request additional resources, please let us know.

Reporting statistical results in text and in graphs

This guidance paper describes OES’s preferred methods for reporting statistical results from a randomized evaluation. It explains how to report a regression coefficient that estimates the effect of a treatment or intervention, as well as how to produce the graphs that OES includes in its project abstracts. Code for generating graphs, both in R and in Stata, is included.
Reporting statistical results in text and in graphs (PDF)

Blocking in randomized evaluations

Whenever possible, we incorporate background information about individuals (or other units) into an evaluation through block randomization. This helps make our estimates of the effects of a program or intervention as precise as possible. This guidance paper describes OES’s approach to block randomization.
Blocking in randomized evaluations (PDF)

Calculating standard errors guide

OES often analyzes the results of a randomized evaluation by estimating a statistical model — typically an ordinary least squares (OLS) regression — where one of the parameters represents the effect of an intervention. In order to decide whether a result is statistically significant, we must estimate the standard error for this parameter. This guide describes our preferred method for doing this. In particular, it explains the reasons for using so-called HC2 standard errors — and how to calculate them in R and Stata.
Calculating standard errors guide (PDF)

Multiple comparison adjustment guide

When evaluators run multiple statistical tests — for example, looking at multiple possible outcomes of a program or intervention, or testing multiple versions of an intervention — they run the risk of getting a “false positive” result unless they account for these multiple tests in some way. There are various approaches to this, and OES’s preferred approach is described in this guide.
Multiple comparison adjustment guide (PDF)

Guidance on using multinomial tests for differences in distribution

Some descriptive and causal research questions at OES center on drawing comparisons across multiple categories between two samples given their status with respect to some policy outcome or behavior. We may alternatively wish to compare how a benefit was distributed among a sample of beneficiaries relative to a well-defined target or eligible population. When we do this, we may use a multinomial statistical test such as a chi-squared test to draw inferences about whether (1) two sub-samples were drawn from the same population or (2) the sample of beneficiaries of a program reflects the population of eligible individuals.
Using multinomial tests for differences in distribution (PDF)

Evaluation resources

Effect size and evaluation: The basics

An impact evaluation aims to detect and measure the effect of a program or policy on a priority outcome. To plan for an evaluation, we need to decide how large or small an effect we want to be able to detect. This important decision will influence all aspects of evaluation planning, including budget, operations, duration, and sample. This resource explains what effect sizes are and their importance in designing an evaluation.
Effect size guide (PDF)

Evidence reviews to support evidence-based policymaking

The Foundations for Evidence-Based Policymaking Act of 2018 (the Evidence Act) directs federal agencies to develop evidence to support policymaking. A crucial component of developing evidence is understanding what evidence already exists. This helps ensure that key learnings are incorporated into new and existing programming, and that the resources available for evidence-building activities are targeted towards areas where there are bigger evidence gaps. This resource introduces a framework for how to conduct a review of existing evidence, and provides additional resources for those seeking to conduct more systematic reviews.
Evidence reviews guide (PDF)

Preregistration as a tool for strengthening federal evaluation

In order to ensure that evaluation findings are reliable and that statistical results are well founded, it is essential that evaluators commit to specific design choices and analytic methods in advance. By making these details publicly available - a practice known as preregistration - we promote transparency and reduce the risk of inadvertently tailoring methods to obtain certain results or selectively reporting positive results. This guidance paper describes the importance and benefits of preregistration and addresses concerns that federal evaluators might have.
Preregistration guide (PDF)

How to use unexpected and null results

Recent research shows that null results in federal evaluations are more common than we think, and occur for a variety of reasons. When agencies share both expected and unexpected results, we can learn about what programs work, what effect sizes are realistic, and improve Federal evaluations. This post dispels misconceptions about null results and highlights different uses and lessons from null results.
Unexpected results guide (PDF)

Observational causal evaluations with quasi-experimental designs

The goal of this document is to provide helpful resources for OES team members engaged in observational, usually retrospective, causal projects. In particular, this intends to support and augment conversations with agency partners, especially those unfamiliar with designs for such projects. This piece does not intend to provide guidance to be followed during analysis, but intends to outline OES’s perspective on observational causal studies. We expect that agency partners will work closely with OES team members on the details of their particular designs.
Resources for quasi-experimental designs (PDF)