What do we mean when we say “evidence-based” and “test”?
Because results from OES tests impact the lives of millions of Americans, the quality of our work is of paramount importance. We follow the OES Evaluation Policy and six steps to ensure our findings are relevant and reliable.
Step 1: Partner with Federal Agencies to target priority outcomes
In conversations with collaborators, we discuss the most important questions that need to be answered in order to improve program implementation and performance, and define a meaningful outcome at the start. Agencies maintain priorities through their Congressional Justifications, Annual Performance Plans, Strategic Plans, Agency Priority Goals, Cross-Agency Priority Goals, Learning Agendas, and many other planning efforts. Each project is vetted for feasibility and potential impact on a key priority in a Federal program or policy.
Step 2: Translate evidence-based insights into concrete recommendations
Our collaborators, who are civil servants with years of experience working to deliver programs across the government, are experts on how their programs work and often have the best ideas for how to improve them. OES team members support their efforts by bringing diverse academic and applied expertise to more deeply understand program bottlenecks and offer recommendations drawn from peer-reviewed evidence in the social and behavioral sciences.
Step 3: Embed tests using randomized evaluations
"Tests" to OES are opportunities to truly learn what works - and are grounded in scientific methods.
- Whenever possible, we aim to randomly assign individuals or groups to a treatment condition (the evidence-based program change). This is what enables us to conclude that improvements in outcomes were actually caused by the program change(s) that we tested.
- In designing tests, we give particular attention to statistical power. Briefly, statistical power is a test’s ability to correctly detect that a program change was effective (assuming that it was indeed effective).
- Finally, one of the most important steps we take is committing to a detailed analysis plan before we begin working with the data. As the recent replication crisis in the social sciences has shown, if scientists allow themselves too much flexibility in analyzing data they may get results that are not reliable but instead reflect inadvertent “fishing” or “p-hacking.”
Step 4: Analyze results utilizing existing administrative data
OES team members work with agency collaborators to leverage existing data to measure the effect of a program change on a priority outcome of interest. As OES seeks to embed tests into ongoing operations, new data collections are generally not an option as they can be costly, time-intensive, and can require extensive approval processes. However, administrative data collected by government entities offer rich information that is often underutilized.
Step 5: Ensure our work meets evaluation best practice
In keeping with our team’s commitment to reproducibility, before we finalize an analysis, we conduct an internal replication that we call Reanalysis. An independent reanalyst — an analyst who does not know the results of the initial analysis — writes new code to analyze the administrative data and independently generate results that address the study’s research objectives. Reanalysis serves as a check on (1) the computer code that the first analyst used to analyze the data, (2) any exploratory analyses that might have been conducted, and (3) any departures from the Analysis Plan that might have been necessary due to unanticipated features of the data. The reanalyst’s goal is to replicate the initial analysis from scratch, working only from the raw data and the Analysis Plan. Discrepancies between the analyses are resolved through careful discussion, generating a more reliable outcome.
Step 6: Measure impact and build evidence to continuously improve
As part of our commitment to transparency and learning, OES shares findings from every completed test. This helps ensure Federal collaborators can learn what works and, just as importantly, what does not. Results which are surprising or run counter to our expectations are just as important to share and often offer valuable lessons. Our first priority is producing materials that enable decision makers to quickly digest results and understand their implications in a policy relevant time-frame. We produce a number of project summary documents - from high-level summaries, to one-page abstracts, to presentations - that our agency collaborators can use to circulate among their program teams, peers, and leadership, to facilitate learning and evidence-based policy making.