When Is Randomization Right for Evaluation?

Public policy researchers grapple with choosing appropriate research designs to guide their work. Choices are typically influenced by the policy questions underlining the research and numerous practical realities.

I advocate using randomized experiments whenever possible because they provide a high level of confidence in the results. But randomized experiments are not always possible and sometimes are inappropriate. This raises a provocative question: what criteria should researchers use to decide when to use an experimental design and when to look for alternatives?
 
When to Experiment?
 
A randomized experiment is appropriate under the following conditions:
 
1.     When getting an unbiased, causal estimate of the policy/program impact matters. An experiment produces an unbiased, causal estimate of program impacts, and when the research or policy question requires such information, then an experimental evaluation design is the preferred one.
 
2.     When randomization is feasible, legal and ethical. The process of randomizing cases to treatment and control groups must be practically feasible, and crafting a control group—that is embargoed from program services for some period of time—must be both legal and ethical. For example, researchers cannot exclude eligible applicants to an entitlement program from receiving that entitlement, but we could use an experiment to assess whether issuing a higher level of entitlement payment is more effective in achieving certain goals.
 
3.     When the evaluation is prospective. Usually, an experiment can only be implemented with advance planning, integrating randomization into the program intake process and allowing treatment and control group cases to move forward under their experimental conditions.
 
4.     When the resulting information will be timely enough to make a difference. Because of the challenges in successfully implementing an experiment, the value produced by its results—for policy and program decisions—must be warranted, both in terms of their content and their timing.
 
5.     When the study population is a reasonable proxy for the population of interest. A common critique of experiments is that their findings are limited to the population under study. If a broad policy decision will be made based on the evaluation’s results, then the study population must not be so idiosyncratic as to prevent generalization, and should be configured to support generalization.
 
6.     When the cost of the evaluation is commensurate with the value of the information produced. To the extent it can be gauged, the value of an evaluation’s results must also exceed the cost of carrying out the research, a criterion that exists regardless of the selected evaluation design.
 
When Not to Experiment?
 
It may be the case that all of the criteria above are met, but that it still might not be appropriate to use an experimental evaluation design. An evaluator should not use a randomized experiment in the following circumstances:
 
1.     When effects are so large that causality is obvious. A central tenant of social science research is that correlation is not causation. That said, some correlations are so large that they can imply causality, such as the relationship between tobacco smoking and cancer. Randomizing who smokes (in addition to being impractical and unethical) is not necessary to establish a causal claim.
 
2.     When an alternative evaluation design could provide a highly reliable answer and at a lower cost. A future blog post will consider alternative evaluation designs, but suffice it to say at this time that conditions exist when non-experimental evaluations are fitting. In those cases, if such an evaluation costs less than an experimental evaluation, then it would be appropriate to use it.
 
3.     When a program is not yet ready for an impact evaluation. Measuring the impacts of newly crafted, not-yet-fully-developed programs is premature. Impact evaluations, including experiments, are best used once programs reach a steady state of implementation.
 
4.     When evaluation questions are not about impact. Perhaps most obviously, an experimental evaluation cannot answer non-impact questions. Other evaluation designs and methods are more fitting for questions regarding program processes and operations or program outputs or outcomes.
 
These are a few guideposts for researchers to consider when deciding whether an experimental design is the right fit. In the situation where an experiment is not the right choice, another set of criteria are needed to select among alternative designs. That will be the topic of a future blog post, so please stay tuned!
 
In the meantime, listen to a recent conversation about the issues raised in this blog:


Comments
Laura Peck
I would love to hear from readers/listeners about any "when to" or "when not to" conditions that I have missed. Please share your thoughts!
5/16/2017 2:07:24 PM

Leave comment



 Security code