Impact evaluation asks a specific question: did a program change outcomes, compared with what would have happened otherwise? That unobserved “otherwise” is the counterfactual.
Different methods estimate that counterfactual under different assumptions. Choosing a method is therefore a design decision, not just a software decision.
Start with Design Logic
Before estimating models, define:
- Treatment and comparison groups
- Timing of intervention and measurement
- Primary outcomes
- Main sources of potential bias
If this logic is unclear, even advanced models can produce misleading conclusions.
Common Evaluation Approaches
Randomized Controlled Trials (RCTs)
Random assignment creates comparable groups on average, which supports strong causal claims when implementation is valid.
Best suited when:
- Randomization is feasible and ethical
- Baseline and follow-up can be collected
- Spillovers can be managed or measured
Difference-in-Differences (DiD)
DiD compares outcome changes over time between treated and comparison groups.
Key requirement: the groups would have followed similar trends in the absence of treatment (parallel trends).
Regression Discontinuity Design (RDD)
RDD uses eligibility cutoffs (for example, a score threshold) to compare units just above and below the threshold.
Works well when cutoff rules are clearly enforced and cannot be manipulated easily.
Matching Methods
Matching can improve comparability on observed characteristics, but it does not solve bias from unobserved differences.
Use with caution and report limits transparently.
Example: Basic DiD in STATA
* Outcome on treatment x post interaction
reg outcome treated##post, cluster(cluster_id)
* With covariates
reg outcome treated##post age education baseline_outcome, cluster(cluster_id)
The interaction term estimates the DiD effect under the design assumptions.
Diagnostic Questions to Ask
- Are treatment and comparison groups defined before seeing outcomes?
- Is timing aligned with plausible program effects?
- Are standard errors clustered at the correct level?
- Are robustness checks pre-specified or clearly justified?
- Are design limits explained in plain language?
Reporting With Integrity
A strong evaluation report includes:
- Design rationale
- Assumptions and their plausibility
- Main and robustness estimates
- Limitations and external validity constraints
- Policy implications with realistic scope
Impact evaluation is most useful when methods, assumptions, and interpretation stay aligned.
Comments
Powered by GitHub Discussions. Sign in with GitHub to comment.