The Importance of Diagnostics
Credible Difference-in-Differences (DiD) requires systematic diagnostics. The goal is to assess identifying assumptions, verify the robustness of conclusions, and check the influence of specific modeling choices. Section 4.8 emphasizes a structured diagnostic workflow tailored to modern staggered-adoption DiD.
Pre-Trend Assessments
Pre-trend checks diagnose whether treated and control groups were on parallel trajectories before the treatment.
- Visual inspection: Always plot pre-treatment outcomes.
- Event-study specification: Estimate multiple pre-treatment leads. Small or noisy leads are necessary for pre-treatment parallel trends, but this does not definitively prove parallel trends will hold in the post-treatment period.
- Joint Wald tests: Test the null hypothesis that all lead coefficients are zero. Be sure to use cluster-robust standard errors.
What if pre-trends are present?
If pre-trends appear non-zero:
- Try conditioning on observed covariates to see if imbalance explains the differential trend.
- Consider factor models or synthetic control variants to accommodate latent trends.
- Apply Rambachan–Roth bounds to conduct a sensitivity analysis. This quantifies how much post-treatment estimates would shift assuming pre-trends continued into the post-period.
Placebo Tests
Placebo tests help detect spurious effects by applying DiD logic where no actual treatment exists.
- Placebo-in-time: Treat a pre-treatment period as if it were the intervention date. An estimate significantly different from zero implies the design itself might generate phantom effects.
- Placebo-in-units: Randomly assign never-treated units into a fictitious “treated” group, leaving the rest as controls. Repeated many times, this builds a null distribution. If the actual treatment effect falls within the middle 95% of these placebo estimates, the result is weak.
Covariate Balance and Overlap
Check whether the treated and control units are properly matched on observables.
- Standardised Mean Differences (SMDs): An SMD around 0.1 to 0.2 standard deviations or higher warrants attention. However, this benchmark varies based on how strongly the covariate predicts the outcome.
- Adjustments: If balance is initially poor, re-weight, match on covariates, or use them as controls in the outcome regression. Keep in mind this addresses observed factors, but unobserved confounding may remain.
- Regression Check: Assess the $R^2$ from regressing the treatment indicator on covariates. A high $R^2$ means treatment assignment is heavily non-random and highly predicted, signalling a high risk of confounding.
Influence and Robustness
Modern DiD estimators aggregate various cohort-time comparisons. Ensure conclusions are not driven by extreme observations.
- Leave-one-cohort-out: Exclude each cohort individually and re-estimate. If doing so shifts the estimate by roughly 25% or more, or flips the sign/significance, investigate that cohort.
- Leave-one-period-out: This reveals whether a single transient period (e.g., an outlier in early post-treatment) overrides the entire effect.
Specification Curves
To demonstrate that a significant finding isn’t merely an artifact of an arbitrary analytical choice, summarize the stability of findings across many defendable specifications:
- Alter the pool of control units (e.g., never-treated vs. not-yet-treated).
- Vary covariate adjustments, fixed effects, estimating windows, and binning.
- Compare multiple estimators (e.g., Callaway–Sant’Anna, Sun–Abraham, Borusyak–Jaravel–Spiess).
Plotting the distribution of these aggregate estimates yields a specification curve. A tight cluster across variations indicates robust conclusions, while wide variance highlights sensitivity to modeling choices.