The Importance of Diagnostics

Credible Difference-in-Differences (DiD) requires systematic diagnostics. The goal is to assess identifying assumptions, verify the robustness of conclusions, and check the influence of specific modeling choices. Section 4.8 emphasizes a structured diagnostic workflow tailored to modern staggered-adoption DiD.

Pre-Trend Assessments

Pre-trend checks diagnose whether treated and control groups were on parallel trajectories before the treatment.

  • Visual inspection: Always plot pre-treatment outcomes.
  • Event-study specification: Estimate multiple pre-treatment leads. Small or noisy leads are necessary for pre-treatment parallel trends, but this does not definitively prove parallel trends will hold in the post-treatment period.
  • Joint Wald tests: Test the null hypothesis that all lead coefficients are zero. Be sure to use cluster-robust standard errors.

If pre-trends appear non-zero:

  1. Try conditioning on observed covariates to see if imbalance explains the differential trend.
  2. Consider factor models or synthetic control variants to accommodate latent trends.
  3. Apply Rambachan–Roth bounds to conduct a sensitivity analysis. This quantifies how much post-treatment estimates would shift assuming pre-trends continued into the post-period.

Placebo Tests

Placebo tests help detect spurious effects by applying DiD logic where no actual treatment exists.

  • Placebo-in-time: Treat a pre-treatment period as if it were the intervention date. An estimate significantly different from zero implies the design itself might generate phantom effects.
  • Placebo-in-units: Randomly assign never-treated units into a fictitious “treated” group, leaving the rest as controls. Repeated many times, this builds a null distribution. If the actual treatment effect falls within the middle 95% of these placebo estimates, the result is weak.

Covariate Balance and Overlap

Check whether the treated and control units are properly matched on observables.

  • Standardised Mean Differences (SMDs): An SMD around 0.1 to 0.2 standard deviations or higher warrants attention. However, this benchmark varies based on how strongly the covariate predicts the outcome.
  • Adjustments: If balance is initially poor, re-weight, match on covariates, or use them as controls in the outcome regression. Keep in mind this addresses observed factors, but unobserved confounding may remain.
  • Regression Check: Assess the $R^2$ from regressing the treatment indicator on covariates. A high $R^2$ means treatment assignment is heavily non-random and highly predicted, signalling a high risk of confounding.

Influence and Robustness

Modern DiD estimators aggregate various cohort-time comparisons. Ensure conclusions are not driven by extreme observations.

  • Leave-one-cohort-out: Exclude each cohort individually and re-estimate. If doing so shifts the estimate by roughly 25% or more, or flips the sign/significance, investigate that cohort.
  • Leave-one-period-out: This reveals whether a single transient period (e.g., an outlier in early post-treatment) overrides the entire effect.

Specification Curves

To demonstrate that a significant finding isn’t merely an artifact of an arbitrary analytical choice, summarize the stability of findings across many defendable specifications:

  • Alter the pool of control units (e.g., never-treated vs. not-yet-treated).
  • Vary covariate adjustments, fixed effects, estimating windows, and binning.
  • Compare multiple estimators (e.g., Callaway–Sant’Anna, Sun–Abraham, Borusyak–Jaravel–Spiess).

Plotting the distribution of these aggregate estimates yields a specification curve. A tight cluster across variations indicates robust conclusions, while wide variance highlights sensitivity to modeling choices.