MMM 408: Diagnostics and Design Considerations in DiD

The Importance of Diagnostics

Credible Difference-in-Differences (DiD) requires systematic diagnostics. The goal is to assess identifying assumptions, verify the robustness of conclusions, and check the influence of specific modeling choices. Section 4.8 emphasizes a structured diagnostic workflow tailored to modern staggered-adoption DiD.

Pre-Trend Assessments

Pre-trend checks diagnose whether treated and control groups were on parallel trajectories before the treatment.

Visual inspection: Always plot pre-treatment outcomes.
Event-study specification: Estimate multiple pre-treatment leads. Small or noisy leads are necessary for pre-treatment parallel trends, but this does not definitively prove parallel trends will hold in the post-treatment period.
Joint Wald tests: Test the null hypothesis that all lead coefficients are zero. Be sure to use cluster-robust standard errors.

What if pre-trends are present?

If pre-trends appear non-zero:

Try conditioning on observed covariates to see if imbalance explains the differential trend.
Consider factor models or synthetic control variants to accommodate latent trends.
Apply Rambachan–Roth bounds to conduct a sensitivity analysis. This quantifies how much post-treatment estimates would shift assuming pre-trends continued into the post-period.

Placebo Tests

Placebo tests help detect spurious effects by applying DiD logic where no actual treatment exists.

Placebo-in-time: Treat a pre-treatment period as if it were the intervention date. An estimate significantly different from zero implies the design itself might generate phantom effects.
Placebo-in-units: Randomly assign never-treated units into a fictitious “treated” group, leaving the rest as controls. Repeated many times, this builds a null distribution. If the actual treatment effect falls within the middle 95% of these placebo estimates, the result is weak.

Covariate Balance and Overlap

Check whether the treated and control units are properly matched on observables.

Standardised Mean Differences (SMDs): An SMD around 0.1 to 0.2 standard deviations or higher warrants attention. However, this benchmark varies based on how strongly the covariate predicts the outcome.
Adjustments: If balance is initially poor, re-weight, match on covariates, or use them as controls in the outcome regression. Keep in mind this addresses observed factors, but unobserved confounding may remain.
Regression Check: Assess the $R^2$ from regressing the treatment indicator on covariates. A high $R^2$ means treatment assignment is heavily non-random and highly predicted, signalling a high risk of confounding.

Influence and Robustness

Modern DiD estimators aggregate various cohort-time comparisons. Ensure conclusions are not driven by extreme observations.

Leave-one-cohort-out: Exclude each cohort individually and re-estimate. If doing so shifts the estimate by roughly 25% or more, or flips the sign/significance, investigate that cohort.
Leave-one-period-out: This reveals whether a single transient period (e.g., an outlier in early post-treatment) overrides the entire effect.

Specification Curves

To demonstrate that a significant finding isn’t merely an artifact of an arbitrary analytical choice, summarize the stability of findings across many defendable specifications:

Alter the pool of control units (e.g., never-treated vs. not-yet-treated).
Vary covariate adjustments, fixed effects, estimating windows, and binning.
Compare multiple estimators (e.g., Callaway–Sant’Anna, Sun–Abraham, Borusyak–Jaravel–Spiess).

Plotting the distribution of these aggregate estimates yields a specification curve. A tight cluster across variations indicates robust conclusions, while wide variance highlights sensitivity to modeling choices.