MMM 605: Diagnostics and Goodness of Fit for Synthetic Control

Credible synthetic control analysis requires rigorous diagnostics that assess the quality of the pre-treatment fit, the sensitivity of conclusions to specification choices, and the plausibility of identification assumptions. This post connects these diagnostics to the identification theory from [MMM 603] and provides practical guidance.

1. Connection to Identification Theory

Diagnostics in synthetic control are not merely descriptive—they help you assess whether the identification assumptions are plausible. Recall the bias decomposition from MMM 603: the bias equals $f_t' \Delta\lambda$, where $\Delta\lambda = \lambda_1 - \sum_j w_j^* \lambda_j$ is the factor loading mismatch.

The fundamental diagnostic question is: How large is $\Delta\lambda$?

Since factor loadings are unobserved, we cannot compute $\Delta\lambda$ directly. Pre-treatment fit metrics serve as empirical proxies. The pre-treatment RMSPE effectively aggregates the discrepancy $f_t' \Delta\lambda + \text{noise}_t$. A small $\text{RMSPE}_{\text{pre}}$ is necessary for small bias because it indicates that the weighted donors track the treated unit closely. However, it is not sufficient—if factors $f_t$ change drastically after treatment, small pre-treatment mismatch can still amplify into large post-treatment bias.

2. Pre-Treatment Fit Metrics

To interpret $\text{RMSPE}_{\text{pre}}$, it is useful to scale it relative to outcome variability. Relative RMSPE divides the pre-treatment RMSPE by the standard deviation of the treated unit’s pre-treatment outcomes:

$$\text{Relative RMSPE} = \frac{\text{RMSPE}_{\text{pre}}}{\text{SD}(Y_{1,\text{pre}})}$$

This is easiest to interpret when benchmarked against decision-relevant effect sizes. If the typical pre-treatment discrepancy implied by the fit is large enough to change a managerial decision, treat the design as fragile.

3. Convex Hull Diagnostic

Identification requires the convex hull condition ($\lambda_1 \in \text{conv}\{\lambda_j\}$). We diagnose this indirectly:

  • If $\text{RMSPE}_{\text{pre}}$ remains large despite optimisation,
  • If weights pile up on a single donor,
  • Or if the treated unit’s predictors are extreme relative to donors.

A standard diagnostic is a PCA plot of treated and donor units in the first two principal components of the predictor matrix. If the treated unit lies outside the donor cloud, extrapolation is likely. This is a proxy for convex-hull concerns, but because it reduces dimensionality, lying inside the cloud doesn’t guarantee the condition holds in the full space.

4. Bounds Width as Diagnostic

Under Synthetic Parallel Trends (MMM 603), when point identification fails, the width of the identified bounds provides a diagnostic for identification strength:

$$\text{Width}_t = \bar{\tau}_t - \underline{\tau}_t$$
  • Narrow bounds: Pre-treatment data strongly constrain the counterfactual.
  • Wide bounds: Many weighting schemes are consistent with the data. If the width is large relative to plausible effects, identification uncertainty dominates estimation uncertainty. Report bounds rather than point estimates.

5. Visual and Weight Diagnostics

Visual diagnostics provide intuitive evidence on fit quality and effects:

  • Trajectory Plot: Plotting treated and synthetic outcomes over time. Pre-treatment trajectories should align, while post-treatment trajectories diverge.
  • Gap Plot: Plotting the difference over time. It should fluctuate near zero before treatment and show a persistent shift after.
  • In-Space Placebo-Check Gap Plot: Overlay the treated unit’s gap against all donor placebo gaps.

For weights, we can compute the effective number of donors:

$$N_{\text{eff}} = \frac{1}{\sum_j (w_j^*)^2}$$

This is the reciprocal of the Herfindahl index. A small $N_{\text{eff}}$ indicates concentrated weights (e.g. tracking a single donor closely), which inflates idiosyncratic variance. A large $N_{\text{eff}}$ indicates diffused, averaged weights. Good practice also involves checking Predictor Balance to ensure the synthetic control matches the treated unit on observed characteristics.

6. Sensitivity Analyses and Overfitting

Stability across different specifications supports robustness.

  • Leave-One-Donor-Out: Re-estimate excluding each donor one by one. If any single donor has large influence, it implies identification rests heavily on that specific unit.
  • Sensitivity to Predictors & Donor Pool: Re-estimate varying the predictors used or modifying the donor pool (e.g., removing geographic neighbours to mitigate spillovers).

Overfitting Diagnostics: Cross-validation within the pre-treatment period (splitting into training and validation sets) helps diagnose overfitting. If validation RMSPE is much larger than training RMSPE, weights are overfitted to idiosyncratic noise.

7. Diagnostic Summary Table

A quick heuristic checklist for evaluating a synthetic control specification:

DiagnosticWhat It AssessesWarning Signal
RMSPE$_{\text{pre}}$Pre-treatment fitLarge relative to SD $(Y_{1,\text{pre}})$
Relative RMSPEFit relative to variabilitySubstantial fraction of SD
Bounds widthIdentification strengthWidth comparable to plausible effects
$N_{\text{eff}}$Weight concentrationVery small (≈1–2 donors)
Predictor balanceCovariate matchingLarge discrepancies
Leave-one-out rangeDonor sensitivityWide range (high influence)
Placebo pseudo-gapsStabilityLarge pseudo-gaps
CV validation RMSPEOverfitting$\gg$ training RMSPE

8. Practical Workflow

A rigorous diagnostic workflow proceeds in stages:

  1. Assess pre-treatment fit (RMSPE and relative RMSPE).
  2. Check the convex hull constraint (PCA plot, boundary weights).
  3. If using bounds, compute and assess the bounds width.
  4. Visualise results (trajectories, gaps, placebos).
  5. Examine weight concentration ($N_{\text{eff}}$ and balance).
  6. Conduct sensitivity analyses (leave-one-out, alternative predictors/donors).
  7. Assess stability via in-time placebo checks.
  8. Check for overfitting via cross-validation.

Reporting these diagnostics transparently builds a cumulative case for credibility and makes assumptions explicit.