MMM 604: Inference for Synthetic Control
With an SC estimate in hand (MMM 602) and the identification theory understood (MMM 603), the next question is: how confident should we be? This post covers the inference toolkit for Synthetic Control—quantifying uncertainty and testing whether the post-treatment gap is statistically meaningful.
1. Permutation-Based Inference (In-Space Placebo Checks)
The cornerstone of SC inference is the in-space placebo test introduced by Abadie et al. (2010). The idea: apply the SC procedure to every donor unit as if it were treated, and compare the treated unit’s post-treatment gap to the distribution of placebo gaps.
Algorithm:
- For each donor unit $j \in J$, construct a synthetic control using the remaining $N-2$ donors and the actual treated unit.
- Compute the post-treatment gap $\hat{\tau}_{jt} = Y_{jt} - \hat{Y}_{jt}^{\text{syn}}$ for each placebo unit $j$.
- Compare the treated unit’s gap to the placebo distribution.
The key statistic is the RMSPE ratio. Define the pre-treatment RMSPE for unit $i$:
$$\text{RMSPE}_{i,\text{pre}} = \sqrt{\frac{1}{T_0} \sum_{t=1}^{T_0} \left(Y_{it} - \hat{Y}_{it}^{\text{syn}}\right)^2}$$and the post-treatment RMSPE:
$$\text{RMSPE}_{i,\text{post}} = \sqrt{\frac{1}{T - T_0} \sum_{t=T_0+1}^{T} \left(Y_{it} - \hat{Y}_{it}^{\text{syn}}\right)^2}$$The RMSPE ratio normalises the post-treatment gap by the pre-treatment fit:
$$r_i = \frac{\text{RMSPE}_{i,\text{post}}}{\text{RMSPE}_{i,\text{pre}}}$$A large ratio for the treated unit indicates a big post-treatment departure relative to how well that unit was fit pre-treatment. The permutation p-value is:
$$p = \frac{1}{N} \sum_{i=1}^{N} \mathbf{1}\{r_i \geq r_1\}$$where $r_1$ is the treated unit’s ratio. If only $1$ out of $N$ units has a ratio as large as the treated unit, $p = 1/N$.
Why ratios matter: Raw gaps are misleading because donor units with poor pre-treatment fit mechanically produce large post-treatment gaps. RMSPE ratios level the playing field by normalising for pre-treatment fit quality.
How to interpret RMSPE in practice:
- $\text{RMSPE}_{i,\text{pre}}$ is an average tracking error during the fit period. Smaller values mean the synthetic control replicates the unit well before treatment.
- $\text{RMSPE}_{i,\text{post}}$ is an average discrepancy after treatment. Large values can reflect treatment effects, model drift, or both.
- The ratio $r_i$ asks: “how many times larger is post-period error than baseline tracking error?” For example, $r_i = 3$ means post-period mismatch is three times the pre-period mismatch.
Suppose the treated unit has $\text{RMSPE}_{\text{pre}}=2$ and $\text{RMSPE}_{\text{post}}=8$, so $r_1=4$. A placebo donor has $\text{RMSPE}_{\text{pre}}=6$ and $\text{RMSPE}_{\text{post}}=9$, so $r_j=1.5$. Even though the placebo’s raw post gap (9) is larger than the treated unit’s pre gap (2), its relative deterioration is smaller. This is exactly why ratios are preferred for rank-based inference.
Common implementation rule: Exclude placebo units with very poor pre-treatment fit before ranking ratios (for example, placebo units with pre-treatment RMSPE above 2x or 5x the treated unit’s pre-treatment RMSPE). This avoids inflating the placebo benchmark with units that were never credibly fit. Report the threshold explicitly and show sensitivity to alternative cutoffs.
Numerical caution: If $\text{RMSPE}_{i,\text{pre}}$ is near zero, ratios can become unstable. In that case, report both the ratio and the raw post-treatment RMSPE, and consider a small denominator floor (e.g., $\max(\text{RMSPE}_{i,\text{pre}}, \epsilon)$) for robustness checks.
Power concerns: With a donor pool of size $N-1$, the smallest achievable p-value is $1/N$. Conventional significance at $\alpha = 0.05$ requires at least 20 donors. Small donor pools ($N < 10$) leave the test underpowered—this is a structural limitation, not a modelling failure.
2. Conformal Inference
Conformal inference provides formal confidence intervals for $\tau_{1t}$ without requiring exchangeability of treatment assignment across units—a stronger framework than the permutation approach.
Construction:
- Compute pre-treatment residuals $\{e_{1s} : s \leq T_0\}$ from the SC fit.
- For a hypothesised treatment effect $\tau_0$, compute the adjusted outcome $\tilde{Y}_{1t} = Y_{1t} - \tau_0$.
- Compute the adjusted residual $\tilde{e}_{1t} = \tilde{Y}_{1t} - \hat{Y}_{1t}^{\text{syn}}$.
- The conformal p-value is the fraction of pre-treatment residuals with $|e_{1s}| \geq |\tilde{e}_{1t}|$.
Confidence interval. Inverting this test yields:
$$\text{CI}_{1-\alpha}(\tau_{1t}) = \{\tau_0 : p(\tau_0) \geq \alpha\}$$which contains all values of $\tau_0$ that cannot be rejected at level $\alpha$.
The scpi package (R, Stata, Python) implements these procedures, including block structures for serial correlation and both pointwise and uniform confidence bands. Conformal methods quantify uncertainty conditional on the SC specification—they do not alter the identification assumptions from MMM 603. If those assumptions fail, conformal intervals will be centred on a biased counterfactual.
Caveat: If pre-treatment residuals exhibit nonstationarity (drift in mean or variance), conformal coverage guarantees may not hold. In such cases, consider using only recent pre-treatment periods or employing weighted conformal methods that downweight distant periods.
3. Analytical Variance Decomposition
Under the factor model (MMM 603, Assumption 16), the variance of the SC estimator decomposes into two sources:
$$\text{Var}(\hat{\tau}_{1t}) \approx \sigma^2 \left(1 + \sum_{j \in J} (w_j^*)^2 \right) + f_t' \, \text{Var}(\hat{\Delta}\lambda) \, f_t$$where:
- $\sigma^2 = \text{Var}(\varepsilon_{it})$ is the idiosyncratic error variance,
- $\sum_j (w_j^*)^2$ is the Herfindahl index of weights (the “effective sample size” adjustment),
- $f_t$ is the factor vector at time $t$,
- $\text{Var}(\hat{\Delta}\lambda)$ is the variance of the estimated factor loading mismatch.
First term (idiosyncratic): Sparse weights—few donors carrying large weight—inflate this term. Diversified weights reduce it.
Second term (factor loading uncertainty): Decreases as $T_0$ grows and the factor structure is better identified. In practice, $\sigma^2$ is estimated from pre-treatment residuals and the factor loading variance is approximated via bootstrap or analytical methods.
This decomposition is approximate and relies on linearisation. In most applications, practitioners rely on bootstrap or conformal methods (via scpi) rather than on closed-form variance formulas.
4. In-Time Placebo Checks
In-time placebo checks assess stability by applying the SC method with pseudo-intervention dates in the pre-treatment period.
Procedure:
- Choose a pseudo-intervention date $T_0^* < T_0$.
- Use periods $1, \ldots, T_0^*$ as the pseudo pre-treatment window.
- Construct SC weights using this truncated window.
- Compute pseudo-gaps for periods $T_0^* + 1, \ldots, T_0$.
If pseudo-gaps are near zero, this supports stability: the synthetic control continues to track the treated unit even outside the fitting window. Large pseudo-gaps suggest the SC overfits the pre-treatment period or that the factor structure is unstable.
In-time placebo checks are diagnostics, not formal tests. They provide evidence on whether the SC is a stable counterfactual, but consistently small pseudo-gaps across multiple pseudo-intervention dates support the assumption that the factor structure governing untreated outcomes is stable over time.
5. Inference for Bounds under Synthetic Parallel Trends
When point identification fails (MMM 603, Section 6), we need inference for the identified set $\mathcal{I}_t = [\underline{\tau}_t, \bar{\tau}_t]$. A $(1 - \alpha)$ confidence set for the identified set takes the form:
$$\text{CS}_{1-\alpha} = \left[\underline{\tau}_t - c_\alpha \hat{\sigma}_{\underline{\tau}}, \; \bar{\tau}_t + c_\alpha \hat{\sigma}_{\bar{\tau}}\right]$$where $c_\alpha$ is an appropriate critical value (bootstrap quantile or normal approximation) and $\hat{\sigma}_{\underline{\tau}}, \hat{\sigma}_{\bar{\tau}}$ are estimated standard errors for the bounds.
This confidence set covers the entire identified interval with probability approximately $1-\alpha$—not just a single point. Even when $\text{CS}_{1-\alpha}$ is relatively tight, the treatment effect $\tau_{1t}$ remains only partially identified if $\underline{\tau}_t < \bar{\tau}_t$. Reporting both the bounds and the confidence set makes this distinction transparent.
When DiD and SC give different estimates, analysts should report: point estimates from each method, the identified set bounds $[\underline{\tau}_t, \bar{\tau}_t]$, and the associated confidence set—then discuss which weighting schemes are more credible given the marketing context.
6. Multiple Testing
Testing many post-treatment periods or outcomes raises multiple testing concerns:
- Pointwise inference tests each period separately at level $\alpha$—simple but inflates family-wise error.
- Uniform inference controls error across all periods simultaneously, using either Bonferroni correction (conservative) or sup-$t$ bands (tighter).
For SC with many post-treatment periods, analysts should: report both pointwise and uniform confidence bands, focus on cumulative or average effects rather than period-by-period tests, and use sup-$t$ bands from conformal inference when available.
7. Practical Guidance
The theory delivers five actionable principles:
- Visual placebo-check comparison: Always conduct in-space placebo checks and plot the treated unit’s gap alongside all placebo gaps. Visual comparison is powerful and intuitive.
- Use normalised statistics: Report RMSPE ratios rather than raw gaps. Ratios account for differential pre-treatment fit and make comparisons meaningful across units.
- Formal inference via conformal methods: Use conformal inference for formal confidence intervals. Conformal methods provide valid inference under symmetry conditions on residuals.
- Stability assessment: Conduct in-time placebo checks to assess stability of the synthetic control across different fitting windows.
- Report intervals and identified sets: When point identification is uncertain (e.g. when DiD and SC disagree), report the identified set $[\underline{\tau}_t, \bar{\tau}_t]$ with confidence sets. Define a small number of primary estimands ex ante and treat period-by-period tests as exploratory to keep the multiple-testing burden manageable.
Summary
Inference for Synthetic Control combines design-based and residual-based methods to quantify uncertainty. Permutation-based placebo checks provide the visual foundation—comparing the treated unit’s gap to the distribution of donor gaps via RMSPE ratios. Conformal inference formalises this into confidence intervals without requiring exchangeability of treatment assignment. The analytical variance decomposition reveals that sparse weights inflate idiosyncratic variance while longer pre-treatment windows reduce factor loading uncertainty. When point identification fails, bounds inference under Synthetic Parallel Trends honestly reports the identified set alongside confidence sets. By combining these tools, practitioners can draw credible conclusions without relying solely on large-sample approximations.