MMM 606: Extensions to Synthetic Control – ASCM and SDiD

In the previous posts, we covered the core foundations of Synthetic Control (MMM 601-602), Identification Theory (MMM 603), Inference (MMM 604), and Diagnostics (MMM 605). While the canonical Synthetic Control Method (SCM) is powerful, it has limitations, particularly when pre-treatment fit is imperfect or when dealing with multiple treated units.

In this post, we explore two major modern extensions that address these limitations and provide greater robustness in practice: Augmented Synthetic Control Metrics (ASCM) and Synthetic Difference-in-Differences (SDiD).

1. The Pre-Treatment Fit Problem

Recall from MMM 603 that the bias in standard SCM depends heavily on how well the synthetic control tracks the treated unit in the pre-treatment period. SCM enforces non-negative weights that sum to one. While this constraint prevents dangerous extrapolation and ensures a convex combination of donors, it can also lead to poor pre-treatment fit if the treated unit is an outlier (e.g., has higher baseline averages than any single donor).

When the pre-treatment fit is poor, the assumption that the synthetic control will perfectly trace the counterfactual post-treatment trajectory breaks down.

2. Augmented Synthetic Control (ASCM)

The Augmented Synthetic Control Method (ASCM), introduced by Ben-Michael, Feller, and Roth (2021), directly tackles the problem of imperfect pre-treatment fit via a bias correction mechanism.

The Intuition

ASCM uses an outcome model (typically ridge regression) to estimate the bias resulting from the fact that the synthetic control does not perfectly match the treated unit’s pre-treatment characteristics. It then adds this bias estimate back to the standard SCM estimate.

The Estimator

Let $\hat{Y}_{1t}^{\text{syn}} = \sum_{j} w_j^* Y_{jt}$ be the standard synthetic control prediction. Let $\hat{m}_{jt}$ be a prediction for unit $j$ at time $t$ derived from a regularised outcome model trained on the donor pool.

The ASCM estimator adjusts the standard prediction:

$$ \hat{Y}_{1t}^{\text{aug}} = \sum_{j} w_j^* Y_{jt} + \left( \hat{m}_{1t} - \sum_j w_j^* \hat{m}_{jt} \right) $$

The term in the parentheses is the bias correction. It is the difference between the model’s prediction for the treated unit and the model’s prediction for the synthetic control.

Key Properties

Double Robustness: ASCM is consistent if either the underlying factor model (SCM weights) is correct, or the regularised outcome model is correct.
Extrapolation: By adding an intercept and linear adjustments, ASCM implicitly allows for some extrapolation outside the convex hull of the donors, fixing cases where the treated unit has a slightly higher baseline.
Negative Weights: The resulting implicit weights in ASCM can be negative, which breaks the strict non-negativity constraint of pure SCM but dramatically improves fit.

3. Synthetic Difference-in-Differences (SDiD)

While ASCM focuses on bias correction, Synthetic Difference-in-Differences (SDiD), introduced by Arkhangelsky et al. (2021), merges the strengths of both canonical Difference-in-Differences (DiD) and Synthetic Control.

The Intuition

DiD removes permanent differences between units using unit fixed effects and common time shocks using time fixed effects, assigning equal weight to all donors.
SCM finds a weighted average of donors to match pre-treatment trends but typically does not include an additive unit fixed effect, making it sensitive to baseline level shifts.

SDiD introduces both unit weights (to find good donor matches like SCM) and time weights (to emphasize pre-treatment periods that best predict post-treatment periods), integrated directly into a two-way fixed effects framework.

The Estimator

The SDiD estimator solves a weighted optimization problem where the standard DiD equation is weighted by $\hat{\omega}_i$ (unit weights) and $\hat{\lambda}_t$ (time weights):

$$ (\hat{\tau}^{\text{sdid}}, \hat{\mu}, \hat{\alpha}, \hat{\beta}) = \arg \min_{\tau, \mu, \alpha, \beta} \sum_{i,t} \hat{\omega}_i \hat{\lambda}_t \left( Y_{it} - \mu - \alpha_i - \beta_t - \tau D_{it} \right)^2 $$

Where $D_{it}$ is the treatment indicator.

Unit and Time Weights Formation

Unit Weights ($\hat{\omega}$): Similar to SCM, weights are optimized to match the treated unit’s pre-treatment trajectory, but they include an intercept (which handles baseline differences, acting like the unit fixed effect in DiD) and utilize $L_2$ regularization to ensure dispersion.
Time Weights ($\hat{\lambda}$): These weights are optimized to make the pre-treatment average of the donor pool match the post-treatment average of the donor pool. Periods that look more like the post-treatment era get higher weight.

Key Advantages of SDiD

Handles Baseline Differences: The intercept in the weight calculation makes SDiD robust to treated units that sit outside the bounds of the donors in absolute levels.
Robustness to Large Panels: It performs exceptionally well in large-scale panel data where DiD often fails parallel trends and standard SCM suffers from overfitting.
Inference: Because it sits within a regression framework, standard errors can be derived theoretically and evaluated using standard bootstrapping techniques, making inference more streamlined than canonical SCM.

4. Comparing the Choices

When should you use standard SCM versus ASCM versus SDiD in MMM applications?

Method	Best Use Case	Handling of Baseline Shifts	Extrapolation
Standard SCM	Small $N$, clear structural breaks, strict interpretation needs.	Poor (fails convex hull).	None (strictly bounded).
ASCM	Good donor pool but slight pre-treatment mismatch; need double robustness.	Moderate (adjusts via outcome model).	Mild extrapolation permitted.
SDiD	Large panel data with many donors and periods; structural fixed effects exist.	Excellent (intercept naturally handles it).	Implicit via intercept shifts.

5. Practical Implementation Guidelines

In practice, data scientists should rarely rely solely on canonical SCM without running these extensions as sensitivity checks.

Always Check the Baseline Level: If your treated unit has a consistently higher baseline than all donors, standard SCM will fail. You must use SDiD or ASCM, or manual de-meaning before applying SCM.
Use Established Packages:
- For ASCM, use the augsynth package in R.
- For SDiD, use the synthdid package in R or Python ports.
Compare Estimates: Run Canonical DiD, Canonical SCM, and SDiD. If they align, your confidence in the causal effect should be very high. If SDiD significantly diverges from SCM, inspect the SDiD time weights and the ASCM bias correction term to understand why.

Summary

The toolkit for synthetic control has moved far beyond the original 2010 formulation. Augmented Synthetic Control provides a principled way to correct for imperfect pre-treatment fit via an outcome model. Synthetic Difference-in-Differences elegantly bridges two distinct causal inference traditions, bringing unit weights, time weights, and fixed effects into a single robust estimator. By incorporating these extensions, causal inference in Marketing Mix Modeling becomes significantly more reliable, especially when faced with imperfect quasi-experiments.