MMM 606: Extensions to Synthetic Control – ASCM and SDiD
In the previous posts, we covered the core foundations of Synthetic Control (MMM 601-602), Identification Theory (MMM 603), Inference (MMM 604), and Diagnostics (MMM 605). While the canonical Synthetic Control Method (SCM) is powerful, it has limitations, particularly when pre-treatment fit is imperfect or when dealing with multiple treated units.
In this post, we explore two major modern extensions that address these limitations and provide greater robustness in practice: Augmented Synthetic Control Metrics (ASCM) and Synthetic Difference-in-Differences (SDiD).
1. The Pre-Treatment Fit Problem
Recall from MMM 603 that the bias in standard SCM depends heavily on how well the synthetic control tracks the treated unit in the pre-treatment period. SCM enforces non-negative weights that sum to one. While this constraint prevents dangerous extrapolation and ensures a convex combination of donors, it can also lead to poor pre-treatment fit if the treated unit is an outlier (e.g., has higher baseline averages than any single donor).
When the pre-treatment fit is poor, the assumption that the synthetic control will perfectly trace the counterfactual post-treatment trajectory breaks down.
2. Augmented Synthetic Control (ASCM)
The Augmented Synthetic Control Method (ASCM), introduced by Ben-Michael, Feller, and Roth (2021), directly tackles the problem of imperfect pre-treatment fit via a bias correction mechanism.
The Intuition
ASCM uses an outcome model (typically ridge regression) to estimate the bias resulting from the fact that the synthetic control does not perfectly match the treated unit’s pre-treatment characteristics. It then adds this bias estimate back to the standard SCM estimate.
The Estimator
Let $\hat{Y}_{1t}^{\text{syn}} = \sum_{j} w_j^* Y_{jt}$ be the standard synthetic control prediction. Let $\hat{m}_{jt}$ be a prediction for unit $j$ at time $t$ derived from a regularised outcome model trained on the donor pool.
The ASCM estimator adjusts the standard prediction:
$$ \hat{Y}_{1t}^{\text{aug}} = \sum_{j} w_j^* Y_{jt} + \left( \hat{m}_{1t} - \sum_j w_j^* \hat{m}_{jt} \right) $$The term in the parentheses is the bias correction. It is the difference between the model’s prediction for the treated unit and the model’s prediction for the synthetic control.
Key Properties
- Double Robustness: ASCM is consistent if either the underlying factor model (SCM weights) is correct, or the regularised outcome model is correct.
- Extrapolation: By adding an intercept and linear adjustments, ASCM implicitly allows for some extrapolation outside the convex hull of the donors, fixing cases where the treated unit has a slightly higher baseline.
- Negative Weights: The resulting implicit weights in ASCM can be negative, which breaks the strict non-negativity constraint of pure SCM but dramatically improves fit.
3. Synthetic Difference-in-Differences (SDiD)
While ASCM focuses on bias correction, Synthetic Difference-in-Differences (SDiD), introduced by Arkhangelsky et al. (2021), merges the strengths of both canonical Difference-in-Differences (DiD) and Synthetic Control.
The Intuition
- DiD removes permanent differences between units using unit fixed effects and common time shocks using time fixed effects, assigning equal weight to all donors.
- SCM finds a weighted average of donors to match pre-treatment trends but typically does not include an additive unit fixed effect, making it sensitive to baseline level shifts.
SDiD introduces both unit weights (to find good donor matches like SCM) and time weights (to emphasize pre-treatment periods that best predict post-treatment periods), integrated directly into a two-way fixed effects framework.
The Estimator
The SDiD estimator solves a weighted optimization problem where the standard DiD equation is weighted by $\hat{\omega}_i$ (unit weights) and $\hat{\lambda}_t$ (time weights):
$$ (\hat{\tau}^{\text{sdid}}, \hat{\mu}, \hat{\alpha}, \hat{\beta}) = \arg \min_{\tau, \mu, \alpha, \beta} \sum_{i,t} \hat{\omega}_i \hat{\lambda}_t \left( Y_{it} - \mu - \alpha_i - \beta_t - \tau D_{it} \right)^2 $$Where $D_{it}$ is the treatment indicator.
Unit and Time Weights Formation
- Unit Weights ($\hat{\omega}$): Similar to SCM, weights are optimized to match the treated unit’s pre-treatment trajectory, but they include an intercept (which handles baseline differences, acting like the unit fixed effect in DiD) and utilize $L_2$ regularization to ensure dispersion.
- Time Weights ($\hat{\lambda}$): These weights are optimized to make the pre-treatment average of the donor pool match the post-treatment average of the donor pool. Periods that look more like the post-treatment era get higher weight.
Key Advantages of SDiD
- Handles Baseline Differences: The intercept in the weight calculation makes SDiD robust to treated units that sit outside the bounds of the donors in absolute levels.
- Robustness to Large Panels: It performs exceptionally well in large-scale panel data where DiD often fails parallel trends and standard SCM suffers from overfitting.
- Inference: Because it sits within a regression framework, standard errors can be derived theoretically and evaluated using standard bootstrapping techniques, making inference more streamlined than canonical SCM.
4. Comparing the Choices
When should you use standard SCM versus ASCM versus SDiD in MMM applications?
| Method | Best Use Case | Handling of Baseline Shifts | Extrapolation |
|---|---|---|---|
| Standard SCM | Small $N$, clear structural breaks, strict interpretation needs. | Poor (fails convex hull). | None (strictly bounded). |
| ASCM | Good donor pool but slight pre-treatment mismatch; need double robustness. | Moderate (adjusts via outcome model). | Mild extrapolation permitted. |
| SDiD | Large panel data with many donors and periods; structural fixed effects exist. | Excellent (intercept naturally handles it). | Implicit via intercept shifts. |
5. Practical Implementation Guidelines
In practice, data scientists should rarely rely solely on canonical SCM without running these extensions as sensitivity checks.
- Always Check the Baseline Level: If your treated unit has a consistently higher baseline than all donors, standard SCM will fail. You must use SDiD or ASCM, or manual de-meaning before applying SCM.
- Use Established Packages:
- For ASCM, use the
augsynthpackage in R. - For SDiD, use the
synthdidpackage in R or Python ports.
- For ASCM, use the
- Compare Estimates: Run Canonical DiD, Canonical SCM, and SDiD. If they align, your confidence in the causal effect should be very high. If SDiD significantly diverges from SCM, inspect the SDiD time weights and the ASCM bias correction term to understand why.
Summary
The toolkit for synthetic control has moved far beyond the original 2010 formulation. Augmented Synthetic Control provides a principled way to correct for imperfect pre-treatment fit via an outcome model. Synthetic Difference-in-Differences elegantly bridges two distinct causal inference traditions, bringing unit weights, time weights, and fixed effects into a single robust estimator. By incorporating these extensions, causal inference in Marketing Mix Modeling becomes significantly more reliable, especially when faced with imperfect quasi-experiments.