MMM 702: Augmented Synthetic Control (ASCM)

The Problem That Motivates Augmentation Return to the five flagship stores piloting a loyalty programme. Standard synthetic control searches for weights that make a convex combination of control stores match the flagships’ pre-treatment revenue trajectory. In this case the optimisation stalls short of that goal. The flagships anchor high-traffic urban centres with customer profiles that no suburban or regional store replicates. Even the best synthetic control undershoots the flagships’ baseline revenue by around 8%. That gap carries forward into the post-treatment period, so the synthetic control estimator mixes the treatment effect with residual bias from imperfect pre-period fit. Augmented synthetic control attacks this problem directly. It pairs the weighting estimator with a regression adjustment that corrects the gap in expected outcomes. If the synthetic control undershoots by 8% in periods where the regression predicts the treated store should be higher, the augmentation shifts the counterfactual up to reflect that systematic difference. The benefit is that we can improve fit when the treated unit sits near, but not squarely inside, the donor convex hull. The cost is model dependence: once we add a regression component, the estimator inherits whatever misspecification that model carries.

The Estimator Let unit 1 be treated and let J index donor stores. Denote by ŵ the weights from the standard synthetic control optimisation, with components $\hat{w}_j$ , and let $\hat{m}_{it}$ be predictions from an auxiliary outcome model for $Y_{it}(0)$, estimated on pre-treatment data for donors and (where available) the treated unit. In practice $\hat{m}_{it}$ often comes from a ridge regression of outcomes on a covariate vector $X_{it}$ . In many applications $X_{it}$ is time-invariant and the t subscript simply keeps the notation consistent with the panel setup. For a post-treatment period $t > T_0$, the augmented counterfactual for the treated unit takes the form

$$ \hat{Y}_{1t}^{ASCM}(0) = \sum_{j \in J} \hat{w}_j Y_{jt} + \hat{m}_{1t} - \sum_{j \in J} \hat{w}_j \hat{m}_{jt}. $$

The first term is the standard synthetic control prediction. The second term corrects for any residual imbalance that the auxiliary model attributes to systematic differences between the treated unit and its synthetic control. When, according to the regression model, the treated unit and the synthetic control have the same expected outcome in period t, this correction vanishes and the estimator collapses back to standard synthetic control. A useful diagnostic is to report the size of the augmentation term, $\hat{m}_{1t} - \sum_{j \in J} \hat{w}_j \hat{m}_{jt}$, relative to the weighted-donor term, $\sum_{j \in J} \hat{w}_j Y_{jt}$. If the augmentation dominates, the estimate is driven primarily by extrapolation through the outcome model rather than by donor interpolation. The period-t treatment effect estimator is

$$ \hat{\tau}_{1t} = Y_{1t} - \hat{Y}_{1t}^{ASCM}(0) = (Y_{1t} - \hat{m}_{1t}) - \sum_{j \in J} \hat{w}_j (Y_{jt} - \hat{m}_{jt}). $$

Why Augmentation Helps — and When It Hurts Augmentation is attractive because it gives us another route to good counterfactuals. If the synthetic control weights alone already capture the relationship between treated and donor units, the augmentation term is small and ASCM behaves much like standard synthetic control. If the weights are imperfect but the regression model captures how covariates predict outcomes across stores and time, the augmentation can correct much of the bias from imperfect pre-treatment fit. This “two chances to get it roughly right” story is sometimes described as a form of double robustness [Ben-Michael et al., 2021]. In our setting, the analogy is that if either (i) the SC weights approximate the untreated counterfactual well or (ii) the regression adjustment $\hat{m}_{it}$ is close to E[$Y_{it}(0)$ | $X_{it}$ ], then ASCM can substantially reduce bias relative to pure SC. In the strict econometric sense, however, ASCM is not generically doubly robust in the way classical missing-data estimators are. In finite marketing panels both components are estimated, both are noisy, and both can be misspecified. The practical message is more modest: augmentation can reduce bias when one of the components tracks the untreated potential outcomes well, but it can increase variance and even bias when both components are off in different directions. A deeper issue is extrapolation. Standard synthetic control constrains the treated unit’s counterfactual to live inside the convex hull of donor outcomes. When the treated store is an outlier relative to donors, this constraint forces you to admit that no credible synthetic control exists. ASCM keeps the convex-hull restriction for the weighted donor outcomes, but then adds a regression correction that can push the augmented counterfactual outside that hull. For the flagship stores, the regression might infer that urban locations systematically load more heavily on an “urban-consumer” factor than any individual donor store. The augmentation then shifts the counterfactual store up to reflect that higher loading. If this model of how covariates map into outcomes is right, extrapolation buys you a better counterfactual. If it is wrong, the same mechanism projects you into the wrong part of outcome space.

Identification Assumptions ASCM targets the same estimand as standard synthetic control: the treatment effect on the treated unit in each post-treatment period, $\tau_{1t}$ = $Y_{1t}(1)$ − $Y_{1t}(0)$. The estimator replaces the unobserved $Y_{1t}(0)$ with the

7.2 Augmented Synthetic Control (ASCM)

augmented counterfactual $\hat{Y}_{1t}^{ASCM}(0)$. The assumptions extend those for synthetic control by adding structure on the regression component. First, the no-anticipation and no-interference conditions introduced in Chapter 6 continue to apply. Pretreatment outcomes for the treated unit must equal their untreated potential outcomes, $Y_{1t}$ = $Y_{1t}(0)$ for t ≤ $T_0$ , and donor outcomes must not be affected by the treated unit’s loyalty programme, so $Y_{jt}$ = $Y_{jt}(0)$ for all donors and all periods. In the loyalty-programme example this rules out substantial competitive responses that meaningfully shift donor revenue paths. Second, we require that the outcome model and the weighting step together approximate the treated unit’s untreated path. A convenient way to express this is to define an outcome model $m_{it}$ := E[$Y_{it}(0)$ | $X_{it}$ ] and residualised untreated outcomes $u_{it}$ := $Y_{it}(0)$ − $m_{it}$ . The ASCM logic is that the weights should approximately balance these residualised outcomes in the pre-treatment period, u1t ≈

X $w_j^*$ $u_{jt}$ , t ≤ $T_0$ , $j \in J$ while the fitted model $\hat{m}_{it}$ provides a stable approximation to $m_{it}$ for imputing untreated outcomes in the post-treatment period. Finally, ASCM needs stability of the regression relationship across the treatment boundary. The outcome model that underlies $\hat{m}_{it}$ , estimated from pre-treatment donor data (and, where relevant, the treated unit), must continue to describe how covariates relate to untreated outcomes in the post-treatment period. Structural breaks that change this mapping—such as a major platform algorithm shift or a sharp macroeconomic shock—will cause the regression component to extrapolate incorrectly. Because we never observe $Y_{1t}(0)$ after treatment, this stability condition cannot be verified directly. You must justify it using institutional knowledge and auxiliary evidence about how $X_{it}$ and outcomes co-move around the intervention. Some parts of this structure can be partially assessed. You can check whether the augmentation improves fit within the pre-treatment period by splitting it into training and validation segments. Estimate the weights and regression on the training segment, predict the validation segment, and compare prediction errors with and without augmentation. If the augmented model consistently worsens pre-period fit, it is unlikely to repair problems after treatment. What you cannot observe is whether the same regression relationship holds once the loyalty programme goes live. That extrapolation across the treatment boundary is, as in many causal designs, the part you must defend with economic argument rather than direct data.

Implementation Implementing ASCM requires three linked choices: the predictors for synthetic control, the covariates for augmentation, and the form of the regression model. The predictors for synthetic control typically include a vector of pre-treatment outcomes and a small set of time-invariant store characteristics, mirroring the baseline synthetic control set-up from Chapter 6. The regression covariates may overlap this set or extend it with transformations, trends, or interactions that capture how store attributes relate to revenue dynamics. Ridge regression is a natural default because it handles collinearity in marketing covariates and shrinks coefficient estimates when we only have a short pre-treatment history. When the covariate set is rich relative to $T_0$ , use sample splitting or cross-fitting for the regression step so that the same noise in pre-treatment outcomes does not drive both weight choice and regression adjustment. In the flagship example, the weighting step might use monthly revenue over twenty-four pre-treatment months along with store characteristics such as square footage, average foot traffic, and a product-mix index. The regression step then uses these same covariates to predict each store’s revenue. You choose the ridge penalty $λ$ by cross-validation within the pre-treatment period, trading off in-sample fit against stability of predictions. Short pre-treatment periods create a tension. With few time points, the regression has limited data and will overfit unless regularised. Ridge and elastic net penalties help by shrinking coefficients towards zero and, in the elastic net case, encouraging sparsity in covariate selection. At the same time, a large donor pool can make the synthetic control weights diffuse, spreading small positive weights across many donors whose characteristics only loosely resemble the flagship stores. In that setting the regression correction does much of the work, effectively pulling the counterfactual towards what the model thinks an “urban flagship” should look like given its covariates. When the treated unit’s characteristics sit far from the donor distribution, that correction becomes a strong extrapolation and should be interpreted with care. A sensible validation strategy mirrors what you did for basic synthetic control. Split the pre-treatment period into a training block and a holdout block. Estimate synthetic control weights and the regression on the training block, use them to predict the holdout block, and compare root mean squared prediction error with and without augmentation. This checks predictive stability within the untreated regime. It does not validate that the same relationship will hold once treatment starts. If ASCM does not improve prediction in the holdout, there is little reason to trust it in the post-treatment period. You can also run placebo checks with pseudo intervention dates inside the pre-period, exactly as in Chapter 6, and examine whether the resulting pseudo treatment effects concentrate near zero. In the ASCM setting, placebo checks stress-test the weighting and regression components: large pseudo-effects indicate that at least one of them is capturing noise or unstable structure rather than the untreated trajectory.

Comparing ASCM to Alternatives ASCM is most attractive when the treated unit lies near the boundary of the donor convex hull and simple synthetic control cannot achieve good pre-treatment fit. In that setting the regression adjustment can use observed differences in store attributes and pre-trends to repair part of the mismatch. The trade-off is that ASCM is more model dependent: any misspecification in the regression component directly feeds into the adjusted counterfactual. Good empirical practice is to treat ASCM as one estimator in a small ensemble rather than as a replacement for simpler designs. You can run standard synthetic control, event-study difference-in-differences, and ASCM

7.2 Augmented Synthetic Control (ASCM)

on the same loyalty-programme experiment, all targeting the same post-treatment ATT for the flagship stores. If the three estimators tell a consistent story, you gain confidence that conclusions are not driven by the modelling choices specific to any one method. If they diverge, that divergence is itself information. For example, large differences between ASCM and standard SC with similar pre-period fit point to sensitivity to the regression specification, whereas differences between SC and event-study DiD with similar covariates highlight tension between convex-hull and parallel-trends assumptions. In that case the right response is not to pick the most “favourable” estimate but to diagnose why the methods disagree and to reflect that uncertainty in how you present the results. The next section turns to a different way of addressing poor fit. Rather than adding a regression correction on top of synthetic control, regularised synthetic control modifies the weight construction itself, shrinking towards simpler weighting schemes or enforcing explicit balance constraints.

References

Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 7.2.