MMM 704: Synthetic Difference-in-Differences (SDID)

The Problem That Motivates Time Weights Let us return to the brand launching campaigns in twenty markets over three years. Markets adopt treatment at different times: some in Q1 2022, others in Q3 2022, still others in Q2 2023. Standard synthetic control matches each treated market to a weighted combination of controls based on pre-treatment outcomes. The difficulty is that pre-treatment windows differ across cohorts. Early adopters have long pre-treatment histories. Late adopters have short ones. Their pre-treatment periods also overlap different seasonal patterns, macro conditions and competitive environments. Standard synthetic control treats all pre-treatment periods equally. If a treated market experiences an unusual spike in month eight because of a local event, that spike receives as much weight in the matching objective as any other month. The optimisation then seeks donors that also spiked in month eight, perhaps for entirely unrelated reasons. The resulting match reflects coincidence rather than structural similarity. Synthetic difference-in-differences, introduced by Arkhangelsky et al. [2021], addresses this by weighting time periods as well as units. Periods with idiosyncratic shocks receive lower weight. The matching objective focuses on periods when treated and control markets behave more comparably. The result is a synthetic control that reflects stable patterns rather than particular quirks of the calendar.

The Estimator Let $Y_{it}$ denote the outcome for unit i in period t. Let I be the set of treated units and J the set of control units. SDID constructs two sets of weights from the pre-treatment data: unit weights $w_j$ for control units $j \in J$ and time weights $v_t$ for pre-treatment periods $t \in T_{\mathrm{pre}}$. Both sets satisfy convexity constraints, with non-negative weights that sum to one. One convenient formulation chooses weights by minimising a regularised, weighted squared-error loss over the pre-treatment panel,

$$ \min_{w, v} \sum_{i \in I} \sum_{t \in T_{\mathrm{pre}}} v_t \left(Y_{it} - \sum_{j \in J} w_j Y_{jt}\right)^2 + \eta_w \|w\|_2^2 + \eta_v \|v\|_2^2, $$

subject to $w_j \ge 0$ and $\sum_{j \in J} w_j = 1$, and similarly $v_t \ge 0$ and $\sum_{t \in T_{\mathrm{pre}}} v_t = 1$. Here $\|w\|_2^2 := \sum_j w_j^2$ and $\|v\|_2^2 := \sum_t v_t^2$. The regularisation parameters $\eta_w$ and $\eta_v$ control how strongly we penalise concentrated unit and time weights. Under the simplex constraints, minimising these norms pushes the solution towards more diffuse weights, down-weighting reliance on any single donor or any single pre-treatment period. Given the estimated unit weights $\hat{w}$ and time weights $\hat{v}$, SDID estimates an average treatment effect on the treated by comparing doubly-differenced means. The SDID estimator targets a single scalar ATT summary over a user-chosen post-treatment window.

To make the averaging explicit, let $T_{\mathrm{post}}$ denote the post-treatment periods included in the summary, and define

$$ \bar{Y}_{\mathrm{treated,post}} := \frac{1}{|I| |T_{\mathrm{post}}|} \sum_{i \in I} \sum_{t \in T_{\mathrm{post}}} Y_{it}, $$

$$ \bar{Y}_{j,\mathrm{post}} := \frac{1}{|T_{\mathrm{post}}|} \sum_{t \in T_{\mathrm{post}}} Y_{jt}, $$

$$ \bar{Y}_{\mathrm{treated},t} := \frac{1}{|I|} \sum_{i \in I} Y_{it}. $$

One convenient representation of the estimator is

$$ \widehat{\mathrm{ATT}}^{\mathrm{SDID}} = \bar{Y}_{\mathrm{treated,post}} - \sum_{j \in J} \hat{w}_j \bar{Y}_{j,\mathrm{post}} - \sum_{t \in T_{\mathrm{pre}}} \hat{v}_t \bar{Y}_{\mathrm{treated},t} + \sum_{j \in J} \sum_{t \in T_{\mathrm{pre}}} \hat{w}_j \hat{v}_t Y_{jt}. $$

What SDID Does Differently The central innovation is the time weights. Standard synthetic control assigns equal importance to every pre-treatment period and asks which donors match the treated unit’s entire trajectory. When that trajectory includes strong idiosyncratic shocks, the resulting match may be driven by local coincidences rather than by enduring relationships. Difference-in-differences, by contrast, assigns equal weight to all units and periods and relies on unit and time fixed effects to absorb level differences. When units differ in their exposure to time-varying shocks, the unweighted parallel trends assumption can fail. SDID relaxes both rigidities. The unit weights allow the estimator to focus on control units that look like the treated markets on observables and pre-treatment dynamics. The time weights allow the estimator to focus on periods when treated and control units are most comparable. This double flexibility is attractive in marketing applications where treated and control markets face different seasonal or macro environments, or where the pre-treatment period includes one-off shocks. Apply this to the campaign launch. Suppose treated markets include both Sun Belt cities with strong summer sales and Midwestern cities with strong holiday sales. A single unregularised synthetic control will often struggle to match both patterns, because no fixed set of donor weights fits the entire year for both cohorts. SDID instead estimates time weights that down-weight the months when Sun Belt and Midwest diverge most sharply and puts more emphasis on spring and autumn, when their patterns converge. The resulting synthetic control captures the common trend without being dominated by region-specific seasonality.

7.4 Synthetic Difference-in-Differences (SDID)

Identification SDID targets the same type of estimand as DiD, namely an average treatment effect on the treated over the chosen post-treatment window, but under a weighted version of the parallel trends assumption. Let $Y_{it}(0)$ denote the untreated potential outcome. Weighted parallel trends requires that, for each post-treatment period t, there exist population weights $w_j^*$ on control units and $v_s^*$ on pre-treatment periods such that

$$ E\left[ Y_{it}(0) - \sum_{j \in J} w_j^* Y_{jt}(0) - \sum_{s \in T_{\mathrm{pre}}} v_s^* \left( Y_{is}(0) - \sum_{j \in J} w_j^* Y_{js}(0) \right) \;\bigg|\; i \in I \right] = 0. $$

Expectations are taken over the sampling distribution (or a superpopulation model) for outcomes, with $i \in I$ indicating the treated group. In words, after reweighting pre-treatment periods and control units, treated units and their synthetic control would display the same change from the weighted pre-treatment mean in the absence of treatment. This condition weakens unweighted parallel trends along one dimension and strengthens it along another. It is weaker in that we require parallelism only after reweighting, which can accommodate differential exposure to shocks that are balanced by the unit and time weights. It is stronger in that it depends on the existence of suitable weights and on the estimator’s ability to approximate them from the pre-treatment data. As in the generalised parallel trends discussion in Chapter 4, you trade an unconditional assumption for a weighted one and pay for that flexibility by estimating the weights. By contrast, unweighted parallel trends would require, for a fixed reference pre-treatment period $t_0$, $E[Y_{it}(0) - Y_{it_0}(0) \mid i \in I] = E[Y_{jt}(0) - Y_{jt_0}(0) \mid j \in J]$ for all post-treatment t. SDID replaces this with a reweighted version that holds only after applying $w_j^*$ and $v_s^*$. Two further ingredients are implicit and should be remembered from earlier chapters. First, the usual no-anticipation and no-interference conditions apply. Write $G_i$ for the adoption time of unit i. Units must not adjust behaviour in anticipation of the campaign, so outcomes must satisfy $Y_{it} = Y_{it}(0)$ for all $t < G_i$. Control markets must remain unaffected by others’ campaigns. Second, the mapping from unit and time characteristics into untreated outcomes must be sufficiently stable across the pre- and post-treatment periods that weights estimated from pre-treatment data remain informative after treatment. Whether this trade-off is favourable depends on the data. When the donor pool is large and the pretreatment period contains enough informative variation, the optimisation can find weights that balance pre-treatment paths in a way that plausibly generalises. When the donor pool is thin or the pre-treatment window is short, the weight estimates become noisy and the weighted parallel trends condition becomes more of a modelling claim than a restriction you can pressure-test.

Implementation Implementing SDID in the campaign example requires the same design choices as DiD and SC, plus regularisation parameters. You must specify the donor pool, the pre-treatment window, the post-treatment evaluation window, and how to tune ηw and ηv . For a staggered launch over three years, the donor pool includes markets that never receive the campaign and those that have not yet received it at a given event time, mirroring the staggered DiD set-up in Chapter 4. The pre-treatment window should be long enough to identify stable weights; in marketing panels this typically means at least a year of monthly data. The post-treatment window covers the horizon over which you care about effects, for example the first year after launch. The regularisation parameters control the bias–variance trade-off for the weights. Larger values shrink unit and time weights towards something close to uniform, improving stability but reducing the estimator’s ability to exploit heterogeneity across markets and time. A practical choice again uses cross-validation on the pre-treatment panel. Split the pre-treatment period into training and validation segments, estimate SDID weights over a grid of ηw , ηv values on the training segment, and select the pair that minimises prediction error on the validation segment. This can reduce overfitting to pre-period noise, but it cannot validate post-treatment counterfactuals. After tuning, verify that the chosen weights deliver acceptable pre-treatment RMSPE, covariate balance, and placebo behaviour, using the diagnostics in Section 6.5. In a typical campaign application, this procedure yields moderate regularisation. The resulting unit weights concentrate on a handful of donor markets with similar demographics and baseline sales, while the time weights down-weight months dominated by unrelated promotions or macro shocks. The SDID estimate then typically sits between the DiD and SC estimates in this campaign example, with tighter standard errors than DiD when the weights succeed in removing noise and a more plausible level than SC when plain SC suffers from poor pre-treatment fit. This is not a general guarantee, but rather a diagnostic pattern in many marketing panels. The point is not that SDID will always land between the two, but that its performance is diagnostic of how much the double weighting buys you in a particular design.

Connection to Factor Models SDID fits naturally into the factor model perspective developed in Chapter 6. If untreated potential outcomes admit a low-rank representation with unit-specific loadings on a small number of time-varying factors, then unit weights can be interpreted as approximating the treated units’ factor loadings by convex combinations of donor loadings, while time weights emphasise pre-treatment periods that are most informative about the factors that matter for the post-treatment comparison. Together, the weights aim to reconstruct the relevant part of the low-rank structure that governs untreated outcomes for the treated units. This perspective clarifies when SDID is likely to work well. When the outcome matrix is close to low rank and a moderate number of factors explain most of the variation, the doubly weighted averages can track the treated units’ latent factors and produce credible counterfactuals. When outcomes are driven by many

7.4 Synthetic Difference-in-Differences (SDID)

independent shocks with weak common structure, no choice of unit and time weights can summarise the data effectively, and SDID may fail to fit pre-treatment paths. As with SC and ASCM, pre-treatment fit and balance diagnostics provide the main empirical check on whether the factor-structure story is plausible.

Costs and Limitations SDID buys flexibility at a cost. The estimator is more complex than basic SC or DiD because it requires two sets of weights, two regularisation parameters and an additional set of diagnostics. Explaining a timeweighted, unit-weighted double difference to a non-technical stakeholder is harder than explaining a simple before–after comparison or a single-unit SC gap plot. The dependence on estimated weights also creates fragility. When the pre-treatment period is short, the optimisation problem for the weights is underpowered and the resulting ŵ and v̂ are imprecise. In that case, the weighted parallel trends condition becomes largely untestable in practice. When the donor pool is small, the unit weights have limited room to move and tend towards simple averages, pushing SDID back towards a DiD-like comparison with only modest gains from the synthetic component. Staggered adoption adds further layers. In designs with multiple cohorts, you are effectively estimating separate sets of weights by cohort and then aggregating cohort-specific effects into an overall effect. As discussed in Chapter 4, the aggregation scheme matters: event-time averages, population-weighted averages and variance-weighted averages can all yield different summaries. SDID does not remove these aggregation issues. It provides an alternative set of cohort-level estimates that you can plug into the same aggregation framework.

When to Use SDID SDID is most useful when you have a moderate number of treated units, a donor pool rich enough to support meaningful unit reweighting, and a pre-treatment period long enough to support meaningful time reweighting. It shines in settings where treated and control markets face different seasonal patterns or macro conditions, and where pre-treatment periods contain idiosyncratic shocks that you would like the design to down-weight rather than fit. It is not a default. When the donor pool is very small or the pre-treatment period is short, the weight estimates will be noisy and the extra complexity may not buy you much beyond carefully specified DiD or augmented SC. In those cases, the simpler methods discussed earlier in the chapter often deliver more stable answers and are easier to explain. In practice, you can treat SDID as another estimator in the same ensemble that already includes SC, ASCM and DiD. Applying all of them to the same campaign, inspecting pre-treatment diagnostics and comparing estimated treatment paths gives you a richer picture of how sensitive your conclusions are to the way you weight units and time. When SDID and the simpler methods agree, you gain confidence that your substantive conclusions are not driven by the extra modelling structure. When they diverge, that divergence is itself informative and should be reflected openly in how you present and interpret the results.

7.5 Triply Robust Panel (TROP) Estimators

References

Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 7.4.