MMM 706: Identification and Assumptions

Hybrid methods build on the same basic causal ingredients as synthetic control and difference-in-differences. They still ask what the treated units’ outcomes would have been in the absence of treatment, still rely on comparisons to donor units, and still use pre-treatment data to learn relationships they then extrapolate into the post-treatment period. What changes is how these relationships are modelled and how much structure is imposed. This section states the identification assumptions that hybrids inherit from earlier chapters, highlights what each hybrid adds on top, and connects these requirements to the factor-model perspective developed in Chapter 6. For clarity we focus on a single treated unit with index 1 and donor units indexed by j ∈ J , as in the basic synthetic control setup in Chapter 6. Extension to multiple treated units proceeds by averaging unit-specific effects, as in the DiD and SC chapters. Throughout, let $T_0$ denote the last pre-treatment period. Treatment begins after $T_0$ . The primary estimand is the post-treatment path effect $\tau_{1t}$ = $Y_{1t}(1)$ − $Y_{1t}(0)$ for t > $T_0$ . Later sections summarise this path by averaging over post-treatment windows or aggregating across treated units. The core identification logic for hybrid estimators mirrors that of synthetic control. We require that, in expectation, the untreated potential outcome for the treated unit can be represented by a combination of donor outcomes and a model-based adjustment that we can estimate from pre-treatment data and plausibly extrapolate to post-treatment periods. Schematically, we can write untreated outcomes for the treated unit as a sum of a weighted donor component and a model-based adjustment. For example, for augmented synthetic control we work with a representation of the form:

$$Y_{1t}(0) \approx \sum_{j \in J} w_j^* Y_{jt}(0) + X_{1t} \beta^* + \varepsilon_{1t}$$

where $w_j^*$ are population weights and $\beta^*$ are outcome-model parameters. For SDID, we work with a representation that combines unit weighting with additive time adjustments:

$$Y_{1t}(0) \approx \sum_{j \in J} w_j^* Y_{jt}(0) + \alpha_1 + \lambda_t + \varepsilon_{1t}$$

with $\alpha_1$ a unit effect and $\lambda_t$ a time effect. In SDID the time weights vt defined in Section 7.4 determine which pre-treatment periods matter most for learning the comparison, rather than redefining time effects. These expressions are approximations that summarise the rôle of weights and adjustments. They are best read as imputation decompositions used to motivate estimation, not as causal structural models for $Y_{it}(0)$. In the pre-treatment period we estimate the relevant weights and adjustment parameters from observed data. In the post-treatment period we hold those relationships fixed and use them to construct counterfactuals for $Y_{1t}(0)$.

Generic Assumptions All hybrid methods in this chapter inherit a common set of identification conditions from the general framework in Chapter 2, the synthetic control chapter and the DiD chapter. We restate them here in compact form to fix ideas and to emphasise that hybrids do not relax these fundamentals. Assumption 20 (No Anticipation in Pre-Treatment Periods) For the treated unit, pre-treatment outcomes coincide with untreated potential outcomes: $Y_{1t}$ = $Y_{1t}(0)$

for all t ≤ $T_0$ .

No anticipation, introduced in Chapter 2, rules out behavioural responses before the recorded treatment date. If there is evidence that outcomes start to move in response to the campaign before $T_0$ , then those periods do not provide clean information about the untreated trajectory. In practice, you respond by redefining the intervention to include the anticipation period or by restricting the pre-treatment window used to estimate weights and adjustments to earlier periods. Assumption 21 (No Interference or Explicit Exposure Modelling) Either the treatment applied to the treated unit does not affect donor outcomes, $Y_{jt}$ = $Y_{jt}(0)$

for all j ∈ J , t = 1, . . . , T, or spillovers are represented through an exposure mapping hj (D−j,t ) with spillover-aware potential outcomes $Y_{jt}$ (d, h), and the analysis targets effects holding the exposure process fixed. This is the same no-spillovers component of SUTVA that underpins synthetic control and DiD. If the treated unit’s campaign materially changes competitors’ behaviour or market conditions in donor units, then donor outcomes no longer represent valid counterfactuals. Hybrid designs cannot solve this problem mechanically. The remedy remains design-based: curate the donor pool, impose geographic or competitive buffers, or build an explicit exposure model. Assumption 22 (Pre/Post Stability of the Imputation Relationship) The relationships used to impute $Y_{1t}(0)$ from donor outcomes, covariates, and any adjustment model remain stable across the treatment boundary. In particular, interpret the weights and adjustment parameters used by a given hybrid method as converging to population limits $w^*$ and $\psi^*$ learned from pre-treatment data. We require that the same imputation rule that fits the pre-treatment untreated outcomes continues to apply to $Y_{1t}(0)$ after $T_0$ . Stability says that the way we map donor outcomes, predictors, and weights into counterfactual outcomes does not jump at the treatment date. For ASCM, this means the outcome regression extrapolates sensibly beyond the pre-treatment window. For SDID, it means that the reweighted two-way fixed-effects structure that fits pre-treatment data remains a good approximation afterwards. For factor-based hybrids such as TROP, it means that the factor structure learnt from pre-treatment data continues to describe untreated outcomes in the post-treatment period. We cannot test this directly for $Y_{1t}(0)$, but placebo checks and crossvalidation within the pre-treatment period provide indirect evidence on whether the mechanism generalises.

7.6 Identification and Assumptions

Assumption 23 (Overlap and Feasibility) There exist unit weights $w_j^*$, augmentation parameters $\psi^*$, and where relevant, time weights $v_t^*$ such that the hybrid representation achieves good pre-treatment fit for the treated unit, with stable and interpretable weights. Overlap here means that the treated unit is not so extreme that no combination of donors and adjustment terms can approximate its untreated path. In the pure SC case this reduces to the convex-hull condition on factor loadings; in hybrids it adds the requirement that the chosen adjustment model and time weights can repair any residual mismatch without creating implausible extrapolation. In practice you diagnose this by inspecting pre-treatment RMSPE relative to the scale of Y , by plotting pre-treatment paths for the treated unit and its synthetic counterpart and by examining how concentrated and specification-sensitive the weights are. There is no universal numerical threshold that guarantees identification. The rule of thumb is that pretreatment discrepancies should be small relative to both the natural volatility of outcomes and the size of the effects you care about, and that small perturbations in the design should not cause weights or fits to swing wildly.

Method-Specific Assumptions On top of these generic conditions, each hybrid introduces its own additional structure. These method-specific assumptions are where identification gains or breaks. For ASCM, the key extra ingredient is the outcome regression. As discussed in Section 7.2, within a suitable factor-model framework the estimator can remain close to unbiased if either the SC weights achieve good balance or the regression model accurately captures the remaining imbalance, even if the other component is imperfect. In finite marketing panels both components will typically be imperfect, so the practical interpretation is more modest: ASCM buys you another route to good counterfactuals, not a guarantee that one of the two is correct. For a formal treatment of the augmentation logic, see [Ben-Michael et al., 2021]. For ridge and balancing SC, the additional structure comes from regularisation [Doudchenko and Imbens, 2016]. Ridge SC shrinks weights towards more diffuse configurations; balancing SC enforces explicit covariatebalance constraints. Identification then hinges on choosing the penalty or tolerance so that you do not regularise away the very heterogeneity you need to capture. The stability assumption in Assumption 22 must hold for the particular regularised weights chosen by cross-validation or other tuning rules, not just for some hypothetical unregularised solution. For SDID, the central new requirement is weighted parallel trends. After applying population unit weights and time weights $w_j^*$ and $v_s^*$, the doubly differenced untreated outcomes for treated units must evolve in parallel to those of the weighted donors, as set out formally in Section 7.4. This assumption is weaker than unweighted parallel trends, because it allows you to reweight the comparison group and pre-treatment periods to balance differential exposure to shocks. It is stronger than simply observing good pre-treatment fit, because it asserts that the reweighted relationship continues to hold for $Y_{1t}(0)$ after treatment. See Section 7.4 for the formal weighted parallel-trends condition.

For TROP, the method-specific assumption is the factor structure discussed in Section 7.5. Untreated potential outcomes must admit a low-rank decomposition into unit and time effects plus a small number of latent factors. Triple robustness – in the sense that bias can be expressed as a product of unit imbalance, time imbalance and factor-model error – operates inside this factor framework. If outcomes in fact do not exhibit a strong low-rank structure, the factor-model error component of the product-of-errors bound is large, so even small unit and time imbalance can generate substantial bias.

Connection to Factor and Imputation Models The factor-model perspective from Chapter 6 provides a unifying lens for these identification assumptions. Suppose untreated potential outcomes follow a factor structure:

$$Y_{it}(0) = \alpha_i + \lambda_t + \sum_{r=1}^{R} \lambda_{ir} f_{tr} + \varepsilon_{it}$$

with a small number of common factors $f_{tr}$ and unit-specific loadings $\lambda_{ir}$. Synthetic control implicitly constrains the treated unit’s loadings to lie in the convex hull of donor loadings: $\lambda_{1r} = \sum_j w_j \lambda_{jr}$ for each factor $r$, with non-negative weights summing to one. This convexity delivers interpretability but can fail when the treated unit sits near or outside the hull. Interactive fixed-effects and matrix-completion methods estimate loadings freely without convexity constraints. They can match pre-treatment paths very closely, but rely on regularisation and factor-rank restrictions to avoid overfitting and to extrapolate credibly. They shift the burden of identification from geometric coverage (convex hull) to structural assumptions about low rank and stability of factors. Hybrid estimators sit between these poles. ASCM keeps the convex-hull logic but adds a regression adjustment that can soak up residual factor-structure differences. Ridge and balancing SC adjust the geometry of the hull by regularising weights or enforcing covariate balance. SDID adds an additive structure that reweights units and periods while remaining within an extended fixed-effects framework. TROP combines unit weights, time weights and a factor model, letting the data decide which component carries most of the explanatory power. Athey et al. [2021] and Arkhangelsky and Imbens [2024] formalise this view by treating these procedures as versions of imputation with different constraints on how the counterfactual surface may vary across units and time.

Practical Guidance From an identification standpoint, hybrid methods are most useful when neither pure synthetic control nor pure DiD gives a comfortable answer on its own. If the treated unit lies well inside the donor convex hull and unregularised synthetic control achieves excellent pre-treatment fit, the extra structure of ASCM, SDID or TROP is unlikely to change the substantive conclusions and may add unnecessary complexity. If unconditional

7.6 Identification and Assumptions parallel trends between treated and control groups is plausible and panel structure is simple, classical DiD or event-study designs may suffice. Hybrids earn their keep in the intermediate regions. When the treated unit is near but not squarely inside the donor hull, or when treated and control groups violate unweighted parallel trends but can be brought into alignment by reweighting, hybrids allow you to trade additional modelling structure for better pre-treatment balance and more credible counterfactuals. The price is stronger assumptions about how that extra structure behaves out of sample. The right way to use these methods is cumulative. Start with the simplest design whose identification assumptions you can defend. Use hybrid estimators to see whether conclusions are sensitive to relaxing convexity or unweighted parallel trends, and to check whether richer factor structures improve pre-treatment fit in a way that aligns with your marketing context. When hybrids and simpler methods agree, you gain confidence that the underlying causal story is robust. When they disagree, the identification assumptions in this section tell you exactly which features of the data-generating process you need to scrutinise to explain why.

References

Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 7.6.