MMM 403: Identification with Staggered Timing

Why identification deserves its own section

Knowing what you want to estimate ($\tau(g,t)$) is different from knowing when you can trust the estimate. Section 4.3 articulates the four assumptions that make cohort-time effects identifiable in staggered adoption designs, explains what each requires, and describes the marketing contexts where each is most likely to fail.

Two types of control units

Staggered designs offer two pools of controls:

Never-treated units ($G_i = \infty$): remain untreated throughout the panel. They provide a stable baseline but may be systematically different from treated units (e.g., persistently low-performing stores that were never selected for rollout).
Not-yet-treated units ($G_i > t$): will eventually adopt but have not done so in period $t$. They are more similar to treated units and can be used as valid controls under weaker assumptions, but may exhibit anticipation effects as their adoption date approaches.

Modern heterogeneity-robust estimators can exploit both pools. The choice between them is a substantive decision, not a technical one.

Assumption 1: Parallel Trends

The staggered parallel trends assumption requires that cohorts adopting at different times would have followed similar period-to-period trajectories in the absence of treatment. Formally, for all cohorts $g, g'$ and all pre-treatment periods $t < g$ and $t < g'$:

$$ E[Y_{it}(\infty) - Y_{i,t-1}(\infty) \mid G_i = g] = E[Y_{it}(\infty) - Y_{i,t-1}(\infty) \mid G_i = g']. $$

This is the staggered counterpart of the canonical parallel trends assumption. It requires equal changes, not equal levels or growth rates. Crucially, it must hold across all cohort pairs, not just between treated and never-treated units. If early adopters (cohort $g=2$) would have grown faster than late adopters (cohort $g=6$) absent treatment, using $g=6$ as a control for $g=2$ introduces bias.

A stronger version requires that treated cohorts and never-treated units share parallel untreated trends:

$$ E[Y_{it}(\infty) - Y_{i,t-1}(\infty) \mid G_i = g] = E[Y_{it}(\infty) - Y_{i,t-1}(\infty) \mid G_i = \infty]. $$

This Strong Parallel Trends assumption (Assumption 11 in the book) is sufficient when identification relies exclusively on never-treated controls. In marketing settings it is often implausible: never-treated stores may be in declining markets or face different competitive pressures. Using not-yet-treated controls relaxes this to the weaker condition that cohorts share common untreated trends only up to the point the later cohort adopts.

Conditional parallel trends further relaxes the assumption by requiring equal untreated changes only after conditioning on covariates $X_{it}$, not that adoption timing is independent of potential outcomes.

Assumption 2: Overlap and support

Overlap requires that never-treated units exist and are comparable to treated units on observables. When every unit eventually adopts (no never-treated units), identification must rely entirely on not-yet-treated controls, which requires sufficient variation in adoption timing.

Practical diagnostics:

Standardised mean differences (SMDs): values around 0.1–0.2 warrant attention; values above 0.25 suggest serious imbalance likely to threaten parallel trends.
Propensity score distributions $e(X_{it})$ or $e(X_{it}, \alpha_i, \lambda_t)$: if treated and never-treated units cluster in different regions of covariate space, comparisons are extrapolations rather than interpolations, and parallel trends is doing heavy lifting.

When treated units are selected on performance (e.g., the programme rolls out first to top-performing stores), never-treated units will be systematically different. Conditional parallel trends or factor models (Chapters 8–9) may be required.

Assumption 3: No anticipation

No Anticipation (Assumption 12) asserts that pre-treatment potential outcomes are unaffected by future treatment assignment:

$$ Y_{it}(g) = Y_{it}(\infty) \quad \text{for all } g > t. $$

In marketing, anticipation arises when customers or stores learn about an impending programme and adjust behaviour in advance — for example, delaying purchases to qualify for loyalty rewards. This contaminates pre-treatment outcomes and biases event-time estimates.

Several nuances matter:

Anticipation is not binary. It can be partial and heterogeneous: informed insiders may anticipate more than uninformed customers.
Some outcomes are more susceptible: purchases can be delayed; brand awareness cannot be “saved up.”
Anticipation may vary with the time horizon: units may not anticipate treatment six months ahead but may anticipate it one month ahead.

Diagnostic: event-study specifications with pre-treatment leads (Section 4.6). If leads are systematically non-zero, this is evidence of anticipation or differential pre-trends — the two are observationally equivalent. Only institutional knowledge distinguishes them. If units could not have known about impending treatment, non-zero leads indicate pre-trend violations rather than anticipation.

Practical remedy: the did R package allows specifying anticipation = 1 to relax pure no-anticipation, effectively excluding the nearest pre-period(s) from the control set for each cohort while maintaining identification.

Assumption 4: SUTVA and spillovers

The stable unit treatment value assumption (SUTVA) requires that potential outcomes for unit $i$ do not depend on the treatment assignments of other units. SUTVA is routinely violated in marketing:

A loyalty programme at one store may generate word-of-mouth that influences buying at nearby stores (positive spillover → biases DiD toward zero).
Advertising in one market may spill over to adjacent markets via media overlap.
A pricing change may trigger competitive responses that alter outcomes for rival firms (negative/competitive spillover → biases DiD away from zero).

When spillovers contaminate control units, not-yet-treated and never-treated units no longer provide valid counterfactuals. Both positive and negative spillovers may operate simultaneously, making the direction of bias difficult to determine a priori.

Design-based solutions (covered in Chapter 3):

Define clusters that internalise spillovers, treating both direct and spillover effects jointly.
Create buffer zones that separate treated and control units geographically or along other dimensions, allowing spillovers to dissipate before reaching controls.

Model-based solution: when spillovers are the research question, explicit spillover models (Chapter 11) estimate direct and indirect effects separately using an exposure mapping $h_i(D_{-i,t})$ that describes how neighbours’ treatments affect unit $i$’s outcome. This requires knowledge of the network or geographic adjacency structure.

Factor structure relaxations

When standard parallel trends is implausible but units are subject to common time-varying shocks with differential exposure, interactive fixed effects (IFE) models provide an alternative. The model posits:

$$ Y_{it}(\infty) = \alpha_i + \lambda_t + \sum_{r=1}^{R} \lambda_{ir} f_{tr} + \varepsilon_{it}, $$

where $f_{tr}$ are latent factors common to all units in period $t$ and $\lambda_{ir}$ are unit-specific loadings capturing differential exposure. This low-rank representation accommodates differential trends driven by macroeconomic conditions, industry demand shifts, or platform algorithm changes — without requiring that period-to-period changes are identical across units.

Identification relies on a low-rank assumption: $R$ is small relative to $\min(N, T)$, so factors and loadings can be estimated from untreated observations and used to impute counterfactual trajectories for treated units. These methods trade the untestable parallel-trends restriction for the untestable low-rank restriction; neither dominates universally, and the choice should be guided by prior knowledge about the data-generating process.

Summary of identification assumptions

Assumption	What it requires	Key marketing failure mode
Parallel Trends	Equal untreated period-to-period changes across cohorts	Programme rolled out to systematically high/low-performing units
Overlap	Comparable pre-treatment covariates across treated/control	All units eventually treated; treated units selected on performance
No Anticipation	Pre-treatment outcomes unaffected by future treatment	Customers delay purchases ahead of loyalty programme launch
SUTVA	Potential outcomes independent across units	Word-of-mouth, competitive spillovers, media overlap

Takeaway

The four assumptions are jointly sufficient for identifying cohort-time effects $\tau(g,t)$ in staggered designs. None is directly testable from data alone. Evaluating them requires institutional knowledge of the study design, the selection mechanism, and the competitive environment. When any assumption is implausible, the appropriate response is to adjust the design (buffer zones, cluster definitions), condition on covariates (conditional parallel trends), use factor structure relaxations (IFE), or model spillovers explicitly — not to ignore the violation.

References

Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 4.3.
Callaway, B., and Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics.
Rambachan, A., and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies.