What is staggered adoption?
Staggered adoption means units adopt treatment at different times. Let $G_i$ be the first treated period (or $\infty$ if never treated), and assume absorbing treatment:
$$ D_{it}=\mathbf{1}\{t\ge G_i\}. $$This is common in phased rollouts, platform entry sequences, and campaign launches. It also creates an immediate warning: if treatments can switch off, the design deviates from this assumption and requires extensions.
The core estimand: cohort–time effects
The natural target is the cohort–time effect:
$$ \tau(g,t)=\mathbb{E}[Y_{it}(g)-Y_{it}(\infty)\mid G_i=g],\quad t\ge g. $$This is the average effect for the cohort that adopts at time $g$, evaluated in calendar time $t$. Aggregating $\tau(g,t)$ gives overall ATT or event-time effects $\theta_k$.
Why this matters: different estimators apply different weights to $\tau(g,t)$, so understanding the estimand is essential for interpreting results.
Event-time effects and dynamics
Event time is $k=t-G_i$. Event-time effects average $\tau(g, g+k)$ across cohorts with data at that $k$:
$$ \theta_k=\mathbb{E}[Y_{i,G_i+k}(G_i)-Y_{i,G_i+k}(\infty)\mid G_i<\infty]. $$These profiles answer whether effects grow, fade, or reverse after adoption. Pre-treatment event times ($k<0$) are also the best diagnostic for parallel trends.
Identification: parallel trends across cohorts
Staggered adoption relies on parallel trends across adoption cohorts. In words: absent treatment, early- and late-adopting units would have evolved similarly.
Because this is untestable, we rely on diagnostics:
- Cohort-specific event-study plots.
- Placebo tests in pre-treatment windows.
- Sensitivity analysis when pre-trends diverge.
If parallel trends is implausible, alternatives include factor models or synthetic control.
Why TWFE can fail
Traditional TWFE mixes comparisons across cohorts and times, often using already-treated units as controls. With heterogeneous effects, this can produce negative weights and misleading estimates.
Modern estimators (e.g., Callaway–Sant’Anna, Sun–Abraham) construct cleaner comparisons between treated units and not-yet-treated or never-treated controls.
Takeaway
Staggered adoption is rich but delicate. The design creates natural variation for identification, but only if parallel trends is plausible and estimators respect heterogeneity. Always identify the cohort–time estimand first, then choose an estimator that targets it transparently.
References
- Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 3.2.2.
- Callaway, B., and Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics.
- Sun, L., and Abraham, S. (2021). Estimating dynamic treatment effects in event studies. Journal of Econometrics.
- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics.