Why phased rollouts are attractive

Phased rollouts assign treatment to batches of units over time: a loyalty program starts with one cohort of stores, expands to more stores later, and leaves some units untreated (temporarily or permanently). This is operationally convenient and scientifically useful: it spreads implementation costs, allows learning from early cohorts, and creates staggered timing that supports causal inference.

The design caveat is critical: if rollout timing is adjusted based on early outcomes, the assignment mechanism becomes outcome-driven and parallel trends across cohorts breaks. To preserve identification, the schedule must be fixed in advance or altered only for operational reasons unrelated to outcomes.

Cohort mapping and not-yet-treated controls

The key object is adoption time $G_i$. Units in cohort $g$ adopt at time $g$, and not-yet-treated or never-treated units serve as controls for that cohort.

Identification comes from clean comparisons of treated units to units that have not yet adopted in the same period. This logic requires:

  • Parallel trends across cohorts absent treatment.
  • No anticipation (later cohorts are unaffected before adoption).
  • Absorbing treatment so adoption time is well-defined.

Modern DiD estimators formalize this by estimating cohort–time effects:

$$ \tau(g,t)=\mathbb{E}[Y_{it}(g)-Y_{it}(\infty)\mid G_i=g],\quad t\ge g. $$

These are the primitive estimands. Everything else is aggregation.

Aggregation choices: calendar time vs event time

Phased rollouts allow two distinct aggregation strategies.

Calendar-time aggregation answers: what is the average effect in period $t$? It pools treated units observed in the same calendar period and yields a sequence $\{ATT_t\}$ that is directly tied to seasonality and macro conditions.

Event-time aggregation answers: how does the effect evolve with time since adoption? It uses $k=t-G_i$ and aggregates across cohorts at the same event time:

$$ \theta_k = \sum_{g:g+k\le T} w_{gk}\,\tau(g,g+k). $$

This yields a dynamic profile $\{\theta_k\}$ and is the basis of event-study plots.

The choice is substantive, not cosmetic. Calendar time is natural for contemporaneous impact. Event time is natural for dynamics, learning, and carryover.

Why TWFE can mislead

Traditional two-way fixed effects regressions implicitly average cohort–time effects with weights that can be negative under heterogeneity. This can distort signs and magnitudes. Modern estimators (Callaway–Sant’Anna, Sun–Abraham) construct cleaner comparisons using not-yet-treated or never-treated controls and yield transparent weights.

Practical implication: plan for heterogeneity-robust estimators at design time, not after you see the data.

Planning for heterogeneity

Rollouts almost always induce heterogeneity:

  • Early adopters differ from late adopters.
  • Effects evolve with tenure and network dynamics.
  • Treatment response varies by market characteristics.

Design choices that help:

  • Ensure not-yet-treated or never-treated controls exist throughout the window.
  • Balance cohort sizes so no single cohort dominates weights.
  • Collect covariates that enable subgroup analyses.

Trade-off: never-treated vs not-yet-treated controls

Never-treated controls strengthen identification but may be ethically or operationally costly. If permanent withholding is infeasible, you must rely on not-yet-treated controls, which requires late adopters and a design that preserves parallel trends across cohorts.

This choice should be made ex ante because it determines the estimand and the identification strategy.

Pre-specify what you will report

Phased rollouts create many valid estimands. To avoid cherry-picking:

  • Pre-specify which aggregations will be reported (overall ATT, cohort effects, event-time profiles).
  • If subgroup effects are important, define them in advance.
  • Plan inference accordingly (joint confidence bands, multiple testing adjustments).

Practical design checklist

  • Fix rollout timing in advance or document non-outcome-based adjustments.
  • Map cohorts clearly and verify not-yet-treated controls in each period.
  • Decide on calendar-time vs event-time aggregation before analysis.
  • Choose heterogeneity-robust estimators that align with your estimand.
  • Plan diagnostics for pre-trends and sensitivity analysis.

Takeaway

Phased rollouts combine operational flexibility with credible inference, but only if the assignment mechanism is insulated from early outcomes. Cohort–time effects are the primitive causal objects; aggregation choices define the estimand. When designed carefully, phased rollouts provide a powerful, realistic path to causal inference in marketing panels.

References

  • Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 3.5.
  • Callaway, B., and Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics.
  • Sun, L., and Abraham, S. (2021). Estimating dynamic treatment effects in event studies. Journal of Econometrics.
  • Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics.