What event studies add to DiD

DiD estimators produce scalar summaries — $ATT_{agg}$, $\theta_k$ — but do not automatically reveal how effects evolve over time or whether treated units were already diverging before treatment. Event-study specifications do exactly that: they estimate a separate coefficient for each event time $k$ (periods since adoption), tracing the full dynamic profile, diagnosing anticipation, and checking parallel trends.

The event-study regression

The standard specification is:

$$ Y_{it} = \alpha_i + \lambda_t + \sum_{k \ne -1} \beta_k \mathbf{1}\{t - G_i = k\} + \varepsilon_{it}, $$

where $\mathbf{1}\{t - G_i = k\}$ is an indicator for event time $k$ and $k = -1$ is the omitted reference period (the quarter immediately before adoption), so $\beta_{-1} \equiv 0$ by normalisation. The sequence $\{\beta_k\}$ traces:

  • Leads ($k < -1$): pre-treatment coefficients. Under parallel trends and no anticipation these should be near zero.
  • Immediate effect ($k = 0$): the jump at adoption.
  • Lags ($k \ge 1$): post-treatment dynamics — growth, decay, or a steady state.

Normalisation caveat. If anticipation has already materialised by $k = -1$ (e.g., customers start changing behaviour after a programme announcement), the baseline is contaminated and post-treatment coefficients understate the true effect. When anticipation is suspected, normalise to an earlier period ($k = -2$ or $k = -3$) or to the average of distant pre-treatment periods.

TWFE event-study contamination

The specification above is a TWFE-style event study and inherits the negative weighting problem from Section 4.4. Under heterogeneous treatment effects, TWFE event-study coefficients $\beta_k$ are weighted sums of cohort-time effects $\tau(g,t)$ with potentially negative weights attached to already-treated comparison units. Even the shape of the estimated dynamic profile can be distorted.

Practical rule: use the TWFE event study only as a rough diagnostic. For effect magnitudes and formal inference, rely on Sun–Abraham (sunab) or Callaway–Sant’Anna event-time aggregations, which produce event-study plots that reflect true dynamics rather than heterogeneity artefacts.

Pre-treatment leads ($k < -1$) diagnose two distinct problems:

PatternLikely source
Leads trending toward the post-treatment effectAnticipation (behaviour change in response to expected treatment)
Leads drifting up or down with no clear directionDifferential pre-trends (units were on different trajectories before treatment)

Both patterns generate identical data — non-zero pre-treatment coefficients. Only institutional knowledge separates them. If units could not have known about impending treatment, non-zero leads indicate pre-trend violations rather than anticipation.

Anticipation examples in marketing:

  • Negative anticipation: customers delay purchases to qualify for a loyalty programme (negative pre-treatment coefficients growing in magnitude near launch).
  • Positive anticipation: customers accelerate purchases before an expected price increase, or firms ramp up advertising ahead of a product launch (positive pre-treatment coefficients).

Low power caveat (Roth 2022): pre-trend tests have limited power against many plausible violations. Small estimated pre-trends do not validate parallel trends. Substantive arguments about the assignment mechanism remain essential.

Dynamic effects

Post-treatment lags ($k \ge 0$) reveal the nature of the effect:

  • Constant: all $\beta_k$ roughly equal — immediate, permanent effect.
  • Growing: $\beta_k$ increases with $k$ — consistent with habit formation, learning, or network effects (e.g., loyalty reward accumulation).
  • Decaying: $\beta_k$ decreases with $k$ — consistent with a temporary promotion or a one-time advertising spike.

The sequence $\{\beta_k\}_{k=0}^K$ is a reduced-form summary of the dynamic response without imposing parametric restrictions on the lag structure.

Worked example

A loyalty programme rollout yields (in £000s quarterly sales):

Event time $k$$\hat{\beta}_k$SE95% CI
$-3$0.81.2$[-1.6,\ 3.2]$
$-2$0.41.0$[-1.6,\ 2.4]$
$-1$0(reference)
$0$5.21.1$[3.0,\ 7.4]$
$1$7.11.3$[4.5,\ 9.7]$
$2$8.31.4$[5.5,\ 11.1]$
$3$8.11.5$[5.1,\ 11.1]$

Reading the plot:

  • Pre-treatment CIs all include zero → consistent with parallel trends.
  • Discrete jump at $k=0$ → immediate effect of ~£5,200/store/quarter.
  • Rising pattern $k=0 \to k=2$ ($5.2 \to 8.3$) → growing effects, consistent with habit formation as customers accumulate points.
  • Levelling at $k=3$ ($8.1 \approx 8.3$) → effect near steady state.

Binning and reference window choices

When the number of event times is large relative to sample size, estimating a separate coefficient per $k$ is impractical.

Binning groups adjacent event times: e.g., $k \in \{0\}$, $k \in \{1,2\}$, $k \in \{3,4,5\}$, $k \ge 6$. The trade-off: finer bins give more dynamic detail; coarser bins pool more observations and reduce standard errors.

Practical threshold: aim for at least 50–100 treated unit-period observations per bin, or at least 10 treated units per bin (whichever is more restrictive). When $N$ is small, prioritise the unit-count floor over fine time resolution.

Endpoint binning caveat: a binned coefficient at $k \ge 6$ averages effects across different durations. If effects are still growing (habit formation, learning), the binned estimate is a lower bound on the long-run effect. Report the cutoff explicitly and check robustness to alternative cutoffs (e.g., $k \ge 4$ vs $k \ge 8$).

Reference window: the convention is $k = -1$. Changing the reference shifts all coefficients by a constant but does not change differences between coefficients (which estimate treatment effect contrasts and are invariant to normalisation). Transparency about the reference choice and robustness checks ensure conclusions are not artefacts.

Interpretation and diagnostics

A well-behaved event-study plot has three features:

  1. Flat near-zero pre-treatment coefficients ($k < 0$) — consistent with parallel trends.
  2. Discrete jump at $k = 0$ — suggesting an immediate treatment effect.
  3. Clear post-treatment trajectory — whether constant, growing, or decaying.

When pre-trends are visible, options include:

  • Condition on covariates to restore Conditional Parallel Trends.
  • Use factor models that accommodate differential trends (Chapters 8–9).
  • Conduct sensitivity analysis with Rambachan–Roth bounds, which quantify how robust conclusions are to parallel-trends violations of various magnitudes.

Ignoring pre-trend violations and proceeding is not acceptable.

Joint test: a Wald test $H_0: \beta_{-K} = \cdots = \beta_{-2} = 0$ using cluster-robust standard errors complements visual inspection. Rejection is suggestive evidence of pre-trends or anticipation. Failure to reject is only weakly informative given low power. Report both visual and test evidence. When many event-time coefficients are examined jointly, control for multiplicity (Romano–Wolf or FDR procedures) rather than relying solely on unadjusted pointwise $p$-values.

Relationship to distributed lag models

Event-study specifications impose minimal restrictions on the lag structure, providing flexibility at the cost of efficiency and interpretability. Distributed lag models (Chapter 10) parameterise the decay or persistence function, enabling extrapolation beyond observed event times and direct long-run multiplier estimation — at the cost of stronger functional-form assumptions. The choice is between flexibility and parsimony, guided by how much the dynamic shape is known a priori.

Marketing applications

  • Loyalty programme rollout: staggered adoption across stores enables detailed tracing of how effects grow as customers accumulate points and develop habits.
  • Advertising campaign launches: reveals whether effects peak immediately (direct response) or build over time (brand building).
  • Pricing policy waves: traces competitive responses and demand adjustments across product categories over time.

Takeaway

Event-study coefficients are a transparent window into dynamics, anticipation, and long-run effects — but the TWFE version inherits negative weighting problems under heterogeneity. Use Sun–Abraham or Callaway–Sant’Anna event-time aggregations for formal inference; reserve the TWFE event study for initial diagnostics. Always pair the plot with a joint pre-trend test, report binning and reference choices, and address any pre-trend violations before proceeding.

References

  • Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 4.6.
  • Roth, J. (2022). Pre-test with caution: Event-study estimates after testing for parallel trends. American Economic Review: Insights, 4(3), 305–322.
  • Rambachan, A., and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5), 2555–2591.
  • Sun, L., and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous effects. Journal of Econometrics, 225(2), 175–199.