MMM 503: Specifying Leads and Lags in Event Studies

From Estimands to Regressions

Section 5.2 defined the causal estimands $\theta_k$. Section 5.3 covers the nitty-gritty of turning them into empirical regression specifications. Design choices around reference periods, binning, and window lengths shape both the precision and the credibility of your estimates.

The Basic TWFE Event-Study Specification

The traditional Two-Way Fixed Effects (TWFE) event-study regression takes the form:

$$ Y_{it} = \alpha_i + \lambda_t + \sum_{k \in \mathcal{K} \setminus \{-1\}} \beta_k^{\text{TWFE}} \mathbb{1}\{t - G_i = k\} + \varepsilon_{it} $$

where $\alpha_i$ is a unit fixed effect, $\lambda_t$ is a time fixed effect, and $k = -1$ is the omitted reference period.

Important caveat: Under treatment effect heterogeneity, $\beta_k^{\text{TWFE}}$ does not estimate the causal $\theta_k$. Instead, it is a weighted average of cohort-specific effects with potentially opaque (and occasionally negative) weights. Always rely on heterogeneity-robust estimators like Sun–Abraham or Callaway–Sant’Anna instead.

Reference Period and Normalization

By convention, the period immediately before treatment ($k = -1$) is omitted from the regression and normalized to zero. This ensures all post-treatment coefficients $\theta_0, \theta_1, \ldots$ are interpreted relative to the untreated baseline.

Occasionally, analysts omit $k = 0$ (the treatment period itself) when:

The treatment is implemented mid-period, contaminating adoption-period outcomes.
Anticipation is severe, so $k=0$ already contains behavioral responses to expected treatment.

However, omitting $k = 0$ complicates interpretation: subsequent coefficients then measure changes from the first post-treatment period, not from the untreated counterfactual. The cleanest approach remains omitting $k = -1$.

Binning: Aggregating Sparse Event Times

Panel data rarely provide dense support across all event times, especially at the tails ($k \ll 0$ or $k \gg 0$).

Binning Rationale: Group extreme event times to stabilize estimates:

$k \le -8$: Pre-treatment aggregate bin
$k \in \{-7, -6, \ldots, -2\}$: Separate coefficients
$k = -1$: Omitted reference
$k \in \{0, 1, \ldots, 10\}$: Separate coefficients
$k \ge 11$: Post-treatment aggregate bin

Trade-off: Binning reduces variance (tighter confidence intervals) but introduces bias if true effects vary within the bin.

Heuristic support threshold: Aim for at least 50–100 treated unit–period observations per bin, or at least 10 treated units. The exact threshold depends on outcome variance and the number of independent clusters.

Window Selection: Balancing Coverage, Support, and Power

The window specifies which event times to include.

Considerations:

Coverage: Must span the periods where effects are substantively expected to evolve.
Support: Do not extend so far that many event times have only a handful of observations.
Statistical Power: Wider windows include more observations, improving precision (but at the cost of estimating more parameters).
Diagnostic Value: Pre-treatment windows should span at least 3–5 periods to credibly test for pre-trends.

Symmetric windows (e.g., $k \in \{-5, \ldots, 5\}$) are defensible defaults, but asymmetric windows with more lags than leads are common when the focus is exclusively on post-treatment dynamics.

Cohort Composition and Support Tables

At different event times, different cohorts contribute data. This composition effect can bias interpretation if treatment heterogeneity is substantial.

Example: Suppose early adopters experience large effects and late adopters experience small effects. At $k = 0$, both cohorts contribute equally. At $k = 4$, only early adopters contribute (late adopters lack 4 post-treatment periods). If the plotted $\theta_k$ appears to grow over time, this may reflect changing cohort composition rather than true effect dynamics.

Diagnostics:

Produce a support table showing, for each $k$: number of observations, number of contributing cohorts, and their identities.
Plot support figures showing effective sample size across event times.
Estimate and plot cohort-specific profiles $\theta_{g,k}$ to directly assess heterogeneity.

These tools expose whether trends are driven by genuine dynamics within cohorts or by compositional shifts in which cohorts are in the sample.