Why this section matters

Two-way fixed effects (TWFE) is the default tool in panel analysis, but it is easy to misuse. Section 2.5 explains what TWFE actually identifies, the assumptions behind it, and how inference can fail when those assumptions are violated. It also outlines alternatives and diagnostics that align better with modern MMM data.

The canonical TWFE model

The workhorse specification is:

$$ Y_{it} = \alpha_i + \lambda_t + \tau D_{it} + X_{it}'\gamma + \varepsilon_{it}. $$

Here, $\alpha_i$ absorbs time-invariant unit differences, $\lambda_t$ absorbs common shocks, and $\tau$ is interpreted as the average treatment effect only under strong assumptions.

Strict exogeneity

Consistency of the within estimator requires strict exogeneity:

$$ \mathbb{E}[\varepsilon_{it} \mid D_{i1},\ldots,D_{iT}, X_{i1},\ldots,X_{iT}, \alpha_i] = 0, \quad \forall t. $$

This is stronger than a contemporaneous “no omitted confounders” claim. It rules out feedback from past outcomes to current treatment and demands that, conditional on the full history, the remaining variation in $D_{it}$ is as-good-as-random.

Within estimation in one line

The TWFE estimator is OLS on demeaned data:

$$ \ddot{Y}_{it} = Y_{it} - \bar{Y}_{i\cdot} - \bar{Y}_{\cdot t} + \bar{Y}_{\cdot\cdot}, $$

with the same transformation applied to $D_{it}$ and $X_{it}$. This clarifies that identification comes from within-unit, over-time variation after removing unit and period means.

The big pitfall: heterogeneity with staggered timing

TWFE implicitly assumes homogeneous treatment effects:

$$ \tau_{it} = \tau \quad \forall i,t. $$

When adoption is staggered and effects vary by cohort or over time since adoption, the TWFE coefficient becomes a weighted average of many two-by-two comparisons. Some of those comparisons use already-treated units as controls, which can produce negative weights and misleading estimates.

Practical diagnostic: event-study coefficients for pre-treatment periods ($k < 0$) should be near zero. Large or erratic pre-trends signal that TWFE is likely confounded.

Random effects and the Mundlak device

Random effects models exploit both within- and between-unit variation, but they require:

$$ \mathbb{E}[\alpha_i \mid D_{i1},\ldots,D_{iT}, X_{i1},\ldots,X_{iT}] = 0, $$

which is often implausible in marketing panels.

The Mundlak device adds unit-level means of time-varying covariates, allowing correlation between $\alpha_i$ and regressors while preserving a within-unit interpretation of $\tau$.

Hausman-Taylor as a bridge

When time-invariant regressors are important and correlated with unit effects, Hausman–Taylor provides internal instruments using:

  • Within-unit variation in exogenous time-varying regressors, and
  • Unit means of those regressors as instruments for endogenous time-invariant terms.

It can be useful, but only when the exogenous/endogenous partition is credible and instruments are strong.

Inference: serial correlation and clustering

Panel errors are rarely independent. Ignoring serial correlation yields standard errors that are too small.

Default guidance: cluster by unit. This allows arbitrary correlation within units over time while assuming independence across units. If errors also correlate across units within periods (e.g., regional shocks), two-way clustering (unit and time) is safer.

When the number of clusters is small, cluster-robust standard errors can be unreliable. Two common fixes:

  • Wild cluster bootstrap
  • Permutation tests

For long $T$ with few units, HAC standard errors (e.g., Newey–West) can be more appropriate than clustering.

Nonstationarity and spurious regression

Panel regressions can go wrong when variables drift over time. Key distinctions:

  • Deterministic trends: changing mean with stable variance.
  • Structural breaks: parameter shifts at unknown dates.
  • Unit roots: stochastic trends that accumulate shocks.

An $I(1)$ process is a unit-root process where the level is nonstationary but the first difference is stationary. A canonical example is a random walk:

$$ y_t = y_{t-1} + \varepsilon_t. $$

Here, shocks never fully die out, so the variance of $y_t$ grows with $t$ and the series does not revert to a fixed mean. Differencing yields $\Delta y_t = y_t - y_{t-1} = \varepsilon_t$, which is stationary if $\varepsilon_t$ is. In panel settings, an $I(1)$ outcome can make regressions on other trending variables look significant even when there is no causal link. This is the classic spurious regression problem: high $R^2$, small p-values, and unstable coefficients that flip when you change the sample window.

When $T$ is short, unit-root concerns are secondary; structural breaks matter more. With long $T$, unit roots and cointegration become central. In such cases:

  1. Test for unit roots (LLC, IPS, CIPS).
  2. If I(1) variables are cointegrated, use error-correction models.
  3. If not, difference to stationarity and interpret coefficients carefully.

A quick decision map

ProblemSymptomSafer response
Staggered timing + heterogeneityOdd TWFE sign or magnitudeUse modern DiD / cohort-time estimators
Serial correlationTiny SEs, over-rejectionCluster by unit or use wild bootstrap
Cross-unit shocksCommon-period residual spikesTwo-way clustering or factor models
Long T with trending levelsHigh $R^2$, unstable coefficientsTest unit roots, consider cointegration

Takeaway

TWFE is a baseline, not a default. It is credible when strict exogeneity and homogeneous effects are plausible. When those assumptions fail, the right move is not to tweak the regression, but to change the design or estimator and strengthen diagnostics.

References

  • Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 2.5.
  • Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics.
  • Cameron, A. C., Gelbach, J. B., and Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors.
  • Rambachan, A., and Roth, J. (2023). A sensitivity analysis for parallel trends.