MMM 202: Panel Data Structures and Indexing

Why structure matters

Panel methods live or die by the data shape: how many units ($N$), how many periods ($T$), and how much missingness. These features determine which estimators are viable and which asymptotics apply.

Proper panels: balanced vs unbalanced

A proper panel tracks the same units over time. Balanced panels have no missing cells; unbalanced panels have gaps, often due to churn, entry, or intermittent observation. Missingness can be informative, which turns the observation process itself into a selection problem. Methods like synthetic control and factor models may require long, contiguous histories or imputation.

Three common shapes

Thin panels ($N \gg T$): many units, few periods. Fixed effects work well in linear models, but nonlinear FE suffers from the incidental-parameter problem.
Fat panels ($N \ll T$): few units, many periods. Time-series issues dominate; HAC or cluster-robust inference is essential and small-$N$ inference is fragile.
Square panels ($N \approx T$): both dimensions large. Matrix-style methods (interactive fixed effects, synthetic DiD) exploit low-rank structure and joint asymptotics.

Other configurations

Not all data are proper panels. Two important alternatives:

Grouped repeated cross-sections: different units each period but group means can form a panel at the group–time level.
Row–column matrices: customer–product tables lack time ordering and require low-rank or matrix completion assumptions instead of parallel trends.

Event-time indexing for staggered adoption

When adoption is staggered, we define the cohort adoption time $G_i$ and event time $k=t-G_i$. Event-study regressions align units at $k=0$ to trace dynamic effects, with pre-period coefficients as diagnostics for anticipation or selection.

Takeaway

Panel structure is not a footnote; it drives identification strategy, estimator choice, and valid inference. A good causal analysis starts by describing the panel shape and missingness pattern before selecting a method.

References

Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 2.2.
Neyman, J., & Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrica.