Why power is a design decision

Power is the probability of detecting a true effect of a given size. Underpowered studies miss meaningful effects; oversized studies detect trivial ones. The solution is not to chase p-values but to design around effect sizes and precision.

In panels, power depends on design choices: number of clusters, length of pre- and post-windows, and how treatment is assigned. Clustering and serial dependence are the main killers of effective sample size.

Minimum detectable effects under clustering

For clustered designs, the minimum detectable effect (MDE) is the smallest true effect likely to be detected at a given significance level $\alpha$ and power $1-\beta$.

A back-of-the-envelope formula for a two-group comparison using cluster-period outcomes is:

$$ \text{MDE} \approx (z_{1-\alpha/2}+z_{1-\beta})\times \sqrt{\frac{\sigma^2[1+(T-1)\rho]}{T}}\times\sqrt{\frac{1}{pC}+\frac{1}{(1-p)C}}. $$

Where:

  • $C$ is the number of clusters,
  • $p$ is the treated share,
  • $T$ is the number of periods,
  • $\rho$ is the intra-cluster correlation (ICC).

Key implications:

  • Increasing $C$ is far more powerful than increasing $T$ when $\rho$ is high.
  • Equal allocation ($p=0.5$) minimizes variance; imbalanced assignment increases the MDE.

What ICC means in practice. The intra-cluster correlation $\rho$ measures how similar outcomes are within the same cluster over time. A simple variance decomposition is $\varepsilon_{it}=\nu_i+\eta_{it}$, where $\nu_i$ is a persistent cluster component and $\eta_{it}$ is idiosyncratic noise. Then $\rho=\sigma_{\nu}^2/(\sigma_{\nu}^2+\sigma_{\eta}^2)$. If $\rho$ is near 0, within-cluster observations behave almost independently and extra periods add information. If $\rho$ is near 1, outcomes within a cluster move together and extra periods add little new information. This is why the design effect grows with $1+(T-1)\rho$: you get diminishing returns from longer panels when clusters are highly correlated. In practice, estimate $\rho$ from historical data using a random-effects or ANOVA-style estimator, and treat it as a sensitivity parameter in power calculations. For example, with $T=12$ and $\rho=0.6$, the design effect is $1+(12-1)\cdot 0.6=7.6$, so the effective information is closer to $12/7.6\approx1.6$ independent periods per cluster.

Serial dependence shrinks effective sample size

Outcomes within a unit are correlated across time. If outcomes follow an AR(1) process with autocorrelation $\phi$, the effective number of independent observations per unit is approximately:

$$ T_\text{eff} \approx T\,\frac{1-\phi}{1+\phi}. $$

When $\phi$ is large, additional periods add little information. This is why long panels do not automatically imply high power.

Practical guidance:

  • Estimate $\phi$ from historical data when possible.
  • Use conservative values (e.g., $\phi=0.7$) when uncertain.

Simulation-based power using historical panels

Analytical formulas rely on simplifying assumptions. Simulation-based power is more robust:

  1. Calibrate a model to historical data (means, variance, dependence).
  2. Generate synthetic datasets under null and alternative effects.
  3. Randomize treatment under the proposed design.
  4. Run the same estimator and inference planned for the real study.
  5. Repeat thousands of times and compute the rejection rate.

This captures skewness, heteroskedasticity, and complex dependence that formulas miss. It is especially valuable for stratified geo-experiments, switchbacks with carryover, and staggered rollouts.

Align power with the inferential plan

If your analysis will use cluster-robust SEs, power should be computed under that assumption. If you plan randomization inference or wild cluster bootstrap, power must simulate those procedures. If multiple outcomes or event-time effects are tested, power must account for multiplicity adjustments.

Power is a curve, not a point

Power depends on the assumed true effect size. Rather than a single point estimate, report a power curve over a plausible range of effects. Conservative practice is to design for the lower end of substantively meaningful effects.

Practical checklist

  • Estimate ICC and autocorrelation from historical data.
  • Prioritize increasing the number of clusters over longer time windows when $\rho$ is high.
  • Simulate power under the actual estimator and inference plan.
  • Pre-specify multiple-testing adjustments and include them in power simulations.
  • Report power across a range of effect sizes, not a single point.

Takeaway

Power planning in panels is about realism: clustering and serial dependence dramatically shrink effective sample size. Use analytical formulas as a guide, but rely on simulation when designs are complex. A well-powered design protects against both false negatives and misleadingly precise trivial effects.

References

  • Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 3.7.
  • Bertrand, M., Duflo, E., and Mullainathan, S. (2004). How much should we trust differences-in-differences estimates?
  • Cameron, A. C., Gelbach, J. B., and Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors.