Why design beats model fit
Section 3.1 shifts the focus from outcome models to assignment mechanisms. The key question is not which regression fits best, but how treatment was assigned. Design-based reasoning builds identification on how units receive treatment, rather than on functional-form assumptions.
This perspective matters because the strongest causal statements come from how treatment variation was generated, not from statistical convenience. In marketing panels, assignment is often messy, so being explicit about design is the difference between credible and fragile inference.
Two regimes: experimental vs quasi-experimental
We can separate panel designs into two broad regimes.
Experimental designs use known randomization mechanisms. In notation, the assignment matrix $D$ is independent of potential outcomes, often conditional on strata:
$$ \Pr\bigl(D \mid \{Y_{it}(d_{ti})\}_{i,t}, X\bigr)=\Pr(D \mid X). $$This is the cleanest route to causal inference and underpins geo-experiments and platform A/B tests.
Quasi-experimental designs do not randomize treatment. Instead, institutional rules, rollouts, thresholds, or targeting create as-if-random variation that can be exploited with assumptions like parallel trends, unconfoundedness, or factor structure.
Assignment mechanism: the formal object
Section 3.1 formalizes the assignment mechanism as the conditional distribution:
$$ \Pr\bigl(D \mid \{Y_{it}(d_{ti})\}_{i,t}, X\bigr). $$This expression captures the data-generating process for treatment. If it depends on potential outcomes, identification is compromised unless we can condition on the right variables to remove that dependence.
Overlap and why it is hard in marketing
Even under unconfoundedness, identification requires overlap (positivity): units must have a positive probability of receiving the treatment levels you care about, given $X_{it}$ and fixed effects. In marketing, targeted campaigns and feedback loops can violate overlap because the platform systematically avoids certain segments or timing windows.
Practical implication: if no units with similar covariates receive both treatment and control, no method can recover a credible causal contrast.
Design-based vs model-based logic
A regression model can improve precision, but it does not create identification. The causal legitimacy of a coefficient comes from design-based assumptions, not from a good fit or a high $R^2$.
- Design-based: argues the comparison is valid because of the assignment mechanism.
- Model-based: assumes the outcome model is correctly specified.
Modern practice combines them: design for identification, regression for efficiency.
Marketing panel archetypes
Section 3.1 previews the main panel design types that recur throughout the book. Below is a more detailed map from design to identifying logic and diagnostics.
Randomized cluster designs (geo-experiments, A/B tests). Treatment is assigned at a cluster level (DMA, region, store group), creating clean randomization while managing interference. Identification is strongest, but inference must respect clustering and potential spillovers across cluster borders. Diagnostics focus on balance across clusters and compliance with the randomization protocol.
Staggered adoption designs (rollouts, phased launches). Units adopt at different times, often for operational reasons. Identification relies on parallel trends in untreated outcomes across cohorts and on no anticipation. Diagnostics emphasize pre-trends, cohort-specific event studies, and sensitivity to alternative aggregation weights.
Single treated unit designs (case studies, flagship launches). Only one or a few units receive treatment, making counterfactual construction the main challenge. Synthetic control and SDID are natural tools, with credibility tied to pre-treatment fit under a stable factor structure. Diagnostics are placebo tests and leave-one-out robustness checks.
Common shock designs (policy changes, platform updates). All units are exposed at the same time, but impacts can vary across units. Identification often hinges on event-study dynamics or factor models that separate common shocks from heterogeneous responses. Diagnostics check for pre-shock stability and alternative control series.
Continuous treatment intensity (spend, exposure). Treatment varies in magnitude rather than on/off status. Identification requires unconfoundedness conditional on rich controls and fixed effects, or a factor structure that isolates endogenous demand shocks. Diagnostics focus on overlap across intensity levels, sensitivity to control sets, and stability of dose-response estimates.
Each archetype maps to specific methods and diagnostic workflows in later chapters.
Takeaway
The design-based view forces clarity. It asks: Who assigned treatment? Why did they do it? Once that is explicit, estimands and estimators follow logically. In MMM, this discipline prevents over-interpreting regression output and keeps the analysis anchored to credible causal variation.
References
- Shaw, C. (2025). Causal Inference in Marketing: Panel Data and Machine Learning Methods (Community Review Edition), Section 3.1.
- Angrist, J. D., and Pischke, J. (2010). The credibility revolution in empirical economics.
- Goldfarb, A., et al. (2022). Marketing research in the age of platforms.