MMM 602: Constructing the Synthetic Control
In this post, we dive into the practical and theoretical steps for constructing a synthetic control, a powerful method for causal inference in marketing and panel data settings. This approach allows us to estimate the effect of an intervention (such as a marketing campaign) by comparing the treated unit to a weighted combination of control units (the donor pool) that best resemble its pre-treatment characteristics.
1. Predictor Selection
The first step is to select predictors—variables used to match the treated unit to the donor pool. These typically include:
- Pre-treatment outcomes: Historical values of the outcome variable, capturing trends, seasonality, and baseline levels.
- Covariates: Additional variables (e.g., market size, demographics) that predict outcomes but may not be fully reflected in the outcome trajectory.
Matching on pre-treatment outcomes is crucial, as it proxies for matching on latent factors that drive outcome dynamics. Covariates help when pre-treatment periods are short or the donor pool is heterogeneous.
2. Weight Optimization
Synthetic control weights are chosen to minimize the discrepancy between the treated unit and the synthetic control in predictors, subject to convexity constraints (weights are non-negative and sum to one). The optimization involves:
- Inner optimization: Find weights that best match the treated unit’s predictors.
- Outer optimization: Optionally, select predictor importance (matrix V) to further minimize pre-treatment prediction error.
Regularization (e.g., ridge or entropy penalties) can be added to address non-uniqueness and reduce overfitting.
3. Connection to Factor Models
The method is grounded in a factor model framework. Matching pre-treatment outcomes helps align the treated unit’s latent factors with those of the synthetic control. Perfect pre-treatment fit is sufficient for identification, but in practice, noise and finite samples mean fit is approximate.
4. Donor Pool Curation
Careful selection of the donor pool is essential. Exclude:
- Units that received similar treatments
- Units affected by spillovers from the treated unit
- Units fundamentally different from the treated unit
Pre-specify the donor pool and report sensitivity analyses to ensure credibility.
5. Diagnostics and Reporting
- Pre-treatment fit: Use root mean squared prediction error (RMSPE) to assess fit, but remember it is necessary, not sufficient, for identification.
- Transparency: Report weights and document all choices (predictors, donor pool, regularization) to maintain transparency and credibility.
6. Practical Workflow
- Assemble data for treated and donor units
- Define intervention timing
- Curate the donor pool
- Select predictors
- Choose estimation approach (standard or penalized SC)
- Solve for weights
- Evaluate pre-treatment fit
- If fit is poor, revisit donor pool or constraints
- Report weights and diagnostics
Synthetic control offers a transparent, design-based approach to causal inference in marketing, but its credibility depends on careful design, diagnostics, and reporting.