MMM 602: Constructing the Synthetic Control

In this post, we dive into the practical and theoretical steps for constructing a synthetic control, a powerful method for causal inference in marketing and panel data settings. This approach allows us to estimate the effect of an intervention (such as a marketing campaign) by comparing the treated unit to a weighted combination of control units (the donor pool) that best resemble its pre-treatment characteristics.

1. Predictor Selection

The first step is to select predictors—variables used to match the treated unit to the donor pool. These typically include:

Pre-treatment outcomes: Historical values of the outcome variable, capturing trends, seasonality, and baseline levels.
Covariates: Additional variables (e.g., market size, demographics) that predict outcomes but may not be fully reflected in the outcome trajectory.

Matching on pre-treatment outcomes is crucial, as it proxies for matching on latent factors that drive outcome dynamics. Covariates help when pre-treatment periods are short or the donor pool is heterogeneous.

2. Weight Optimization

Synthetic control weights are chosen to minimize the discrepancy between the treated unit and the synthetic control in predictors, subject to convexity constraints (weights are non-negative and sum to one). The optimization involves:

Inner optimization: Find weights that best match the treated unit’s predictors.
Outer optimization: Optionally, select predictor importance (matrix V) to further minimize pre-treatment prediction error.

Regularization (e.g., ridge or entropy penalties) can be added to address non-uniqueness and reduce overfitting.

3. Connection to Factor Models

The method is grounded in a factor model framework. Matching pre-treatment outcomes helps align the treated unit’s latent factors with those of the synthetic control. Perfect pre-treatment fit is sufficient for identification, but in practice, noise and finite samples mean fit is approximate.

4. Donor Pool Curation

Careful selection of the donor pool is essential. Exclude:

Units that received similar treatments
Units affected by spillovers from the treated unit
Units fundamentally different from the treated unit

Pre-specify the donor pool and report sensitivity analyses to ensure credibility.

5. Diagnostics and Reporting

Pre-treatment fit: Use root mean squared prediction error (RMSPE) to assess fit, but remember it is necessary, not sufficient, for identification.
Transparency: Report weights and document all choices (predictors, donor pool, regularization) to maintain transparency and credibility.

6. Practical Workflow

Assemble data for treated and donor units
Define intervention timing
Curate the donor pool
Select predictors
Choose estimation approach (standard or penalized SC)
Solve for weights
Evaluate pre-treatment fit
If fit is poor, revisit donor pool or constraints
Report weights and diagnostics

Synthetic control offers a transparent, design-based approach to causal inference in marketing, but its credibility depends on careful design, diagnostics, and reporting.