MMM 708: Tuning, Implementation, and Donor Curation

Implementing hybrid methods in marketing panels means making a series of design choices: which predictors to include, how to tune penalties, and how to curate the donor pool. These choices interact with the estimator and affect both pre-treatment fit and the credibility of the counterfactual. The central tension is that pushing too hard on prediction accuracy in the pre-period can harm causal identification if the model then extrapolates poorly. This section offers practical guidance on navigating that trade-off in a way that respects the identification logic developed earlier in the chapter.

Predictor Selection Predictor selection plays different roles for different hybrid methods. For synthetic control and ridge SC, the predictor set defines the space in which the treated unit must be approximated. Including more pre-treatment outcome periods and key covariates expands the dimensionality of the matching problem and makes the convex hull richer. In marketing panels, a small number of latent demand and seasonality factors often drive most of the variation, so a modest number of carefully chosen preperiods can go a long way. In factor-model terms (Chapter 6), you need the pre-treatment window to be long enough that $T_0$ (the last pre-treatment period) is comfortably larger than the number of factors $R$ for preperiods to distinguish signal from noise; in typical marketing panels $R$ is small, so 12–24 well-chosen months can be enough even when the full panel is longer. This rule of thumb is safest when seasonality is stable, the pre-period contains no major breaks, and the donor pool is rich enough to support matching. If the panel is long and outcomes are highly autocorrelated, using every second or third period, or summarising history through moving averages, can reduce dimensionality without losing the structure that matters, provided these choices still capture the relevant seasonal and business-cycle patterns. Holding out a block of pre-treatment periods for validation, as in Section 7.3, helps check that the chosen predictor set generalises within the pre-period. For ASCM, the augmentation model uses covariates to correct residual differences between the treated unit and its synthetic control. Here predictor selection is closer to standard outcome regression. You should include variables that plausibly predict outcomes and differ between treated and control units — such as store size, location demographics, competitive intensity or channel mix — and be cautious about mechanical variable selection that loads the model with weak predictors. Pre-specifying this covariate set based on institutional knowledge is essential, with a small number of pre-planned sensitivity checks that add or remove candidate variables. Do not include predictors that are themselves affected by treatment (or by anticipation), since that can build post-treatment information into the counterfactual. For SDID, predictors enter implicitly through outcomes rather than through a separate covariate matrix. The algorithm constructs unit and time weights to balance pre-treatment paths directly, so the main design choice is the length and placement of the pre-treatment window. You still need enough pre-treatment periods to identify stable weight patterns — typically at least several periods that span the relevant seasonal or business cycle variation — but you do not have to specify a separate predictor set. For TROP and related factor-based hybrids, predictors matter through the distance metrics used for unit weighting and through the factor model itself. The considerations mirror those for synthetic control and interactive fixed-effects models in Chapter 6: you need enough pre-treatment periods to identify a low-rank structure and enough variation across units to distinguish factor loadings. In short marketing panels with only a handful of pre-periods, the factor component will necessarily be simple, and you should treat any rich factor structure as exploratory rather than definitive.

Penalty Parameter Tuning Regularisation penalties control the balance between fit and stability. Tuning them well is critical for hybrids that rely on weighting and augmentation. For ridge SC, the ridge penalty on the weights governs how concentrated or diffuse the donor weights become. Small penalties allow the optimisation to place large weight on one or two donors and chase tight pre-treatment fit. Larger penalties spread weight across more donors and sacrifice some fit in exchange for stability. A practical way to choose the penalty is to split the pre-treatment period into a training block and a validation block, estimate ridge SC over a reasonable grid of penalty values on the training block and pick the value that predicts the validation block best. Cross-validation here tunes out-of-sample prediction within the pre-treatment period. Good predictive performance is necessary for credible identification, but it does not on its own guarantee that the same weights produce unbiased counterfactuals after treatment. The exact grid is not sacred; what matters is that you explore a range from very little shrinkage to quite strong shrinkage and that the selected value is not at an extreme of the grid. For ASCM, the outcome regression often includes its own regularisation, such as ridge or elastic net on the covariate coefficients. The same training–validation logic applies. Given the typically short pre-periods in marketing panels, it is safer to err on the side of modest complexity in the augmentation model. The double-robustness logic discussed in Section 7.2 provides some insurance across the weighting and regression components, but in finite samples both components are estimated and often both are somewhat wrong. Regularisation should therefore be viewed as a way to tame variance and overfitting in the regression step, not as a licence to greatly increase covariate or functional-form complexity. For SDID, the main design choice is the implementation of the optimisation problem that delivers unit and time weights. Standard implementations incorporate stabilisation internally, so you do not typically search over penalty values by hand. You do, however, need to check that the optimisation has converged and that the resulting weights are interpretable — not all mass on one donor, not all mass on one pre-treatment period, and broadly in line with your understanding of which markets and periods are comparable. Even when you do not tune a single penalty explicitly, SDID still regularises implicitly through its objective and constraints. Treat the resulting unit and time weights as tuned objects, not as fixed design inputs.

7.8 Tuning, Implementation, and Donor Curation

For TROP and other flexible factor-based hybrids, penalty tuning is more demanding: you must decide how sharply to concentrate unit weights, how local to make time weights and how complex to make the factor structure. One approach, discussed in Section 7.5, is to use staged cross-validation [Athey et al., 2025b], tuning time weights first, then unit weights, then the factor penalty, each time using held-out untreated observations to assess predictive performance. This staged scheme is a pragmatic heuristic, not a theoretically guaranteed route to causal validity. Given the method’s current research status and lack of off-the-shelf software, we view these tuning rules as guidance for methodological work rather than prescriptions for routine marketing analysis.

Connection Between Tuning and Diagnostics Tuning choices show up directly in the diagnostics you report, and diagnostics should in turn inform how you interpret tuning. Pre-treatment RMSPE will vary with the strength of regularisation. Tight fit with very small penalties can signal that the weights are overfitting idiosyncratic donor fluctuations. Loose fit with very large penalties can signal that the estimator is effectively averaging over donors and ignoring meaningful structure. The key is not to maximise or minimise RMSPE mechanically, but to look at how RMSPE behaves across a range of penalties and how that behaviour lines up with weight patterns and institutional knowledge. If only extreme penalties deliver good validation performance or plausible weights, that is a warning sign. Weight dispersion is another informative diagnostic. You can summarise how many donors meaningfully contribute by using, for example, an effective donor count $N_{\text{eff}} = \left(\sum_j \hat{w}_j^2\right)^{-1}$. Because $N_{\text{eff}}$ appears inside the idiosyncratic-variance term $\sigma^2\left(1 + \sum_j (w_j^*)^2\right)$ from equation (6.24), very small $N_{\text{eff}}$ mechanically inflates variance even when pre-period fit is tight. Very low values indicate near-singleton donor reliance and high sensitivity to that donor’s idiosyncrasies. Very high values indicate near-uniform weighting. In most marketing panels you want an interior solution where a modest number of donors carry most of the weight and those donors make sense given the business context. Sensitivity to tuning choices completes the picture. If your estimated treatment effect changes little across a broad range of penalty values that all deliver acceptable pre-treatment fit, your conclusions are unlikely to hinge on fine details of the regularisation. If estimates swing sharply as you move the penalty, especially in regions of the grid where fit and weights look similar, then any point estimate should be presented with caution and accompanied by a discussion of that instability. Complement these summaries with simple weight plots (sorted bar charts for donor weights, line plots for SDID time weights) so that outliers are obvious. Flat numerical diagnostics can hide pathologies that are visually clear.

Donor Pool Curation Donor curation is at least as important as tuning. As in Chapter 6, donors must be comparable to treated units, free of treatment and unaffected by spillovers. Comparability comes first. In a retail setting, it is rarely sensible to use outlets from very different formats, regions or customer segments as donors. Restrict the pool to units that plausibly face similar demand, competition and operational constraints. This often means staying within the same banner or region, or at least within a relatively homogeneous subset of markets. Absence of treatment and contamination is next. Donor units must not themselves receive the intervention during the window you use for estimation and evaluation. More subtly, they should not be exposed to strong spillovers through competition, supply chains or shared customers. Buffered designs, where you drop donors within a certain geographic or competitive radius of treated units, are a simple way to reduce contamination, but they also shrink the donor pool. Hybrid estimators can sometimes tolerate a smaller, cleaner donor pool by using augmentation or regularisation to repair fit, but they cannot recover information that is simply not there. Overly aggressive donor restrictions can violate the convex-hull or overlap conditions in Section 7.6 by removing donors that help span the treated unit in predictor space. Curation should therefore balance business comparability against the need for geometric coverage. If spillovers are plausible, you either curate donors so the exposure mapping $h_i(D_{-i,t})$ is effectively constant (often zero), or you commit to an explicit exposure model. These design choices interact with the method. Synthetic control needs donors that span the treated unit in the space of predictors. Excluding too many donors can push the treated unit outside the convex hull and force the estimator into extrapolation. ASCM can absorb small extrapolations through its regression component, but still relies on donors that share the main drivers of outcomes. SDID requires that, after reweighting, treated and donor groups can plausibly satisfy weighted parallel trends. Factor-based hybrids such as TROP assume that donors and treated units share a common factor structure. In each case, donor curation should be justified both by business logic and by how the curated pool supports the specific hybrid’s assumptions.

Prediction vs Identification It is tempting to treat pre-treatment prediction as the primary goal and to judge hybrids solely by how well they fit the pre-period. That is not what we want for causal work. The objective is to construct a counterfactual that approximates the treated unit’s untreated path in the post-treatment period, not to win a forecasting competition on pre-treatment data. Overfitting the pre-period by piling on predictors, using highly flexible models or driving penalties towards zero can produce impressive pre-treatment fit. But if the underlying factor structure or covariate relationships change at treatment (for example, a structural break in $f_t$ or a new channel shock), that same flexibility will happily track noise and then extrapolate it forward. Underfitting by imposing strong penalties or sparse

7.8 Tuning, Implementation, and Donor Curation models can produce looser pre-treatment fit but may extrapolate more stably when the pre-period is short or only partially representative of the post-period environment. The practical lesson is to combine automatic tools, such as cross-validation within the pre-period, with economic judgement about how the intervention might change the data-generating process. Placebo checks, which treat pre-treatment periods as pseudo post-treatment periods, are especially valuable: they show whether an estimator that fits early pre-period data well can also predict later pre-period outcomes it has not seen.

A Practical Workflow A disciplined workflow for hybrids brings these elements together. Start by pre-specifying a predictor set based on the business context, focusing on pre-treatment outcome histories and covariates that are plausibly related to both treatment assignment and outcomes. Decide in advance on a small number of alternative specifications that add or drop predictors to test robustness, rather than searching post hoc for combinations that deliver striking results. Next, curate the donor pool. Exclude units that receive treatment during the estimation window, units that are clearly incomparable on business grounds and units that are likely to be heavily contaminated by spillovers. Document these choices and, where possible, illustrate how they affect pre-treatment fit and weight patterns. Then tune regularisation using pre-treatment data. For ridge-based hybrids such as ridge SC and many ASCM implementations, use a training–validation split and a broad grid of penalties that span from very light to quite strong shrinkage. For SDID, focus on checking convergence and inspecting the resulting unit and time weights rather than searching over penalties. For more complex hybrids like TROP, recognise that current tuning guidance is still research-oriented and only pursue that route if you have both the data and the technical capacity to justify it. Finally, estimate treatment effects and present diagnostics. Report pre-treatment RMSPE, weight dispersion and sensitivity to tuning choices, not just point estimates. If a hybrid method achieves materially better pre-treatment fit than standard SC while producing plausible weights and stable estimates across tuning choices, the extra complexity is likely earning its keep. Better pre-treatment fit is supportive, but it is not sufficient if spillovers, anticipation, or breaks in the untreated outcome process remain plausible. If the gains in fit are marginal and estimates are sensitive to tuning, the simpler design is usually preferable for transparency and ease of explanation. In all cases, the role of tuning and implementation is to support the identification arguments made earlier in the chapter, not to replace them. Careful predictor selection, thoughtful donor curation and transparent regularisation choices help ensure that hybrid methods deliver counterfactuals that are both statistically well behaved and substantively credible in real marketing applications.

MMM 708: Tuning, Implementation, and Donor Curation

References