DPM 105: Numerical Evidence and Lessons for Practice

Testing the Theory

In the final section of their paper, the authors subject TJAP to a battery of synthetic simulations to verify the theoretical regret bounds. They generate multi-market contextual MNL environments, carefully controlling the feature dimension ($d$), the number of source markets ($H$), and the sparsity of the preference shifts ($s_0$).

They compare TJAP against three baselines:

CAP: A state-of-the-art joint assortment-pricing algorithm that learns only from the target market ($H=0$).
M3P / ONS-MPP: Pricing-centric algorithms forced to use heuristic assortment rules.
POOL(H): A naive approach that aggregates all data from the target and $H$ sources, but performs no bias correction (debiasing).

The Three Key Findings

1. More markets = faster learning (up to a point). As predicted by the $\sqrt{1/(1+H)}$ term in the regret bound, TJAP’s cumulative regret drops steadily as $H$ increases from 0, to 1, to 3, to 5.

2. Small $s_0$ yields huge gains; Large $s_0$ tapers off. When the cross-market shifts were highly sparse ($s_0 / d$ was small, e.g., 2 out of 10 features), TJAP with $H=5$ had a fraction of the regret of single-market algorithms. However, as $s_0$ grew closer to the feature dimension (e.g., 15 out of 50 features), the performance gap between $H=5$ and $H=0$ narrowed considerably, as target-data dependence dominated.

3. Naive pooling is operationally dangerous. When $s_0 > 0$, the POOL(H) baseline performed poorly. By blindly trusting source market data on coordinates where the target market actually differed, it learned permanently biased parameters. As a result, POOL with 5 source markets often performed worse than CAP with 0 source markets.

Practical Takeaways for Pricing Engines

For data scientists and pricing operators building multi-market revenue management systems, this paper offers several stark operational lessons:

Don’t reinvent the wheel, but don’t copy it either. When launching a new market, it is computationally and financially wasteful to start learning from a blank slate ( $H=0$ ). Historical data is incredibly valuable for learning shared structure (like general product affinities). However, lifting a model trained in Market A directly into Market B without a debiasing step is a recipe for systematic pricing errors.

Estimation requires a pipeline, not a one-liner. The aggregate-then-debias architecture is highly practical. You can fit an expensive, massive model on your data warehouse once a week securely (Step 1). Then, online in your new market, you train a lightweight lasso-penalized residual model on fresh, local observations (Step 2) to adapt to local tastes.

Reason about $s_0$ before you build. The degree of structural similarity ($s_0$) dictates the ROI of your transfer learning infrastructure. If you are launching in a market with fundamentally distinct customer behavior, expensive multi-market data pipelines won’t save you—you simply need to buy time to collect local data.

Open Directions

This work opens up several exciting directions for applied research:

Online Source Selection: TJAP assumes all $H$ sources have the specific $s_0$-sparse structure. In reality, some source markets might be entirely unrelated. The algorithm needs a mechanism to detect and eject “bad” source markets on the fly to prevent negative transfer.
Beyond Sparsity: What if markets differ smoothly rather than sparsely? Models utilizing low-rank tensor factorization across markets or utilizing latent geographic embeddings might capture richer behavioral similarities.

This concludes our 5-part series on Transfer Learning for Joint Assortment Pricing, based on the 2026 paper by Chen, Chen & Zhang.