DPM 104: How Fast Can You Learn? Minimax Bounds for Transfer

The Math Behind the Tradeoff

We’ve established that the TJAP algorithm uses an aggregate-then-debias approach paired with a two-radius UCB policy. But how well does it actually perform?

In Section 4 of their paper, Chen, Chen, and Zhang provide matching finite-time upper and lower regret bounds that perfectly cleanly separate the benefits of data pooling from the stubborn reality of structural mismatch.

Theorem 7: The Regret Upper Bound

Assuming the structure holds (continuous prices, MNL demand, $s_0$-sparse shifts), the expected cumulative regret of TJAP over horizon $T$ is bounded by:

$$ \text{Regret}(T) \lesssim d \sqrt{\frac{T}{1+H}} + s_0 \sqrt{T} $$

(Omitting logarithmic factors, metric entropy terms, and problem-dependent constants for clarity).

This bound is a beautiful, transparent quantification of the variance-bias tradeoff:

The Shared Term ($d \sqrt{T / (1+H)}$): This represents the statistical cost of learning the shared preference coordinates. Notice the denominator: as the number of source markets $H$ increases, this term vanishes. Transfer learning works here.
The Shifted Term ($s_0 \sqrt{T}$): This represents the residual transfer bias from the sparse preference shifts. This term depends entirely on the target-market learning. It does not improve with $H$. No amount of source data can teach you about parameters unique to the target market.

Theorem 8: The Minimax Lower Bound

Upper bounds tell us how well a specific algorithm does. Lower bounds tell us how well any algorithm could possibly do. The authors establish a minimax lower bound over the structured preference-shift class:

$$ \inf_{\pi} \sup_{\text{instances}} \text{Regret}(T; \pi) \ge c_0 \left( \frac{\sqrt{(d - s_0)T}}{\sqrt{1+H}} + s_0\sqrt{T} \right) $$

Because the forms match (up to log factors), TJAP is rate-optimal. The structural forces isolated by the algorithm are fundamental to the problem itself.

Structural Implications for Revenue Management

What do these bounds tell us operationally?

1. Transfer is fundamentally asymmetric. Transfer accelerates learning only along directions where markets are behaviorally aligned. It provides absolutely zero benefit in directions where structural mismatch persists.

2. $s_0$ acts as a hard ceiling. When heterogeneity is highly localized (e.g., $s_0 = 1$ or $2$), transfer yields massive gains over learning from scratch. However, if heterogeneity is diffuse (e.g., $s_0 \approx d$), the $s_0 \sqrt{T}$ term dominates heavily. In this regime, bringing in data from $H$ source markets offers negligible improvement over ignoring them entirely.

3. Bias correction is not optional. If an algorithm uses naive pooling (ignoring the $s_0$ shifts), the regret would become linear $O(T)$ due to uncorrected bias. TJAP’s $\ell_1$-debiasing step is what prevents linear failure and restores the sublinear $\sqrt{T}$ rate for the target-specific parameters.

We have the theory; does it hold up in simulation? In our final post, we will look at the numerical experiments and summarize the practical takeaways for building multi-market pricing engines.