Section 5 of Sandoval, Waudby-Smith and Jordan (2026) is the technical core of the paper. It answers a question that standard bandit analyses take for granted: how do you construct confidence intervals for expected log-increments when the optimal portfolio is unknown?

Why standard UCB analyses fail

Standard UCB algorithms rely on two assumptions that don’t hold here:

  1. Observable rewards: You can compute the reward $\log(\lambda_Q(a)^\top E_n(a))$ after pulling arm $a$. But $\lambda_Q(a)$ is unknown.
  2. Sub-Gaussian tails: The rewards have light tails. But $\log(\lambda_Q(a)^\top E_n(a))$ can take arbitrarily large negative values (when the portfolio puts weight on an e-value that happens to be near zero).

The paper’s solution is elegant: instead of observing the true reward, use the empirical maximum over all portfolios as a proxy, and account for the gap using a pathwise regret bound.

The sub-exponential property

The first key technical result (Lemma 5.1) establishes that the log-increments have finite moment generating functions in a neighborhood of zero. Specifically, for $\theta \in [-1, 1]$:

$$E_Q\!\left[\exp\left(\theta \log(\lambda_Q(a)^\top E_1(a))\right)\right] < \infty.$$

This follows from properties of the numeraire portfolio — the portfolio $\lambda_Q(a)$ that maximizes expected log-wealth. The numeraire portfolio has the special property that $\lambda_Q(a)^\top E_1(a)$ has expectation 1 under the null, and this structure constrains the tail behavior of its log.

From this, it follows that the $p$-th absolute moment is uniformly bounded by $b^p p!$ (Lemma B.5), and the moment generating function can be bounded by that of a Gaussian — but only in the smaller neighborhood $\theta \in [-1/2, 1/2]$ (Lemma B.6). This makes the log-increments sub-exponential rather than sub-Gaussian.

Confidence intervals from regret bounds

Proposition 5.2 is the paper’s main concentration result. For any portfolio selection algorithm with pathwise portfolio regret bounded by $R^{\text{Port}}_n$:

$$P_Q\!\left(\max_{\lambda \in \Delta^d} \frac{1}{n}\sum_{i=1}^n \log(\lambda^\top E_i) - E_Q\!\left[\log(\lambda_Q^\top E_1)\right] \geq \sqrt{\frac{8b \log(1/\alpha)}{n}} + \frac{5\log(1/\alpha)}{n} + \frac{R^{\text{Port}}_n}{n}\right) \leq \alpha.$$

The confidence interval width has three components:

  1. $\sqrt{8b \log(1/\alpha) / n}$ — the sub-exponential concentration term
  2. $5\log(1/\alpha) / n$ — a higher-order correction
  3. $R^{\text{Port}}_n / n$ — the regret penalty

Crucially, you don’t need to run the portfolio algorithm — the existence of an algorithm with regret bound $R^{\text{Port}}_n$ is enough. Taking $R^{\text{Port}}_n = \frac{d \log(n+1)}{2} + \log(2)$ from Cover and Ordentlich gives a concrete, computable bound.

Allocation regret for SPRUCE

Theorem 5.3 shows that SPRUCE achieves logarithmic allocation regret:

$$R^{\text{Alloc}}_{n,Q} = O\left(\sum_{a \in \mathcal{A}} \frac{\log n}{\Delta_{a,Q}}\right),$$

where $\Delta_{a,Q} = E_Q[\log(\lambda_Q(a_Q)^\top E(a_Q))] - E_Q[\log(\lambda_Q(a)^\top E(a))]$ is the gap between arm $a$ and the optimal arm.

The proof adapts the standard UCB analysis to handle:

  • Unobservable rewards (replaced by empirical maxima minus regret bounds)
  • Sub-exponential rather than sub-Gaussian noise
  • Time-varying confidence widths

The resulting bound has the familiar $\log n / \Delta$ form from stochastic bandit theory, but the constants and proof technique are novel.

Why this may be of independent interest

The authors note that concentration inequalities for optimal growth rates that don’t require knowledge of $\lambda_Q$ are rare in the literature. The only similar result they cite is from Agrawal and Ramdas (2025, Appendix A.2).

This inequality could be useful in any setting where you need to:

  • Bound the deviation of empirical Kelly growth rates from their expectations
  • Construct confidence sequences for optimal betting strategies
  • Analyze algorithms that must learn optimal portfolios from data

Key takeaways

  1. Log-increments are sub-exponential, not sub-Gaussian — the numeraire portfolio structure gives finite MGFs but only in a restricted neighborhood.
  2. Regret bounds become confidence widths: the pathwise portfolio regret $R^{\text{Port}}_n$ appears as an additive term in the concentration inequality.
  3. You don’t need to know $\lambda_Q$: the existence of a low-regret algorithm is sufficient to bound the deviation of empirical maxima from true expectations.
  4. SPRUCE achieves $O(\log n)$ allocation regret: the same rate as standard UCB, but with a proof that handles unobservable rewards and sub-exponential noise.
  5. These inequalities are reusable: they apply to any setting involving Kelly-optimal wealth growth, not just the multi-armed testing problem.

References

  • Sandoval, Waudby-Smith, and Jordan (2026). Multi-Armed Sequential Hypothesis Testing by Betting.
  • Cover and Ordentlich (1996). Universal portfolios with side information.
  • Vershynin (2018). High-Dimensional Probability, Section 2.7 on sub-exponential random variables.
  • Agrawal and Ramdas (2025). Tight lower and upper bounds on stopping times of power-one sequential tests.
  • Karatzas and Kardaras (2007). The numeraire portfolio in semimartingale financial models.