STB 101: Sequential Testing by Betting — The Single-Arm Foundation

Sandoval, Waudby-Smith, and Jordan (2026) generalize sequential testing by betting to a setting where the statistician chooses among multiple data sources (arms) at each time step. Before we can understand the multi-arm machinery, we need a firm grip on the single-arm foundation.

This post covers Sections 2.1 and 2.2 of the paper.

The basic setup

Testing by betting has roots in the work of Abraham Wald, Herbert Robbins, and more recently Ramdas, Shafer, Vovk, and others. The core idea is elegant: instead of constructing a fixed-sample test, you build a wealth process that grows when the data contradict the null hypothesis.

Formally, let $(Y_n)_{n \in \mathbb{N}}$ be a sequence of i.i.d. random variables on a filtered probability space $(\Omega, \mathcal{F})$ where $\mathcal{F} \equiv (\mathcal{F}_n)_{n \in \mathbb{N}_0}$ is the filtration generated by $(Y_n)_{n \in \mathbb{N}}$. We want to test a null hypothesis $\mathcal{P}$ against an alternative $\mathcal{Q}$ with $\mathcal{P} \cap \mathcal{Q} = \emptyset$.

The goal: fix a type-I error rate $\alpha \in (0, 1)$ and construct a binary-valued $\mathcal{F}$-adapted map $(\varphi^{(\alpha)}_n)_{n \in \mathbb{N}}$ where $\varphi^{(\alpha)}_n = 1$ means “reject” and $\varphi^{(\alpha)}_n = 0$ means “do not reject.” The key guarantee is type-I error control at arbitrary stopping times:

$$\sup_{P \in \mathcal{P}} P_P\!\left(\exists n \in \mathbb{N} : \varphi^{(\alpha)}_n = 1\right) \leq \alpha.$$

This means you can peek at the data as often as you want and stop whenever you like — the error guarantee holds regardless.

Test supermartingales and e-processes

The engine behind these guarantees is the test $\mathcal{P}$-supermartingale. A process $(W_n)_{n \in \mathbb{N}}$ is a test $\mathcal{P}$-supermartingale if:

$W_n \geq 0$ for all $n$ (nonnegative)
$W_0 = 1$ (starts at 1)
$\sup_{P \in \mathcal{P}} E_P[W_n \mid \mathcal{F}_{n-1}] \leq W_{n-1}$ for all $n$ (supermartingale under every null distribution)

Intuitively, under the null, your expected wealth never grows. If it does grow large, that’s evidence against the null.

An e-process is a slight relaxation: a nonnegative, adapted process that is almost surely upper-bounded by a test $\mathcal{P}$-supermartingale for each $P \in \mathcal{P}$. All test supermartingales are e-processes, but not vice versa — e-processes give you more flexibility in construction.

Ville’s inequality: the safety net

The reason thresholding works comes from Ville’s inequality for nonnegative supermartingales. For any $\alpha \in (0, 1)$, if we define the sequential test as:

$$\varphi^{(\alpha)}_n := \mathbf{1}\{W_n \geq 1/\alpha\},$$

then:

$$\sup_{P \in \mathcal{P}} P_P\!\left(\exists n \in \mathbb{N} : W_n \geq 1/\alpha\right) \leq \alpha \cdot E_P[W_1] \leq \alpha.$$

This is the game-theoretic analogue of Markov’s inequality, but it holds uniformly over time. You reject the null the moment your wealth crosses $1/\alpha$.

The multiplicative wealth process

In practice, e-processes are often constructed multiplicatively:

$$W_n = \prod_{i=1}^n \lambda_i^\top E_i,$$

where:

$E_i$ are e-values — nonnegative random variables with expected value at most 1 under the null
$\lambda_i$ is a portfolio (a vector of betting weights summing to 1) chosen predictably based on past data

Each multiplicative increment $\lambda_i^\top E_i$ has expectation $\leq 1$ under the null, so the product is a test supermartingale. Under the alternative, a well-chosen portfolio makes wealth grow exponentially.

Log-optimality: how fast can wealth grow?

When the alternative is true, we want wealth to grow as fast as possible. This leads to the notion of log-optimality.

Fix a null $\mathcal{P}$ and alternative $\mathcal{Q}$. For $Q \in \mathcal{Q}$, a $\mathcal{P}$-e-process $(W_n)_{n \in \mathbb{N}}$ is $Q$-log-optimal in a class of e-processes $\mathbb{W}$ if for any $W' \in \mathbb{W}$:

$$\liminf_{n \to \infty} \left(\frac{1}{n} \log W_n - \frac{1}{n} \log W'_n\right) \geq 0 \quad \text{Q-almost surely.}$$

In plain English: no other process in the class has a strictly larger asymptotic growth rate. The optimal process grows wealth at the Kelly-optimal rate — the same rate that maximizes expected log-wealth in repeated gambling.

Why this matters

The single-arm setting is well understood. The real challenge Sandoval, Waudby-Smith, and Jordan tackle is: what happens when you must choose which arm to sample from at each step, and you only observe the outcome of the arm you chose? This partial-information setting combines sequential testing with multi-armed bandits, and the notion of log-optimality needs to be generalized to account for the fact that you don’t observe counterfactual outcomes.

That’s where the multi-arm protocol comes in — covered in the next post.

Key takeaways

E-processes generalize test supermartingales and provide anytime-valid type-I error control via Ville’s inequality.
Wealth grows multiplicatively — each e-value is a bet, and the portfolio determines how you allocate across competing e-values.
Log-optimality means your asymptotic growth rate matches the best possible process in the class, even without knowing the true alternative distribution.
The single-arm case is the warmup — the paper’s contribution is extending all of this to multiple arms with partial information.

References

Sandoval, Waudby-Smith, and Jordan (2026). Multi-Armed Sequential Hypothesis Testing by Betting.
Ramdas et al. (2023). A review of testing by betting.
Shafer and Vovk. Probability and Finance: It’s Only a Game!
Kelly (1956). A new interpretation of information rate.