Gauss 202 derived the log probability of the full 2D sample mean. The next natural question is what happens when downstream code needs just one scalar summary, such as $z = x_1 + 2x_2$ or a projected contrast $w^\top \mathbf{x}$. Because estimators, detectors, and filters often score these one-dimensional projections, it helps to have the distribution at our fingertips.
Setup: 2D Gaussian and a linear probe
Let $\mathbf{x} = (x_1, x_2)^\top \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma)$ with
$$ \boldsymbol{\mu} = \begin{pmatrix}\mu_1 \\ \mu_2\end{pmatrix}, \qquad \Sigma = \begin{pmatrix} \sigma_1^2 & \rho \sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma_2^2 \end{pmatrix}, \qquad |\rho| < 1. $$Pick any weight vector $w = (a, b)^\top$. The scalar projection $z = w^\top \mathbf{x} = a x_1 + b x_2$ captures a linear combination of the original components.
Distribution of the projection
Joint Gaussianity plus linearity gives an immediate result:
$$ z \sim \mathcal{N}\bigl(w^\top \boldsymbol{\mu},\; w^\top \Sigma w\bigr). $$Expanding the variance for the $2\times2$ case makes the dependencies explicit,
$$ \mathrm{Var}(z) = a^2 \sigma_1^2 + b^2 \sigma_2^2 + 2ab\,\rho \sigma_1 \sigma_2. $$Key observations:
- The mean simply projects $\boldsymbol{\mu}$ onto $w$; any bias in $x_1$ and $x_2$ combines linearly.
- The variance tucks in the covariance term $2ab\,\rho \sigma_1 \sigma_2$, so positively correlated variables inflate projections that use same-sign weights, while opposite signs cancel.
- The quadratic form $w^\top \Sigma w$ generalizes to higher dimensions, but in 2D you can reason in head: start with marginal variances, then adjust for correlation.
Geometric reading
Imagine the 2D Gaussian as an ellipse. The projection $z$ measures the shadow of that ellipse along direction $w$. Rotating $w$ sweeps through all possible 1D marginals:
- When $w = (1, 0)$ you recover $x_1$ alone.
- When $w = (\cos\theta, \sin\theta)$ you read the marginal on the line making angle $\theta$ with the $x_1$ axis.
- The longest variance occurs when $w$ aligns with the ellipse’s principal axis (largest eigenvector of $\Sigma$); the shortest when it aligns with the smallest axis.
Understanding these projections is crucial when a downstream algorithm gates on a single statistic: thresholding $x_1 - x_2$ corresponds to $w=(1,-1)$, for instance.
Worked example
Suppose $\sigma_1 = 2$, $\sigma_2 = 1$, $\rho = -0.3$, and we probe with $w = (0.6, 0.8)$ (a unit vector). Set $\boldsymbol{\mu} = (0.2, -0.1)$ for concreteness.
- Mean: $w^\top \boldsymbol{\mu} = 0.6(0.2) + 0.8(-0.1) = 0.04$.
- Variance: $$ w^\top \Sigma w = 0.6^2 (4) + 0.8^2(1) + 2(0.6)(0.8)(-0.3)(2)(1) = 1.44 + 0.64 - 0.576 = 1.504. $$ So $z \sim \mathcal{N}(0.04, 1.504)$.
This tells us everything about the probability of any threshold on $z$. For example, $P(z > 1)$ is just $1 - \Phi\left(\frac{1-0.04}{\sqrt{1.504}}\right)$, no 2D integration required.
Connection to Gauss 202
Gauss 202 showed that averaging $n$ iid vectors keeps results Gaussian with shrunken covariance $\Sigma/n$. If we now project the sample mean with the same weights $w$, the distribution tightens further:
$$ w^\top \bar{\mathbf{x}} \sim \mathcal{N}\!\left(w^\top \boldsymbol{\mu}, \; \frac{1}{n} w^\top \Sigma w\right). $$This is exactly what Kalman filters or generalized least squares exploit: aggregate first, then score a linear functional whose variance is analytically known.
Simulation sanity check
import numpy as np
mu = np.array([0.2, -0.1])
sigma1, sigma2, rho = 2.0, 1.0, -0.3
Sigma = np.array([[sigma1**2, rho * sigma1 * sigma2],
[rho * sigma1 * sigma2, sigma2**2]])
w = np.array([0.6, 0.8])
samples = np.random.multivariate_normal(mu, Sigma, size=100_000)
z = samples @ w
print("Sample mean:", np.mean(z))
print("Sample var :", np.var(z))
print("Theory mean:", w @ mu)
print("Theory var :", w @ Sigma @ w)
The output shows the Monte Carlo mean and variance hugging their theoretical counterparts, reinforcing that no approximation was made—the linear combination is exactly Gaussian.
Takeaways
- A linear combination $a x_1 + b x_2$ of a 2D Gaussian remains Gaussian.
- The mean and variance reduce to $w^\top \boldsymbol{\mu}$ and $w^\top \Sigma w$, making probability queries one-dimensional.
- Choosing $w$ selects which ellipse direction you interrogate; aligning with eigenvectors reveals extreme spreads.
- Projecting the sample mean inherits the $1/n$ variance shrinkage from Gauss 202, so linear probes become more certain with larger batches.