Instrumental variables (IVs) are the workhorses of causal inference when randomized experiments are out of reach and confounding blocks the backdoor path. Yet the identifying assumptions often feel abstract until you see how instruments reshape the causal graph and estimation steps. This post mirrors the collider walkthrough with fresh diagrams to emphasize what the instrument must—and must not—do.
Visualizing the IV setup
The canonical IV story introduces an observed instrument Z that nudges treatment X without touching the outcome Y except through X. Any unobserved confounder U may still influence both X and Y, but it must remain disconnected from Z.
Z (left) affects the treatment X, which then changes the outcome Y. Unobserved confounders U may bias the direct X→Y link, but because Z breaks the confounding tie, it recovers variation in X that is orthogonal to U.This structure encodes the three core conditions in every textbook—from Angrist & Pischke to Wooldridge—that keep IV designs credible:
- Relevance:
ZshiftsXin the data. Without a first stage, the instrument is moot. - Exclusion:
ZinfluencesYonly throughX. Any directZ → Yarrow breaks identification. - Independence:
Zis as-if randomized with respect to the unobserved causes ofY, keepingZandUindependent.
Diagnosing strength and exclusion in practice
Two-stage least squares (2SLS) makes the IV workflow explicit: isolate the part of X explained by Z, then use only that predicted component to estimate the effect on Y. Visualizing the pipeline highlights where diagnostics belong.
X̂. Stage 2 regresses the outcome on X̂. Weak-instrument tests (like the first-stage F-statistic) live in Stage 1, while overidentification checks probe exclusion by comparing multiple instruments against the Stage 2 fit.When reviewing an IV design:
- Inspect the Stage 1 fit—F-statistics above 10 are the conventional cutoff, but context matters.
- Probe exclusion restrictions with subject-matter knowledge or overidentification tests when you have multiple instruments.
- Trace alternative pathways in the DAG to ensure
Zis not a proxy for a hidden policy, trend, or selection effect that touchesYdirectly.
Workflow checklist
- Map the causal story as a DAG and call out every path that connects
ZtoY. - Justify why each non-
Z→X→Ypath stays closed (policy rules, timing, institutional design). - Quantify the first-stage relationship and communicate diagnostics, not just point estimates.
- Report sensitivity or robustness checks (placebo outcomes, alternative controls) that stress-test the exclusion restriction.
- Revisit the design whenever the instrument changes scope—new cohorts or jurisdictions can break independence.
Further reading
- Angrist & Pischke, Mostly Harmless Econometrics for econometric intuition and the 2SLS playbook.
- Joshua Angrist’s MIT lecture notes for graphical IV examples and diagnostics.
- Imbens & Rubin, Causal Inference for Statistics, Social, and Biomedical Sciences for LATE interpretations and monotonicity.
- Miguel Hernán & James Robins, Causal Inference: What If for potential-outcomes treatments of instruments alongside DAG reasoning.