This note covers advanced PyMC techniques for extending the framework beyond its built-in capabilities. You’ll learn how to create custom deterministic operations and define custom probability distributions when the standard library doesn’t meet your modeling needs.
Prerequisites
This guide assumes familiarity with:
- Basic PyMC modeling (see PyMC 102 and PyMC 104)
- PyTensor tensor operations
- Probability distributions and log-probabilities
- Python class inheritance
Arbitrary deterministics
Due to its reliance on PyTensor, PyMC provides many mathematical functions and operators for transforming random variables into new random variables. However, the library of functions in PyTensor is not exhaustive, therefore PyTensor and PyMC provide functionality for creating arbitrary functions in pure Python, and including these functions in PyMC models. This is supported with the as_op function decorator.
PyTensor needs to know the types of the inputs and outputs of a function, which are specified for as_op by itypes for inputs and otypes for outputs.
from pytensor.compile.ops import as_op
import pytensor.tensor as pt
@as_op(itypes=[pt.lscalar], otypes=[pt.lscalar])
def crazy_modulo3(value):
if value > 0:
return value % 3
else:
return (-value + 1) % 3
with pm.Model() as model_deterministic:
a = pm.Poisson("a", 1)
b = crazy_modulo3(a)
Important caveat: gradient limitations
An important drawback of this approach is that it is not possible for pytensor to inspect these functions in order to compute the gradient required for the Hamiltonian-based samplers. Therefore, it is not possible to use the HMC or NUTS samplers for a model that uses such an operator. However, it is possible to add a gradient if we inherit from Op instead of using as_op. The PyMC example set includes a more elaborate example of the usage of as_op.
This limitation is why models that need gradient-based sampling typically use built-in PyTensor operations. For example, the coal-mining disasters change-point model uses the simpler pm.math.switch function (which is differentiable) rather than custom Python functions for the rate switching logic.
When to use as_op
Use as_op when:
- You need a custom transformation not available in PyTensor
- You’re using discrete-only models (where gradients aren’t needed)
- You’re willing to use Metropolis or other gradient-free samplers
- The transformation is complex and would be difficult to express in PyTensor
Avoid as_op when:
- You want to use NUTS or HMC samplers
- The transformation can be expressed using built-in PyTensor operations
- Performance is critical (native PyTensor operations are faster)
- You need automatic differentiation
Modern alternative: Creating custom Ops
For more control and better integration with PyTensor, you can create custom operations by subclassing Op:
import pytensor.tensor as pt
from pytensor.graph import Op
from pytensor.graph.basic import Apply
class CrazyModulo3(Op):
def make_node(self, x):
x = pt.as_tensor_variable(x)
return Apply(self, [x], [x.type()])
def perform(self, node, inputs, outputs):
value = inputs[0]
if value > 0:
outputs[0][0] = value % 3
else:
outputs[0][0] = (-value + 1) % 3
crazy_modulo3 = CrazyModulo3()
with pm.Model() as model_deterministic:
a = pm.Poisson("a", 1)
b = crazy_modulo3(a)
This approach:
- Avoids deprecation warnings from
as_op - Provides better integration with PyTensor’s graph system
- Allows for more sophisticated behavior and optimization
- Still has the same gradient limitations unless you implement
gradmethod
Arbitrary distributions
Similarly, the library of statistical distributions in PyMC is not exhaustive, but PyMC allows for the creation of user-defined functions for an arbitrary probability distribution. For simple statistical distributions, the CustomDist class takes as an argument any function that calculates a log-probability $\log(p(x))$. This function may employ other random variables in its calculation.
Basic custom distributions with CustomDist
Here is an example inspired by a blog post by Jake Vanderplas on which priors to use for a linear regression (Vanderplas, 2014).
import pymc as pm
import pytensor.tensor as pt
import numpy as np
# Generate sample data for demonstration
np.random.seed(42)
X = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0])
Y = np.array([2.5, 4.1, 5.8, 8.2, 10.5, 12.1, 14.3, 16.8, 18.9, 21.2])
with pm.Model() as model:
alpha = pm.Uniform('intercept', -100, 100)
# Create variables with custom log-densities
beta = pm.CustomDist('beta', logp=lambda value: -1.5 * pt.log(1 + value**2))
eps = pm.CustomDist('eps', logp=lambda value: -pt.log(pt.abs_(value)))
# Create likelihood
like = pm.Normal('y_est', mu=alpha + beta * X, sigma=eps, observed=Y)
The custom log-probability functions can use any PyTensor operations, including:
- Mathematical operations:
pt.log,pt.exp,pt.sqrt, etc. - Conditional logic:
pt.switch,pt.where - Array operations:
pt.sum,pt.prod,pt.mean - Other random variables (for hierarchical models)
Advanced custom distributions
For more complex distributions, one can create a subclass of Continuous or Discrete and provide the custom logp function, as required. This is how the built-in distributions in PyMC are specified. As an example, fields like psychology and astrophysics have complex likelihood functions for particular processes that may require numerical approximation.
Example: Custom distribution with RandomVariable
Implementing the beta variable above as a Continuous subclass is shown below, along with an associated RandomVariable object, an instance of which becomes an attribute of the distribution.
import pytensor.tensor as pt
import pymc as pm
class BetaRV(pt.random.op.RandomVariable):
name = "beta"
ndim_supp = 0
ndims_params = []
dtype = "floatX"
@classmethod
def rng_fn(cls, rng, size):
raise NotImplementedError("Cannot sample from beta variable")
beta = BetaRV()
Example: Full custom distribution subclass
class Beta(pm.Continuous):
rv_op = beta
@classmethod
def dist(cls, mu=0, **kwargs):
mu = pt.as_tensor_variable(mu)
return super().dist([mu], **kwargs)
def logp(self, value):
mu = self.mu
return beta_logp(value - mu)
def beta_logp(value):
return -1.5 * pt.log(1 + (value) ** 2)
with pm.Model() as model:
beta = Beta("beta", mu=0)
This example shows:
- Creating a custom
RandomVariableclass (BetaRV) - Defining the distribution’s properties (name, dimensionality, data type)
- Implementing a
Continuousdistribution subclass (Beta) - Providing the
distclass method for parameter handling - Implementing the
logpmethod for log-probability calculation - Using the custom distribution in a model
Key components of custom distributions
When creating a custom distribution subclass, you typically need to implement:
rv_op: The RandomVariable operation associated with the distributiondist()classmethod: Constructs the distribution with given parameterslogp()method: Computes the log-probability density/masslogcdf()method (optional): Computes the log cumulative distribution functionrng_fn()classmethod: Defines how to sample from the distribution (in the RV class)
Using as_op for log-probability functions
If your logp cannot be expressed in PyTensor, you can decorate the function with as_op as follows:
from pytensor.compile.ops import as_op
@as_op(itypes=[pt.dscalar], otypes=[pt.dscalar])
def custom_logp(value):
# Pure Python computation
return some_complex_calculation(value)
with pm.Model() as model:
custom_var = pm.CustomDist('custom', logp=custom_logp)
Note that this will create a blackbox Python function that will be much slower and not provide the gradients necessary for e.g. NUTS. This should only be used as a last resort when the log-probability cannot be expressed using PyTensor operations.
Benefits of custom distributions
Creating a custom distribution subclass allows you to:
- Define complex probability models not available in the standard library
- Implement domain-specific likelihoods (e.g., for specialized scientific applications)
- Add custom validation and parameterization logic
- Provide custom sampling methods if needed
- Share and reuse distributions across multiple models
- Integrate with PyMC’s automatic inference machinery
This flexibility makes PyMC extensible to virtually any probabilistic model, from standard textbook examples to cutting-edge research applications.
Use cases and examples
Use case 1: Non-standard likelihood functions
In fields like psychology or neuroscience, you might have complex likelihood functions based on cognitive models:
def drift_diffusion_logp(rt, choice, drift, threshold, non_decision):
"""Log-probability for drift diffusion model"""
# Complex calculation involving numerical integration
# or lookup tables
return logp_value
with pm.Model() as ddm_model:
drift = pm.Normal('drift', 0, 1)
threshold = pm.Gamma('threshold', 1, 1)
non_decision = pm.Uniform('non_decision', 0, 1)
likelihood = pm.CustomDist('rt',
drift, threshold, non_decision,
logp=drift_diffusion_logp,
observed={'rt': rt_data, 'choice': choice_data})
Use case 2: Astrophysical models
In astrophysics, you might need custom distributions for modeling astronomical phenomena:
def lognormal_mixture_logp(x, mu1, sigma1, mu2, sigma2, weight):
"""Mixture of two lognormal distributions"""
logp1 = -pt.log(x * sigma1 * pt.sqrt(2 * np.pi)) - 0.5 * ((pt.log(x) - mu1) / sigma1)**2
logp2 = -pt.log(x * sigma2 * pt.sqrt(2 * np.pi)) - 0.5 * ((pt.log(x) - mu2) / sigma2)**2
return pt.log(weight * pt.exp(logp1) + (1 - weight) * pt.exp(logp2))
Use case 3: Truncated and censored distributions
When standard truncation isn’t sufficient:
def doubly_truncated_normal_logp(x, mu, sigma, lower, upper):
"""Normal distribution truncated on both sides"""
z = (x - mu) / sigma
z_lower = (lower - mu) / sigma
z_upper = (upper - mu) / sigma
normalizing_constant = pt.log(
0.5 * (pt.erf(z_upper / pt.sqrt(2)) - pt.erf(z_lower / pt.sqrt(2)))
)
return -0.5 * z**2 - pt.log(sigma * pt.sqrt(2 * np.pi)) - normalizing_constant
Best practices
- Start simple: Try
CustomDistwith a lambda function before creating full subclasses - Test thoroughly: Verify your log-probability is correct with known cases
- Document assumptions: Clearly state what your custom distribution represents
- Check gradients: If using NUTS/HMC, ensure gradients are available and correct
- Consider performance: Native PyTensor operations are much faster than Python loops
- Validate numerically: Check for numerical stability (overflow, underflow, NaN)
- Provide examples: Include usage examples when sharing custom distributions
Common pitfalls
- Forgetting normalization constants: Your logp must include all terms, including constants
- Mixing Python and PyTensor: Stick to PyTensor operations inside logp functions
- Not handling edge cases: Check for division by zero, log of negative numbers, etc.
- Incorrect broadcasting: Ensure your logp handles array inputs correctly
- Performance issues: Avoid Python loops; use PyTensor’s vectorized operations
Testing custom distributions
Always test your custom distributions before using them in production:
# Test 1: Check that logp evaluates
test_value = 1.0
logp_result = my_custom_dist.logp(test_value)
print(f"logp({test_value}) = {logp_result.eval()}")
# Test 2: Compare with known distribution (if possible)
from scipy import stats
scipy_logp = stats.norm.logpdf(test_value, loc=0, scale=1)
pymc_logp = pm.Normal.dist(0, 1).logp(test_value).eval()
assert np.isclose(scipy_logp, pymc_logp)
# Test 3: Check gradients (for continuous distributions)
import pytensor
x = pt.scalar('x')
logp_expr = my_custom_logp(x)
grad_expr = pytensor.gradient.grad(logp_expr, x)
grad_fn = pytensor.function([x], grad_expr)
print(f"Gradient at {test_value}: {grad_fn(test_value)}")
Resources
Summary
Custom operations and distributions are powerful tools for extending PyMC to handle specialized modeling needs. While the built-in library covers most common use cases, understanding how to create custom components allows you to:
- Implement domain-specific models from scientific literature
- Prototype new statistical methods
- Handle unusual data structures or likelihood functions
- Push the boundaries of probabilistic programming
Remember to start with the simplest approach (CustomDist with lambda functions) and only move to more complex implementations (subclassing distributions or creating custom Ops) when necessary. Always validate your implementations thoroughly before using them in production analyses.