Sampling Large VMF Mixtures Without Evaluating Them

Rendering photorealistic images means simulating how light bounces through a scene. With enough bounces and enough samples, you get a beautiful image. The catch is that “enough” can mean millions of samples, and most of them contribute almost nothing. My thesis is about making those samples count.

Path tracing is a Monte Carlo method for solving the rendering equation. You trace rays from the camera into the scene, let them bounce around, and accumulate light contributions along the way. The variance of this process depends heavily on how you choose directions at each bounce. Random uniform sampling works, but it wastes most of its budget on directions that contribute little.

Importance sampling is the standard technique in Monte Carlo integration for reducing variance by preferentially sampling high-contribution paths. Path guiding is one approach to importance sampling. It learns where incoming light comes from and steers sampling towards those directions. One way to represent this learned distribution is with a mixture model, where multiple probability distributions combine to approximate the incoming radiance at a surface point. von Mises-Fisher (VMF) mixtures work for this because they are defined on the sphere, matching the domain of light directions.

The problem is that evaluating the probability density of a sample requires summing over every component:

$$p(x) = \sum_{i=1}^{N} w_i \cdot p_i(x)$$

With a few thousand components, that O(N) sum dominates the computation, especially in real-time rendering where you might only get one sample per pixel per frame.

Simply skipping components introduces bias. The PDF you compute is wrong, and no matter how many samples you take, the image will never converge to the correct result.

My thesis introduces a partial estimator that solves this.

In standard Monte Carlo integration, you estimate an integral by dividing sample values by their probability. The partial estimator does the same thing, but with a twist. When you draw a sample from the full mixture, you know which component generated it. I call this the origin.

If the origin is one of the components you decided to evaluate, you use the standard Monte Carlo estimator with the partial PDF instead of the full PDF. If the origin is not in your subset, you return zero:

$$ \hat{I}(x) = \begin{cases} \frac{f(x)}{p_p(x)} & \text{if origin} \in \text{subset} \\ 0 & \text{otherwise} \end{cases} $$

At first glance, returning zero seems like it would bias the result. But the probability of the origin being in your subset is exactly the ratio of partial to full PDF. These factors cancel in expectation, and the estimator remains unbiased.

You might wonder why we don’t just always include the origin in the subset. The catch is that the selection process can depend on the sample direction, but it cannot depend on which component generated that sample. If you picked your subset based on the origin, the math breaks and the factors no longer cancel. Any selection works as long as it doesn’t peek at the origin. I only focus on deterministic selection methods, but the same math works for random selections as well, given that the selection process is independent of the origin component.

This lets you evaluate just a handful of components instead of thousands while still converging to the correct image.

Interactive slice selection

Morton slice selection in action. Red ellipses show which VMF components are included in the partial model as the query direction changes.

The estimator is unbiased regardless of which components you select, but the variance depends entirely on the selection. I evaluated several approaches:

Spatial locality. Components whose mean direction is close to the sample direction are likely relevant. Fast to compute, but misses components that are far away yet still contribute significantly.
Tree search. Build a hierarchy and traverse it to find promising components. Can get stuck on false hotspots where merged nodes look good but their children don’t.
Precomputation. Store the best components for a grid of directions ahead of time. Works well for static mixtures, but in online learning scenarios the mixture changes over time and would require recomputation.

The README has the full breakdown.

Writing GPU shaders in Rust

The framework uses rust-gpu to compile Rust to SPIR-V. The same data structures and algorithms run on both CPU and GPU. Unit tests run natively, no GPU involved.

During the entire development of this thesis, I never had to debug on the GPU. The only GPU-specific issue was a rust-gpu codegen bug in the rotating right-shift operation. Everything else was caught by CPU tests.

That’s a pretty strong argument for sharing code between CPU and GPU.