Intro to Sampling
Hello! This article will introduce the concept of sampling  starting with Riemann sums, then moving on to uniform sampling, stratified sampling, and finishing with importance sampling. This article will explain sampling through a combination of text and interactive visual demos. If you're interested in a rigourous mathematical introduction to sampling, you may want to look elsewhere. But, if you like interactive demos, you've come to the right place! Before you begin reading, I suggest you have a good understanding of how to graph and evaluate functions. If you've taken Calculus 1 and 2, you should be able to understand this article.
Let's get started!
Riemann sums
Before we talk about sampling, let's talk about integrals. We typically think of an integral as the "area under a curve". Some functions are easy to integrate; we can write a nice formula to determine the exact area under a curve. However, some functions cannot be easily integrated, and we have to evaluate them analytically. One simple way to analytically evaluate an integral is called a Riemann sum. In a Riemann sum, we evaluate a function at evenly spaced points along the curve, then draw a rectangle with the height of the evaluated point. We calculate the area of each rectangle, sum the areas, and we get an approximation for the integral. Let's see what this looks like visually.
Riemann sum of y = 3x^{2} from 0 to 1.
Exact Area: 1.0
If you tinker with the visualization above, you can see how to approximate an integral with a Riemann sum. As the number of rectangles increases, the approximation gets better and better, and the error in the approximation is reduced.
🧠 Learn More
Random Sampling
But is there a different way to approximate the integral? Instead of evaluating the function at evenly spaced points on the curve, what if we evaluate at randomly selected points? Let's try it!
Approximation of y = 3x^{2} from 0 to 1.
Exact Area: 1.0
Since the function isn't evaluated at evenly spaced intervals, the rectangles overlap, so you can click the 'Align Rectangles' checkbox to eliminate the overlap and align the rectangles like the first visualization.
The randomly sampled integral estimation yields a slightly different result than the Riemann sum. Like the Riemann sum, the random approximation gets better as the number of rectangles (also called samples) increases. However, the randomly approximated area is different each time you evaluate the integral. This causes the error in the approximation to vary as well. Sometimes the random approximation error is smaller than the Riemann sum error, sometimes it is larger.
Let's visualize the error terms in a histogram. The histogram will show the errors generated from approximating the integral many times.
A histogram showing the error between the randomly approximated area and exact area.
Number of Approximations: 500
This graph is expensive to generate, especially for high number of samples, so you'll need to press the 'Generate Histogram' button to render the graph.
At a low number of samples per approximation, there is a large variation in the error terms. As the number of samples is increased, the errors converge towards 0. The amount of variation in the error terms is called variance. Variance is a measurement of how spread out a group of numbers is from the average value of the group. We can see that the variance in the error terms decreases towards 0 as the number of samples increases. A variance of 0 means that the random approximation is equal to the exact value of the integral, which is what we want. Our goal is to reduce the variance in the error terms as much as possible. If we compare the variances of our randomly sampled integral and Riemann sum approximation, we can determine which integration strategy produces a better approximation.
Display Options
Graph Options
The variance of the Riemann sum integral is always smaller than the variance of the randomly sampled integral. This means that our Riemann sum approximation is generally better than our random sampling strategy. However, even though Riemann has a smaller variance, we could still get lucky and run into a case where random sampling is better than Riemann. If you look at the visualization above, you can see a graph of the probability that random sampling has a smaller error than Riemann. You'll see that the random sampling has less than 30% chance of a better approximation once the sample count is larger than 20. Those are not good odds 😢. At the moment, it looks like Riemann gives a better approximation. But can we do better? Of course we can!
🧠 Learn More
Stratified sampling
Let's take a slight detour to talk about stratified sampling. In uniform random sampling (what we did previously), there are cases where the random samples clump together, leaving large gaps in the interval. Usually, we do not like large gaps. We want our samples to be randomly distributed, but not clumped. How can we avoid this? With stratified sampling! In stratified sampling, we split the interval into smaller subintervals (called strata), then generate random samples in each stratum.
Stratified sampling vs uniform sampling
Number of samples:
Stratified sampling reduces the clumping and empty spots present in uniform sampling. However, there are a few disadvantages to this approach. The number of stratified samples must be a multiple of the number of strata. We want to ensure that each stratum has the same number of samples as all the others. If the number of samples is not a multiple of the number of strata, some of the strata would have more samples than others. This would introduce bias into our sampling strategy, which we do not want right now. We want stratified samping to remain unbiased, just like uniform sampling.
Let's use stratified sampling to approximate an high frequency function.
Stratified sampling vs uniform sampling,
Graph of y = 25 * (max(0.7, sin(6 * x))  0.7) + 0.5 from 0 to 10.
Exact Area: 18.11364
Approximated Area:
Difference:
Number of samples:
Display Options
Graph Options
Based on the data, there isn't much difference between stratified and uniform sampling. However, the Riemann sum approximation occasionally does a very poor job of approximating the function at low sample counts. This is partially because the Riemann sum samples the function at evenly spaced intervals, while random sampling does not.
🧠 Learn More
Applications of Stratified Sampling
While we saw negligible results when using stratified sampling, there are cases where it yields a better approximation than uniform sampling. This can occur when sampling area lights, or generating camera rays through a pixel or viewport.
🧠 Learn More
PDFs and CDFs
So far, we've been generating random samples that are evenly distributed over the interval from 0 to 1. Essentially, this means if we generate a large number of random samples, we expect 50% of them to be in less than 0.5, and 50% of them to be greater than 0.5. But what if we want to generate samples so that 33% of the samples are less than 0.5 and 67% of the samples are greater than 0.5?
To accomplish this, we need to use a Probability Density Function, or PDF. A PDF is a function that describes the relative likelyhood that a random value is chosen. So far, we've been using a uniform PDF, which has the equation y = 1. This means that each random value is equally likely to appear.
A PDF is defined over the interval from 0 to 1. If the integral of the PDF from 0 to 1 is equal to 1, the PDF is normalized. If a PDF is not normalized, a normalized PDF can be created by dividing the PDF by the integral from 0 to 1. We will use only normalized PDFs going forward.
The integral of a PDF is called a Cumulative Distribution Function, or CDF. While PDF(x) descibes the relative probability of x being chosen from the distribution, CDF(x) describes the probability that a randomly generated value will have a value less than x. This means that CDF(0) = 0 and CDF(1) = 1.
How do we select a good PDF? The shape of the PDF should closely match the shape of the function that is being approximated. The closer the shapes, the better the approximation will be. Conversely, choosing a PDF with a shape that does not closely match the function will result in poor approximations. Additionally, if we have a separable function, like y = f(x) * g(x), using either f(x) or g(x) as the PDF will lead to good results.
Inverse Transform Sampling
Inverse Transform Sampling is a process to generate random samples that are distributed based on a desired PDF. Here are the steps.

Create a normalized PDF.

Integrate the PDF to create the CDF, y = CDF(x).

Create a function that is the inverse of the CDF, x = CDF(y) or y = CDF^{1}(x).

Generate a random value from a uniform random distribution (what we've been using so far), u.

Evaluate CDF^{1}(u).
As an example, let's use y = 2x as a PDF and follow the Inverse Transform Sampling steps.

y = 2x is a normalized PDF.

The integral of y = 2x is y = x^{2}.

The inverse function of y = x^{2} is y = sqrt(x).
So, if we plug in numbers from a uniform random number generator into sqrt(x), they will follow the distribution of y = 2x. Fortunately for us, sqrt(x) is an easy function to evaluate, so it will be very easy to generate samples distributed along the PDF. However, not all functions have nice integrals and inverses, so it's important to pick functions that can be quickly evalutated.
Below is a visualization that shows you various PDFs, CDFs, and how they generate different distributions of random samples.
PDF, CDF, Inverse Transform Sampling
Number of Points:
Display Options
🧠 Learn More
Importance Sampling
Is it finally time to use our PDF to generate random samples and approximate an integral? Almost. Using a nonuniform PDF will cause generated samples to be more or less dense in different regions of the interval. In areas where samples are more dense, each rectangle generated should have a smaller width, since there are more samples to better approximate the dense area. Likewise, in areas of less sample density, the width of the rectangles should increase, since that rectangle needs to approximate more of the function because fewer samples will be in that region. But how much to we scale these rectangle widths by? By the value of the PDF at the random sample point. Take a look at this concept in the visualization below.
Uniform sampling vs Importance sampling
With all this in mind, lets try using importance sampling on a separable equation and see what happens!
The graph of y = 2x * g(x) from 0 to 3 approximated with importance sampling.
Exact Area: 11.793940
Approximated Area:
Error:
Riemann Sum Error:
Number of Rectangles:
Display Options
Let's compare importance sampling to Riemann and see which evaluation strategy produces a lower variance.
Display Options
Graph Options
Hooray! With importance sampling, we are able to get an approximation with a lower variance than the Riemann sum! Also, if you look at the probability graph, you'll notice that importance sampling has a 99% chance of producing a closer approximation than Riemann when there are less than 20 samples. Additionally, while uniform sampling drops down to around 10% at 200 samples, importance sampling is around 50%. You may also notice that our third PDF does a terrible job of approximating the function; it has a variance that bottoms out around 50, and a probability of better appximation that drops to 0% almost instantly.
In this specific case, importance sampling is almost always better when the sample count less than 20 samples. On higher dimensional functions, this '20' number will be larger, since higher dimensional functions require more samples to approximate accurately. Because of this, we should use importance sampling wherever possible.
🧠 Learn More
Applications of importance sampling
Impotance sampling is heavily used in the computer graphics field. It is most commonly used in path tracers when solving the rendering equation. Sometimes, cosineweighted importance sampling is used (cos(x) is the PDF). Other times we use BRDF importance sampling, where the BRDF of the material (or parts of it) is the PDF. In other cases, we need to use multiple PDFs to efficiently sample the entire interval of an integral. This is called multiple importance sampling, and it's a topic for another day 😊.
🧠 Learn More
Conclusion
Sampling is a pretty useful technique for evaluating integrals. I hope you enjoyed the demos and learned something new!