Path Tracer: Physically Based Rendering
Table of Contents
Introduction#
A path tracer is a rendering technique that simulates the behavior of light in a scene. It uses Monte Carlo method to estimate the radiance at each pixel of an image by tracing the path of light through the scene. The algorithm is iterative and parallel in nature, so it runs intuitively and fairly well on CUDA. And it is able to simulate many effects that are difficult with other rendering techniques, such as soft shadows, depth of field, caustics, ambient occlusion, and indirect lighting.
Coffee Shop | Stanford Bunny |
Cow | Gear |
All the above scenes were rendered in 2000x2000 resolution with 1000 samples per pixel and 8 light bounces.
And we also have an interesting mirror scene, where a glossy sphere is placed in a cube with all sides mirrors, rendered in 2000x2000 resolution with 200 samples per pixel and with different number of light bounces.
1 Bounce | 8 Bounces | 64 Bounces |
---|---|---|
Visual Features#
Material System#
Material system is adopted from glTF Specification.
- Albedo: The color of the material.
- Metallic: The ratio of diffuse and specular reflection. A value of
0
means the material is dielectric, and a value of1
means the material is metal. - Roughness: The roughness of the material. A value of
0
means the material is perfectly smooth, and a value of1
means the material is pure diffuse reflection. - IOR: The index of refraction of the material. A value of
1
means the material is vacuum, and a value of1.5
is a good compromise for most opaque, dielectric materials. - Opacity: The opacity of the material. A value of
0
means the material is fully transparent, and a value of1
means the material is fully opaque. - Emittance: The emittance of the material. A value of
0
means the material is not emissive, and a value greater than0
means the material is emissive, controlling the brightness of the material.
Using the metallic
and roughness
parameter, the material can be either dielectric or metal, and its reflection model can be either diffuse or specular. In addition with multiple importance sampling, the path tracer is able to render imperfect specular materials and produce better roughness effect. Also, by controlling the ior
and opacity
of dielectrics, the material can produce glass-like refraction with fresnel effect.
Diffuse | Imperfect Specular | Pure Specular | |
---|---|---|---|
Dielectric | |||
Metal | |||
Glass |
Use the material system, we can mimic many real-world materials. For example, we have the following materials like aluminum, titanium, stainless steel, and different glasses.
More of Metal | More of Glass |
---|---|
And many Suzanne
Glass | Aluminum | Yellow Plastic | Stainless Steel |
---|---|---|---|
All scenes were rendered in 800x800 resolution with 2000 spp and 8 light bounces.
Anti-Aliasing#
Anti-aliasing can be achieved by jittering rays within a pixel. In the following example, the image is rendered in low resolution to exaggerate the effect.
AA OFF | AA ON |
---|---|
All scenes were rendered in 200x200 (up-sampled to 800x800) resolution with 2000 spp and 8 light bounces.
Physically-Based Depth-of-Field#
Depth-of-field can be achieved by jittering rays within an aperture. In the following example, the aperture is modeled as a circle with a radius of 0.5 and the focal length is 10.
DoF OFF | DoF ON |
---|---|
All scenes were rendered in 800x800 resolution with 2000 spp and 8 light bounces.
Mesh Loading#
With the help of tinyobjloader and tinygltf libraries, the path tracer is able to load .obj
and .gltf
files (partially). Thus, we can render more complex scenes, and put more stress on the path tracer.
Procedural Textures#
Procedural textures can be achieved by using the barycentric interpolated uv coordinate of the intersection point. There is hardly any performance impact. Check out the following example.
Gradient Mario | Checkerboard Mario |
---|---|
All scenes were rendered in 800x800 resolution with 1000 spp and 8 light bounces.
Open Image Denoise#
Open Image Denoise is a high-performance, high-quality denoising library for ray tracing. It is able to remove noise from rendered images without losing much details. Additional filters like albedo and normal map are added to the denoiser pre-filter to improve the quality of the denoised image.
The denoiser is integrated into the system as a post-processing step. Triggered every fixed number of intervals, the denoised image is merged to the original image using exponential moving average.
$$ \text{Image} = (1 - \alpha) * \text{Image} + \alpha * \text{Denoised} $$
Although it does have a small impact on the performance, the quality of the image is significantly improved and we could get a much cleaner image with the much fewer number of samples.
The following example shows the effect of the denoiser with 200 samples per pixel, a relatively low sample rate.
Denoiser OFF | Denoiser ON |
---|---|
All scenes were rendered in 800x800 resolution with 200 spp and 8 light bounces.
Performance Features#
Stream Compaction#
When a ray hits a light source, goes into void, or exceeds the maximum number of bounces, it is terminated. The terminated rays are removed from the ray pool using stream compaction. Luckily the stream compaction algorithm is already implemented in the CUDA Thrust library, we can use thrust::remove_if
or in this case thrust::partition
to remove the terminated rays from the ray pool. Any custom work efficient stream compaction implementation with shared memory optimization and bank conflict avoidance, like Project2-Stream-Compaction, will do just fine.
First Bounce Caching#
When anti-aliasing is not enabled, the first ray from the camera is always the same for every iteration. So we can cache the first ray bounce and reuse it for every iteration. However, this optimization is not particularly useful when more advanced visual features like anti-aliasing, depth-of-field, and motion blur are enabled.
Material Sorting#
Additionally, we could sort the rays by material type to improve the performance. The idea is that rays with the same material type will have similar process time so that we can reduce warp divergence. However, this optimization later proved to be not very useful and even harmful to the performance. The reason is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.
Bounding Volume Hierarchy#
Bounding volume hierarchy (BVH) is a tree structure on top of the scene geometry to accelerate ray tracing. The idea is to group the scene geometry into a hierarchy of bounding volumes, and the ray tracer can quickly discard the entire group of primitives if the ray does not intersect with the bounding volume.
Image from PBRT 4.3 is a good illustration of BVH true. The BVH is built using the equal count partition method, which tries to split the primitives into two equal count groups. The BVH is built on the CPU in a linear buffer (heap like structure) and then copied to the GPU for ray tracing. BVH could be potentially optimized by utilizing SAH (Surface Area Heuristic) and building the BVH directly on the GPU.
Performance Analysis#
Let’s take a look at the performance of the path tracer with different features enabled. Stream compaction plays a important role in the correctness of the algorithm in addition to its performance benefits. So stream compaction will be enabled in all tests and we will use path tracer with only stream compaction method enabled as the baseline.
Cornell-Metal and Cornell-Glass are simple scenes with metal or glass material balls in side the cornell box. Those spheres is not in the mesh system therefore BVH has no effect on the performance.
More complex scenes like Mario-Metal and Mario-Glass are the same as the previous two scenes except that the spheres are replaced with Mario mesh. The mesh system is able to load .obj
files or .gltf
files (partially). The number of triangles in the Mario mesh is about 5,000.
Lastly the Teapot-Complex scene consists of 5 teapots with different materials. The teapots are loaded from .obj
file. The teapots are uniformly placed in the scene and the total number of triangles is about 50,000.
Observations#
- Material Sorting is not a good optimization. It is slowing down the path tracer. The reason, as hinted before, is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.
- First Bounce Caching has limited performance improvement. The reason is that the first bounce is only a small part of the entire ray tracing process. Besides, when enabling more advanced visual features like anti-aliasing, depth-of-field, and motion blur, the first bounce is no longer the same for every iteration.
- BVH is mind-blowing. It is able to improve the performance by a factor of 15x and reducing the rendering time by 90%! BVH enables quick discard of groups of primitives if the ray does not intersect with the bounding volume. This is especially useful when the scene is complex and the number of primitives is large. Although BVH traversal requires additional global memory access, the performance gain is still significant.
Possible Improvements#
- Subsurface scattering
- Wavelength dependent refraction
- Volumetric rendering
- Texture and normal map
- Motion blur
- Environment map
- Better random number generator
- BVH with SAH and BVH on GPU
- Occupancy optimization
- Shared memory optimization
References#
- Physically Based Rendering: From Theory To Implementation
- glTF Specification and Example BxDF Implementation
- GPU-based Importance Sampling
- Axis-Aligned Bounding Box (AABB) intersection algorithm
- Iterative BVH Traversal with near \(O(1)\) Memory
- Open Image Denoise