Skip to main content
  1. Posts/

Path Tracer: Physically Based Rendering

·1593 words·8 mins
cuda parallel computing physically based rendering ray tracing
Zhenzhong "Anthony" Tang
Author
Zhenzhong “Anthony” Tang
Student, Researcher, and Developer
Table of Contents
Path Tracing - This article is part of a series.
Part 1: This Article

Tested on: Windows 11 Pro 22H2, AMD EPYC 7V12 64-Core Processor (4 vCPU cores) @ 2.44GHz 28GiB, Tesla T4 16GiB (Azure)

Introduction
#

A path tracer is a rendering technique that simulates the behavior of light in a scene. It uses Monte Carlo method to estimate the radiance at each pixel of an image by tracing the path of light through the scene. The algorithm is iterative and parallel in nature, so it runs intuitively and fairly well on CUDA. And it is able to simulate many effects that are difficult with other rendering techniques, such as soft shadows, depth of field, caustics, ambient occlusion, and indirect lighting.

Coffee ShopStanford Bunny
CowGear

All the above scenes were rendered in 2000x2000 resolution with 1000 samples per pixel and 8 light bounces.

And we also have an interesting mirror scene, where a glossy sphere is placed in a cube with all sides mirrors, rendered in 2000x2000 resolution with 200 samples per pixel and with different number of light bounces.

1 Bounce8 Bounces64 Bounces

Visual Features
#

Material System
#

Material system is adopted from glTF Specification.

  • Albedo: The color of the material.
  • Metallic: The ratio of diffuse and specular reflection. A value of 0 means the material is dielectric, and a value of 1 means the material is metal.
  • Roughness: The roughness of the material. A value of 0 means the material is perfectly smooth, and a value of 1 means the material is pure diffuse reflection.
  • IOR: The index of refraction of the material. A value of 1 means the material is vacuum, and a value of 1.5 is a good compromise for most opaque, dielectric materials.
  • Opacity: The opacity of the material. A value of 0 means the material is fully transparent, and a value of 1 means the material is fully opaque.
  • Emittance: The emittance of the material. A value of 0 means the material is not emissive, and a value greater than 0 means the material is emissive, controlling the brightness of the material.

Using the metallic and roughness parameter, the material can be either dielectric or metal, and its reflection model can be either diffuse or specular. In addition with multiple importance sampling, the path tracer is able to render imperfect specular materials and produce better roughness effect. Also, by controlling the ior and opacity of dielectrics, the material can produce glass-like refraction with fresnel effect.

DiffuseImperfect SpecularPure Specular

Dielectric

Metal

Glass

Use the material system, we can mimic many real-world materials. For example, we have the following materials like aluminum, titanium, stainless steel, and different glasses.

More of MetalMore of Glass

And many Suzanne

GlassAluminumYellow PlasticStainless Steel

All scenes were rendered in 800x800 resolution with 2000 spp and 8 light bounces.

Anti-Aliasing
#

Anti-aliasing can be achieved by jittering rays within a pixel. In the following example, the image is rendered in low resolution to exaggerate the effect.

AA OFFAA ON

All scenes were rendered in 200x200 (up-sampled to 800x800) resolution with 2000 spp and 8 light bounces.

Physically-Based Depth-of-Field
#

Depth-of-field can be achieved by jittering rays within an aperture. In the following example, the aperture is modeled as a circle with a radius of 0.5 and the focal length is 10.

DoF OFFDoF ON

All scenes were rendered in 800x800 resolution with 2000 spp and 8 light bounces.

Mesh Loading
#

With the help of tinyobjloader and tinygltf libraries, the path tracer is able to load .obj and .gltf files (partially). Thus, we can render more complex scenes, and put more stress on the path tracer.

Procedural Textures
#

Procedural textures can be achieved by using the barycentric interpolated uv coordinate of the intersection point. There is hardly any performance impact. Check out the following example.

Gradient MarioCheckerboard Mario

All scenes were rendered in 800x800 resolution with 1000 spp and 8 light bounces.

Open Image Denoise
#

Open Image Denoise is a high-performance, high-quality denoising library for ray tracing. It is able to remove noise from rendered images without losing much details. Additional filters like albedo and normal map are added to the denoiser pre-filter to improve the quality of the denoised image.

The denoiser is integrated into the system as a post-processing step. Triggered every fixed number of intervals, the denoised image is merged to the original image using exponential moving average.

$$ \text{Image} = (1 - \alpha) * \text{Image} + \alpha * \text{Denoised} $$

Although it does have a small impact on the performance, the quality of the image is significantly improved and we could get a much cleaner image with the much fewer number of samples.

The following example shows the effect of the denoiser with 200 samples per pixel, a relatively low sample rate.

Denoiser OFFDenoiser ON

All scenes were rendered in 800x800 resolution with 200 spp and 8 light bounces.

Performance Features
#

Stream Compaction
#

When a ray hits a light source, goes into void, or exceeds the maximum number of bounces, it is terminated. The terminated rays are removed from the ray pool using stream compaction. Luckily the stream compaction algorithm is already implemented in the CUDA Thrust library, we can use thrust::remove_if or in this case thrust::partition to remove the terminated rays from the ray pool. Any custom work efficient stream compaction implementation with shared memory optimization and bank conflict avoidance, like Project2-Stream-Compaction, will do just fine.

toytag/Project2-Stream-Compaction

Cuda
0
0

First Bounce Caching
#

When anti-aliasing is not enabled, the first ray from the camera is always the same for every iteration. So we can cache the first ray bounce and reuse it for every iteration. However, this optimization is not particularly useful when more advanced visual features like anti-aliasing, depth-of-field, and motion blur are enabled.

Material Sorting
#

Additionally, we could sort the rays by material type to improve the performance. The idea is that rays with the same material type will have similar process time so that we can reduce warp divergence. However, this optimization later proved to be not very useful and even harmful to the performance. The reason is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.

Bounding Volume Hierarchy
#

Bounding volume hierarchy (BVH) is a tree structure on top of the scene geometry to accelerate ray tracing. The idea is to group the scene geometry into a hierarchy of bounding volumes, and the ray tracer can quickly discard the entire group of primitives if the ray does not intersect with the bounding volume.

Image from PBRT 4.3 is a good illustration of BVH true. The BVH is built using the equal count partition method, which tries to split the primitives into two equal count groups. The BVH is built on the CPU in a linear buffer (heap like structure) and then copied to the GPU for ray tracing. BVH could be potentially optimized by utilizing SAH (Surface Area Heuristic) and building the BVH directly on the GPU.

Performance Analysis
#

Let’s take a look at the performance of the path tracer with different features enabled. Stream compaction plays a important role in the correctness of the algorithm in addition to its performance benefits. So stream compaction will be enabled in all tests and we will use path tracer with only stream compaction method enabled as the baseline.

Cornell-Metal and Cornell-Glass are simple scenes with metal or glass material balls in side the cornell box. Those spheres is not in the mesh system therefore BVH has no effect on the performance.

More complex scenes like Mario-Metal and Mario-Glass are the same as the previous two scenes except that the spheres are replaced with Mario mesh. The mesh system is able to load .obj files or .gltf files (partially). The number of triangles in the Mario mesh is about 5,000.

Lastly the Teapot-Complex scene consists of 5 teapots with different materials. The teapots are loaded from .obj file. The teapots are uniformly placed in the scene and the total number of triangles is about 50,000.

Observations
#

  • Material Sorting is not a good optimization. It is slowing down the path tracer. The reason, as hinted before, is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.
  • First Bounce Caching has limited performance improvement. The reason is that the first bounce is only a small part of the entire ray tracing process. Besides, when enabling more advanced visual features like anti-aliasing, depth-of-field, and motion blur, the first bounce is no longer the same for every iteration.
  • BVH is mind-blowing. It is able to improve the performance by a factor of 15x and reducing the rendering time by 90%! BVH enables quick discard of groups of primitives if the ray does not intersect with the bounding volume. This is especially useful when the scene is complex and the number of primitives is large. Although BVH traversal requires additional global memory access, the performance gain is still significant.

Possible Improvements
#

  • Subsurface scattering
  • Wavelength dependent refraction
  • Volumetric rendering
  • Texture and normal map
  • Motion blur
  • Environment map
  • Better random number generator
  • BVH with SAH and BVH on GPU
  • Occupancy optimization
  • Shared memory optimization

References
#

  1. Physically Based Rendering: From Theory To Implementation
  2. glTF Specification and Example BxDF Implementation
  3. GPU-based Importance Sampling
  4. Axis-Aligned Bounding Box (AABB) intersection algorithm
  5. Iterative BVH Traversal with near \(O(1)\) Memory
  6. Open Image Denoise
toytag/CUDA-Path-Tracer

C++
0
0
Path Tracing - This article is part of a series.
Part 1: This Article

Related

Boids Flocking Simulation
·1028 words·5 mins
cuda parallel computing