Skip to main content
  1. Posts/

Path Tracing Denoiser

·689 words·4 mins
cuda parallel computing physically based rendering ray tracing
Zhenzhong "Anthony" Tang
Author
Zhenzhong “Anthony” Tang
Student, Researcher, and Developer
Table of Contents
Path Tracing - This article is part of a series.
Part 2: This Article

Overview
#

A path tracing denoiser that uses geometry buffers (G-buffers) to guide a smoothing filter.

CornellStanford Bunny

The denoise filter is based on the paper “Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering,” by Dammertz, Sewtz, Hanika, and Lensch.

rt), n) and x) are the input buffers(Ray Tracing, Normal, World Space Position). 1), 2)show 2 levels of edge-avoiding À-Trous filter. f)is the final output image after tone mapping. d)shows the optional diffuse buffer. Image from paper.

Filter
#

The filter is based on B3-spline interpolation \((1/16, 1/4, 3/8, 1/4, 1/16)\) and extended to a \(5\times5\) kernel. The kernel is applied in a-trous style, iteratively increasing the step width by a factor of 2.

However the filter itself is not edge-aware, meaning that it will smooth across edges and act essentially as a Gaussian blur.

80x80 A-Trous Filter80x80 Gaussian Filter

G-buffer
#

The G-buffer contains the following information:

  • Normal
  • Position
NormalPosition

Results
#

Combined with G-Buffer, the filter becomes edge-aware, meaning that it will not smooth across edges. The result is a much cleaner image.

Original80x80 FilterReference (1000 iterations)

Default Cornell scene with 10 samples per pixel, 800x800 resolution, 8 light bounces per sample.

Original80x80 FilterReference (1000 iterations)

Slightly more complex Stanford Bunny scene with 10 samples per pixel, 800x800 resolution, 8 light bounces per sample.

Performance Analysis
#

We have implemented the denoiser in CUDA, and applied some memory footprint optimizations to our G-buffer.

  • We adopted oct-encoding for normals, which reduces the vec3 normal to vec2.
  • Instead of storing the full vec3 position, we only store a float z-depth and use the pixel index to reconstruct the world space position.

The following graph compares the influence of filter size and image resolution on our denoising filters. It is tested on the stanford bunny scene with 10 samples per pixel, 800x800 resolution, 8 light bounces per sample.

From the comparison above, we can see that as the filter size doubles, the average runtime increases by approximately the runtime of one iteration of the filter. This is expected, as the a-trous filter is applied iteratively, and the runtime of each iteration is approximately the same. We can see the trend lines are also linear.

Our filter is heavily memory bound, meaning that by limiting the memory footprint of G-buffer, there are significant reduction in global memory access, and resulting in huge performance gain.

The next graph compares the influence of image resolution on our denoising filters. It is also tested on the stanford bunny scene with 10 samples per pixel, 80x80 filter size, 8 light bounces per sample.

There runtime is proportional to the number of pixels in the image. Since the resolution is increasing exponentially, the trend line is also exponential. With more heavy global memory access, G-buffer optimization is more effective.

We also have detail results for the filters with different sizes in different scenes. Shown below.

Original5x5 Filter10x10 Filter
20x20 Filter40x40 Filter80x80 Filter
Original5x5 Filter10x10 Filter
20x20 Filter40x40 Filter80x80 Filter

All scenes are rendered with 10 samples per pixel, 800x800 resolution, 8 light bounces per sample.

Across different scenes, cornell and bunny are two of the most representative scenes. The cornell scene is simple and mostly diffusive materials. Our denoising filter performed very well in this case. On the other hand, the bunny scene is more complex, with more specular and refractive materials. We can clearly see that the filter is starting to blur some details of the bunny and the reflection and refraction is not nearly as clear as the reference image. However, the filter performed consistently in different lighting conditions, in most cases there are only minor artifacts/over-exposure.

This is generally a good results, though it might not look as good in merely 10 iterations. The denoising filter significantly reduces the number of iterations needed to achieve a decent result. Typically 100 iterations with denoising is comparable to or even better than 1000 iterations without denoising.

References
#

  1. “Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering,” by Dammertz, Sewtz, Hanika, and Lensch
  2. Oct-encoding Normals
toytag/CUDA-Denoiser

null
0
0
Path Tracing - This article is part of a series.
Part 2: This Article

Related

Path Tracer: Physically Based Rendering
·1593 words·8 mins
cuda parallel computing physically based rendering ray tracing
Boids Flocking Simulation
·1028 words·5 mins
cuda parallel computing
Search Engine
·2112 words·10 mins
distributed system microservices apache storm apache spark docker swarm kubernetes