Below is a brief summary of our rendering framework and several key demonstrations of the power of this framework.  To read the original paper, click on the title above.  For a more detailed summary, including more results not presented in the paper as well as matlab and python (theano) implementations of the NLP distance model and code to reproduce the results presented here, please click here.


Abstract

We develop a framework for rendering photographic images by directly optimizing their perceptual similarity to the original visual scene. Specifically, over the set of all images that can be rendered on a given display, we minimize the normalized Laplacian pyramid distance (NLPD), a measure of perceptual dissimilarity that is derived from a simple model of the early stages of the human visual system. When rendering images acquired with a higher dynamic range than that of the display, we find that the optimization boosts the contrast of low-contrast features without introducing significant artifacts, yielding results of comparable visual quality to current state-of- the-art methods, but without manual intervention or parameter adjustment. We also demonstrate the effectiveness of the framework for a variety of other display constraints, including limitations on minimum luminance (black point), mean luminance (as a proxy for energy consumption), and quantized luminance levels (halftoning). We show that the method may generally be used to enhance details and contrast, and, in particular, can be used on images degraded by optical scattering (e.g., fog).


Perceptually optimized rendering framework. When we view a real-world scene, the luminances, specified by a vector S, give rise to an internal perceptual representation f(S) . While luminances in the real world can range from complete darkness (0 cd…

Perceptually optimized rendering framework. When we view a real-world scene, the luminances, specified by a vector S, give rise to an internal perceptual representation f(S) . While luminances in the real world can range from complete darkness (0 cd∕m2) to extremely bright (e.g., midday sun, roughly 109 cd∕m2), a typical display can generate a relatively narrow range of roughly 5 to 300 cd∕m2. The optimization goal is to adjust the luminances I generated by the display to minimize the difference between the perceptual representations f(S) and f(I) while remaining within the set of images that can be generated by the display.

Rendering Framework

Here, we formulate a general solution for perceptually accurate rendering, directly optimizing the rendered image to minimize perceptual differences with the light intensities of the original scene, subject to all constraints imposed by the display. This constrained optimization formulation relies on four ingredients: knowledge of the original scene luminances (or calibration information that allows calculation of those luminances), a measure of the perceptual similarity between images, knowledge of the display constraints, and a method for optimizing the image to be rendered. We use a model of perceptual similarity loosely based on the transformations of the early stages of the human visual system [specifically, the retina and lateral geniculate nucleus (LGN)], that has previously been fit to a database of human psychophysical judgments. Because this model is continuous and differentiable, our method can be efficiently solved by first-order constrained optimization techniques. We show that the solution is well defined and general, and therefore represents a framework for solving a wide class of rendering problems.

Computing Perceptual Distance

Normalized Laplacian Pyramid Perceptual Transform. The scene luminances S (in cd∕m2) are first transformed using a power function (top left). The transformed luminance image is then decomposed into frequency channels using the recursive implementati…

Normalized Laplacian Pyramid Perceptual Transform. The scene luminances S (in cd∕m2) are first transformed using a power function (top left). The transformed luminance image is then decomposed into frequency channels using the recursive implementation of the Laplacian pyramid. Each channel z k is then divided by a weighted sum of local amplitudes (computed with lowpass filter P) plus a constant σ. The final lowpass channel x N k is also normalized, but with distinct parameters (top right). Symbols ↑ and ↓ indicate upsampling and downsampling by a factor of 2, respectively.

Construction of the NLP Distance Measure. Two images are transformed by f ·to a perceptual representation, yielding two NLPs (see Figure above). We compute the α-norm over the vector of differences for each frequency channel, and then combine these …

Construction of the NLP Distance Measure. Two images are transformed by f ·to a perceptual representation, yielding two NLPs (see Figure above). We compute the α-norm over the vector of differences for each frequency channel, and then combine these over channels using a β-norm. For all rendering results, we use α 2.0 and β 0.6, which are optimized to fit the human perceptual ratings on distorted images. 

Varying Image Acquisition Conditions

We performed a set of experiments to test the capabilities of our optimization framework over different image acquisition conditions. We begin with calibrated images, for which we know the exact luminance values (in cd∕m2) of the original scene.

Rendering of a calibrated HDR image on a display with a limited luminance range. The scene luminances for this image spanned the range from Smin 0.78 cd∕m2 to Smax 16;200 cd∕m2, whereas the display luminances are assumed to lie between 5 cd∕m2 and 3…

Rendering of a calibrated HDR image on a display with a limited luminance range. The scene luminances for this image spanned the range from Smin 0.78 cd∕m2 to Smax 16;200 cd∕m2, whereas the display luminances are assumed to lie between 5 cd∕m2 and 300 cd∕m2. Left: the image rendered by linear rescaling of luminance values into the display range. Center: the image rendered using a state-of-the-art tone mapping algorithm [23]. Right: the image rendered using the proposed method of minimizing the NLPD metric subject to the display constraints.

We followed this with uncalibrated HDR images, where we need to make an educated guess about the luminance range in the original scene.

3figstop.png
Rendering of an uncalibrated HDR image on a display with a limited luminance range. Linear mapping of luminances leads to loss of detail (top left: rescaling of luminances to the display range, assuming Smax 300 cd∕m2; top center: rescaling of lumin…

Rendering of an uncalibrated HDR image on a display with a limited luminance range. Linear mapping of luminances leads to loss of detail (top left: rescaling of luminances to the display range, assuming Smax 300 cd∕m2; top center: rescaling of luminances, assuming a more realistic value of Smax 106 cd∕m2 ). Top right: the image rendered using [23]. Bottom: the image optimized for NLPD, with different assumed maximum luminance values (Bottom left: SMAX=10^5, Bottom Center: SMAX= 10^6, Bottom Right: SMAX =10^7.

Detail Enhancement and Haze Removal

We showed in the preceding sections that using knowledge about the image acquisition process helps greatly in automatically rendering images, given the display constraints. In some cases, however, detail visibility in the scene might be unsatisfactory. Intuitively, photographers know that the amount of detail visible in a scene depends on the amount of available light. If the image has already been acquired, it is of course not possible to alter the light sources. However, since the scene luminances scale linearly with the intensity of the light sources, our method allows us to simulate increased intensity post hoc, by linearly re-scaling the luminances of the scene S.

Surprisingly, this same method of detail enhancement can also be used for the problem of haze removal. In a hazy scene, the local contrast has effectively been reduced (roughly speaking, by adding a constant level of scattered light) which makes detail more difficult to discern.

Example of haze removal. Left: the original image. Right: the image processed by optimizing NLPD, with Smin 5 and Smax 104.

Example of haze removal. Left: the original image. Right: the image processed by optimizing NLPD, with Smin 5 and Smax 104.

For more results and applications, including rendering with limited power consumption or a discrete set of grey levels, please click through to the paper.