Below is a brief summary of the model architecture and the key findings.  For a more detailed summary, including more data and a downloadable software implementation, please visit here.  To read the original paper, click on the title above.


Abstract

We present an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local contrast gain control. Images are first decomposed using a Laplacian pyramid, which subtracts a local estimate of the mean luminance at multiple scales. Each pyramid coefficient is then divided by a local estimate of amplitude (weighted sum of absolute values). The quality of the distorted image, relative to its undistorted original, is the root mean squared error in this "normalized Laplacian" domain. The weights are optimized to estimate local amplitude using (undistorted) images from a separated database. We show that both luminance and contrast stages lead to significant reductions in redundancy, relative to the original image pixels. We also show that the resulting quality metric provides a better account of human perceptual judgements than either MS-SSIM or a recently-published gain-control metric based on oriented filters. 


Normalized Laplacian pyramid model diagram, shown for a single scale (k). The input image at scale k, x(k) (k = 1 corresponds to the original image), is modified by subtracting the local mean (eq. 2). This is accomplished using the standard Laplacia…

Normalized Laplacian pyramid model diagram, shown for a single scale (k). The input image at scale k, x(k) (k = 1 corresponds to the original image), is modified by subtracting the local mean (eq. 2). This is accomplished using the standard Laplacian pyramid construction: convolve with lowpass filter L(w), downsample by a factor of two in each dimension, upsample, convolve again with L(w), and subtract from the input image x(k). This intermediate image z(k) is then normalized by an estimate of local amplitude, obtained by computing the absolute value, convolving with scale-specific filter P(k)(w), and adding the scale-specific constant s (k) (eq. 3)). As in the standard Laplacian Pyramid, the blurred and downsampled image x(k+1) is the input image for scale (k + 1).

Representation of an example image. X is the original image (left). Z is the decomposition of the image using the Laplacian pyramid (three scales shown), each image corresponding to a different scale. Note that the Laplacian pyramid includes downsam…

Representation of an example image. X is the original image (left). Z is the decomposition of the image using the Laplacian pyramid (three scales shown), each image corresponding to a different scale. Note that the Laplacian pyramid includes downsampling in each scale. The examples shown here have been upsampled for visualization purposes. y are the corresponding locally contrast-normalized images.

Local mutual information between values and their spatial neighbors within an 11 x 11 local region.  Shown for three representations (image pixels, Laplacian pyramid sub-band, normalized Laplacian pyramid sub-band). Brightness is proportional t…

Local mutual information between values and their spatial neighbors within an 11 x 11 local region.  Shown for three representations (image pixels, Laplacian pyramid sub-band, normalized Laplacian pyramid sub-band). Brightness is proportional to the mutual information between a central coefficient and the neighbor at that relative location. Values are estimated from one million image patches. The average mutual information over the whole neighborhood is given above each panel.

Comparison of quality metrics to human perceptual data. Each plot shows the inverse of the mean opinion score of human observers (DMOS) as a function of prediction of a quality metric, for 1700 images corrupted by different types and magnitudes of d…

Comparison of quality metrics to human perceptual data. Each plot shows the inverse of the mean opinion score of human observers (DMOS) as a function of prediction of a quality metric, for 1700 images corrupted by different types and magnitudes of distortion.  Performance of the metric is summarized with three numbers (provided above plot): the Pearson correlation before fitting a logistic function (r1), and the Pearson correlation (r2) and the prediction error (RMSE) after fitting a logistic function (black line).