SPIE 3959-47


Spatial Summation of Face Information

Christopher W Tyler and Chien-Chung Chen




Do all parts of the face contribute equally to face detection or are some parts more detectable than others? The task was to detect the presence of normalized frontal-face images within in aperture windows of varying extent. We performed such a face summation study using two-alternative forced-choice psychophysics. The face stimuli were scaled to equal eye-to-chin distance, foveated on the bridge of the nose. The images were windowed by a fourth-power Gaussian envelope ranging from the center of the nose to the full face width. Eight faces (4 male and 4 female) were presented in randomized order, intermixed with 8 control stimuli consisting of phase-scrambled versions of the images with equal Fourier energy.

The integration functions for detection of random images did not deviate significantly from a log-log slope of –0.5, suggesting the operation of a set of ideal integrators with probability summation over all aperture sizes. The data for face detection showed that observers were not ideal integrators for the information in the face images, but integrated linearly up to some small size and failed to gain any improvement for information beyond some larger size. This performance suggested the operation of a specialized face template filter at detection threshold, differing in extent among the observers



The perceptual processing of case images has been the subject of scientific study since the time of Charles Darwin and Francis Galton in the middle of the 19th century. What has never been studied, however, is whether there is specialized processing for face information that can contribute to the detection of face images in comparison with other comparable images. One difficulty in this endeavor is to specify the other comparable images in a relevant fashion. For example, if one compared the detection of faces to random-dot noise, differences in detectability could be attributed to the differences in spatial frequency spectrum between the two types. The comparison stimuli will therefore be equated for the spectral characteristics of the face stimuli (and the retinal range within which they are presented.

A second property of faces is that they are bilaterally symmetric (to first approximation). This symmetry imposes a pairwise constraint on the points in the image. We therefore wished to provide a corresponding constraint in the noise stimulus. However, the bilateral symmetry may be an important component of the face detectability. We therefore imposed centric symmetry on the noise, where each point matches the point diametrically across the center of the pattern.

To evaluate the spatial processing properties at threshold, we varied the aperture in which the face stimuli were visible. For the noise stimuli, the match to a contrast energy prediction in previous studies implies that detection sensitivity should increase with the square root of the area of the aperture. The presence of specialized face-detection mechanisms would be revealed by a more rapid improvement in detectability with area for the face stimuli than for the noise stimuli. This increased rate of improvement would operate over the range where specialized detectors existed, asymptoting to the noise-alone sensitivity beyond that range. This is the hypothetical form of the data that would reveal any specialized processing for face information.



We used two types of images as stimuli (Fig 1). The face images comprised four male and four female faces. These face stimuli were scaled to the separation between the eyes and the mouth, processed by a high-pass filter and normalized to equalize the contrast energy in each. The control stimuli were noise patterns derived from the same power spectrum as the face images while the phase spectrum was generated from a uniformly distributed random number generator with a range from zero to 2 radians. This procedure is equivalent to generating random-dot images and filtering them to match the amplitude spectrum of each face.


Fig. 1. Sample stimuli. Left: high-pass filtered face image. Right: phase-randomized noise with matched spatial frequency spectrum.


Each image was then presented in a smooth-edged circular aperture whose width formed the independent variable of the experiment. The sperture was defined by a two-dimensional 4-th power Gaussian function exp(-(x4+y4)/4) where is the scale parameter that controls the size of the image. We used 6 values ranging from 0.2o to 1.2o in half octave steps.


Equipment & Procedure

The stimuli were presented on a CRT monitor controlled by a PowerMac 7100 host machine. The monitor input-output intensity function was measured with a LightMouse photometer (Tyler, 1997). A bit-stealing technique (Tyler et al., 1992) was used to compute the linear look-up table. The mean luminance of the monitor was 30 cd/m2. At the viewing distance of 114 cm, each pixel on the monitor corresponded to 1' visual angle.


Fig. 2. Depiction of the six sizes of the fourth-power Gaussian aperture in which the faces were shown.


We used a temporal two-alternative forced-choice paradigm and the Psi dynamic threshold seeking method (Kontsevich & Tyler, 1999) to measure the thresholds at 75% proportional correct level. The temporal waveform of stimulus interval was a square pulse with a duration of 300 ms and a 800 ms blank period between two temporal intervals. The reported thresholds were averages of four measurements.

Three observers participated in the experiment, all naive to the purpose of the study but experienced in psychophysical experiments. Observers EST and HAB had a corrected to normal and observer ASM had a normal visual acuity (20/20).



To set the scene, consider first the data for summation of randomized information (dashed curves, Fig. 3.). Sensitivity improves with aperture size up to the largest aperture. The improvements approximate a slope of –0.5 on double-log coordinates (thin straight line) for two of the three observers. This slope corresponds to the square-root power behavior of contrast energy model that bases detectability on the square of the absolute contrast summed throughout the image. (Note that if there were linear summation throughout the image, the slope would be –1). In this respect, the noise summation data tend to conform to the simple model of contrast energy summation that characterized the responses to a battery of simple stimuli studied by the Modelfest Group (Watson, 2000)

The experimental question we address is: do observers perform any differently when the information is structured in a face configuration? The answer seems to be yes, in general. The data for face detection showed approximately linear integration up to some aperture size that varies among observers and showed no further improvement for information beyond some larger size (Fig. 4). The form of the detection functions suggests the operation of a single-sized filter with an aperture of 1/3 to 2/3 of the face width, with no influence of contrast information outside this aperture (i.e., for larger diameters). In some cases, detection for face shows significantly greater sensitivity than for the noise stimuli. Observer EST, for example, has threshold significantly lower by as much as 5 dB (nearly a factor of 2) for face detection versus noise detection.

Fig. 3. Contrast threshold for detectability of noise presented within circular apertures of the width given on the abscissa of the graphs. The different colorings identify the three observers. The data do not deviate significantly from a log-log slope of –0.5 (oblique thin line), based on the error bars of 1 sem.

Fig. 4. Contrast threshold for detectability of faces presented within circular apertures of the width given on the abscissa of the graphs for each of three observers. The different symbols identify the three observers; oblique thin line: log-log slope of –0.5.



The data indicate that there is specialized processing of human face information at detection threshold. All observers show a strong summation of face information up to an intermediate size, asymptoting to negligible summation beyond that size. In two observers, detectability of faces at this optimal size is significantly better than detectability of noise with amplitude spectra identical to those of the faces. For these observers, the specific configuration of the facial features was particularly salient for detection performance. All three observers showed no further advantage for the information around the edge of the face – the hair, ears and jawline. Although these facial characteristics are well known to be significant for facial discrimination (Ullmann, 1996), they seem to play no role in threshold detectability, at least under our test conditions.

Detection of the noise, on the other hand, shows a reasonable adherence a contrast energy model (Watson, 2000) that assumes that the observers are able to sum the contrast information linearly over the entire aperture of the fovea (which is much larger than a typical receptive field). It further assumes that there is an accelerating transducer for contrast preceding the summation, with a power exponent of 2. There seems to be no obvious reason for such a nonlinear process in neuronal image processing, but it does provide an acceptable description of the data.

The alternative framework that is often presented as generating a log-log slope of –0.5 is the Ideal Observer model (Green & Swets, 1966; Banks et al., 1993). This model assumes that the detection mechanism has access to a summing field matching the extent of every stimulus aperture. While it may be a plausible model for experiments with blocked trials, where the observer knows in advance which aperture is to be presented, it is more difficult to implement an Ideal Observer strategy in experiments with randomized trials, such as the present case. The observer cannot know in advance which size of aperture is to be presented, and therefore cannot target the correct filter size. However, there is an alternative strategy that could be employed by means of probability summation (Tyler and Chen, submitted for publication). If the observer can implement, say, fourth-power probability summation over the set of filter sizes, the most stimulated filter will tend to win out over all other filters by virtue of its larger signal. Probability summation could thus act as a self-selecting mechanism for the optimal size for analysis when the data had no other structure.

One the other hand, there does not seem to be a similar range of filter sizes available in the case of the face information, where the functions are closer to the form expected for a single sized filter. Again, these conditions were intermixed with the noise conditions, so the observers would not have had the opportunity to switch strategy between the two stimulus types. It seems to follow that they had a central-face-template filter available, in addition to the filters available for the noise detection, that again was selected by probability summation from among the available range. While of no value in improving the noise detectability, the template filter improved face detection performance within its range of effectiveness. Beyond its range, performance reverts to the level for noise images.



Banks M.S., Geisler W., Bennett P.J. (1987) The physical limits of grating visibility. Vision Research 27, 1915-24.

Green D.M., Swets J.A. (1966) Signal Detection Theory and Psychophysics. (Wiley: New York).

Kontsevich LL, Tyler CW. (1999) Bayesian adaptive estimation of psychometric slope and threshold. Vision Research 39, 2729-2737.

Tyler CW (1997) The Morphonome image psychophysics software and a calibrator for Macintosh systems. Spatial Vision 10, 479-484.

Tyler C.W., Chan H., Liu L., McBride B., Kontsevich L.L. (1992) Bit stealing: how to get 1786 or more gray levels from an 8-bit color monitor. In Human Vision, Visual Processing & Digital Display III, Ed. B. Rogowitz, SPIE, Bellingham, WA 351-364.

Tyler C.W., Chen C.-C. Signal detection theory in the 2AFC paradigm: Attention, channel uncertainty and probability summation. Vision Research (submitted for publication).

Ullman S. (1996) High-Level Vision. MIT Press: Cambridge Mass.

Watson, A.B. (2000) Visual detection of spatial contrast patterns: Evaluation of five simple models. Optics Express 6, 12-33.