Is the richness of our visual world an illusion? - Trans-saccadic memory for complex scenes
School of Psychology,
University of the West of England, St. Matthias College, Bristol BS16 2JP,
Perceptual Systems Research
Centre, Department of Psychology, University of Bristol, 8 Woodland Rd, Bristol
BS8 1TN, England.
Our construction of a stable
visual world, despite the presence of saccades, is discussed. A computer
graphics method was used to explore trans-saccadic memory for complex images.
Images of real-life scenes were presented under four conditions: they stayed
still or moved in an unpredictable direction (forcing an eye movement), while
simultaneously changing or staying the same. Changes were the appearance,
disappearance or rotation of an object in the scene. Subjects detected the
changes easily when the image did not move but when it moved their performance
fell to chance. A grey-out period was introduced to mimic that which occurs
during a saccade. This also reduced performance but not to chance levels.
These results reveal the poverty
of trans-saccadic memory for real-life complex scenes. They are discussed with
respect to Dennett's view that much less information is available in vision than
our subjective impression leads us to believe. Our stable visual world may be
constructed out of a brief retinal image and a very sketchy, higher-level
representation along with a pop-out mechanism to redirect attention. The
richness of our visual world is, to this extent, an illusion.
Not only are we blind to many
aspects of our personal visual world, we are also surprisingly unaware of this
fact. Under normal circumstances, for example, we do not notice that we blink;
that we have large retinal blind spots; that our instantaneous spatial,
chromatic, and temporal resolution varies dramatically with eccentricity; and
that our vision is interrupted several times a second by rapid eye movements
(saccades). Indeed, despite all of these considerable distractions, we believe
that we see a complete, dynamic picture of a stable, uniformly detailed and
It is even tempting to suppose
that there is a "me" in there and a place from where "I"
observe. Dennett (1991) calls this the "Cartesian Theatre" and argues
that this powerful illusion is propped up by a "nearly impenetrable barrier
of intuitions" (p 322). One of these intuitions is that a complete visual
picture of the observable world is present in the mind at any time - that
consciousness "contains" a rich model of the visible world. This,
claims Dennett, is simply not true. Only in the fovea is the information
detailed and rich, and every time the eyes move this detailed information is
The illusion that we are
simultaneously aware of every aspect of the view in front of us could arise
because almost every question asked of our visual system is seamlessly answered.
Rather than the answers being retained in visual memory, some argue that the
outside world itself acts as a visual memory - rapidly accessible by looking
again (Minsky, 1985). A sense of presence might build up over time as different
glances answer different questions. A time lag would thus be predicted between
the onset of the visual input at a given fixation and conscious awareness based
on it; the time taken to integrate this new input with at least one of the
existing higher-level representations of the scene. This might relate to delays
of up to half a second in sensory experience reported by Libet (1982).
If this is true it has some very
odd implications for the view out of my window. Unless I am looking straight at
that tree, it is not represented in any detail in my visual system, the
representation having been washed away at the last saccade. It only seems to be
available to consciousness because I can look again.
Dennett gives the example of
walking into a room covered with identical portraits of Marilyn Monroe. You can
immediately see that there are hundreds of portraits even though you can only
have looked at one or two of them. How can you "see" them all at once.
He answers that you don't: there is no representation of each one in the visual
system. Rather they are represented as being present; as perhaps "more of
the same". The details are not needed, are not represented and are not
"present in consciousness" - whatever it may feel like. Thus he
reveals an important distinction - between the presence of representation and
the representation of presence. The fact that we confuse these two contributes
to the illusion.
One may object that if one
picture had a slight difference, a moustache or scribbled on glasses, you would
notice. In fact, you may notice but because of separate mechanisms (discussed
below) that operate within each fixation.
Is this true? There are three
questions here: (1) How does the visual system detect changes in the
environment? (2) How much information is retained at each saccade? (3) How much
missing detail is "filled in" by the cognitive system? Although
"filling-in" is the topic of much current debate (e.g. Ramachandran,
1993; Ramachandran and Gregory, 1991; Dennett, 1993) it is the first two that
are most relevant here.
Detection of Change
If the "looking again"
strategy, described above, is to succeed, it may rely on the external world
remaining stable across the time required to change fixation (to "look
again"). Since this time is relatively short the visual system may be able
to get away with it. During a single fixation the visual system is, in fact,
highly sensitive to many kinds of spatial, temporal and chromatic changes in the
visual input. This high sensitivity to change is supported by several special
mechanisms: e.g. retinal adaptation (Ditchburn, 1973), "pop-out"
systems (Treisman & Gelade, 1980) and motion detectors (Reichardt, 1961). Is
it this special sensitivity to change during each fixation that gives us an
inflated impression of our awareness of change between fixations?
There is an odd implication of
this view - that changes occurring between fixations should not be easily
detectable. In other words, little information need be retained from one saccade
to the next.
Just what information is
retained after each saccade? There must presumably be some kind of
trans-saccadic integration otherwise no model of the world could be constructed
from visual information. The question is what kind. At one extreme might be a
very low level process in which successive retinal representations are fused
into a continuous and detailed representation. At the other extreme might be a
very high level or abstract integration.
Towards the former end of the
spectrum is the spatiotopic fusion hypothesis (e.g. Breitmeyer, 1984). This
suggests that eye movements are compensated for and successive pictures fused
according to environmental coordinates rather than retinal ones. Towards the
other extreme Pollatsek, Rayner & Henderson (1990) suggest something like
location independent object detectors. Dennett's view is also far out on this
extreme, implying that most information is lost on each saccade and the brain
does not bother to fill in the missing details. Only a high level abstract
The evidence clearly favours the
latter end of the spectrum. Stimulus displacements during a saccade are hard to
detect (Bridgeman & Mayer, 1983), and there is little effect on the reading
of words if the case of individual letters is changed (Pollatsek, Rayner &
Collins, 1984). Pollatsek et al (1984) also showed that subjects identified a
line drawing of an object faster when they had had an extrafoveal preview -
implying integration across the saccade. However, moving the target did not
abolish the preview benefit, suggesting that integration is not location
dependent (Pollatsek, Rayner and Henderson, 1990). Instead, it seems to rely on
internal spatial relationships within the stimulus, as Irwin (1991) demonstrated
using simple dot patterns - further evidence against spatiotopic fusion.
What sort of memory is
responsible? The very short lived iconic store (Sperling, 1960) is an unlikely
candidate because it is tied to retinal, not environmental, coordinates. More
plausible is the short term visual store (Phillips, 1974) which is not disrupted
by a bright light or pattern mask, is not tied to anatomical coordinates and is
affected by pattern complexity. Indeed, Irwin (1991) recently concluded that
transsaccadic memory is long lasting, undetailed, of limited capacity, and not
tied to spatial position. There need be no separate memory for this and it is
probably identical to visual short-term memory.
O'Regan and Levy-Schoen (1983)
suggested that information is only integrated across saccades if it is
semantically encoded (eg "in front of the red table"). However, more
recent work by Hayhoe, Lachter, and Feldman (1991) argues that the memory
representation is likely to be something intermediate between the purely
symbolic and purely semantic extremes. Hayhoe et al suggest that the most likely
candidate is a spatially ordered visual buffer (originally proposed by Feldman
(1985)). This is a representation similar to a map, in which a given visual
location has an associated set of features, or parameters. It precedes object
recognition, but is precise enough to support geometrical judgements such as the
angle-categorisation task used by Hayhoe et al.
However, the experiments
described above used synthetic, simple stimuli. It is unclear how the possible
memory representations deal with the real-world images (or, indeed what kind of
capacity they have and how this relates to objects in the scene). In order to
address this issue it is necessary to use complex images. Dennett (1991, p 361;
Grimes pers. comm.) has recently taken part in experiments using an eye-tracker
in which the stimulus is changed during saccades. Subjects can apparently read
text without noticing anything odd while words or letters are changed in this
way. Relatively large changes in complex images, such as the appearance or
disappearance of people or objects, also go unnoticed when they occur during a
saccade. In a lecture demonstration Dennett showed pairs of Grimes' pictures one
after the other. The changes were so obvious that most people laughed unless, by
chance, the change happened during a saccade, when it could not be seen. The
effect is dramatic and counter-intuitive. It implies, as Dennett suggested, that
very little information is retained after a saccade, even less than suggested by
Irwin, Zacks and Brown (1990).
Conceivably the effect could be
due to saccadic suppression, a mechanism supposed to suppress processing during
saccades. However, it has long been known that saccadic suppression, if it
exists, is not total (e.g. Dodge, 1900, see Carpenter, 1988 for a review). For
example, Brooks, Yates & Coleman (1980) report that a dot stimulus moving
relative to saccadic motion can be seen, suggesting there is no suppression. In
any case is there any need to suppress anything? If low level detailed
information is never stored there may not be. During saccades movement detectors
or pop-out mechanisms will overload and so convey no information. Dennett
concludes that the brain treats all this with benign neglect.
We have developed a novel method
to test Dennett's unexpected claims. The idea is simple and no eye tracking
system is required. Any change made to an image is synchronised with a rapid
displacement of the entire image in a random direction. Subjects must move their
eyes to see any change and this can be compared to a no-movement condition. This
provides a simple technique to explore what information is retained as we move
our eyes about the world.
In all experiments, images were
presented on the screen of an IBM RS/6000 workstation. The visual display unit
was an IBM 6019 monitor, the screen
of which was 36 cm wide and 30 cm high. When subjects sat at a comfortable
viewing distance (approximately 50 cm), the screen subtended approximately 41 x
34 degrees and the image approximately 10 degrees. The resolution of the screen
was 1280 x 1024 pixels.
The images were obtained by
sampling a normal scene (typically of our laboratory) with a monochrome video
camera. Images were in pairs (as shown in Fig 1), for example one showed a
breakfast scene with a full glass of milk in one picture and nearly empty in
another; a second pair showed a street scene with and without a person; a third
showed a person's head with either one or two earrings. An example is given in
The images were digitised and
held in the computer's memory as 256 x 256, 8-bit (256 grey-level) images. Each
image was de-blurred by applying a moderate amount of high-pass filtering; each
image appeared as a normal, sharp, monochrome picture. The subtense of each
image was about 8 degrees square. Mean luminance was approximately 30 cd m-2
(Minolta Spot Chroma Meter). Each image could be positioned anywhere on the
larger screen. The background colour of the screen was set to mid-grey.
A new image could be displayed
in any location, and the old one extinguished, within a frame blanking interval
(approx 1 ms). All images used in an experimental run were stored in the frame
buffer and could therefore be displayed virtually instantaneously. This is a
standard computer-graphic technique (Troscianko and Low 1985; Harris, Makepeace,
and Troscianko 1987; Brelstaff and Wilson 1994).
We carried out a pilot
experiment and two experiments in the main study. In the pilot experiment, a
pair of images was prepared. The first image was presented in the centre of the
screen (for approx 5 sec). Next, this image was extinguished and the second
image was displayed in an unpredictable location on the screen after the frame
blanking interval. This shift in image location elicited at least one saccade,
since subjects were instructed to move their gaze to the new image location as
the image "jumped". In this preliminary experiment, five out of six
naive subjects failed to spot the mismatch between the number of chairs in the
two images; to them, the second image appeared identical to the first.
This pilot experiment suggested
that there was an interesting inability to register changes in the visual scene
across a saccadic eye-movement, and the two main experiments were designed to
address this question in greater detail.
Experiment 1, we compared the cases of where the:
(i) Image changes (or does not
change) and moves (as in pilot experiment);
(ii) Image changes (or does not
change) and stays in same place.
Thus, Experiment 1 was similar
to the pilot experiment except that larger sets of images and subjects were
used. We predicted that image changes would be easy to see in the absence of
image displacement (a saccade). However, if we obtained results of this nature,
it is still unclear whether the impairment in the displaced condition arises (a)
as a result of an eye-movement being made, or (b) whether any interruption of
vision may give a similar result. Since saccades are often thought of as leading
to "grey-out" during the timecourse of the saccade, we wanted to test
a condition in which we introduced "grey-out" externally (by
presenting a blank grey interstimulus mask). This mid-grey ISI field was
presented for 250 ms. It was therefore hoped that Experiment 2 would clarify
whether an eye-movement was necessary to elicit the experimental effect of poor
In Experiment 2, the following
1. Image changes and moves (as
in Exp 1);
2. Image changes but stays in
the same place; a mid-grey interstimulus interval (ISI) separates the two images
The subject sat in a
darkened room at a comfortable distance from the screen. In both experiments,
the first image appeared at the centre of the screen for 2 sec, then a priming
beep sounded. The second image was then presented in either a different location
or the same location, depending on the condition. Different locations were
always a fixed angular distance away (8 deg) but in a random direction.
The subject's task was to press
one of two mouse buttons, reporting whether the second image was the same as, or
different from, the first one. The same number of "same" and
"different" image pairs were used, with each stimulus pair being
randomly selected. Each subject saw a total of 30 pairs of images in Experiment
1 (15 image pairs with a change in one item, plus 15 "same" image
pairs). In Experiment 2, each subject saw 24 pairs of images. All orders were
In pilot experiments, we had
found some learning effects. To avoid these, no subject saw any image pair more
than once. Subjects were asked not to communicate about the experiment with
Each subject was given a
practice trial before each experiment, with a stimulus different from that used
in the experiment.
Five naive subjects were used
Experiment 1, and twelve in Experiment 2. They had normal or corrected-to-normal
Table 1 shows the results of
1. Mean results of Experiment 1.
Performance was better in the
no-saccade condition than in the saccade condition; the number of correct
responses was significantly different between the conditions (t=9.13, df=4,
Table 2 shows the results of
2. Mean results of Experiment 2.
Statistical analysis of these
results showed that:
The number of correct responses in the saccade condition is at chance (t=1.48,
The number of correct responses in the grey-out condition is significantly above
chance (t=8.16, p<0.01, 1-tailed).
The number of correct responses differs significantly between the two conditions
(t=2.79, p<0.02, 2-tailed).
These results show that when
subjects had to move their eyes to detect a change between two complex images,
they could not do so reliably. Indeed, for changes that could easily be detected
when the image did not move, their performance was close to chance when it did
move (forcing a saccade). A simple grey-out and delay also reduced performance
but not to chance levels. Taken together these results show the fragility of
visual memory for a complex scene. The ability to detect even large changes is
partly disrupted by a simple delay and more or less destroyed by a saccade.
This suggests that Dennett may
be right. Although we appear to have available "in consciousness" a
complete detailed and stable representation of the world this may in fact be
constructed out of only two things (a) a transient retinal image which is
detailed only in the fovea and lasts only the time between saccades, and (b) a
higher-level sketchy model of the whole scene that contains too little
information for even relatively large changes to be noticed. The suggestion by
Hayhoe et al (1991) of an intermediate-level store at a level before object
recognition is broadly in keeping with our data; but the implication of our
findings is that such a store has severely limited capacity to encode parts of
the scene that are not actively attended to. The impression that we can easily
see anything that changes is provided by the pop-out mechanisms that redirect
attention. If this is so it shows that our intuitions about our own visual
function are far from useful in understanding the construction of our convincing
and stable visual world.
The research facilities used in this study were
provided by a grant from the Defence Research Agency (grant no
(1984) Visual masking: an integrative
approach. New York; OUP.
GJ, Wilson JB (1994) Generating colour and texture verniers. Int.
J. Psychophysiol. 16, 199-208
Mayer,M. (1983) Failure to integrate visual information from successive
fixations. Bulletin of Psychonomic Soc.,
Yates,J.T., Coleman,R.D. (1980) Perception of images moving at saccadic
velocities during saccades and during fixation. Experimental Brain Research, 40,
(1988) Movements of the Eyes. London;
(1991) Consciousness Explained.
Boston; Little, Brown & Co.
(1993) Reply to my critics. In. Dennett
and his Critics: Demystifying Mind. Ed. Dahlbohm,B. Oxford; Blackwell.
(1973) Eye Movements and Visual Perception,
Oxford; Clarendon Press.
(1900) Visual perception during eye movement. Psychological Review, 7,
(1985) Four frames suffice: A provisional model of vision and space. Behavioral
and Brain Sciences 8, 265-289
Makepeace,A.P.W., Troscianko,T. (1987) Cathode ray tube displays in
psychophysiological research. J.
Psychophysiol 4, 413-429
Lachter,J., Feldman,J. (1991) Integration of form across saccadic eye movements,
(1991) Information integration across saccadic eye movements, Cognitive
Psychology, 23, 420-456
Zacks,J.L., Brown,J.S. (1990) Visual memory and the perception of a stable
visual environment, Perception and
Psychophysics, 47, 35-46
(1982) Brain stimulation in the study of neuronal functions for conscious
sensory experiences, Human Neurobiology,
(1985) The Society of Mind, New York;
Levy-Schoen,A. (1983) Integrating visual information from successive fixations:
Does trans-saccadic fusion exist? Vision
Research, 23, 765-768
(1974) On the distinction between sensory storage and short-term visual memory, Perception
and Psychophysics, 16, 283-290
Rayner,K., Collins,W.E. (1984) Integrating pictorial information across eye
movements, Journal of Experimental
Psychology: General, 113, 426-442
Rayner,K., Henderson,J.M. (1990) Role of spatial location in integration of
pictorial information across saccades, Journal
of Experimental Psychology: Human Perception and Performance, 16,
(1993) Filling in gaps in perception: Part 2., scotomas and phantom limbs. Current
Directions in Psychological Science, 2,
Gregory,R.L. (1991) Perceptual filling-in of artificially induced scotomas in
human vision, Nature, 350,
(1961) Autocorrelation, a principle for the evaluation of sensory information by
the central nervous system. In Sensory
Coding, Ed. Rosenblith,W.A., New York; Wiley.
(1960) The information available in brief visual presentations, Psychological
Gelade,G. (1980) A feature-integration theory of perception, Cognitive
Psychology, 12, 97-136
Low,I. (1985) A technique for presenting isoluminant stimuli using a
microcomputer. Spatial Vision, 1,