Abstract
An exciting development in the past few decades has been the growing realization that the seemingly distinct sensory systems, such as touch, vision, and hearing, are not strictly independent, as is often taught. Rather, these systems are intricately intertwined in the brain and in perception. Here, focusing particularly on human research from the past 25 years or so, we review work establishing the close perceptual and neural interactions between touch and vision in a number of domains, including object orientation, shape, texture, and motion, as well as the perceived ownership of the body and its parts. We then review some recent work examining perceptual and neural interactions between touch and hearing, with a focus on the domain of temporal frequency.
Interactions Between Vision and Touch
Although vision is a distance sense whereas touch is a contact sense, the sensory systems supporting vision and touch have much in common. Both these senses are used to assess object properties, such as orientation, shape, and surface texture; spatial relations between objects or parts of objects; and object motion. Thus, it should not be surprising that the perceptual characteristics of the visual and tactile systems resemble each other, or that the corresponding neural representations overlap. For ease of exposition, we have chosen to organize the following sections by domain, referring in each case to psychophysical 1 studies of perceptual interactions between vision and touch and then to their neural substrates.
Orientation discrimination
People perceive the orientation of objects in their environment visually, but also appreciate the orientation of small or graspable objects haptically (i.e., with the hands). Further, vision and touch interact in the perception of stimulus orientation, as exemplified in tactile modulation of a visual illusion called the tilt illusion: In vision, the perceived orientation of a central grating 2 can be affected by the orientation of a surrounding grating. When the central and surrounding gratings differ in orientation by a few degrees, the central grating is perceived as tilted further away from the orientation of the surround than it actually is. This “repulsive” effect is enhanced by a simultaneously presented tactile surround grating whose orientation is congruent to that of the visual surround (Pérez-Bellido, Pappal, & Yau, 2018). Another example comes from the study of binocular rivalry, in which presenting two stimuli independently to the left and right eyes results in corresponding percepts that switch unpredictably back and forth. For instance, when the two stimuli are gratings with orientations perpendicular to each other, the perceptual experience oscillates between the two gratings. When a tactile grating is then added in an orientation matching that of one of the visual gratings, this visual grating tends to become dominant over its competitor in the rivalry (Lunghi & Alais, 2015).
Such interactions suggest that the representation of object orientation is common to vision and touch. This idea fits with neuroimaging studies using positron emission tomography (PET), which have shown that the same part of visual cortex is active during visual (Sergent et al., 1992) and tactile (Sathian et al., 1997) discrimination of grating orientation. This visual cortical region appears to be the human counterpart of the monkey visual cortical area known as the sixth visual area, or V6 (Fig. 1a; for a review, see Sathian, 2016). Transient disruption of this visual cortical area by focal transcranial magnetic stimulation (TMS; application of brief magnetic pulses to the head) interferes with tactile discrimination of grating orientation (Zangaladze et al., 1999), which demonstrates the functional role of this visual area in touch. These neuroimaging and neurostimulation findings, together with the psychophysical studies summarized above, converge on the conclusion that object orientation is encoded in a neuronal pool accessible to both vision and touch. Collectively, these observations illustrate that so-called visual cortical areas subserve not only visual but also corresponding nonvisual tasks, a point we return to repeatedly in this review. Accordingly, one could argue that referring to these areas as “visual” is incorrect, but in the present review we use this term, not only for the sake of simplicity, but also in recognition of the fact that these areas were originally identified as stations along the pathways of visual processing.

Locations of the human brain regions referenced in the text (a–c) and a conceptual model of haptic shape perception (d). In all three brain images, the front (anterior) of the brain is on the left, and the back (posterior) is on the right (adapted from Sathian, 2016, Fig. 1). The images in (a) and (c) are sagittal slices (i.e., in planes oriented along the anterior-to-posterior axis of the head); the
Shape perception
Although an object’s shape is a defining visual property, shape is also often assessed haptically, especially during grasping. Recognition of unfamiliar objects is
Haptic shape is encoded not only in somatosensory cortical areas, the parts of the cerebral cortex classically associated with touch, but also in a region of visual cortex called the lateral occipital complex (LOC; Fig. 1b; for a review, see Sathian, 2016), which mediates visual shape perception. The similarity of spatial patterns of LOC activity in response to different shapes, whether encountered visually or haptically, correlates with the perceptual similarity of the same shapes (Masson et al., 2016). Even sound cues can activate the LOC during shape recognition in both sighted and blind people, which suggests that the LOC computes geometric representations of shape regardless of the input modality (for a review, see Sathian, 2016). Posterior parietal cortex, in and around the intraparietal sulcus (IPS; Fig. 1b), is also involved in the perception of shape through both vision and touch; however, its function appears to be to reconstruct representations of objects from their component parts. To compare the involvement of areas such as the LOC and IPS in haptic and visual shape perception, we undertook a series of functional MRI (fMRI) studies of activity and connectivity during multiple tasks (Lacey et al., 2014). On the basis of these studies, we proposed a conceptual model of haptic shape perception (Fig. 1d). An important feature of our model is its inclusion of the ability for mental imagery, which appears to be quite important for haptic shape perception. Indeed, mental imagery is also valuable for visual perception, especially under suboptimal viewing conditions, in which imagery is used to construct hypotheses against which incoming visual input is compared (Kosslyn, 1994, Chapter 5).
Visual imagery can be subdivided into
The commonalities between visual and haptic shape perception resonate with the question famously posed by Irish philosopher William Molyneux to his British colleague John Locke: Would restoration of sight to someone blind from birth allow visual recognition of objects previously experienced only through touch (Locke, 1706/1997)? It turns out that the empirical answer to this question is nuanced: Five congenitally blind individuals who underwent surgical treatment to restore their vision were unable to match a haptically presented sample object with the correct visually presented object within 48 hr after surgery. However, three of the five participants were tested days to weeks later and showed substantial improvement on the cross-modal test (Held et al., 2011). Thus, it appears that the answer to Molyneux’s question is negative immediately after sight restoration, but turns positive after a short period of time, presumably a reflection of the (rapid) effect of multimodal experience. Given the small sample size and the preliminary nature of this study, further work on this topic is desirable.
Texture perception
The texture of object surfaces is a property that is primarily sensed via touch, and touch is superior to vision in this domain (for a review, see Sathian, 2016). This is not surprising when one considers the multiple dimensions of texture, which include rough-smooth, hard-soft, and sticky-slippery (for a review, see Bensmaia, 2009)—judgments for which people tend to rely on touch. The rough-smooth dimension has been studied extensively from psychophysical and neurophysiological perspectives, and this work indicates that spatial patterns are particularly important (as they are in vision) for coarse tactile textures. Because perception of haptic texture depends on moving one’s fingers over surfaces, temporal cues (arising from the timing with which bumps on the surface are encountered) also contribute; temporal frequency (an important property of auditory as well as tactile stimuli) becomes increasingly important as tactile textures get finer (for a review, see Bensmaia, 2009). Tactile textures bias judgments of simultaneously encountered visual textures, but not vice versa (Guest & Spence, 2003), which is consistent with the dominance of touch in texture perception.
The parietal operculum (Fig. 1b) is a key somatosensory cortical locus where haptic texture is represented. Sun et al. (2016) trained a multivariate classifier 4 to distinguish visual presentations of glossy versus rough objects using spatial patterns of fMRI activity not only in visual cortical areas, but also in the parietal operculum (Sun et al., 2016). The classifier’s success implies that the visual stimuli evoked corresponding haptic representations. Reciprocally, haptic assessment of texture activates texture-selective visual cortical areas, especially in medial occipital cortex (MOC; Fig. 1c; for a review, see Sathian, 2016). Remarkably, the texture-selective parietal operculum is also active when people listen to sentences containing textural metaphors, such as “she had a rough day” (Lacey et al., 2012), which suggests that metaphorical roughness is understood by reference to its physical counterpart. This underscores the grounding of abstract concepts in relevant sensorimotor processes, an idea originally proposed by Aristotle, developed in modern cognitive psychology (see Barsalou, 2008, for a review), and applied to the subject of metaphors by Lakoff and Johnson (1980).
Motion perception
The motion aftereffect is well known in vision: After exposure to visual motion in one direction for about 10 s, a static visual stimulus seems to move in the opposite direction. This aftereffect is thought to be due to adaptation of the motion detectors. It is also manifest cross-modally: Adaptation to visual motion induces a tactile motion aftereffect, and vice versa, which indicates a shared visuotactile representation of motion (Konkle et al., 2009). Consistent with this hypothesis are numerous neuroimaging studies demonstrating that motion-selective visual cortical areas referred to as the MT complex (Fig. 1b) are also recruited by tactile or auditory motion, in both sighted and blind people, although some studies have failed to find cross-modal activation. TMS over the MT complex interferes with discriminating both the speed and the direction of tactile motion (Amemiya et al., 2017; this article also reviews previous neuroimaging and TMS studies relating to motion), which reinforces the idea that this and other visual cortical areas underlie the performance of domain-specific tasks (motion, shape, etc.) in a modality-independent manner.
Body ownership
The sense of body ownership—where one’s body is in space and what belongs to one’s own body and not someone else’s—should logically be one of the strongest percepts (for a review, see Ehrsson, 2020). However, the rubber-hand illusion (Botvinick & Cohen, 1998), which demonstrates an especially intriguing visuotactile interaction, challenges this idea: This illusion arises when one arm of the participant is concealed and “replaced” with a realistic fake arm. While the participant looks at the fake arm, the experimenter synchronously brushes the fake hand and the participant’s own (hidden) hand, which induces the illusory feeling of touch on the unseen hand, the sense that the fake arm belongs to the participant’s body (incorporation into the body image), and the sense that the unseen, real arm is located in the same position as the fake arm (proprioceptive drift). The self-reported strength of the illusion correlates with activity in ventral premotor cortex (for a review, see Ehrsson, 2020). The rubber-hand illusion can be induced in a matter of seconds in individuals who are susceptible to it, and recent work suggests that the effects are relatively long-lasting; in particular, the sense that the fake arm belongs to one’s body may persist for several minutes (Abdulkarim et al., 2021).
In a similar but even more dramatic visuotactile illusion, synchronous stroking of a mannequin and a participant’s own body at the same body location, along with a virtual-reality display that shows the mannequin in place of the participant’s body, induces the perception that the mannequin’s body is the participant’s own (for a review, see Ehrsson, 2020). These amazing observations indicate that constructing the perception of one’s body and its parts depends critically on multisensory integration. There are large individual differences in susceptibility to these illusions, the reasons for which have yet to be worked out; understanding such variability is vital because incorporating prosthetics or tele-operated devices into the body schema is important for their efficient use (Cutts et al., 2019).
Visuo-haptic object processing over the life span
Infants—even neonates—are capable of visuo-haptic cross-modal matching and are therefore sensitive to object properties common to vision and touch (for a review, see Lewkowicz & Bremmer, 2020). However, the statistically optimal visuo-haptic integration that adults demonstrate, in which input from the two modalities is flexibly weighted to minimize the variance of perceptual estimates, takes some time to develop: Up to about the age of 8 years, integration is suboptimal; haptics dominates size perception, and vision dominates orientation discrimination. By the age of 8 to 10 years, however, integration is more optimal, presumably a reflection of calibration of sensory systems by cross-modal comparison during development (Gori et al., 2008). That something important happens around the age of 8 to 10 years is consistent with the observation that cross-modal object recognition is view independent for children ages 9 to 10 years and older (as in adults; see the earlier section on shape perception) but not for younger children (Jüttner et al., 2006). Although proficiency at visual or haptic within-modal memory for objects is unaffected by age, this is not so for cross-modal memory: Older adults exhibit a marked asymmetry such that performance is much worse when haptic encoding is followed by visual retrieval than when visual encoding is followed by haptic retrieval, in contrast to the situation in childhood and early adulthood, when cross-modal object recognition is unaffected by which modality is used for encoding (Lacey, Campbell, & Sathian, 2007; Norman et al., 2006).
Audiotactile Interactions
Temporal-frequency information is perhaps most important to audition, for example, in speech and music perception (Pérez-Bellido, Barnes, et al., 2018), but the ability to perceive temporal frequency by touch also contributes to tactile texture perception, as noted above (Bensmaia, 2009). Further, manipulating the frequency of the (unattended) sounds generated by touching textured surfaces influences tactile texture judgments, which suggests that auditory frequency is perceptually integrated with tactile texture (Guest et al., 2002). Thus, a reasonable question is whether audition and touch share a common representation of temporal-frequency information and/or a common neural basis.
A number of psychophysical studies point to a shared representation of temporal frequency between audition and touch. For instance, in one study, participants were asked to report which of two successively presented stimuli had the higher temporal frequency while they attended selectively to either auditory or tactile input and attempted to ignore distractors in the other modality. Symmetric audiotactile influences were found: The frequency in the attended modality was consistently perceived as similar to that in the unattended modality (Convento et al., 2019). Also, auditory adaptation effects can transfer to the tactile domain: Exposure to frequency-specific auditory noise improves subsequent discrimination of tactile frequency but not intensity (Crommett et al., 2017).
Neuroimaging and neurostimulation studies bear out the notion of convergent temporal-frequency representations of tactile and auditory inputs. For instance, in an fMRI study, the left auditory cortex was found to respond to vibrotactile frequencies of 20 Hz and 100 Hz, both in the audible range, but not to a 3-Hz stimulus, which is below the audible range (Nordmark et al., 2012). Conversely, another fMRI study revealed that multiple regions of somatosensory cortex responded to auditory inputs in a frequency-specific manner and that the similarity of spatial patterns of activity in response to different frequencies correlated with the similarity of corresponding perceptual judgments, although the somatosensory cortical responses were less robust and noisier than their auditory cortical counterparts (Pérez-Bellido, Barnes, et al., 2018). Moreover, TMS over primary somatosensory cortex impaired auditory frequency discrimination, but only when trials comprising unimodal auditory stimuli were interleaved with trials requiring (unimodal) tactile or (cross-modal) audiotactile frequency discrimination (Convento et al., 2018).
Conclusions
This brief review has provided some examples of multisensory interactions involving the tactile (haptic) system. Touch and vision represent object properties similarly in a variety of domains, and the neural representations of these properties converge in brain regions that should be considered as specialized for particular tasks rather than for particular modalities of sensory input. Although work on audiotactile interactions is less well developed than work on visuotactile interactions, a similar theme of perceptual and neural commonality emerges in the domain of temporal frequency. Tactile inputs can often be integrated with corresponding visual or auditory inputs, and dramatic illusions evoked by manipulating visuotactile interactions reveal that the very sense of body ownership depends critically on multisensory integration. Thus, the various sensory systems should no longer be considered independent. Rather, the goal in sensory neuroscience and psychophysics is to find out how they interact to produce the richness of sensory experience.
Recommended Reading
Cascio, C. J., Simon, D. M., Bryant, L. K., DiCarlo, G. & Wallace, M. T. (2020). Neurodevelopmental and neuropsychiatric disorders affecting multisensory processes. In K. Sathian & V. S. Ramachandran (Eds.)
Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory?
Lacey, S. & Sathian, K. (2020). Visuo-haptic object perception. In K. Sathian & V. S. Ramachandran (Eds.)
Sathian, K., Lacey, S., Stilla, R., Gibson, G. O., Deshpande, G., Hu, X., LaConte, S., & Glielmi, C. (2011). Dual pathways for haptic and visual perception of spatial and texture information.
Yau, J. M., Pasupathy, A., Fitzgerald, P. J., Hsiao, S. S., & Connor, C. E. (2009). Analogous intermediate shape coding in vision and touch.
