Acquiring and Utilizing Correlation Patterns across Multiple Input Modalities for Developmental Learning
The Intersensory Redundancy Theory formulated by Bahrick, Lickliter and Flom (2004) is an approach to explain how selective attention is achieved and how it guides the development of perception and cognition. The concept of amodal information is a relevant one for the approach pursued here. One example is fire, which can be seen, heard and smelled. Thus, the event of fire reaches us by amodal information, that is, information that is not specific to a single sensory modality but is redundant across more than one sense (Bahrick, Lickliter & Flom, 2004: 99). In the framework of Intersensory Redundancy Theory, the basic idea is that stimulus properties that are redundant or amodal reinforce each other, become foreground and attract, thus, attention. This process promotes faster processing of redundantly specified properties than of other stimulus properties in early development. Thus, the infant's initial sensitivity to amodal information provides an economical way of guiding perceptual processing to focus on meaningful, unique events. Intersensory redundancy refers to the spatially coordinated and temporally synchronous presentation of the same information across two or more senses and is therefore possible only for amodal properties (e.g., tempo, rhythm, duration, intensity). As an example, the sights and sounds of hands clapping provide intersensory redundancy because they are temporally synchronous, originate in the same place, and convey the same rhythm, pace, and intensity patterns in vision and audition (Bahrick, Lickliter & Flom, 2004: 100).
We hypothesize that multi-modal redundancy is a key concept to bootstrap learning of concepts as it allows drawing conclusions about the structure of data. By finding segments in time that exhibit strong correlations between specific properties in different modalities it is possible to extract these from the surrounding data where less correlation is present. For example, when demonstrating an action to their infants, parents will synchronize their speech patterns with their action patterns and thus segment their action into smaller sub-actions. Our assumption is that by extracting these sub-actions we are able to find not only building blocks of action but also build up multi-modal representations where the different elements are related to each other through learned correlations.