Infants' use of temporal and phonetic information in the encoding of audiovisual speech


  • D. Kyle Danielson The University of British Columbia
  • Cassie Tam The University of British Columbia
  • Padmapriya Kandhadai The University of British Columbia
  • Janet F. Werker The University of British Columbia


Infants match heard and seen speech in their own and unfamiliar languages (e.g., Kuhl & Meltzoff, 1982, 1984; Patterson & Werker, 1999, 2003; Pons et al., 2009; Kubicek et al., 2014) as early as two months of age, and, as with adults, the addition of mismatching visual information to auditory speech affects infants’ speech perception (e.g., Burnham & Dodd, 2004; Desjardins & Werker, 2004; Danielson et al., 2015). What has remained relatively unexplored, however, is whether and how infants with little linguistic experience use information from the auditory and visual domains differently, and how the temporal relationship between the auditory and visual speech signals may modify perception.

To explore this question, we familiarized 6-month-old English-learning infants to sequences of audiovisual syllables from Hindi, comprised of a dental stop and a retroflex stop that are indistinguishable by adult English speakers (Werker et al., 1981). In three familiarization conditions, syllables were audiovisually incongruent such that the audio track from one syllable was paired with the video track from another. In one condition, audiovisual signals were presented simultaneously. In the two other conditions, the visual and auditory signal were offset from one another (auditory first or visual first) by a short 333ms interval. Then infants were tested with auditory-only sequences of syllables from each category. We hypothesized that infants would categorize the stimuli during familiarization based either on the speech segment that they heard or the one that they saw, and that they would exhibit a matching preference for that stimulus type at test.

When familiarized to synchronous or visual-first stimuli, similar to natural speech, infants exhibited a matching preference at test for auditorily matched syllables. We interpret these results as evidence that, when the natural temporal dynamics of speech are roughly maintained, infants rely more heavily on auditory than on visual information.

Author Biographies

D. Kyle Danielson, The University of British Columbia

Lecturer, Department of Psychology

Cassie Tam, The University of British Columbia

BA Student, Department of Psychology

Padmapriya Kandhadai, The University of British Columbia

Research Associate, Department of Psychology

Janet F. Werker, The University of British Columbia

Professor and Canada Research Chair, Department of Psychology




How to Cite

Danielson DK, Tam C, Kandhadai P, Werker JF. Infants’ use of temporal and phonetic information in the encoding of audiovisual speech. Canadian Acoustics [Internet]. 2016 Aug. 25 [cited 2022 Jan. 20];44(3). Available from:



Proceedings of the Acoustics Week in Canada

Most read articles by the same author(s)