Abstract
The Pew Research Center reports a rapid rise in audiobook consumption among adults as digital device ownership increases (Zickuhr & Rainie, 2014). With the emergence of new technologies, long gone are the days of cassette tapes and CDs that made stopping, rewinding, studying, and restarting an audio recording a hassle. Digital media now makes it possible for individuals to easily listen, read, or do both simultaneously from their smart phones, e-readers, electronic tablets, and computers. Recent integration of digital content into electronic textbooks also offers multiple modalities from which to present content. Although gaining popularity by many, some argue that listening to an audio recording is inferior to reading it and, in essence, “cheating” (Miller, 2010; Reimer, 2007). If it were the case that listening to a text was inferior to reading the text, one would predict that comprehension after listening to a digital audio recording would differ from comprehension after reading the same electronic version of the text.
Given the importance of text comprehension across the life span, and strong opinions expressed by teachers and the public at large on the effectiveness of reading compared with listening on comprehension (Baskin & Harris, 1995; Beers, 1998; Bomar, 2006; Goldsmith, 2002; Moody, 1989), there is a surprising lack of empirical research that directly evaluates the effect of mode of input on comprehension. Furthermore, a review of the research on adults yields conflicting results. Across several studies, recall after reading text was better than recall after listening to text (Daniel & Woody, 2010; Dixon, Simon, Nowak, & Hultsch, 1982; Green, 1981; Lund, 1991). For example, Daniel and Woody (2010) found that participants who read an article scored significantly higher on a quiz than those who listened to a podcast of the same article. In contrast, Moyer found no significant differences between the two modalities. One possible explanation for the conflicting results across these studies pertains to whether participants were (Daniel & Woody, 2010) or were not (Moyer, 2011) given the opportunity to go back and study the information before being tested for comprehension.
Likewise, studies comparing single and dual modality modes of input have been varied, and yielded inconsistent results (Baddeley, 2003; Lee & Young, 1974; Low & Sweller, 2005; Paivio, 1991). Much of the research investigating the effects of dual modality presentation of verbal information on comprehension has been done with foreign language learners. For example, Chang (2009) did not include a read-only condition, but found gains in comprehension when students read and listened simultaneously to stories, rather than listening only. In contrast, Diao and Sweller (2007) found significant gains in comprehension in the reading modality only rather than the reading while listening condition, but did not include a listening-only condition. Their results support cognitive load theory, wherein it is hypothesized that simultaneous inputs from two modes increase cognitive load and, hence, can be detrimental to learning compared with the presentation of the material in a single modality only (Plass, Moreno, & Brunken, 2010). In yet another study, Moreno and Mayer (2002) found that students who read while listening to text learned the material better than those who only listened, or those whose text was accompanied by animations. Unfortunately, the findings from these studies cannot be compared directly because of differences across studies in the nature of the readings, the opportunity to review content after reading, population characteristics, or different instructional conditions.
Given the rapid advances in technology and the growing availability and use of multiple modes of input of content, coupled with a lack of consistency in the research literature pertaining to the effect of mode of input (listening, reading, or listening and reading simultaneously) on verbal comprehension for adults, we conducted an experiment to investigate the extent to which input modality (digital audio, e-text, dual modality) affects comprehension and retention of verbal material. Although the role of gender has been largely ignored in previous research on this topic, it is possible that there may be gender differences in reading and listening comprehension. Pauls, Petermann, and Lepach (2013) found that women outperformed men on auditory episodic memory tasks and explained that the women’s superior auditory memory performance could be explained by their advantage in verbal ability. As such, gender differences may also contribute to inconsistency across studies. Therefore, we included an analysis for gender effects in our experiment as well.
The purpose of this experiment was to investigate the effect of input modality (digital audiobook, e-text, or dual modality) on participants’ immediate comprehension (Time 1) and 2-week retention (Time 2) of a “real-world” non-fiction book. As such, the following research questions were addressed:
Method
Participants
To be included in this study, participants had to meet the following inclusionary and exclusionary criteria: adults between the ages of 25 and 40; college educated (bachelor’s degree only), native speakers of English, of normal hearing and vision (with correction), and no self-reported history of neurological or learning impairments. To increase population homogeneity, potential participants outside this age range, with more advanced degrees, or who had not graduated from college were excluded. Early to middle-aged adults with college degrees were chosen because they are the fastest growing population of audiobook consumers (Zickuhr & Rainie, 2014). The population was recruited through flyers located throughout Manhattan coffee shops and Craig’s List advertisements.
Based on these study criteria, 121 participants from the New York City metropolitan area were selected. The participants (
This study was conducted in accordance with the prescribed standards of the university’s Institutional Review Board. All participants provided informed consent and were financially compensated for their participation.
The verbal comprehension aptitude of participants was assessed using a listening and reading comprehension test. The detailed methods and validity of the test used for assessing comprehension aptitude in this adult sample are reported in Rogowsky, Calhoun, and Tallal (2015).
To determine the effectiveness of randomization and assure that there were no significant differences between the three groups in verbal comprehension aptitude, scores of participants randomly assigned to each of the three input conditions were evaluated. The mean comprehension aptitude score for the participants in the digital audiobook condition was 25.5 (
Demographics by Instructional Condition.
Although there was not a significant difference in the comprehension aptitude scores across the three conditions: digital audiobook, e-text, or dual modality, the participants in the e-text condition had the highest scores. To assure that individual differences in participant’s comprehension aptitude did not inadvertently affect the results pertaining to mode of input, all analyses were run both with and without controlling for comprehension aptitude. There were no differences found based on whether or not comprehension aptitude was used as a covariate. To avoid redundancy, only the data using aptitude as a covariate are reported.
Comprehension Measure
The content used in this experiment and across all input modality conditions was the preface and Chapter 17 of the non-fiction novel,
The question set that comprised the
The validity of the

The correlation between the participants’ score on the
Descriptive Statistics and Correlation Matrix for Comprehension Aptitude and
Procedure
Three input modality conditions were used in this study. In Condition 1 (digital audiobook), participants listened to both the preface and Chapter 17 of
Upon completion of Chapter 17, participants proceeded immediately (Time 1) to take the
Results
Immediate Comprehension (Time 1)
A one-way between-subjects ANCOVA was conducted to determine the effect of input modality condition (digital audiobook, e-text, or dual modality) on the mean scores of the

The mean scores on the Unbroken Comprehension test at Time 1 for each of the three groups were not significantly different.
Males
A one-way between-subjects ANCOVA was calculated to determine the effect of input modality condition (digital audiobook, e-text, or dual modality) on the mean scores for the males on the
Females
Table 3 shows the mean scores for the females on the
A final analysis was conducted to determine whether there were any significant interactions between input modality conditions and gender on
In sum, when participants were randomized into input modality conditions, the overall comprehension aptitude of participants did not significantly vary across conditions at Time 1, and no significant differences were found in the immediate comprehension of non-fiction content for the overall population, for males, or for females. All analyses in this experiment were done both with and without co-varying for comprehension aptitude. No differences were found for any analysis depending on whether comprehension aptitude was or was not co-varied out.
Attrition
Two weeks after completing the
A one-way between-subjects ANCOVA was calculated to determine the effect of input modality (digital audiobook, e-text, or dual modality) on the mean scores of the
Males
A one-way between-subjects ANCOVA was conducted to determine the effect of input modality (digital audiobook, e-text, or dual modality) on the mean scores for males on the
Females
A one-way between-subjects ANCOVA was calculated to determine the effect of input modality (digital audiobook, e-text, or dual modality) on the mean scores for females on the
A final analysis was conducted to determine whether there were any significant interactions between input modality condition and gender on
Discussion
There is considerable interest in potential effects on learning, especially verbal comprehension, based on input modality. As an increasing amount of information is becoming available electronically, especially for adults, there is a growing interest in whether there are differences between listening to audiobooks, podcasts, or webinars, as compared with reading the same material via e-text. The purpose of this study was to investigate empirically the effect of modality of input on verbal comprehension in native English-speaking college graduates. We compared the comprehension and retention of “real-world” content presented in three different modes of input: reading e-text presented on a Kindle®, listening to a digital audio recording presented via a Kindle®, and listening while reading the e-text that was highlighted in real time and synchronized by the Kindle® with the audio recording.
Participants were randomly assigned to one of three groups that received the same instructional material (the preface and Chapter 17 from the historical, non-fiction book,
The finding that there is no significant difference in comprehension for adults who read a book or listen to an audiobook is consistent with the findings of Moyer (2011), even though much longer passages were used in the current study than in the Moyer study. However, the current study’s findings are not consistent with Daniel and Woody (2010), which found significantly greater comprehension after reading as compared with listening to text, despite the similarity in length of text across the two studies. The main difference between the current study that did not find any difference based on mode of input and that of Daniel and Woody (2010) that found an advantage for written material was that in the Daniel and Woody (2010) study, participants were allowed to review and study content before taking the comprehension test, whereas in the current study participants were not able to review material. Because reviewing information in the podcast condition would have been more difficult than reviewing the written material, participants may have been less likely to re-examine the podcast; thus, creating an advantage to the participants in the written condition.
Several studies found either an advantage for dual modality presentation (Chang, 2009) or a disadvantage (Diao & Sweller, 2007; Moreno & Mayer, 2002), whereas the results of the current study failed to show any significant difference between single modality of input (either listening to digital audio or reading e-text) or dual modality (reading and listening simultaneously). As such, our results failed to support either the cognitive load theory wherein it is thought that simultaneous written and spoken presentations of the same material increases cognitive load and, hence, is detrimental to learning compared with the presentation of the material in a single modality only (Plass et al., 2010) or the dual modality theory wherein information presented simultaneously in both a reading and listening format elicits more elaborate memory traces, thereby facilitating better retrieval (Baddeley, 2003; Lee & Young, 1974; Paivio, 1991). However, the previous studies evaluating the effects of single as compared with dual modality presentation on verbal comprehension were conducted with students who were learning English as a second language, whereas our study was conducted with native English speakers. This suggests that different results may apply to adults who have different levels of English proficiency, with modality of presentation playing a less significant role for those individuals who already are proficient in English.
Limitations
The focus of this study was on newer technologies and whether mode of input using these technologies affected verbal comprehension. A print text condition was not included and presents itself as a limitation of the study. Neither age nor level of English proficiency were included as variables in this study, as this study focused only on college-educated, native English speakers. Thus, the extent to which these findings generalize to younger or older populations, children at different stages of reading proficiency, non-English language or English as a Second Language speakers, and/or more or less educated populations cannot be determined from this study. Similarly, only one non-fiction novel was used to assess verbal comprehension and retention across three different modes of input (audiobook, e-text, dual modality). Thus, the extent to which these data can be generalized to comprehension and retention of other forms and genres of verbal materials (e.g., fiction, textbooks, professional manuals, and newspaper articles) and other modalities (i.e., animation and video) cannot be determined from this study. In addition, our research assistants oversaw the use of the technology; as such, a person’s prior experience using a Kindle® was not examined.
Future Research
This study was done only with college-educated adults and so it can only be generalized to individuals with well-developed listening and reading comprehension skills. However, there is considerable interest in the use of computer technology in education and there may be multiple advantages to offering differing modes of input for teaching different components of literacy. For example, using audio input when initially teaching phonological awareness skills, visual presentations for teaching orthographic skills and subsequently being able to present both simultaneously to build vocabulary and comprehension skills may have differential benefits for children at different stages of reading acquisition or different levels of English proficiency. Similarly, being able to present audiobooks to populations with visual impairments or dyslexia may be an important adjunct to their education. As such, it will be important to conduct additional research that explores these and similar questions pertaining to verbal comprehension in children who are at various stages of learning to read, individuals with different levels of English proficiency, as well as populations with auditory or visual processing challenges. The effects of input modality may lead to very different outcomes at different stages of language and reading development and English language proficiency. Longitudinal studies evaluating academic outcomes for children and adults who have received ongoing instruction that does or does not include the use of audiobooks, e-text, or both simultaneously may have important educational and clinical implications.
Conclusion
Recent integration of multimedia and digital content into electronic text often offer multiple modalities from which to present content. For example, one may read the material, listen to the material, or do both simultaneously. Many new technologies also allow individuals to annotate text, review by keywords, or highlight specific segments for future study. To our knowledge, this is the first study to directly compare these new technologies, specifically listening to audiobooks, reading e-text, and dual modality presentation of non-fiction text on immediate comprehension and 2-week retention. Our study found no significant differences based on whether a portion of a non-fiction book was presented via audiobook, e-text, or dual modality. We conclude that, for the average, college-educated, native language English reader, comparable comprehension and retention of text occur regardless of the modality of presentation. We caution, however, that the non-fiction text material used in this study was more narrative in style, and may not be representative of the discourse style typically found in textbooks. It may be that reading textbooks, with the goal of learning, studying, and retaining new, factual information or difficult concepts may yield different results than were found in this study with non-fiction literature.
