Abstract
Keywords
Language deficits are a core feature of autism spectrum disorder (ASD), and children with ASD demonstrate these language deficits early on in life (Ellis Weismer, Lord, & Esler, 2010; Paul, Chawarska, Cicchetti, & Volkmar, 2008). There is an extensive literature indicating that vocabulary learning is an area that poses a particular challenge to this population (Arunachalam & Luyster, 2016; Charman, Drew, Baird, & Baird, 2003; Herlihy, Knoch, Vibert, & Fein, 2015; Luyster, Lopez, & Lord, 2007; Luyster & Lord, 2009). Indeed, delayed onset of first words is often the first recognized symptom in ASD (Camarata, 2014; De Giacomo & Fombonne, 1998). Receptive language delay can be observed as early as 12 months in ASD and expressive vocabulary delay at 18 months (Mitchell et al., 2006). Children with ASD also experience significant delays in both expressive and receptive modalities of word learning (Hudry et al., 2010). Reduced word learning proficiency is witnessed in a substantial proportion of the ASD population (Hus, Pickles, Cook, Risi, & Lord, 2007). Given that vocabulary acquisition is strongly linked to later language skills in typically developing (TD) children and children with ASD (Howlin, Mawhood, & Rutter, 2000), additional research to understand word learning deficits and effective strategies to facilitate word learning in this population is warranted.
Children with ASD display a unique vocabulary learning profile in two aspects. First, they present with a larger weakness in receptive language than expected relative to their expressive language (Kover, McDuffie, Hagerman, & Abbeduto, 2013; Luyster, Kadlec, Carter, & Tager-Flusberg, 2008; Mitchell et al., 2006). In typical development, receptive vocabulary development usually precedes expressive vocabulary development in absolute terms and is strongly correlated with expressive language (Bornstein & Hendricks, 2012). The relative weakness in receptive vocabulary (as compared to expressive vocabulary) in ASD has been reported across age groups (Ellis Weismer et al., 2010; Kover et al., 2013; Pickles, Anderson, & Lord, 2014) and divergent cognitive profiles (Kjelgaard & Tager-Flusberg, 2001; Woynaroski, Yoder, & Watson, 2016). The atypical expressive-receptive vocabulary profile is not only evident cross-sectionally but can also be observed longitudinally: several longitudinal studies have confirmed that the longitudinal association between receptive and expressive language is atypical in infants and preschoolers with ASD (Hudry et al., 2014; Woynaroski et al., 2016). There has been additional evidence from a recent longitudinal study that this discrepant vocabulary profile gradually changes over development across the preschool period, suggesting that it may be a particular relevant marker for young children with ASD (Davidson & Ellis Weismer, 2017).
Second, another unique challenge for children with ASD is generalization of learned skills across materials, settings, or communication partners (Eldevik, Kazemi, & Elsky, 2016). For example, children and adolescents with ASD generalized learned word learning strategies less consistently to a new learning context compared to TD participants (de Marchena, Eigsti, & Yerys, 2015) and failed to generalize learned language skills across social communication partners (Pellecchia & Hineline, 2007). Difficulties with generalization have long been reported in the literature in ASD and have persisted despite advances in intervention (Lovaas, Koegel, Simmons, & Long, 1973; Tek & Naigles, 2017).
Cross-modal generalization
Cross-modal generalization is one specific type of generalization that children with ASD may have difficulty with even relative to other populations with language disabilities (e.g., intellectual disabilities). Cross-modal generalization is defined as the transfer of learned vocabulary from one modality (e.g. expressive vocabulary) to another modality (e.g. receptive vocabulary)
In contrast, more recent work has suggested that children with developmental disabilities and language impairments are capable of cross-modal generalization but are more likely to generalize from the expressive modality to receptive than vice versa (Davis, Lancaster, & Camarata, 2016b; Doyle, Wolery, Gast, Ault, & Wiley, 1990). In an intervention study that targeted expressive morphosyntax skills in children with specific language impairment (SLI), participants were found to display significant gains in receptive morphosyntax skills as incidental learning from the expressive treatment (Camarata, Nelson, Gillum, & Camarata, 2009). Specific to word learning, children with Down Syndrome and children with SLI similarly demonstrated some evidence of expressive-to-receptive generalization but consistent difficulty with receptive-to-expressive generalization compared to mental age-matched TD peers (Bird, Chapman, & Schwartz, 2004; Davis et al., 2016a; Gray, 2003). Regarding receptive-to-expressive generalization, only one study to date (Bucher & Keller, 1981) has provided evidence that receptive-to-expressive generalization is possible in children with language impairments without cognitive deficits but is influenced by factors such as complexity of target vocabulary, familiarity of stimuli, and the extent to which participants learned the vocabulary. Participants demonstrated enhanced receptive-to-expressive generalization with simpler and shorter vocabulary, more familiar target word pictures, and longer training in the receptive modality. These findings collectively suggest that cross-modal generalization cannot be assumed in children with different language impairment etiologies, especially in the receptive-to-expressive direction.
Gap in current literature on teaching vocabulary to children with ASD
In clinical settings, professionals continue to debate about when and in which sequence we should teach vocabulary when working with children with ASD. Based on a common vocabulary acquisition sequence in TD children, some researchers have recommended a receptive-before-expressive intervention sequence (Lovaas, 2003; Taylor & McDonough, 1996). However, a review by Petursdottir and Carr (2011) suggests that empirical support for this recommendation is limited. An alternative approach is to target vocabulary in the expressive modality before the receptive modality or only in the expressive modality. In fact, numerous intervention packages for children with ASD were designed to focus on language production and a substantial portion of intervention studies in this population examined the effects of treatment on expressive language as primary outcomes (i.e. verbalization, word production, initiation of communication, etc.) and receptive language as secondary outcomes (see Goldstein, 2002 for a review and Hampton & Kaiser, 2016 for a recent meta-analysis).
However, the assumption that teaching vocabulary in the expressive modality will improve receptive vocabulary in the population of ASD has only been tested in one study (Wynn & Smith, 2003). In this study, though targeting vocabulary in the expressive-only condition resulted in cross-modal generalization more often than targeting vocabulary in the receptive-only condition, not all children consistently demonstrated expressive-to-receptive generalization. Additionally, the vocabulary targets used in Wynn and Smith may have been quite simple for participants to learn. The number of training sessions in each condition ranged from one to five sessions for each word pair, which may have been too few to demonstrate conclusive within-subject comparisons. The reviewed literature shows that evidence for teaching expressive and receptive vocabulary in children with ASD is extremely limited. On one hand, past work on cross-modal generalization in children with developmental disabilities and language impairments suggests that children with ASD will likely display a similar expressive-to-receptive generalization pattern. On the other hand, given the distinctive expressive-receptive vocabulary profile and particular difficulty in generalization in children with ASD, there is some evidence to support that children with ASD may experience difficulty generalizing in both directions (Wynn & Smith, 2003). Taken together, it remains unclear whether children with ASD are capable of incidentally generalizing vocabulary learned in one modality to another.
The current study
Vocabulary is a foundation for many aspects of language development and is an important predictor of later academic success in children with typical development and with language impairments (Dickinson, McCabe, Anastasopoulos, Peisner-Feinberg, & Poe, 2003). Studying cross-modal generalization has the potential to help understand specific word learning deficits in ASD and increase cost efficiency in language intervention. A compelling practical rationale is that if children with ASD are capable of making gains in receptive vocabulary as an incidental effect of expressive-only language interventions, then expressive-only language interventions may be an efficient and sufficient program for teaching vocabulary. On the other hand, it is important to directly test whether teaching vocabulary in either the expressive or receptive modality will increase learning efficiency via incidental, untrained cross-modal generalization. The purpose of this study was to investigate receptive and expressive word acquisition and cross-modal generalization in children with ASD. Specifically, we asked the following research questions:
Do children with ASD generalize cross-modally from expressively-only trained targets to untrained receptive target identification following a combined storybook and play expressive vocabulary intervention? Do children with ASD generalize cross-modally from receptively-only trained targets to untrained expressive target identification following a combined storybook and play receptive vocabulary intervention? Does expressive-only vocabulary instruction lead to greater generalization as measured by a higher percentage of correct responses in cross-modal probes than receptive-only vocabulary instruction? Does expressive-only vocabulary instruction lead to faster learning as measured by fewer sessions to reach mastery criterion in within-modal probes than receptive-only vocabulary instruction?
Methods
Data were collected between June 2009 and March 2011. The University Institutional Review Board reviewed and approved all study procedures. A parent or legal guardian provided informed consent for all participants. Assent was not obtained because participants’ low language levels precluded a plausible case that assent narratives could be adequately comprehended. Rather, specific behavioral guidelines were adopted in order to ensure that a participant could reasonably demonstrate their wish to terminate the study at any phase (and in any session).
Participants
Participants included nine boys with ASD and one girl with ASD, ages 3.0–7.4 years with a mean age of 4.6 years (
Participants’ demographic information and standardized test results.
EOWPVT-4: Expressive One-Word Picture Vocabulary Test—Fourth Edition (Martin & Brownell, 2011); Leiter–R: Revised Leiter International Performance Scale (Roid & Miller, 2011); PLS-3: Preschool Language Scales—Third Edition (Zimmerman et al., 1992); PPVT-4: Peabody Picture Vocabulary Test—Fourth Edition (Dunn & Dunn, 2007); TACL-3: Test of Auditory Language Comprehension—Third Edition (Carrow-Woolfolk, 1999).
Participants with scores below floor were entered as floor.
Target vocabulary selection
A four-step process was employed to identify a set of 16 individualized target words that were absent from each child’s receptive and expressive vocabulary repertoire. First, a pool of vocabulary was selected for each child’s parent to indicate which words were absent from their child’s repertoire. The pool of words included low-incidence vocabulary words from the upper levels of the MacArthur-Bates Communicative Development Inventories (Fenson, Marchman, Thal, Dale, & Reznick, 2007), such as toucan, abacus, canoe, flashlight, tepee, or blimp. Second, the experimenter engaged each child in a word imitation task. Words for which the child could not produce an intelligible imitation were excluded so that a child’s phonological limitations would not confound expressive production or generalization. Third, the child was instructed to match one picture for each of the vocabulary words to an identical picture. The experimenter did not name the vocabulary during the matching task. Only the stimuli that the child correctly matched were included. This step ensured that the child had the necessary prerequisite matching skills for receptive probes so that errors in receptive probes would not be confounded by lack of ability to match or point to a target vocabulary. Fourth, the child was asked to identify each potential target word expressively and receptively. To test for expressive knowledge, the child was asked to name each picture. To test for receptive knowledge, the child was asked to identify each picture from a closed set of four pictures. Only vocabulary items that were not named correctly (0% across four expressive opportunities) and that were identified at or below chance level (< 25% across four receptive opportunities) were selected for intervention.
These procedures were important because we wanted to select vocabulary items that were absent from each child’s expressive and receptive repertoire and eliminate potential confounding effects of nonexperimental variables on demonstrating vocabulary learning such as cognitive ability to match pictures, motor ability to point to pictures, or oral-motor ability to express a target form intelligibly. After 16 vocabulary items were selected following these procedures, they were randomly assigned into four sets. Within each set, the words were randomly assigned to a modality so that two words within each set were taught expressively exclusively and two words were taught receptively exclusively.
Three sets were presented during intervention phases and a fourth set served as an untreated control set to monitor possible incidental learning of vocabulary and maturation effect. Control words were not taught in either modality but were included during each probe phase. Baseline procedures included at least three sessions of probes on all 16 words (targeted and control) to establish a stable data pattern before the intervention phase was implemented.
Experimental design
A single-case parallel treatments design across behavior sets (Gast & Ledford, 2018; Gast & Wolery, 1988) was used to compare word learning and cross-modal generalization of words learned within the receptive-only and expressive-only vocabulary intervention. This design is well suited for comparing interventions on nonreversible behaviors (Gast & Ledford, 2018). Conceptually, this design is equivalent to two concurrent multiple-probe designs implemented simultaneously with one variation of the intervention (Storytime Intervention + expressive prompts) implemented in one multiple-probe design whereas the second variation (Storytime Intervention + receptive prompts) in the other (Wolery, 2013). Experimental control was demonstrated through the staggered introduction of the independent variable to different sets of vocabulary with changes in learning behavior observed only after the introduction of the independent variable (Gast & Ledford, 2018).
One advantage of the parallel treatments design is that it does not require continuous baseline probe sessions for behaviors that have yet to be introduced to the intervention (Gast & Ledford, 2018; Jones & Schwartz, 2004). Therefore, it allows demonstration and comparison of two interventions for teaching nonreversible behaviors by time lagging the behavior sets (i.e. target word sets in this study) while controlling for maturation and history threats and avoiding instrumentation threats that are common in multiple baseline designs. This design also allowed expressive and receptive vocabulary to be targeted simultaneously yet independently so that word learning and generalization can be compared directly across two modalities. By “independent,” we are not arguing that vocabulary can be learned or taught completely in one modality in isolation. From an applied behavioral perspective on teaching communication and language, the behavior of the teacher and the behavior of the learner are functionally independent of one another (Sundberg & Michael, 2001; Sundberg & Partington, 1998). We are using the terms “isolation” or “independence” in modality to refer to the modality of the teaching prompts used in the intervention.
All children participated in receptive- and expressive-only vocabulary interventions and were taught vocabulary words that were absent in baseline expressive probes and at or below chance levels in baseline receptive probes. Vocabulary intervention modality (expressive or receptive) was randomly assigned to target words so that half of the target words were trained only expressively (expressive-only condition) and half of the words were trained only receptively (receptive-only condition) for each participant. No direct cross-modal training was delivered. A series of probe sessions were completed to assess baseline, within-modal learning, and cross-modal generalization of vocabulary learning. These probe sessions occurred during baseline assessment, during intervention, and post-intervention. Across both expressive- and receptive-only conditions, participants were trained to reach the same correction criterion in the targeted modality (80% correct responses across three sessions) before they were probed in the untrained modality. The primary dependent variable was the percentage of correct responses on expressive and receptive cross-modal probes. As an exploratory analysis, our secondary dependent variable was the number of sessions needed to learn vocabulary to criterion (defined later in this section). A difference in the effectiveness of the expressive-only condition versus receptive-only condition would be indicated if one modality of vocabulary instruction reliably produced greater cross-modal generalization or more vocabulary learning in a shorter amount of time than the contrasting modality.
Vocabulary intervention
Individual intervention sessions were conducted at the university clinic for each participant by licensed speech-language pathologists. For all participants, intervention started within one month of initial consent and assessments. Participants attended multiple 24-minute treatment sessions per week with a range of two to four sessions per week based on their ability to attend the sessions. The intervention length varied for each participant with a mean of 19 weeks, ranging from 14 weeks to 36 weeks. A combined storybook and play intervention using milieu teaching strategies (Storytime Intervention; Wolery, Ault, & Doyle, 1992a; Wolery et al., 1992b) was adopted. Each session was divided equally into an expressive-only condition and a receptive-only condition (12 minutes in each modality).
The clinician, intervention setting, play context, and reinforcers were identical across the expressive- and receptive-only conditions. The clinician introduced targeted vocabulary words in a storybook followed by structured play. During the storybook session, the child and the clinician viewed a picture book of at least 10 pages together while the clinician told a story about the pictures. Each target word was verbally presented by the clinician 10 times. During the play context, the clinician and the child interacted with a set of toys that matched the storybook theme. The clinician used commenting play strategy in which she used target words meaningfully for approximately 10 times after securing the child’s attention. The clinician also used other naturalistic play techniques, including environmental arrangement, shadow play, semantic expansion, and matched turn taking (Wolery et al., 1992a).
The expressive- and receptive-only interventions differed in the type of prompts provided by the clinician during intervention. For the expressive-only condition, the clinician prompted the child to name the vocabulary item when shown a picture that matched the word immediately after the clinician produced each vocabulary word. For the receptive-only condition, the clinician prompted the child to point to a picture that matched the word.
The clinician reinforced correct responses with verbal praises, smiles, or social routines such as high-fives. For incorrect responses, the clinician delivered the correct response (“Uh-oh, it’s a ____” for the expressive-only condition and “Uh-oh, here’s the ____” while pointing to the correct picture for the receptive-only condition) and readministered the prompt. If the child then identified or produced the word correctly, the clinician reinforced the correct response. If the child made no response after prompting, the clinician waited for a minimum of 2 seconds and then delivered the correct expressive or receptive exemplar.
Probe procedures
In addition to the within-modal probes, cross-modal probes (i.e. untrained modality) were also administered at baseline during one cross-modal probe session. For example, if the word “pelican” was assigned to the receptive-only condition for a child, it would be probed expressively during cross-modal probe sessions. All target words, including trained, untrained, and control sets of vocabulary were included in the cross-modal probe session. Similar to within-modal probe sessions, three trials of each word were included in each cross-modal probe session.
Measuring mastery and generalization
Participants’ responses to within-modal, cross-modal, and daily intervention probes were recorded during each probe or intervention session. The percentage of correct responses to probes was calculated and plotted for each word and each participant for visual analysis of data. The criterion for mastery of within-modal learning was defined as 80% correct across three consecutive intervention sessions. Similar to criterion on mastery of within-modal word learning, complete cross-modal generalization was defined as 80% for each direction of generalization. More broadly, to make decisions on whether each participant demonstrated successful cross-modal generalization of word learning, success in generalization for each direction was defined as meeting criterion on the majority of phases of intervention (two out of three phases).
Interobserver agreement (IOA)
IOA was calculated for 33% of probe sessions and at least 33% of intervention sessions (
Procedural fidelity
Procedural fidelity was measured for at least 33% of baseline and probe sessions (
Results
Data from individual participants were graphed and analyzed at the participant level to answer all four research questions. All 10 participants successfully learned target vocabulary words to criterion (80%, across three consecutive sessions at the end of each intervention phase) in trained modalities, indicating that the combined storybook and play intervention was effective in teaching new vocabulary and that cross-modal generalization could be tested in all participants. However, substantial individual variability was observed in number of sessions needed to reach mastery criterion and pattern of word learning. The length of each intervention phase ranged from 6 to 34 sessions for each set of vocabulary trained to reach mastery criterion. Representative data from three participants are presented in Figures 1, 3, and 4. Plots of all other participant’s learning and testing data at baseline, during intervention, and follow-up probes are available in supplementary materials (Figures S1–S7). All participants’ performance on the cross-modal probes is presented in Table 2.
Vocabulary learning and cross-modal generalization of a participant who demonstrated successful cross-modal generalization in both directions. Averaged accuracy rate on cross-modal probes during probe sessions across three phrases for each participant. Vocabulary learning and cross-modal generalization of a participant who demonstrated successful cross-modal generalization from expressive to receptive modality only. Vocabulary learning and cross-modal generalization of a participant who did not demonstrate successful cross-modal generalization in either direction. Participants’ success in cross-modality probes during the probe sessions immediately following intervention.



Cross-modal generalization
Overall, 3 out of 10 children (Participant 4, 5, and 7; see Table 2) demonstrated successful cross-modal generalization of trained vocabulary for both expressive-to-receptive and receptive-to-expressive directions (i.e. 80% on cross-modal probes for at least two out of three intervention phases) similar to what is often seen in typical development. An example plot from one participant (Participant 4) who demonstrated successful cross-modal generalization for both directions is displayed in Figure 1.
For this participant, during both expressive-only and receptive-only intervention conditions, his demonstration of target words increased in level as compared with the baseline probe conditions and had an overall accelerating trend for the majority of intervention phases. The percentage of correct identification remained at 0 or 25% for all target words prior to any intervention. For receptive-only trained vocabulary words, he reached criterion in seven, five, and seven sessions, respectively, for three phases. For all three phases, he demonstrated an immediate increase in the percentage of correct identification after intervention was initiated. For expressive-only trained vocabulary words, he reached criterion in five, five, and eleven sessions, respectively, for three phases He showed an immediate increase in the percentage of correct identification for both Phase 1 and Phase 2 with intervention. For Phase 3, though his performance remained low for the first four sessions, he demonstrated a steady increase since the 5th session and eventually reached criterion during the 11th session. His performance on the untrained control set of vocabulary remained at 0 or 25% throughout the probe conditions, which eliminated the internal threats of maturation and incidental learning.
Overall, cross-modal generalization was higher for the expressive-to-receptive direction than for the receptive-to-expressive direction. Nine out of ten children demonstrated successful cross-modal generalization on the expressive-to-receptive probes for the majority of phases (i.e. two out of three phases), with an average expressive-to-receptive generalization rate at 89% (ranging from 44 to 100%; Figure 2). In contrast, only three children (Participant 4, 5, and 7) reached mastery criterion for cross-modal generalizations on the receptive-to-expressive probes for the majority of intervention phases, with an average receptive-to-expressive generalization at 43% (ranging from 6 to 89%, see Figure 2). On average, participants demonstrated successful cross-modal generalization on four out of six words for expressive-only trained words but only one out of six words for receptive-only trained words.
An example plot from one participant (Participant 9) who demonstrated successful cross-modal generalization in the expressive-to-receptive direction is displayed in Figure 3. For this participant, his percentage of correct identification gradually and consistently increased after the intervention was introduced, although variability was observed for Phase 2 of the expressive-only condition and Phase 3 of the receptive-only condition. For the expressive-only condition, he reached criterion in 19, 14, and 22 sessions, respectively, for three phases. For receptive-only vocabulary words, he reached criterion in 9, 17, and 10 sessions, respectively. As shown in Figure 3, though this participant reached successful generalization criterion for two out of three phases for the expressive-to-receptive generalization (100, 83, and 50%), his performance on the receptive-to-expressive generalization remained low (33, 67, and 33%). His performance on the control set of vocabulary remained low for the receptive-only conditions, yet some variability (ranging from 0 to 50%) was observed for the control set of vocabulary for the expressive-only condition. However, because his percentage of accuracy was still under the mastery criterion, we concluded that the variability observed in the percentage of correct for the control vocabulary did not provide sufficient evidence for incidental learning. Five other participants (Participant 1, 2, 6, 8, and 10) demonstrated similar cross-modality generalization pattern only in the expressive-to-receptive direction.
Finally, one participant did not demonstrate successful cross-modal generalization in either direction (Participant 3, Figure 4). For Phase 1 and Phase 2, with the introduction of intervention, his percentage of correct identification remained low for the first half of the intervention phase but increased with an accelerating trend during the second half of both phases. For Phase 3, significant variability was observed for the expressive condition, but he eventually reached criterion after 34 sessions. For the receptive-only condition for Phase 3, he showed an overall increasing trend immediately after the introduction of the intervention and maintained a high percentage of correct identification after reaching the criterion. However, he did not demonstrate successful cross-modal generalization in either direction even after he demonstrated mastery in the trained modality. His percentage of correct identification on the cross-modal probes ranged from 33 to 50% for the expressive-to-receptive generalization probes and from 0 to 50% for the receptive-to-expressive generalization probes.
Expressive and receptive word learning
Even though all children were successful in learning targeted vocabulary to mastery criterion in both expressive and receptive modalities, large variability was observed with regard to number of intervention sessions needed to reach mastery criterion and there were individual patterns of word learning. Total length of intervention ranged from 17 to 70 sessions ( Number of sessions needed to reach within-modal mastery criterion for each participant.
Measure of incidental (untreated) vocabulary learning
A fourth set served as an untreated control set to monitor possible incidental learning of vocabulary. The mastery criterion used for daily intervention probes and cross-modal probes was also used for the control set. One participant (Participant 2) demonstrated evidence of learning for one control word in the receptive condition. No other children demonstrated evidence of incidental learning to criterion in either modality. Therefore, these results demonstrate vocabulary intervention effects well above the observed levels of incidental word learning.
Discussion
The primary purpose of this study was to evaluate cross-modal generalization of vocabulary in children with ASD. Cross-modal generalization can be and is often an assumed or “automatic” outcome in clinical practice. To be sure, rapid expressive-to-receptive and receptive-to-expressive generalization is common in TD children (Dollaghan, 1985; Fernandes, 2008). To systematically test this assumption in ASD, this study used an orthogonal test of expressive- and receptive-only vocabulary learning in which vocabulary targets were taught and probed independently in each modality. During intervention, all participants demonstrated within-modal learning for both expressive- and receptive-only conditions (30 replications across 10 participants). However, contrary to the assumption that vocabulary learning will be “automatically” generalized across modalities, results from this study indicate that cross-modal generalization at the word level is not automatic nor consistent in children with ASD, particularly in the receptive-to-expressive direction. Additionally, as a secondary analysis, we compared the efficiency of word learning in the expressive-only condition and the receptive-only condition to understand whether learning in one modality proceeds the other. Though all participants demonstrated learning in both modalities eventually, participants varied in terms of their learning pattern of the target vocabulary words in each modality.
Cross-modal generalization
In a broader sense, generalization refers to the process of transferring learned skills to a similar but not identical new context (Harris, 1975; Wolfe, Blankenship, & Rispoli, 2018). It is a ubiquitous challenge in clinical practice and is one of the most critical barriers to treatment success because learned skills in the clinical setting with trained materials may not readily generalize to functional language outcomes outside of the clinic context (de Marchena et al., 2015) Many children with intellectual disabilities showed poor performance on generalization tasks even when they mastered the materials or behaviors that were explicitly trained (Davis et al., 2016a, 2016b). For children with ASD, generalization has been known to be a particular challenge and past studies have consistently shown that children with ASD were less efficient in their generalization of learned skills in different domains including language and literacy, emotion recognition, and social skills (for a review, see Wass & Porayska-Pomsta, 2014).
In this study, we extend the current literature on generalization in children with ASD by examining a specific type of generalization in word learning: cross-modal generalization. Unlike children with typical language development, children with ASD in this study displayed two main patterns of cross-modal word generalization. First, children with ASD demonstrated more cross-modal generalization in the expressive-to-receptive direction than the receptive-to-expressive direction. Nine children demonstrated successful expressive-to-receptive generalization of trained words on at least two out of three intervention phases, yet only three children demonstrated successful generalization in the receptive-to-expressive direction. Second, even in the children who reached successful cross-modal generalization criteria, very few demonstrated consistent generalization across all three intervention phases. Consistent with results from Wynn and Smith (2003), these results provide a convergent picture of atypical vocabulary learning profile and weakness in cross-modal generalization in children with ASD, particularly from the receptive to expressive modality.
However, our results contrast with Wynn and Smith (2003) who found that children’s overall language ability had an impact on their levels of cross-modal generalization. In their study, the three participants who scored the highest on the standardized language assessment also demonstrated the highest level of cross-modal generalization. It is reasonable to speculate that a child’s language or cognitive ability could have an association with his or her ability to generalize across modalities or contexts. For instance, generalization requires the ability to relate newly encountered stimuli or context to past learning experiences (Rimland, 1964). Therefore, children with better working memory or short-term memory could be at an advantage for generalization tasks. Additionally, children with more advanced language skills could rely on their linguistic resources to draw meaningful connections between familiar and new contexts or materials, which could also contribute to greater success with generalization. However, we did not detect any pattern using visual analysis between children’s generalization ability and their language functioning, cognitive skills, or age in our study. The different findings in the association between cognitive or language skills and cross-modal generalization between this study and Wynn and Smith (2003) could arise from several sources including different sample distributions on these parameters. In general, single subject designs do not include systematic replication across multiple participant features so these kinds of analyses rely on categorical or binary distinctions. The hypothesis that cognitive or language abilities drive cross-modal generalization ability or broader generalization ability should be empirically tested in future studies with more extensive replication of these distinctions.
Additionally, our finding that children with ASD generalized more from the expressive modality to the receptive modality than the opposite direction inspired us to reflect more deeply about the relationship between expressive and receptive modalities of word learning and factors that may facilitate one direction of generalization over another. A task requirement analysis of cross-modal generalization suggests that expressive-to-receptive generalization requires a child to identify an item out of an array of four pictures and respond to a verbal prompt (i.e. “Where is the toucan?”) with a motor action (i.e. pointing), whereas receptive-to-expressive generalization requires a child to retrieve a label when presented with a picture and respond to a verbal prompt (i.e. “What is it?”) with a verbal answer when the individual item is presented.
The comparison of task requirements suggests that the task demand of verbal production of the item in the receptive-to-expressive generalization may result in greater complexity of the task than the task demand of pointing in the expressive-to-receptive generalization. Even though we attempted to control for participants’ oral-motor and speech production skills by only including words that a child can imitate as target vocabulary, we did not explicitly test participant’s phonological skills. Previous theoretical models on lexical learning and access have suggested that a speaker typically go through two stages to produce a word: lexical selection and phonological form encoding (Levelt, 2001; Levelt, Roelofs, & Meyer, 1999). During the lexical selection stage, the speaker selects the target lexical item from active lexical concepts in the mental lexicon. Once a lexical item is selected, the phonological form encoding system is activated to retrieve the sounds that correspond to the word that the speaker wants to produce.
It is possible that the additional step of retrieving phonological forms of the target vocabulary in the receptive-to-expressive generalization posed additional challenges for children with ASD in this study. However, evidence from previous studies on phonological skills of children with ASD, though limited, have generally suggested differences in phonological processing skills are not sufficient explanations of word learning difficulties in this population (Gladfelter & Goffman, 2017; Norbury, Griffiths, & Nation, 2010). In one study, children with ASD learned the phonological forms of new vocabulary and even outperformed TD peers when tested immediately after the word learning paradigm (Norbury et al., 2010). The authors interpreted their findings as indicative of enhanced attention to phonological forms in children with ASD and a “sound before meaning” strategy that they use to learn words. It is relevant to note that participants in the Norbury study were high-functioning children with ASD whereas our study included children with ASD with varying levels of cognitive skills. Further studies are needed to systematically examine the role that that phonological skills play in word learning in children with ASD.
Given the atypical receptive-expressive vocabulary profile in children with ASD (Davidson & Ellis Weismer, 2017; Hudry et al., 2010; Kover et al., 2013), we suspect that the particular difficulty in the receptive-to-expressive generalization observed in this study reflects a unique difficulty in receptive vocabulary learning or reduced receptive advantage in children with ASD rather than a difference in task demands across expressive and receptive vocabulary learning. Previous studies on the discrepant receptive-expressive vocabulary profile in children with ASD have suggested that children with ASD may demonstrate focused impairments in receptive vocabulary learning because they benefit less from the ambient linguistic input in the environment and may require more supportive contexts to learn new words (McDaniel, Yoder, Woynaroski, & Watson, 2018; Woynaroski et al., 2016). Specifically, McDaniel et al. (2018) proposed that deficits in attention toward the speaker in children with ASD may disrupt receptive vocabulary learning more than expressive vocabulary learning. In the same study, the authors provided evidence that reduced attention toward the speaker was associated with larger later receptive-expressive vocabulary size discrepancy in children with ASD. One possible explanation of the better expressive-to-receptive generalization in our study is that the expressive-only vocabulary learning condition provided a more supportive context for vocabulary learning and generalization for children with ASD than that of the receptive-only condition.
Additionally, based on this post hoc consideration of the receptive-expressive asynchrony, we speculate that the disparity in the extent of generalization across the two directions could also potentially be attributed to differences in teaching and prompting contexts across conditions. Even though the clinician provided two prompts for each target word during daily probes for both conditions, in the expressive condition the child was presented with
In our study, the expressive prompts in the expressive-only condition involved the visual cue and the auditory cue of the target item with simultaneous focus on one item (the target word). In contrast, the receptive prompts in the receptive-only condition always involve the visual information of three other items in addition to the visual cue and the auditory cue of the target item. Word learning is inherently multisensory in nature because young learners are expected to combine complementary information from A–V modalities when acquiring words. Thus, for children with ASD who may have variable temporal binding windows for A–V inputs and thus ambiguity in how these multisensory signals should be combined, the exclusive (and item-focused) binding between the A–V information of the target item in the expressive-only condition (i.e. one spoken word label and one referent) perhaps facilitated the mapping between the label and its referent by reducing the variability of stimuli and the complexity of cross-modal A–V multisensory integration. A hypothesis to address the observed greater expressive-to-receptive generalization is that the strengthened link between the label and its referent in the expressive-only condition contributed to a more consistent temporal binding of the auditory and visual features of the target word and facilitated cross-modal expressive-to-receptive generalization. Future studies are necessary to test this hypothesis and systematically examine the impact of potential deficits in multisensory integration on word learning in children with ASD.
Potential autonomy for expressive and receptive modalities in ASD
Another difference in the expressive and receptive prompts and teaching contexts could potentially explain the disparity in the extent of generalization across the two directions. In the expressive-only condition, the expressive prompts asked the child to name the target item, whereas in the receptive-only condition, the receptive prompts asked the child to point to a picture that matches the target word. Consequently, in the expressive condition, participants were expected to produce the labels and were only reinforced following their correct production, yet in the receptive condition, participants were reinforced after they correctly identify the target picture.
A plausible explanation is that the prompts in the expressive-only condition provided children with ASD with opportunities to practice retrieving phonological forms, which further helped them develop more robust lexical representations for target words in the expressive-only condition as compared to the receptive-only condition. In contrast, in the receptive-only condition, it is possible that participants learned to point to the picture of the referent via associative learning between the spoken label in the prompt and the picture instead of developing mental representations of the target words that capture the referential relations between the spoken label, the picture, and the symbolic representation of the word. This explanation is consistent with existing theories and evidence suggesting that children with ASD rely on associative learning mechanisms to learn words and are more likely to develop partial knowledge of learned words (Baron-Cohen, Baldwin, & Crowson, 1997; Henderson, Powell, Gareth Gaskell, & Norbury, 2014; McDuffie, Yoder, & Stone, 2006; McGregor et al., 2012; Parish-Morris, Hennon, Hirsh-Pasek, Golinkoff, & Tager-Flusberg, 2007; Preissler, 2008). To illustrate, young children with ASD have been previously found to associate spoken labels with pictures of the referent instead of real objects (Parish-Morris et al., 2007). Such findings suggest that even though an associative style of word learning may allow children to establish a connection between the spoken label and the picture, these connections may not be sufficiently robust to generalize a newly learned word label to a referent object, a different context, or to freely retrieve the label when prompted in a new modality.
However, it is also noteworthy that even though participants in this study demonstrated more expressive-to-receptive generalization, there were incidences in which a participant was able to answer within-modal expressive probes accurately yet was unable to comprehend the target items that they were able to produce expressively. A similar notion that children with ASD demonstrate word production without understanding words’ meanings has been previously proposed by Charman et al. (2003). As we integrate this finding with the observed difference in generalization patterns, it may be important to bear in mind that Arunachalam and Luyster (2016) suggested that children with ASD are indeed capable of establishing robust lexical representations but may need more input, time, or practice than TD children. In our study, even though cross-modal generalization in either direction was not automatic, participants did perform better on cross-modal generalization probes in the expressive-only condition than the receptive-only condition perhaps because the expressive prompts included opportunities to practice retrieving (and producing) newly learned words. We argue that these findings support the aforementioned proposal that input and output that include more learning opportunities could potentially enhance lexical representation in children with ASD and promote generalization. Granted, this study was not designed to test lexical representations in children with ASD, more research is needed to improve our understanding of lexical representations in this population and to shed light on how to provide input and prompts to optimize word learning and cross-modal generalization in children with ASD.
Word learning efficiency and cross-modal generalization in ASD
In addition, a secondary finding from this study suggests that unlike TD children whose receptive vocabulary is often considered to precede expressive vocabulary learning (Bornstein & Hendricks, 2012), children with ASD in this study varied in terms of their word learning patterns of the target words. Half of the participants reached mastery criterion for within-modal learning faster in the receptive-only intervention condition compared to the expressive-only condition. When participants’ learning was analyzed by intervention phase separately, one participant consistently demonstrated faster learning in the receptive condition across all three intervention phases and two children consistently reached criterion in the expressive-only intervention prior to the receptive-only modality. A closer examination of the learning data of these three children reveals that the difference in number of sessions to reach criterion across modalities was less than three sessions for each participant. These differences likely reflect individual variability instead of etiology-specific vocabulary learning patterns. Overall, participants’ learning data suggest that receptive- and expressive-only target vocabulary were learned in parallel and in close proximity to one another on a session-by-session basis.
Clinical implications
Findings from this study provide new insights into teaching expressive and receptive vocabulary for children with ASD. There has been a long-standing debate about when and in which sequence professionals should teach vocabulary to children with ASD. Results from this study show that more children demonstrated cross-modal generalization in the expressive-to-receptive direction than the receptive-to-expressive direction. When averaging across target word sets, participants demonstrated successful cross-modal generalization on approximately four out of six words for expressively trained target words yet one out of six words for receptively trained words. These results suggest that targeting expressive vocabulary first with the goal of incidentally increasing receptive vocabulary may be more efficient than starting with the receptive modality. However, the finding that not all children demonstrated successful expressive-to-receptive cross-modal generalization indicates that teaching vocabulary in the expressive modality exclusively does not guarantee receptive understanding in this population. We recommend that practitioners periodically monitor children’s vocabulary learning in both modalities or set an explicit generalization goal to ensure complete learning of trained words.
Additionally, results from this study also highlight the need to probe receptive understanding of trained words if an expressive-first intervention approach is adopted. Specifically, some children in this study demonstrated 100% accuracy rate on within-modal expressive probes yet were unable to identify the target items that they were able to produce expressively. Such findings challenge the common clinical assumption that receptive knowledge of vocabulary typically precedes expressive production. It is thus essential to check children’s comprehension of trained words even after they were produced expressively.
Limitations and future directions
Results from this study should be interpreted within the context of some limitations. First, maturation and history are possible threats to internal validity in parallel treatments design studies because such a design usually requires a long time span to complete (Gast & Ledford, 2018). We assessed internal threats of maturation and history by examining participants’ performance on each vocabulary set in a probe session before instruction started. We also monitored incidental learning by including a fourth untrained control set of vocabulary. For all 10 participants, their performances on the probes before the introduction of each intervention condition remained unchanged (0–25%), which suggest that history and maturation were not occurring. For the fourth set of control vocabulary, we detected some evidence of incidental learning for two participants: for Participant 1, his performance on the within-modal probes for the untrained expressive control set ranged between 50 and 100% and for Participant 2, his performance on the within-modal probes for the receptive control set ranged between 75 and 100%. Their performance on the cross-modal probes remained unchanged. No other children demonstrated evidence of incidental learning to criterion in either modality. Overall, these results demonstrate intervention effects well above the observed level of incidental learning.
Another limitation of this study is that maintenance of participants learning was not systematically targeted or tested. Vocabulary instruction also took place within the clinical setting using toys that were closely linked to the theme of the book. It is plausible that teaching vocabulary in multiple contexts with an explicit treatment goal of maintaining learned words may enhance participants’ cross-modal generalization. Additional research is needed to determine the extent to which cross-modal generalization of word learning is malleable and explore effective treatment strategies that could improve cross-modal generalization.
In conclusion, this study is the first to indicate that children with ASD demonstrated inconsistent cross-modal generalization even after extensive within-modal training. Findings from this study warrant future studies to further understand (a) the relationship between modalities during word learning; (b) the optimal order of vocabulary instruction (i.e. simultaneous instruction, isolated instruction with one modality before another, expressive-before-receptive instruction, receptive-before-expressive instruction); and (c) potential interaction between language functioning, developmental level, and cognitive skills and children’s cross-modal generalization ability. For instance, does teaching vocabulary simultaneously in both expressive and receptive modalities help children with ASD develop more robust lexical representations for learned vocabulary? Does teaching expressive vocabulary globally translate to gains in receptive vocabulary globally? Answers to these questions will provide valuable insights into potential mechanisms of deficits in cross-modal generalization, identification of a subgroup of children with ASD who needs additional support with word acquisition and generalization, and new treatment strategies to improve treatment efficiency in this population.
Supplemental Material
Supplemental material for Cross-modal generalization of receptive and expressive vocabulary in children with autism spectrum disorder
Supplemental Material for Cross-modal generalization of receptive and expressive vocabulary in children with autism spectrum disorder by Pumpki L Su, George Castle and Stephen Camarata in Autism & Developmental Language Impairments
Footnotes
Acknowledgments
Declaration of conflicting interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
