Abstract
About 1 in 59 children has autism spectrum disorder (ASD), as estimated by the US Centers for Disease Control and Prevention (CDC) in 2019. The prevalence estimates for preschoolers are generally lower, for example, about 1 in 125 in the United States (Soke et al., 2017) or 1 in 132 in China (Wang et al., 2011). With early appropriate interventions, children with ASD—especially less severe ASD—stand a better chance of living more independently, having friends, and being in a steady relationship (Fein et al., 2013; Orinstein, Suh, et al., 2015; Orinstein, Tyson, et al., 2015; Roux et al., 2013). Such interventions are available for young children (e.g. Chang et al., 2016; Kasari et al., 2008; Reichow et al., 2012; Schreibman et al., 2015; Warren et al., 2011), but all too often children in need do not get them. In that case, better developmental outcomes tend to be elusive, even for those with relatively high IQs and intact verbal and nonverbal skills (Billstedt et al., 2005; Cederlund et al., 2007; Szatmari et al., 2003).
Many children miss out because they are not diagnosed in time, if at all. Children with ASD vary considerably in the severity of their deficits in social interaction, social communication, and social imagination (American Psychiatric Association, 2013). While severe cases can be diagnosed by age 2 or 3 years (Lord et al., 2006; Moore & Goodson, 2003), milder cases often go undiagnosed until age 6 or 7 years (CDC, 2012), and some are never diagnosed at all. Here in Hong Kong, about 10% of the cases of childhood autism based on the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10, F84.0; World Health Organization, 2004) and 17% of the cases of other ASD conditions (ICD-10, F84.1 and F84.5) are not referred for assessment until after the first grade, when social demands in the classroom and on the playground finally make the children’s social impairments more obvious and problematic (Department of Health, Hong Kong, 2007).
Better tools are needed for early identification of children with ASD—especially less severe cases—for timely clinical assessment. Offering children with ASD effective treatment by age 3 years, for instance, instead of the usual age 6 or 7 years, can launch them on a better lifelong trajectory, giving them a crucial head start on understanding the social world and developing healthy social bonds. Parents are often the first to notice when something does not seem right. Reportedly, this first happens with children on the milder end of the autism spectrum around age 20 months (McConachie et al., 2005). Why then are they not diagnosed until so much older? For one thing, young children often relate better to supportive adults—like their parents—than to peers. Hence, especially in single-child families, children may not show obvious symptoms of ASD at home. With family size trending downward, recognizing signs of ASD in their young children is a major challenge for more and more parents (De Giacomo & Fombonne, 1998; Zwaigenbaum et al., 2005). It would be helpful to supplement parent reports (based on existing instruments such as Autism Behavior Checklist, Volkmar et al., 1988; Developmental Behavior Checklist—Early Screen, Gray & Tonge, 2005; Modified Checklist for Autism in Toddlers, Revised with Follow-up (M-CHAT-R/F), Robins et al., 2014; Pervasive Developmental Disorder Screening Test-II, Siegel, 2004) with other sources of information.
With professional training and regular opportunities to observe children interacting with their peers, preschool teachers are in a good position to notice children’s ASD symptomatology (Duvekot et al., 2015). Yet even when a preschool teacher suspects that a child may have ASD, fear of false alarm may hold the teacher back from alerting the parents, let alone suggesting them to consider clinical assessment for the child.
A valid and convenient screening tool can help preschool teachers make more informed and hence more confident judgment. However, while there are many screening tools to help identify older children with less severe ASD in community settings (e.g. Asperger Syndrome Diagnostic Scale, Myles et al., 2001; Autism Spectrum Screening Questionnaire, Ehlers et al., 1999; Childhood Asperger Syndrome Test, Scott et al., 2002; Social and Communication Disorders Checklist, Skuse et al., 2005; Social Communication Questionnaire, Berument et al., 1999), there are far fewer tools to use with preschool children below age 4 years. For instance, the M-CHAT-R/F and the Rapid Interactive Screening Test for Autism in Toddlers (RITA-T) are screening tools widely used for children up to 30 and 36 months old, respectively (Choueiri & Wagner, 2015; Robins et al., 2001, 2014; Siu et al., 2016), but no preschool version is available to capitalize on preschool teachers’ opportunities to see children in peer interaction regularly.
There is, however, the Diagnostic and Statistical Manual of Mental Disorders Autism Spectrum Problems Scale (DSM-ASD Scale; Achenbach, 2014) from the Child Behavior Checklist for Ages 1½–5 (CBCL/1½–5) and the Caregiver-Teacher Report Form (C-TRF; Achenbach & Rescorla, 2000) for ASD screening in preschool population. The 12 items on this scale can be grouped into a 7-item social communication/interaction (SCI) subscale and a 5-item restricted interests, repetitive behaviors (RRB) subscale (Rescorla, Ghassabian, et al., 2019). Rescorla, Given, et al. (2019) compared the item scores on the DSM-ASD Scale across preschool population samples from different countries and found lower similarity in mean item ratings between international samples for the C-TRF than the CBCL/1½–5. This might be due to greater variations in early childhood settings in schools (e.g. in the teacher–student ratio, classroom structure, and preschool program) than in families across societies (Rescorla, Given, et al., 2019). Such variations in schools may likely affect the relationships between preschool teachers and the children and hence the teachers’ ability to observe and notice certain behaviors included in the checklist. For instance, in preschool settings wherein the teacher–student ratio is less favorable, teachers will probably have less time to interact with and observe each child in class. As such, they may be less likely to pick up some of the behaviors described on the DSM-ASD Scale of the C-TRF, such as those that involve often fleeting social responding (e.g. “Seems unresponsive to affection”) and those that require greater familiarity with the child (e.g. “Disturbed by any change in routine,” “Can’t stand having things out of place”). Moreover, while previous studies have provided evidence on the diagnostic accuracy of the CBCL/1½–5 as a screening instrument for ASD (Levy at al., 2019; Rescorla et al., 2015), such information is lacking for the C-TRF DSM-ASD Scale.
We therefore set out to develop a new ASD screening tool for use by teachers and other observers with minimal clinical training, who may not have known the child for very long or have had a lot of time to observe the child in his or her naturalistic settings. This new observation scale is based on the idea of a natural “stress test.” According to the
For young preschoolers with less severe ASD, peer interaction without adult scaffolding and instructions can make their ASD-related deficits more apparent. When resources are sufficient, clinical psychologists sometimes make preschool visits if a clinical assessment suggests a case in a milder range of the autism spectrum. It can be telling to observe how a child interacts with peers—or does not—during free play. Indeed, a prior study has shown that preschool observation of children’s free-field behavior in group activities and free play based on a protocol derived from the Autism Diagnostic Observation Schedule (ADOS) yielded similar information as that obtained at ADOS assessment performed by clinicians in a clinic (Westman Andersson et al., 2013). Yet preschool observation is not typically considered the most cost-effective use of a diagnostician’s time, and hence, it is rarely done locally or in most other countries. Even where school observations are more common (e.g. in the United Kingdom), this practice can still be improved—that is, less experienced clinicians could benefit from having a valid and simple classroom observation scale.
Such a screening tool can be used by an assistant therapist or preschool teacher to gather valid results of peer interaction as a naturally occurring “stress test.” We intended the classroom observation scale (1) to assist clinical diagnosis of milder cases of ASD in lieu of clinicians making preschool visits; (2) to identify children early on (e.g. first year in preschool) who are more likely to have ASD than their peers, so that these children can be kept under closer watch (i.e. surveillance); and (3) to help preschool teachers make better informed and more confident decisions about whether to discuss with parents of children whom they suspect perhaps to have ASD.
Method
Participants
Ethical approval for this research was granted by the Human Research Ethics Committee of the authors’ university. Written parental consent was obtained prior to data collection. There were two phases to this study. The Classroom Observation Scale (COS) was developed in phase 1, which involved 304 children (age 3;0 to 4;11,
We validated the COS in phase 2 of this study. There was no overlap in the participants between phases 1 and 2. We received parental consent for 322 children (age 2;10 to 4;5,
Procedure
Phase 1: development of the 13-item COS
An item pool was generated from several sources: (1) an extensive review of research on peer interaction, (2) existing instruments for screening and assessing children with ASD, (3) a review of books with retrospective accounts from parents of children with ASD, and (4) interviews with experienced diagnosticians for ASD. The draft checklist consisted of over 100 items. Two local clinicians who specialized in ASD and had considerable experience of school observation suggested item revision in light of the preschool classroom context. The revised draft checklist was then sent to two clinical psychologists specialized in ASD in the United States and a developmental psychologist in Hong Kong for expert review. A preliminary checklist thus created comprised a set of 84 symptomatic/healthy behaviors and a 3-point rating system (1 = occurring rarely/most of the time, 2 = occurring less/more frequently than average, 3 = occurring at a similar rate as average peer).
An experienced clinical psychologist trained six research assistants (who had taken university-level psychology courses) to use the 84-item rating system in a special education classroom for high-functioning preschoolers with ASD. Both the clinical psychologist and the research assistants (one or two assistants at a time) observed the children simultaneously on site, and each research assistant’s ratings were compared item-by-item against the psychologist’s ratings at the end of the observation. After about 9 h of training, each rater achieved an item-by-item agreement greater than 75% with the psychologist.
The six raters then observed the children in the four preschools—1 school day per child, and four to five children per school day. To establish interrater reliability, a portion of the cases (
Data-driven item reduction of our 84-item preliminary list yielded a much shorter 13-item COS. Then, as a first validity check, we examined whether COS scores were related to ASD symptomatology, as indicated by scores based on a widely used assessment tool, namely the ADOS-2 (Lord et al., 2012). ADOS-2 was administered by a clinical psychologist formally trained and qualified to use it for both research and clinical purposes and kept blind to the children’s COS scores. Of the 304 children observed in phase 1, 185 of them—whose parents granted further consent—underwent the ADOS-2 assessment (Figure 1).

A flowchart indicating the number of children at each stage of data collection in phase 1 of the study.
Phase 2: validation of the COS
We further evaluated how well observers with little or no clinical training (i.e. research assistants and preschool teachers) could use the 13-item COS to identify preschoolers under age 4.5 years more likely than their peers to have ASD. Parents were invited to participate about 2 months after their children had started preschool. Interested parents returned a signed consent form to the school.
The same clinical psychologist from phase 1 trained two new research assistants (with university-level psychology coursework but no prior clinical training) to use the 13-item COS, reaching good interrater reliability after about 6 h of training using the same method and criteria. The two research assistants then observed each child participant on 2 school days no more than 19 days apart (
A teacher in each classroom was asked to use COS and Social Responsiveness Scale–2 (SRS-2 teacher-report; Constantino, 2012) to rate the children. The SRS-2 was used as a measure of convergent validity for the COS. Altogether, 30 teachers from the five preschools did so; they were all briefed beforehand on the scoring of the checklist items for about 30–45 min by a clinical psychologist on our research team.
Children of interest were identified based on the COS teacher-report (COS-Teacher) and researcher-report (COS-Researcher). Between the 15th and 85th percentile (i.e. within about one standard deviation of the mean) is typically considered within the normal range for clinical measures (e.g. IQ scores; Sattler, 2008), so we used the bottom 15% as a cutoff for COS-Teacher and COS-Researcher. This cutoff seemed like a reasonable first approximation for bootstrapping our way to find an evidence-based cutoff for the COS. We adopted two approaches to identify young children more likely than their peers to have ASD near preschool onset (Figure 2):

The two screening approaches to identifying ASD in phase 2 of the study (
We did not give ASD assessment to all 322 children in this community sample for two obvious reasons: (1) ethical concerns of clinically assessing a large number of children without clinical referrals (further discussed in a later section) and (2) financial costs. Instead, we used these two approaches and identified 54 of 322 children as more likely to have ASD, noting considerable overlap of screen-positives between the two approaches.
In the second semester of the children’s second preschool year—generally about 1.5 years after the COS data collection—these 54 screen-positive children were mixed with 28 randomly selected screen-negative peers (i.e. typically-developing control) for ASD assessment using ADOS-2. Hence, a total of 82 children were assessed on ADOS-2. The clinical assessments were done by the same clinical psychologist as in phase 1, who was trained and qualified for using ADOS-2 for research and clinical purposes and did not know the children’s COS screen-positive versus control status or their scores on other measures.
Instruments
Statistical analysis
Data analyses were performed using SPSS-25. Raw scores were used for all analyses unless specified otherwise. To yield the 13-item COS from the list of 84 items used in phase 1, variance in item scores and the collinearity among items were checked. The latter was examined by computing Spearman’s correlation coefficients between items. Psychometric properties of the COS were calculated based on data collected in both phase 1 and phase 2. Cronbach’s alphas were reported for internal reliability. Intraclass correlation coefficients (ICCs) were reported for interrater and test–retest reliabilities. Cross-informant agreement between teachers’ and researchers’ ratings on COS was assessed by calculating the Pearson correlation coefficient between the two measures. Convergent validity was assessed based on the correlations of the COS with ADOS-2 in phase 1 and with SRS-2 in phase 2 of this study.
In the validation phase, ADOS-2 was conducted around 1.5 years after the classroom observation. One-way analysis of variance (ANOVA) compared the mean scores on COS-Researcher and COS-Teacher between the non-ASD and ASD groups classified based on the ADOS-2 assessment. To examine the predictive validity of the two screening approaches (Figure 2) in detecting ASD versus non-ASD, Pearson chi-square tests were conducted for each approach to test for significant relations between the categorization based on screening and subsequent diagnoses of ASD. Cramer’s
The receiver operating characteristic (ROC) analyses were done separately for the COS-Teacher and COS-Researcher to further evaluate their predictive validity. Here a larger area under the ROC curve (AUC) would suggest a higher screening accuracy for COS-Teacher and COS-Researcher in classifying ASD cases versus non-cases (Fawcett, 2006), with an area of 1 representing perfect classification and an area of 0.5 as random results. Cutoff criterion at a fixed level of sensitivity (i.e. 0.8 and 0.9), and the corresponding specificity and OR were also reported. These analyses speak to whether a specific cutoff criterion on either COS-Teacher or COS-Researcher might be informative in clinical practice—specifically, how well it could classify “cases” versus “non-cases” (Grund & Sabin, 2010). ROC analyses help describe the sensitivity and specificity of a cutoff criterion.
Results
Item reduction
Drawn from existing tools, many items had been developed using clinical samples, resulting in items with low variance in scores in community samples (<5% of children getting “1 = occurring rarely/most of the time”). Consequently, 62 of the 84 items were removed due to low variance in the community sample. Spearman’s correlation coefficients were computed for the remaining 22 items, and 9 of them were further excluded due to high collinearity with other items (Spearman’s
Internal reliability
Cronbach’s alpha for COS for the phase 1 sample was 0.91, and internal reliabilities for COS-Researcher and COS-Teacher in phase 2 were 0.88 and 0.89, respectively.
Interrater reliability
ICC estimates were calculated based on a mean-rating (
Cross-informant agreement between teachers’ and researchers’ ratings was assessed by calculating the Pearson correlation coefficient between COS-Teacher and COS-Researcher (Gresham et al., 2010). Results indicated significant correlation between the teachers’ and researchers’ ratings on COS (
Test–retest reliability
Test–retest reliability of COS in phase 1 was calculated based on a random selection of 49 children from the total sample of 304 children, observed again 14–32 days later. The ICC estimate between the observations was 0.73, based on a mean-rating (
Content validity
This refers to the extent to which an instrument measures the targeted construct (Anastasia, 1988). Content validity of the COS was high because the final 13 items were distilled from the preliminary 84 items drawn from prior research and modified by input from experienced clinicians and autism experts. The COS spans stereotypical behaviors during structured learning times and less structured social play times, making it appropriate for the preschool setting and population.
Convergent validity
Convergent validity refers to the extent to which measures of theoretically related constructs are correlated. We assessed convergent validity based on the correlations of the COS with ADOS-2 in phase 1 and with SRS-2 in phase 2 of this study.
Correlation with ADOS-2 in phase 1
The Pearson correlation between the COS total scores and ADOS-2 raw scores showed significant association between the two measures (
Correlation with SRS-2 in phase 2
Correlations between the 13-item COS and 65-item SRS-2 were significant (COS-Teacher: Pearson
Predictive validity
In phase 2, 82 preschoolers (age 4;3 to 5;7,
Contrasting non-ASD and ASD on COS
Table 1 shows the mean scores and standard deviations on the screening scales (COS-Teacher and COS-Researcher) for the clinically assessed children classified into non-ASD and ASD. Univariate ANOVAs revealed group differences on COS-Teacher scores (
Mean scores (standard deviations) on the COS-Teacher and COS-Researcher for the non-ASD and ASD groups among the 82 clinically assessed children in phase 2 of the study.
COS: Classroom Observation Scale; ASD: autism spectrum disorder.
Predicting ASD near the end of year 2 in preschool
Pearson’s chi-square tests (Table 2) showed that the categorization based on the two screening approaches (Figure 2) significantly predicted the classification of ASD versus non-ASD cases (COS-Teacher:
Predictive validity of the two screening approaches in identifying preschoolers with ASD in phase 2 of the study.
COS: Classroom Observation Scale; ASD: autism spectrum disorder; CI: confidence interval; OR: odds ratio; LR+: positive likelihood ratio; LR−: negative likelihood ratio.
Below the 15th percentile on the COS-Teacher and below the median on the COS-Researcher. bBelow the 15th percentile on the COS-Researcher and below the median on the COS-Teacher.
ROC analyses were further conducted separately on the COS-Teacher and COS-Researcher to examine the predictive validity of these scales in discriminating ASD cases in our sample (Table 3; Figure 3). The AUC represents a single-value index of discriminative ability across the full range of cutoffs. Note that an AUC within the range of 0.7–0.9 denotes moderate accuracy, while an AUC above 0.9 indicates high test accuracy.
Receiver operating characteristic (ROC) analyses with areas under the curve (AUC), validity indexes, and cutoff scores for the COS-Teacher and COS-Researcher in predicting ASD cases in phase 2 of the study.
COS: Classroom Observation Scale; ASD: autism spectrum disorder; CI: confidence interval; OR: odds ratio; LR+: positive likelihood ratio; LR−: negative likelihood ratio.

Receiver operating characteristic (ROC) curves for COS-Teacher (left) and COS-Researcher (right) in predicting the diagnosis of ASD based on ADOS-2. Screening accuracy was measured by the area under the ROC curve (AUC).
Both COS-Teacher and COS-Researcher showed moderate accuracy in differentiating ASD from non-ASD cases with AUCs of 0.76 and 0.80, respectively (
These results indicated that (1) COS proved useful for identifying preschool children under age 4.5 years more likely than their peers to have ASD diagnosable about 1.5 years down the road and (2) COS proved to be useful across different types of potential users with little or no clinical training.
Discussion
Our new screening tool for identifying—during the first semester of preschool—children more likely than their peers to have ASD is based on a very simple idea. While severe cases of ASD may be noticed by parents and preschool teachers and readily diagnosed by clinicians early on, milder cases often go undiagnosed until the first or second grade. We use peer interaction without adults hovering around as a naturally occurring “stress test” to identify children who have difficulty navigating the social world they share with their peers—difficulty that may foretell long-term social impairments.
This new screening tool works well for young children: all the children in the validation phase (
The COS developed in our study was easy to use for observers with little or no clinical training. The teachers in preschools were able to use the COS with reliable and valid results to help identify preschoolers under age 4.5 years more likely than their peers to have ASD, after receiving a 30- to 45-min group briefing at their preschools by a member of our research team. The eight research assistants who acted as observers in this study had taken university-level psychology courses and had received only a few hours of training from a clinical psychologist. Yet, they could use the COS to help identify children more likely than their peers to have ASD without knowing the children beforehand. Moreover, the COS has good psychometric properties in terms of reliability (internal consistency, interrater reliability, and test–retest reliability) and validity (convergent and crucially—for screening purposes—predictive validity for meeting ASD diagnosis prospectively) based on data collected from both types of informants, making it a potentially robust screening tool.
The results provided support to the ecological validity of the COS for use by preschool teachers, as well as assistant therapists who major in psychology at the undergraduate level but have not received extensive clinical training on ASD. Importantly, both types of data collection methods are plausible in real-life preschool settings. In cases for which ASD is suspected, preschool teachers can rate the child on his or her peer interaction based on the COS, while at the same time, an independent observer who may not be familiar with the child can rate the child’s social behaviors on the COS prior to a formal clinical assessment by a diagnostician. As seen from our results, although the cross-informant agreement was statistically significant, the medium-level correlation between the two kinds of observers suggested that the teachers and independent observers might likely pick up different aspects of children’s behaviors in their observations. Moreover, the observation by independent observers for each child was conducted during just 2 school days. This method is thus affordable in terms of manpower, time, and cost.
Compared to the 12-item DSM-ASD Scale from the CBCL/1½–5 and C-TRF which consists of 7 items on SCI and 5 items on RRB (Rescorla, Ghassabian, et al., 2019), the 13-item COS derived from data-driven item reduction consists of only 2 items on RRB, while the majority of the items are on social interaction. In their international comparisons of the DSM-ASD Scale scores on the C-TRF, Rescorla, Given, et al. (2019) noted greater societal differences for the RRB than the SCI subscale, revealing less consistency in teachers’ ratings on RRB behaviors across societies. Rescorla, Given, et al. (2019) further speculated that the societal differences might be due to the varying level of sensitivity of teachers to RRB problems, and the degree to which group settings allow preschoolers to engage in these behaviors. In this study, 8 items on RRB were originally included in the 84-item preliminary checklist. Nonetheless, except for the two items eventually retained in the COS, the rest of the RRB items were removed due to low variance in scores in the community sample. This may suggest that RRB behaviors may not be readily picked up by observers in preschool settings in Hong Kong.
Furthermore, it is noteworthy that there is minimal overlap of items between the COS and the DSM-ASD Scale of the C-TRF. In particular, the SCI items of the C-TRF DSM-ASD Scale focus more on social responding behaviors, that is, whether the child responds to others’ initiation of interaction (e.g. “Doesn’t answer when people talk to him or her,” “Seems unresponsive to affection,” “Avoids looking other in the eye”). By contrast, the COS items refer more to the social initiation behaviors of preschoolers (e.g. “Initiates to point out things in the environment to other children or adults,” “Initiates conversation with other children,” “Shows empathy for the feelings of peers and tries to make them feel better,” “Initiates the sharing of toys or food with other children”). Perhaps social initiation behaviors, in contrast to more fleeting social responding behaviors, might be more easily spotted by observers who are not familiar with the child (such as the research assistants in this study), and likewise by teachers who may not have a lot of time to interact with and observe the child. As such, the COS may prove versatile in its utility as a screening tool for ASD in preschool populations, along with other existing ASD screening instruments (e.g. CBCL/1½–5 and C-TRF).
Limitations and implications for future research
We are mindful that the COS cutoff scores (i.e. bottom 15% of our full sample) can only be used by preschool teachers and clinicians and their assistants as references for the time being. Moreover, the wide CIs obtained for the ORs of the two screening approaches in predicting ASD diagnoses indicated uncertainty in the estimates, and thus, the results here should be interpreted with caution. To further validate the results and to establish normative cutoffs for this new screening tool, future studies will need to use larger random and representative norming samples.
To address the ethical concerns of giving clinical assessments (e.g. ADOS-2) to too many children without clinical referrals (and being pragmatic about financial constraints), we decided to give ASD assessments to all the screen-positive children (
In our study, we found that at least 14 children in our sample of 322 (4.3%) were diagnosed of ASD. This rate was higher than the CDC (2018) estimate of 1.7% or published preschool estimates around 0.8% (Soke et al., 2017; Wang et al., 2011), perhaps due to (1) random sampling error of our modest sample size, (2) differential parental consent to the study (i.e. parents who were concerned about their child’s development might have been more likely to give consent), and/or (3) under-estimation of ASD cases in the medical record–based surveillance system studies for large populations that contributed to the published prevalence estimates (for a similar point, see CDC, 2012).
The current research design, while pragmatic, could not tell how many of the screen-negative children who were not clinically assessed might turn out to meet diagnosis for ASD. Fortunately, our screening worked quite well: (1) the screen-positive children were much more likely to meet ASD diagnostic criteria subsequently than would be expected by chance and (2) by contrast, only one child in the control group (i.e. randomly selected screen-negative children) turned out to meet ASD diagnostic criteria. To be more confident that the screen-negative children in general are truly without ASD, future studies should consider involving larger random samples of screen-negative children in the clinical assessments.
We waited about 1.5 years in the validation phase to do clinical assessment for ASD because we expected that ASD cases found in community settings such as mainstream preschools would tend to be less severe and hence be more difficult to diagnose in the first year of preschool. Given the positive findings reported here, future evaluation of the COS as a preschool screening tool can consider screening near preschool onset and waiting 1 year or even less to clinically assess the screen-positive children.
More research is also needed to find out how the COS can be more easily and more effectively used. For example, a good user manual with helpful answers for frequently asked questions may suffice to replace in-person briefings for preschool teachers; evidence-based guidelines can inform teachers on how to observe the screen-positive children more closely and effectively, perhaps by using the COS more than once to track the children for a few months to help inform whether clinical assessment is called for. Such continuing surveillance can help reduce false positives based on only one-off screening when the children are quite young (e.g. near the onset of preschool). Also, it may be helpful to use the COS again on children initially screen-negative should concerns become evident. Further psychometric evaluation of the COS with teachers as the raters (e.g. test–retest reliability, interrater reliability, predictive validity for ASD diagnosis) will also provide valuable information on whether teachers as raters without being supplemented by researchers’ ratings will suffice.
Conclusion
This study aimed to develop a convenient COS that can help preschool teachers and observers with little or no formal clinical training to identify in mainstream preschools, with reliable and valid results, children under age 4.5 years more likely than their peers to have ASD. We are mindful that there are good alternate approaches for developing simple ASD screening tools for preschoolers under age 4 years. For example M-CHAT-R/F and the RITA-T (Choueiri & Wagner, 2015) could expand the target age range upward from toddlerhood to early childhood. Nonetheless, the present study constitutes a first step in developing an easy-to-use, reliable, and valid tool to help teachers and healthcare workers capitalize on peer interaction as a naturally occurring stress test to identify, during the first semester of preschool, children more likely than their peers to have ASD. With this COS joining forces with existing screening tools (e.g. CBCL/1½–5 and C-TRF), young children in community settings such as mainstream preschools should stand a better chance of early identification and getting effective intervention for ASD, launching them on a better lifelong trajectory in understanding the social world and developing healthy social bonds.
