Abstract
Keywords
Introduction
Promoting academic expectations that are inappropriately aligned with a child’s development creates unrealistic levels of achievement for young children. As the problematic No Child Left Behind legislation saw many children indeed left “behind,” American educators and policy makers advocated for more consistent and rigorous standardized learning benchmarks for younger children, resulting in the Common Core State Standards for Kindergarten to Twelfth Grade in 2010. While these standards represent a widespread initiative to try to close the achievement gap and provide accountability through test scores, many early childhood advocates question their appropriateness and whether or not they are achievable.
Many policy and decision makers appear to be obsessed with the educational idea that sooner is better when it comes to learning core knowledge. Jean Piaget (1896-1980), an influential Swiss child psychologist, referred to the American fixation that children should learn “sooner and faster” as “the American disease” (Guddemi & Zigler, 2011). Teaching academic tasks to children at earlier ages will
A tool that addresses this need is the Gesell Developmental Observation–Revised (GDO-R) which is an instrument that evaluates a child’s performance on a series of developmental and academic tasks in relation to the sequential ages and stages of typical child development in the cognitive, motor, language, and social/emotional/adaptive domains. The results of the GDO-R reveal a child’s overall Developmental Age and Performance Level Ratings in each of the four domains at a point in time. Developmental age is an age in years and half years which best describes a child’s collective behavior and performances on a developmental scale. A Developmental Age may differ from the child’s chronological age, being lower, higher, or the same. Knowing each child’s Developmental Age enables educators to customize developmentally appropriate academic experiences and expectations to best meet the learning needs of every child.
Arnold Gesell, PhD and MD, developed an assessment of human development, identifying the ages and stages of child development based on his maturationist theory (Gesell, 1925). He published the original Gesell assessment, known today as the GDO-R. It was updated in 1940 and 1965. In 1979, Ames, Gillespie, Haines, and Ilg published Gesell Institute’s
This article presents the psychometric results for each of the tasks on the GDO-R, indicating the typical ages at which specific developmental and academic tasks, needed for realistic, developmentally appropriate success in kindergarten, are mastered. Reliability and validity evidence are also reported to support the continued use of the GDO-R as a developmental assessment for children aged 3 to 6 years. Three tasks are discussed to illustrate the need to establish effective and appropriate academic goals based on a child’s developmental assessment results.
Literature Review
Experts have stressed the significance of the kindergarten year as it relates to the child’s development and the child’s ability to succeed within the school environment. Kindergarten sets the tone for learning and future school success (Black, 2008; Guddemi & Zigler, 2011). Embarking on new learning creates numerous opportunities for the development of the child not only in areas of cognitive, social, emotional, and physical growth, but also as an individual within a community. Life-long, vital skills are acquired through the learning opportunities presented within the kindergarten environment.
With an increased emphasis placed on rigorous new standards and accountability, educators and parents are faced with new challenges relating to school readiness and the kindergarten curriculum. Schools play an important role in readiness; however, various schools have different expectations regarding readiness. A child may be considered prepared for one school environment and not prepared for another based on that particular school’s expectations for readiness (Maxwell & Clifford, 2004). Parents and educators are concerned due to the increase of pressures and demands within early learning environments. The academic expectations of today’s kindergarten are set similar to the achievement levels of first grades 20 years ago (Almon & Miller, 2011; Miller & Almon, 2009).
Although there is much discussion related to the readiness of incoming kindergarten children, it is a school’s responsibility to educate children who are legally of age to attend school. Most states require children to attend school by a certain age regardless of their readiness or skill level. On the flip-side, there is also a need for schools to be
Despite a national focus on early childhood education, current research suggests that educational gaps continue to exist and that achievement gaps occur prior to the beginning of elementary school (Langham, 2009). It has been suggested that high-quality early education in combination with high-quality kindergarten through third-grade programs plays a critical role in attempting to close educational gaps and potentially contributes to enhancing the child’s development, school readiness, and future school success. Long-term effects of a quality pre-kindergarten experience can affect grade retention, placement, special education, and school dropout rates (Barnett, 1993; Campbell, Ramey, Pungello, Sparling, & Miller-Johnson, 2002; Mead, 2008). Unfortunately, not all children have the opportunity to participate in a
A consistent characteristic of high-quality pre-kindergarten and kindergarten programs is the reliance on developmentally appropriate practices (DAPs) for each child based on each child’s needs. DAP is defined by the National Association for the Education of Young Children (NAEYC) as knowing where a child is developmentally, providing unique experiences based on his or her stage of development that are both challenging and achievable, and possessing knowledge about how young children learn. The organization supports educators in “promoting young children’s optimal learning and development” (Copple & Bredekamp, 2009, p. 16). NAEYC’s DAP encourages educators to provide learning opportunities that will enhance all areas of a child’s development and to understand that a child’s development follows a well-documented, sequential order. DAP is based on being aware that each child develops at his or her own unique rate and that learning opportunities need to be challenging, but within the child’s ability. Understanding a child’s development is key to setting expectations that are appropriate and to planning curriculum that meets the child’s needs and abilities. Early educators need to adhere to methods and practices of teaching that foster a child’s development with learning being concentrated in all areas of development—cognitive, social, emotional, language, and physical (Kagan & Reid, 2009).
To plan DAP, assessing where a child is on the path of development is essential in determining what experiences a child is ready for. This use of readiness assessments should never exclude children from learning opportunities, but rather help determine how and what educational and learning experiences should be developed and/or modified to meet the child’s developmental level (Gullo, 2005). Through documentation and assessment, an educator is better able to understand the child. These instruments offer insight into a child’s development and his ability to learn, making learning visible to the educator (Seitz, 2008). As learning is multidimensional, it is important for an assessment to be used as a tool to help educators better understand children, their development, and how they learn (Tomlinson, 2008).
About the GDO-R
The GDO-R is a standardized, performance-based, criterion-referenced developmental assessment tool. It is designed for children from 2½ to 9 years of age and is used to inform educators and parents about a child’s progress on developmental continuums. This information helps to set appropriate expectations for performance as well as instruction for children based on their developmental stage or level. When combined with the Parent/Guardian Questionnaire (PQ) and the Teacher Questionnaire (TQ), the GDO-R functions as a comprehensive assessment system. The GDO-R can also help determine whether or not a child may need further diagnostic evaluations to suggest appropriate planning or remediation in specific areas of development.
The purpose of the GDO Study was to provide updated technical data and reliability evidence for 17 of the 19 original tasks on the ©2007 GDO (see Table 1); two tasks that were intended for assessing older children, Right and Left and Visual III, were omitted. Another purpose of the GDO Study was to define Overt Behavior (Task 20) and to strengthen the social/emotional/adaptive domain (Task 21). The criterion that shaped this study, and subsequent outcomes, was based on three sources of information:
GDO-R Tasks.
Method
The GDO study consisted of several sub-studies designed to collect both quantitative and qualitative data. Quantitative data were collected on children in seven age bands (spanning ages 3-6 years) and is reported here. Examiners administered a total of 167 items in one-on-one sessions with children. Data were also collected for each child from the teacher’s observation of the child in the classroom (45 items) and from the parent’s observation of the child at home (78 items).
Qualitative data were collected on two developmental tasks on the GDO, the Copy Forms and Incomplete Man, as part of the Gesell Institute National Lecture Staff (NLS) Review Study. The purpose of the NLS Review Study was threefold: to collect data on the
GDO Study Timeline
The GDO Study was completed over the course of 3 years following American Educational Research Association (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) standards. A timeline is presented in Figure 1 to provide context and sequence for the data activities and analyses of the GDO Study. The steps in the timeline are as follows.

Overview of GDO study timeline.
Review of Child Development Literature
A comprehensive review of child development literature on observation, assessment methodology, and assessment instruments for children ages 2½ to 9 was undertaken. The GDO instrument was evaluated as a whole based on appropriateness of developmental tasks to early educational goals for young children, culturally sensitive measurement approaches, and consistency with developmental capabilities of children between the ages of 2½ to 9.
Content Validity
After a careful review of the literature and with the results of a Bias Review, the research team felt the items on the GDO instrument remained relevant with the exception of two items. The GDO Study was designed to collect updated technical and baseline data using the ©2007 GDO as the basis of the investigation to renew the reliability and validity evidence for the instrument. In addition, the PQ and the TQ were developed to address the social, emotional, and adaptive development of the child.
Procedures for Reducing Bias
The five experts who reviewed the GDO for bias also evaluated content and age appropriateness for each task. The team was selected from the fields of early childhood education, special education, physical movement, and test development. Each reviewer was asked to respond to a set of specific questions, to identify any biases inherent in the content or methodology of the GDO. The bias review questions can be found in the full Technical Report (Gesell Institute of Child Development, 2012) at www.gesellinstitute.org.
Several GDO tasks raised issues due to a current lack of cultural relevancy. One item within the Interview task asked the child about his or her most recent birthday celebration, and ability to recall presents that he or she received. This question was omitted from the Naming Animals because it failed to allow for the breadth and complexity of birthday celebrations in different cultures and by families with lower socioeconomic circumstances. Another item in the Interview task prompts the child to name animals, presuming all children have had the experience of visiting a farm or zoo. This question was revised to “
Online User Survey
Prior to the commencement of the study, an online survey of GDO users was conducted to collect information on how customers use the GDO. A sample of
Focus Group
A focus group held at a Massachusetts preschool was moderated by a member of the Mid-Continent Research for Education and Learning (McREL) research team. The purpose of the focus group was to gather information on qualitative improvements to the GDO that would be most meaningful to examiners, in addition to the updated technical data, which were collected to renew the validity of the instrument.
TQ and PQ
It was recommended by all reviewers that the GDO-R include a measure of emotional regulation and social behavior by surveying the child’s teacher and parent or guardian. After reviewing the literature on emotional, social, and adaptive behaviors and examining multiple existing parent and teacher questionnaires, the TQ and PQ were developed for the GDO Study.
Procedures for Recruiting Examiners and Distributing Materials
Each site that enrolled in the study signed an agreement and secured parental consent forms for each participating child (Gesell Institute of Child Development, 2012).
Trained examiners administered all designated items on the GDO to children within the study age band (see Table 2) in individual assessment sessions. A standardized script guided the examiner in the administration of each task. Examiners in the study did not score or determine a Developmental Age. Data were collected from three sources at each school: the child (GDO tasks-GDO), the parent or guardian (PQ), and the child’s teacher (TQ).
Chronological Ages of Children Included in Each Age Band.
All GDO assessment forms and study materials were provided to each school free of charge, including return shipping and handling. A Training DVD was given to each site to standardize the training for all examiners in the study. A conference call was held between each school’s examining team and the Gesell research team to review study protocol, answer questions, and offer support. Assistance was also supplied by phone, e-mail, and fax throughout the entire course of the schools’ participation in the study. Each site returned hardcopies of all data forms to Gesell Institute for review, validation, cleaning, and data entry.
Research Sample
The research sample was primarily a sample of convenience drawn from a national population of typically developing children attending schools that administered the GDO. A subset of schools from New Haven, CT, also participated in the study. Examiners for these schools were trained at Gesell Institute, as the schools did not currently utilize the GDO but wanted to participate in the study.
The final sample for analysis for the GDO Study included 1,287 children from 53 geographically diverse sites in 23 states. While the assessment is designed for ages 2½ to 9 years, the researchers chose to limit hands-on data collection to the group of children that comprised the largest number of users of the GDO. The chronological age threshold for each age band is described in Table 2. The age bands targeted for data collection were 3, 3½, 4, 4½, 5, 5½, and 6.
Site Sample
The sample included a diverse group of sites in terms of the type of school, region of the country, size, and population served (i.e., ethnicity and percentage eligible for free and reduced lunch). Refer to Figure 2 for sample distribution by state and Tables 3 to 5 for demographics of participating sites.

Sample distribution by state.
Descriptive Statistics for Participating Sites: School Type.
Descriptive Statistics for Participating Sites: U.S. Region.
Descriptive Statistics for Participating Sites: Ethnicity and Socioeconomic Status.
Examiner Sample
One hundred and one trained GDO examiners, with a mean of 12 years teaching experience, collected GDO data from children at preschool and elementary schools across the nation. The group of examiners, with a mean of 7 years GDO experience, received GDO training in one of two ways: (1) in the last 5 years through a 3-day workshop or (2) a 1-day training session at the Institute. Of the examiner sample, 75% were currently teaching and 25% reported that they were retired, no longer teaching in a classroom, or were volunteer examiners. Approximately 88% of the GDO study examiners have a bachelor’s, master’s, or doctoral degree. Refer to Table 6 for examiner’s level of education.
Examiner Demographics: Level of Education.
Data Validation and Entry Procedures
Data were systematically reviewed for completion, accuracy, and any possible serious administration errors prior to being entered into an electronic survey gizmo file. Unusable data were put aside. Ten percent of the data entered in each of the data sets (GDO, TQ, and PQ) was checked by a team of interns and any keystroke or scoring errors were corrected. A subsample of data collected in New Haven, CT, by a team of subcontractors was also checked for accuracy and reliability by members of the research team.
A total of 1,363 GDO assessments were submitted to Gesell Institute for evaluation. After careful review for accuracy, proper administration, and age requirements, a final sample of 1,287 GDO assessments were used in the final analysis. Thus, 5% of the sample could not be used and was deleted from the original sample. Over the course of 2 years, communication with new sites about data collection/submission procedures improved significantly, and thus further reduced the number of invalid assessments (i.e., child was too young or old for the study, etc.) that were submitted by each site.
Results
Sample Descriptive Statistics
The final sample used for analysis was a proportional mix of boys and girls in each age band (3.0-6.0 years) from ethnically diverse backgrounds. The number of children in each age band varied (see Table 7).
Overview of Gesell Developmental Observation Study Sample Child Descriptive Statistics by Age Band.
Task Descriptive Statistics by Age Band
The
Descriptive Statistics for Writing Numbers Task by Age Band.
Number of Animals Named by Age Band.
Descriptive Statistics for Counting Task.
Descriptive Statistics for Pellets Task.
Frequency Distribution for Distinguishing Features: Pencil Stroke by Age Band (%).
Frequency Distribution for Distinguishing Features: Pencil Grasp by Age Band (%).
Some tasks/items have been organized into separate tables because they require different statistical operations to best demonstrate the findings. These are as follows:
Mean Scores of Social/Emotional/Adaptive Items.
Item p Values by Task and Age Band
Tables 8 through 29 provide results for the GDO-R items by task and age band. The tasks are described in order of test administration. Some item responses were missing when children did not provide an answer to an item or were not administered an item because they gave incorrect responses to the number of previous items that met the stop rule. Missing item responses were treated as incorrect for these analyses.
The
Solid Expectation (SE)—over 70% of the children could complete the task (dark gray shading)
Qualified Expectation (QE)—50% to 69.9% could complete the task (light gray shading)
Not Yet Expected—under 50% of the children could complete the task (no shading)
The Performance Level Expectations for each task reflect responses of a large group of children of the same age in the sample that were able to complete the task independently,
Content-Related Validity
Content-related validity is evidenced by uniformity between task content and the developmental milestones widely accepted to precede instructional content in each area. To ensure such correspondence for the GDO-R, Gesell Institute conducted a comprehensive review of current child development theory and met with education experts to determine common educational goals and the knowledge and skills emphasized in today’s early childhood curricula. The graphic design of the assessment and its manipulative materials reflect the types of activities found in early childhood classrooms and in children’s everyday lives. An online user survey provided additional information regarding overall assessment effectiveness (addressing such topics as the appropriateness of the criteria for developmental age, ease of administration, and appropriateness for each age). These validation efforts resulted in an assessment that reflects the needs of classroom teachers, children, and parents.
Inter-Rater Reliability
Four NLS members participated in the Qualitative Review Study. Three hold a master’s degree in Early Childhood and/or Child Development, and one a bachelor’s in Child Development. Collectively, the Qualitative Review Study team held over one hundred years of experience administering the GDO and conducting Gesell workshops on topics such as school readiness, parent involvement, and child development.
Inter-rater reliability of the GDO-Revised provides evidence regarding the degree to which Developmental Age can be reliably assigned. The inter-rater reliability study included a subsample of children’s performance on the Incomplete Man and Copy Forms tasks. Table 31 describes the sample used in the inter-rater reliability study. The sample for Incomplete Man was smaller than the sample for Copy Forms, because some children were rated as unable to score by one or both raters.
Inter-Rater Reliability Study Sample.
Inter-rater reliability was calculated by comparing the agreement between the developmental ages assigned by Rater A and Rater B of Team 1 for each task. During Phase 1, Rater A and Rater B of Team 1 rated Copy Forms, while Rater A and Rater B of Team 2 rated Incomplete Man. Inter-rater agreement for assigning overall Developmental Age was calculated for Copy Forms and Incomplete Man samples. Inter-rater agreement was also calculated for each individual Copy Form item in Phase 2.
During Phase 1, for both Incomplete Man and Copy Forms, neither team had access to the child’s chronological age; the raters used only the actual work samples and process sheets of the children in the sample. Inter-rater agreement on developmental age, as measured by the Pearson product moment correlation, was high for both Incomplete Man and Copy Forms (see Table 32). These high correlations provide evidence that developmental age can be reliably assigned by trained raters using the GDO-R.
Inter-Rater Agreement Evidence for Developmental Age.
In addition, each rater was asked to rank order
Finally, to examine the degree to which the Developmental Age assigned by raters corresponded to the children’s actual age (i.e., chronological age), the Pearson product moment correlations between Developmental Age and chronological age were calculated. Correlations were calculated separately for Rater A and Rater B. These correlations were high (range = .78-.82), and in the expected range, providing evidence that the assigned Developmental Ages corresponded closely, but not exactly, to children’s chronological age (see Table 32). Perfect correlations are not expected because of the variation in development between children.
In Phase 2, Table 33 presents results of inter-rater reliability for Copy Forms items. Raters used the same children’s work samples as were used in the examination of the reliability of Developmental Age (Phase 1). However, each team of raters that conducted the inter-rater reliability for Incomplete Man during Phase 1 subsequently conducted the inter-rater reliability for Copy Form items for Phase 2, and vice versa. For item inter-rater reliability of individual Copy Forms items, raters also had access to children’s chronological age, because this is the standard scoring practice. Sample sizes varied by item, because some children were rated as unable to score by one or both raters.
Inter-Rater Reliability for Copy Forms Items.
The results in Table 33 indicate strong correlation between raters for each Copy Form item. In addition, the means and standard deviations for Rater 1 and Rater 2’s scores are very similar. It is important to note that for Cube Face-on and Cube Point-on items, a proportionally large number of children were rated as unable to score by both raters. All children who could be scored received a score of 0, resulting in an inter-rater reliability of 1.00. Cube Face-on and Cube Point-on are some of the most difficult items in the entire GDO-Revised. Very few 6-year-old children in the entire study sample received a correct score on these items.
Limitations
The GDO Study contributes a comprehensive sample of child development data to the educational field at large. As with any study of its size and scope, it has limitations. The distribution of child ethnicity across the total sample more closely resembled the U.S. Census than did the distribution of child ethnicity in each age band. Thus, interpretation of change across age bands could possibly be attributable to a sample shift as opposed to a definitive age shift. In the case of the PQ, the percentage of missing data for child’s ethnicity was strongly mitigated by efforts on the part of the school and research team to gather this information from other school records (as reported by parents). However, since the native language of the child was also derived from the PQ, efforts to collect accurate information on a child’s native language from the school were less fruitful, and resulted in higher percentages of missing data across age bands. In some sites, it was not possible to administer the PQ due to the nature of testing at the site (i.e., GDO-R tests were part of admission protocol in private schools. These schools did not administer the GDO Study PQ because it contained questions that could be perceived to affect a child’s eligibility for school acceptance [special evaluations, services, level of education of parent]). In other sites, bussing of children in urban communities meant that parents did not physically come to the school to return a PQ or may have been reluctant to share such information with the school administration.
While the examiners were trained carefully on the GDO-R task administration, they did not receive recording and coding rubrics to score the following observations of the child during all tasks in the assessment: Paper Position, Head Shift, Body Posture, Non-Dominant Hand Posture, and Eye Movement. This may explain why observational data on these items contain missing cases (examiners did not complete the section of the form). Thus, a shortcoming of this study is that these items cannot clearly be interpreted. However, the Qualitative Review Study and Inter-rater Reliability Study strongly confirm the developmental characteristics of each age band as related to the Copy Forms and Incomplete Man tasks. This is very important because it provides recent validity evidence for these specific developmental tasks and allows for continual improvements to the training of examiners.
Implications
The most valuable implication of this research is that the GDO-R has renewed reliability and validity evidence to support its continued use as a developmental instrument to evaluate growth and development of children aged 3 to 6 years and to inform instruction for developmentally appropriate activities. The results from this study also support the original findings for developmental tasks as originally published by Arnold Gesell (Gesell, 1925). Children are developing and reaching the major developmental milestones at about the same time as they did when Dr. Gesell first started collecting date over a century ago.
A few of the important implications of the research for educators nationwide include the following:
Perceiving oblique lines is a prerequisite to letter formation and writing—two essential expectations in the kindergarten curriculum of today. Building the Gate (Task 1: Cubes) and copying the Triangle (Task 4: Copy Forms) require that the child not only perceive the oblique angle of the cube or the form, but is able to reproduce the structure in 3-D or on paper. The GDO study documents that this developmental capacity is solid only by age 5 (Task 1: Cubes–Gate) and 5.5 (Task 4: Copy Forms–Triangle). Educators must be alert to both variations in chronological age and developmental level to properly balance the pace and sequence of daily learning activities for each child.
Children correctly identify letters in the alphabet in a graduated process that is affected by age, experience, and exposure to the printed word. As such, the average 4.5 year old can successfully identify approximately 12 letters of the alphabet while a year later, at 5.5, they can identify 21 to 22 letters. Educators who attempt to teach writing letters before the age of 5.5 (when most children can perceive and execute the oblique lines of letters) are doing their young students a disservice, which may result in a child internalizing failed attempts at writing before his or her developmental capacity for the task exists. Taking the time to understand how developmental level can be leveraged for teaching will benefit both children and teachers.
Educators who are able to recognize when a child is beginning to conserve 10 or more items will likely find that the child can also begin to succeed at simple calculations which have final answers less than 5 (beginning around 5.5 years and solid expectation by 6). Until a child can conserve item sets of 13 to 20, his or her success at calculations will likely remain the product of memorization or chance, as opposed to concepts of true numeracy.
Conclusion
The results of this study, based on a culturally and socioeconomically diverse sample of children 3 to 6 years of age in seven age bands, provide evidence that children’s performance on developmental and academic tasks, as measured by the GDO-Revised, occurs in a sequential progression of mastery which increases with age. In addition, the results provide evidence that not all children of the same chronological age arrive at each developmental level for the same tasks at the same time. Thus, there exists variation in performance on developmental and academic tasks between children of the same age. Future research should include a more intensive analysis of the data by weighing variables such as child ethnicity, geography, and socioeconomic level to pursue stability in the findings.
It is essential that educators, policy makers, and parents understand the significance of developmental level when setting standards for all children. Because children in kindergarten are at various chronological ages and develop at varying rates, having the same set of standards and expectations for
Utilizing standardized, performance-based instruments to understand a child’s developmental level, cultural and social influences, and individual interests allows for appropriate expectations, relevant goals for learning, and proper accountability in the educational system. Educators can utilize each child’s unique developmental profile to plan curriculum that respects the developmental level and potential of the child by using robust observational methods coupled with comprehensive developmental assessment tools.
The results of the GDO Study presented here strongly support the GDO-R as a reliable and valid developmental measurement tool, confirm the essential role that a child’s developmental level plays in his or her success for learning today, and suggest that having the same expectation for all children at the same time is inappropriate if not impossible.
