Sage Journals: Discover world-class research

Abstract

Complex systems of inflectional morphology provide a useful testing ground for input-based language acquisition theories. Two analyses were performed on a high-density (12%) naturalistic sample of two Polish-English children’s (2;0 and 3;11) and their parents’ use of Polish noun inflection: first, each child’s use of inflectional affixes and their lexical restrictedness was compared with their father’s equalised sample. Second, the children’s spontaneous case-marking errors were analysed in context and measured against type and token frequencies in both parents’ data and the child-directed speech (CDS) corpus. Findings in both analyses accord with constructivist theory: near adult-like knowledge of Polish inflections hiding a range of use that is more lexically restricted than in their caregivers’ speech; low error rates hiding much higher ‘pockets of ignorance’ for specific inflectional contexts; and patterns of error that correspond closely to token/type frequencies in the CDS, though with the older sibling making some errors that were not frequency-based. Potential effects of syncretism, case ambiguity and semantics are also discussed.

Keywords

Polish inflection noun productivity error input constructivist morphology frequency usage

Introduction

The constructivist view of language acquisition claims that children learn language through general cognitive mechanisms, rather than domain-specific generative rules. This involves the gradual accumulation of linguistic constructions via rote memory storage and formation of analogies based on similarity – semantic and phonological – to previously stored constructions. Precisely which constructions are learnt, when and how, is determined by frequencies and patterns in the input as experienced by the child in socially meaningful communication (Bybee, 2010; Goldberg, 1995; Langacker, 2000; Tomasello, 2009).

One sub-domain that has attracted much debate is inflectional morphology. This is because inflectional systems are particularly amenable to adjudicating between theories that are more generativist (i.e. formal abstract rules can explain and predict development) and constructivist (i.e. general learning by storage and analogy can explain and predict development) in principle (Ambridge & Rowland, 2013). As summarised by Ambridge and Lieven (2011), inflectional phenomena represent ‘an ideal test case for the wider debate over whether language – and its acquisition – is best characterised in terms of formal rules that act on variables or analogy across stored exemplars’ (p. 169). Perhaps the best-known example of such phenomena is the English past tense (e.g. Pinker & Ullman, 2002 vs McClelland & Patterson, 2002); however, the relative simplicity of this system, including the dichotomy of regular versus irregular verb forms (Herce, 2019), raises issues relating to representativeness of languages more widely. Morphology in other systems such as Finnish verbs (Räsänen et al., 2016), Jordanian Arabic nouns (Albirini, 2015), Lithuanian nouns (Savičiūtė et al., 2018) and Dinka numerals (Ladd et al., 2009) can be considerably more complex, idiosyncratic and unpredictable than English inflection – yet children come to acquire them. A relatively recent trend in the inflectional acquisition literature is the testing of constructivist-based hypotheses for highly inflected language systems. This trend is summarised by Engelmann et al. (2019) as follows:

Children generally make few errors in inflectional morphology, but errors rates are significantly higher for rarer grammatical contexts (‘pockets of ignorance’).

When comparing two different inflected forms of the same lemma, forms that are low-frequency are more likely to attract error than those of higher frequency.

Many, and often most, errors are a result of either (1) using a higher-frequency form; (2) a ‘near-miss’ inflectional form retaining one categorical property but not another, for example, a verb inflected with the correct number but incorrect person and/or (3) producing an erroneous form according to a different inflectional class, for example, mouse →*mouses.

Lemmas which inflect according to a pattern shared by a high number of other lemmas, that is, of a high phonological neighbourhood density (PND) or type frequency, are less likely to attract error than those with fewer phonological neighbours from which to analogise.

The above PND effect may be weaker for inflected forms that have a higher token frequency; conversely, PND may have a stronger effect on accuracy for low-frequency inflected forms.

The effects of PND and token frequency on accuracy may decrease with age.

These patterns have been observed in various highly inflected languages, lending weight to the fundamental constructivist claims introduced above; once children have learnt a particular construction, it is not immediately applied consistently in all utterances, but is item-based, restricted in its application, and subject to erroneous use or omission.

Polish noun inflectional morphology

The present study investigates Polish nouns, which have only featured partially within the above literature (Dąbrowska & Szczerbiński, 2006; Dąbrowska & Tomasello, 2008; Granlund et al., 2019; Krajewski et al., 2012). The properties of Polish which make it an appropriate language for examining input-based theories are that it is a synthetic language with a particularly complex system of inflectional morphology: nouns inflect according to inherent properties of gender (masculine, feminine, neuter) and declension class (from 12 to 17 depending on theoretical distinction), and context-dependent properties of number (singular, plural) and case (nominative, accusative, genitive, dative, instrumental, locative, vocative – see Table 1) (Swan, 2002).

Table 1.

Summary of Polish nominal case functions (adapted from Dąbrowska & Szczerbiński, 2006, p. 562).

Case	Main use/function
Nominative(nom)	citation form (‘unmarked’ dictionary head-words); subject; addressing someone
Accusative(acc)	direct object of most verbs; after certain prepositions
Genitive(gen)	adnominal modifier (e.g. ‘of X’); object of negated verb; object of certain verbs; after certain prepositions
Dative(dat)	indirect object (e.g. recipient); experiencer; object of certain verbs; after certain prepositions
Instrumental(inst)	instrument / means of performing an action; object of certain verbs; after certain prepositions
Locative(loc)	after certain prepositions (usually of location)
Vocative(voc)	addressing someone (with emphasis)

Because there are two numbers and seven cases, any given noun lemma can have up to fourteen inflectional forms, which are themselves determined by a noun’s gender and declension class (with some irregularities). See Figure 1 for an example paradigm of the masculine (Class III) noun kubek (‘cup’).

Furthermore, Polish noun inflection comprises a high level of syncretism and case ambiguity; the former can be seen in Figure 1 in which three different case-number combinations share the same exponent kubki (nom:pl, acc:pl, voc:pl). The latter occurs where certain word forms can share inflectional patterns of another declension class, leading to false analogies and, therefore, error. See Figure 2 for an example of such case ambiguity (shown in across three phonologically similar paradigms (kubek, ‘cup’; półka, ‘shelf’; łóżko, ‘bed’), which are actually of different genders and declension class.

Figure 1.

Paradigm of Noun Kubek ‘Cup’ (All Case-Number Combinations).

Figure 2.

Phonological Commonalities Among Three Different Declension Classes (Superscript Number Indicates Each Commonality).

Rationale for the present study

Syncretism and case ambiguity thus add a layer of potential confusion to an already complex system for the child learner. Since the constructivist approach is concerned with elucidating the relationship between language usage and language learning, Polish is particularly suitable for examining this relationship; frequency distributions of inflection – including erroneous uses – can be observed in Polish children’s speech to test predictions derived from frequency distributions and systemic properties observed in wider Polish usage. Reference grammars and child-directed speech (CDS) corpora are typically used to represent Polish usage, though an issue can be that these are only a proxy for the child participant’s actual linguistic experience (e.g. Tatsumi et al., 2018), and, therefore, non-trivial differences between theoretical input and actual input may exist. Although some studies investigating the acquisition of morphology have used children’s caregiver speech as a more representative indicator of actual input (e.g. Kirjavainen et al., 2009; Tatsumi et al., 2021), the present study is one of a relative few which have carried out an equalised like-for-like analysis of child-versus-caregiver inflectional productivity (Aguado-Orea & Pine, 2015; Krajewski et al., 2012).

The aim of this study is to contribute to the body of evidence for input-based learning of inflectional morphology with a direct analysis of two siblings’ speech in relation to that of their parents. Two analyses were performed on the data collected: (1) observation of the children’s use of inflections and their relative lexical restrictedness as an estimate of productivity and (2) analysis of spontaneous errors and assessment of how well they can be accounted for by type/token frequencies in CDS or intrinsic properties of the Polish inflectional system.

The data sample

The children are two Polish-English simultaneous bilingual siblings (2;0 and 3;11 at the time of recording) living in the United Kingdom with two parents: the father (Investigator and author) is an advanced L2 user of Polish, at level C1 on the Common European Framework of Reference for Languages (CEFR) scale; see Council of Europe (2022). The mother is a native Polish speaker (equivalent to C2+). The Investigator’s speech can be considered native-like in terms of his use of inflection (see Figure 3 for comparison with the mother) and near-native in his accuracy (21 out of 2733 noun tokens [0.8%] were used in the incorrect form). Both parents speak to the children in Polish at home as the dominant language, though the children are exposed to English via media, family and outside-of-home activity such as playgroups. The younger sibling, ‘Amelia’, attends part-time childcare with a Polish-speaking family, and the older sibling, ‘Seba’, attends English-medium preschool for approximately 18 hours per week. Therefore, the siblings have an approximately similar level of Polish language input as each other, though with a recent increase in proportion of English input for Seba.

Figure 3.

Lexical Restrictedness in Use of Inflections in Mother’s (L1 Polish) and Father’s (Investigator, L2 Polish) Equalised Samples.

A high-density sample (12.3%) of the spontaneous speech of two children and their parents was recorded over 8 days (total recording time 9 h 50 m 28s), calculated on the basis that the children were active for 10 hours a day during this period. This density is relatively high in comparison to the 7% sample in a similar Polish study, which uses the same 10-hour assumption in its calculation (Krajewski et al., 2012, p. 14), though their overall sampling period was longer, at 6 weeks. Recordings were made on an opportunistic basis by the author, that is, when the children happened to be speaking more as opposed to periods of relative quiet. Environments were primarily in the home (living room, bedrooms and bathroom), with some recordings made while playing in the garden or in the car. The predominant source of CDS was the author’s, with the mother’s featuring less – the Investigator (author) produced over seven times more noun tokens than the mother. However, the children’s general input outside of the sample is more evenly distributed across both parents. Transcription was carried out using the CLAN software and the CHAT transcription format (MacWhinney, 2000a, 2000b). During transcription, any errors involving Polish noun or verb inflection, or code mixing, were logged in an Excel spreadsheet inside the complete utterance and with contextual information to enable error analysis. Assistance in transcription was provided by two adult native Polish speakers with degree-level education, and all noted errors were confirmed by another such individual. All Polish nouns and their disambiguated morphological form were extracted from the corpus.

Amelia (2;0) produced 115 noun lemmas in 585 inflected forms, and Seba (3;11) produced 197 noun lemmas in 729 inflected forms. All CDS data for Analysis 1 were taken from the same corpus (author/father = 367 lemmas, 2733 inflected forms; mother = 110 lemmas, 382 inflected forms). For Analysis 2, type frequencies were also derived from this corpus, and token frequency data from the Polish CDS corpus available on CHILDES (Haman et al., n.d.).

Methodology

Analysis 1: productivity and lexical restrictedness

This analysis examines to what extent the children demonstrate productive knowledge of Polish noun inflections. The term ‘productivity’ has a range of definitions (see Barðdal, 2008, pp. 9–19); here, the productivity of a child’s knowledge of an inflectional schema is defined as the degree to which ‘a new lexical item [noun] can be accommodated [. . .] even if the child has not heard a given inflectional form of that noun before’ (Krajewski et al., 2012, p. 10). Once a given inflection has been learnt, constructivist models predict a more gradual process of development than generativist claims of rule-based, consistent, error-free and systematic inflection (see Räsänen et al., 2016, pp. 1706–1707 for a review). The child’s knowledge of a particular inflectional construction is piecemeal, guided by whichever stem-suffix combinations have been encountered in the input, and their relative frequencies. Children learning highly inflected languages such as Polish are likely to demonstrate a more gradual and more item-based developmental profile in their use of inflectional morphology than those learning language systems that have less extensive and/or more regular morphology (Krajewski et al., 2011, p. 834). Therefore, the hypothesis of Analysis 1 is as follows:

1. Each child’s range of inflections is more lexically restricted than that of an older child or adult, that is, the range of noun types (lemmas) to which the child applies inflections is more restricted than that of an adult control (same sample size, same lemmas).

We would expect to see a distribution of inflection such that more lemmas are used in one or two forms only, a fewer lemmas are used in three, four (or more) forms. Furthermore, owing to age difference, a greater degree lexical restriction would be expected in the younger child compared with the older child.

Counterevidence for this hypothesis would be to observe the same or similar distribution of inflectional forms, with the equalised adult speech sample showing a comparable degree of lexical restriction to the children’s speech samples.

Analysis 2: error patterns in relation to CDS

This analysis aims to identify the children’s errors and observe how well they can be explained by properties of the children’s input. For any given inflectional construction, constructivist approaches must take a number of key features into account, including frequency of use, semantics and formal systemic properties (Ellis et al., 2016). This error analysis is chiefly concerned with input frequency, though potentially extraneous variables such as semantic, pragmatic and grammatical context are considered where frequency alone cannot account for an error. The two measures of input frequency used here are token frequency, that is, the number of times a whole stem-suffix form occurs in CDS, and type frequency, that is, the number of different lexical items (types) to which a particular inflectional affix is applied (Bybee, 2013; Saxton, 2017, p. 225). As introduced earlier, constructivist theories claim that constructions that appear at a higher token frequency are stored more strongly in memory (more entrenched) and are more easily and readily retrieved than other constructions with lower token frequency (Ambridge et al., 2015; Bybee, 2013). As regards type frequency, constructivists claim that for items of higher type frequency, for example, that are expressed with the same inflectional affix, a learner is more likely to notice such a pattern and make an analogy with it when she or he needs to use an inflectional form that cannot be retrieved from memory. The analogy is typically considered to be based on form (phonological and grammatical) as well as semantics (Bybee, 2013; Tomasello, 2009). Note: this analysis measures type frequency with respect to a given inflectional schema, that is, an affix with higher type frequency is one with which a greater number of lemmas can occupy its schematic slot, and therefore a more likely candidate morpheme. (In contrast, PND is a related measure taken with respect to a given stem, that is, a stem with a higher PND value is one which has a greater number of other stems sharing formal features and therefore is more likely to follow the same pattern of inflection (Engelmann et al., 2019].). Implications of this decision in regard to interpretation of the results are raised in the Discussion.

Therefore, for Analysis 2 we can state three further hypotheses:

2. Children generally make few* errors in inflectional morphology, but for certain case-number combinations, error rates are significantly higher** (‘pockets of ignorance’).

*Taken by generativists as < 5% by their ‘most stringent acquisition standards’ (Hoekstra & Hyams, 1998, p. 84)

**10% or above, according to a review by Räsänen et al. (2016, p. 1718)

3. When comparing two different inflected forms of the same lemma, forms that are of low token frequency are more likely to attract error than those of higher token frequency.

4. Stem-affix constructions which inflect according to a pattern shared by a higher number of lemmas – that is, of a high type frequency – are less likely to attract error than inflectional schemas shared by fewer lemmas.

Due to the lack of control inherent in naturalistic sampling, there may be some confounding effects observed for some errors captured, such as a target noun form having a low token frequency (Hypothesis 3) and type frequency (Hypothesis 4). Any such instances will be discussed in the analysis.

Counterevidence for the above hypotheses would be (2) observation of low error rates across all case-number combinations, and the few errors captured would (3) appear ‘haphazard’ and show little or no relationship with token frequency, and (4) bear little or no relation to the type frequency of the inflectional form in which they occur.

Results

Analysis 1: Hypothesis 1 (productivity of inflectional forms)

To assess productivity, each child’s sample had to be equalised with the Investigator’s to remove any biasing effects of either sample size, knowledge of lemmas, or inflections attested. This is because the Investigator’s original sample contained more tokens and types than the children’s, and therefore, he had more opportunities to demonstrate inflectional productivity. For each child, a filtered Investigator’s sample was produced by undertaking the following three controls: first, to ensure only known lemmas were considered, any not appearing in the child data were removed from the Investigator’s sample. Second, any remaining items with an inflectional affix not attested in the child’s sample were removed from the Investigator’s sample, and those lemmas for which only one token occurred were removed from the child sample, since any such token could have been rote-learnt without productive knowledge of its affix. Finally, to remove any potential biases resulting from Zipfian frequencies within any given paradigm, the two samples were equalised by taking each shared lemma, in turn, and removing items from the Investigator’s sample so that each lemma (regardless of inflectional form) appeared in both samples the same number of times. The resultant sample of Investigator’s nouns could then be directly compared with those of the child in a robust, like-for-like manner similar to that used in similar studies (Aguado-Orea & Pine, 2015; Krajewski et al., 2012).

As noted earlier, the Mother’s sample was far smaller than that of the Investigator (Father); type:token ratio was 110:382 versus 367:2734. However, being an L2 Polish user, the Investigator’s sample was equalised with the Mother’s according to the above controls to ensure that there is no significant difference in his distribution and use of inflections. Figure 3 confirms that this is indeed the case, with a highly correlated profile, χ² = 2.60, p > 0.75.

The following results show each child’s set of specific inflectional affixes (distinguishing syncretism) and the type frequency of each, that is, the number of different lemmas with which she or he used each affix, compared with that of the Investigator. The mean number of affixes per lemma is also provided for each person. Then, each child’s distribution of paradigm size is represented by stacked column charts showing the proportion of lemmas which are attested in one inflected form, two inflected forms and so on. This provides an approximate profile of each child’s lexical restrictedness of inflection relative to the adult Investigator. For example, if one child’s sample contained the words woda (‘water’ nom:sg), wodę (acc:sg) and wody (gen:sg/nom:pl/acc:pl), then for the lemma woda the inflection score is three, that is, three inflectional forms have been attested. (The child may well know more forms, but this is the range which was shown in the controlled sample.)

Amelia versus Investigator

Figure 4 shows that the two speakers’ distributions of inflections were strongly correlated: Spearman’s r = 0.80. Neither speaker used the full range of possible affixes, and both speakers used inflectional affixes not attested in the other’s sample (prior to necessary removal from the Investigator’s sample).

Figure 4.

Distribution of Type Frequencies (Lemmas) for Inflectional Affixes in Amelia’s Sample and Investigator’s Equalised Sample.

Figure 5 shows that, for the equalised set of 29 lemmas, Amelia used 76% of them in only one inflectional form (mostly nom:sg), compared with the Investigator’s 48%. The Investigator used more of the lemmas in two forms (28%), three forms (21%) and four forms (3%) than Amelia did (14%, 7% and 0%, respectively). The only exception to this trend is Amelia’s use of five forms for – somewhat unsurprisingly due to its high frequency in her input – the lemma mama (‘mother’), for which the Investigator used three forms. Although Amelia’s knowledge and distribution of inflections did correlate strongly with the Investigator’s (see Figure 4), Figure 5 shows that her usage of those inflections is significantly more lexically restricted than the Investigator’s: χ2 = 10.24, p < 0.05.

Figure 5.

Lexical Restrictedness in Use of Inflections in Amelia’s and Father’s Equalised Samples.

Seba versus Investigator

In Figure 6, Seba’s distribution of inflections was even more strongly correlated with the Investigator’s than Amelia’s distribution was: Spearman’s r = 0.93. As seen for Amelia, neither speaker used the full range of possible noun inflections, and both speakers used inflectional affixes not attested in the other’s sample, again prior to necessary removal from the Investigator’s sample.

Figure 6.

Distribution of Type Frequencies (Lemmas) for Inflectional Affixes in Seba’s Sample and Investigator’s Equalised Sample.

Figure 7 indicates a greater lexical restrictiveness of inflections in Seba’s speech than the Investigator’s, using 64% of lemmas in one form only, compared with the Investigator’s 39%. Similarly to Amelia’s profile, his use of two, three and four inflections per lemma was less widespread than the Investigator’s use (25% < 39%, 6% < 11% and 6% < 11%, respectively). This observed difference between Seba and the Investigator, as with Amelia, is significant: χ2 = 9.57, p < 0.05.

Figure 7.

Lexical Restrictedness in Use of Inflections in Seba’s and Father’s Equalised Samples.

The data provide strong evidence that the level of productivity of Polish noun inflections for both children is partial in comparison to their adult caregiver. Second, Amelia’s use of inflections is even more restricted in comparison with the Investigator than that of Seba’s, whose productivity appears marginally more adult-like, yet still significantly lexically restricted. Furthermore, the fact that the children were assumed to know syncretic forms (which were disambiguated and counted as separate items) means that their productivity is almost certainly less than that described in this analysis.

In terms of the particular inflectional affixes captured in the data, both children used most of the range available in Polish, which accords well with 2-year-old monolingual Marysia in Krajewski et al. (2012, pp. 16–17), and with type frequency distributions that correlate well with the Investigator. However, it is the restrictedness in the lexical range to which these inflections are applied, which lends particular support to the constructivist claim of gradual development from piecemeal, item-based inflection (involving retrieval of unanalysed whole forms) to broader adult-like production of inflectional forms by way of analogy (in addition to use of retrieval where appropriate). Where Amelia and Seba may be following these two processes in their use of inflection is investigated in the subsequent analysis.

Analysis 2: Hypothesis 2 (error rates)

This analysis is a post hoc examination of the children’s captured errors in inflectional noun forms in relation to type and token frequencies in CDS. For the former analyses (see Hypothesis 4), CDS frequencies were derived from the parents’ combined speech samples, thereby providing an authentic representation of the actual type frequency distribution in the children’s input. However, for the latter analyses (see Hypothesis 3), the Polish CDS speech corpus (Haman et al., n.d.) was required as a proxy due to its greater size and therefore increased reliability for less common tokens.

An overview of the error rates is presented in Table 2. The error rate is calculated for each case-number context, and is calculated using the following formula, derived from a standard error-rate assessment method (Rowland et al., 2008, p. 8):

$E r r o r r a t e o f c o n t e x t C = \frac{\begin{array}{l} f r e q u e n c y o f e r r o n e o u s \\ t o k e n s u s e d i n c o n t e x t C \end{array}}{\begin{array}{l} f r e q u e n c y o f a l l t o k e n s u s e d i n \\ c o n t e x t C (t a r g e t a n d e r r o r) \end{array}} \times 100$

where ‘Context C’ refers to any semantic or syntactic context that requires a given case-number combination, and ‘tokens’ in this study are specifically Polish nouns.

Table 2.

Nominal inflectional error rates for Amelia and Seba.

Context of use	Amelia			Seba
Context of use	No. tokens	. . .of which errors	Percentage error rate	No. tokens	. . .of which errors	Percentage error rate
nom:sg	382	3	0.8	362	1	0.3
acc:sg	68	2	2.9	106	10	9.4
gen:sg	43	4	9.3	76	11	14.5
nom:pl	30	1	3.3	52	2	3.8
acc:pl	25	–	0	43	2	4.7
loc:sg	14	1	7.1	36	5	13.9
inst:sg	11	–	0	22	7	31.8
gen:pl	2	2	100	17	9	52.9
dat:sg	5	–	0	6	1	16.7
inst:pl	2	–	0	6	–	0
voc:sg	3	–	0	1	–	0
loc:pl	0	–	N/A	2	2	100
dat:pl	0	–	N/A	0	–	N/A
Total	585	13	2.2	729	50	6.9

The children’s overall error rates are low (Amelia = 2.2%; Seba = 6.9%), according to the standards referred to in the introduction. However, both children showed overall error rates higher than that of the 2-year-old monolingual child (0.64%) in Krajewski et al. (2012), suggesting a possible effect of bilingualism on accuracy. Comparing the children with one another, the reasons why Seba’s error rate is over three times that of Amelia may be his age-related larger range of lexical items and phrasal and sentential constructions with which to attempt inflection and potentially err. In spite of low overall error rates, as predicted, there are pockets of significantly higher error rate for certain case-number combinations such as gen: pl (Seba 52.9%), inst:sg (Seba 31.8%), gen:sg (Amelia 9.3%) and loc:sg (Seba 13.9%) – though this tendency was more notable for Seba than Amelia. Some case-number combinations were used only on rare occasions (i.e. less than ten tokens – italicised in Table 2) and so are less revealing when taken in isolation. These findings of ‘pockets of ignorance’ provide support for constructivist theories that the children’s speech is not as error-free as generativist accounts would predict.

Analysis 2: Hypothesis 3 (Error analysis against input token frequency)

This part of the analysis examines two separate constructivist claims about token frequency. The first one considers each child’s errors in comparison with their accurate use of other forms of the same lemma. The relevant claim, as summarised by Engelmann et al. (2019), is that ‘[e]rror rates are more common for individual inflected forms with low token frequency’ (p. 31). To test this, a comparison has to be made between the input token frequency of a form each child failed to produce when required and that of other forms of the same lemma which she or he did produce correctly at other times.

Input frequencies of the different forms of lemmas captured in Amelia’s speech and Seba’s speech are presented in Figures 8 and 9, respectively. The black bars represent each error (the ‘missed’ target form) and the white bars represent other forms Amelia / Seba used accurately. Input token frequency of each form (not distinguishing syncretisms) is derived from the CDS corpus frequencies (Haman et al., n.d.). An asterisk indicates that the missed target form was accurately produced at other times (denominator = number of accurate uses).

Figure 8.

Input Token Frequency of Amelia’s Missed Targets (Black) and Correctly Used Forms (White).

Figure 9.

Input Token Frequency of Seba’s Missed Targets (Black) and Correctly Used Forms (White).

There is a clear pattern in both children’s profiles that conforms to the claim by Engelmann et al. (2019, p. 31): the forms that were used accurately are encountered more often in the input than the target forms which the children failed to produce. This supports the claim that a child is more likely to make an error for specific inflected forms that have low token frequency (e.g. indyka, ‘turkey’ acc/gen:sg) than those forms of the same noun which have higher frequencies (e.g. indyk, nom: sg).

The two anomalies in Amelia’s profile have context-based explanation: the first, malinkami (‘raspberries’, inst:pl), is probably a result of the difference in values between the CDS corpus and their father/Investigator and mother. In the latter sample, Amelia heard the precise form malinkami five times, whereas it was never captured in the entire CDS corpus; in Amelia’s home environment, raspberries frequently feature, and are therefore often referred to. Consequently, the form malinkami is probably more entrenched in her mind than the CDS proxy suggests, especially in comparison to the missed target malinek (gen: pl). The second anomaly concerns the partial error łyżka (‘spoon’, nom:sg), which, having been produced accurately on four other occasions, is treated with greater caution as a possible performance-based slip.

Of the four anomalies in Seba’s profile, two have context-based explanations in which the target form was misarticulated with unclear form: książki (‘books’, nom: pl) was articulated somewhere between książki (target) and książkę (acc: sg), and ręce (‘arms’, nom: pl) was articulated as *ręca (non-adult) with correct ręce produced on four other occasions. More significantly, Seba’s two other anomalies are suggestive of his move away from lexically restricted inflection, as seen in Analysis 1: pingwina (‘penguin’, gen: sg) and koloru (‘colour’, gen: sg) – he produced *pingwinu and *kolora – are archetypal examples of the gen:sg-a versus gen: sg-u error made by children aged 10 years and above (Dąbrowska, 2001). Seba appears to be still learning the distinction, yet has not defaulted to the more frequent forms pingwin and kolor. This is evidence that Seba is not always choosing forms based purely on token frequency, unlike Amelia.

The second part of this analysis addresses the constructivist claim that ‘a large proportion of errors of commission involve the replacement of low-frequency target forms with a higher-frequency form of the same verb [or noun]’ (Engelmann et al., 2019, p. 32). In Figures 10 (Amelia) and 11 (Seba), the input frequency of each missed target form (black) – what she or he should have produced – is compared with that of the erroneous form she or he actually produced (grey). In this case, the frequency of each form is given as a percentage relative to the total frequency of all paradigmatic forms of that lemma. The default form (taken to mean the most frequent paradigmatic form) is indicated with a ‘D’. A spotted bar is added to show the frequency of other inflectional forms where needed for comparison, for example, when neither the erroneous use nor the missed target are the default form.

Figure 10.

Input Token Frequency of Amelia’s Erroneous Uses (Grey) and Missed Target Forms (Black).

Figure 11.

Input Token Frequency of Seba’s Erroneous Uses (Grey) and Missed Target Forms (Black).

Both children’s error profiles follow the same general trend, whereby lower token-frequency forms are replaced with competing higher token-frequency forms of the same lemma, as predicted. Furthermore, in many cases (5/7 for Amelia; 6/9 for Seba), the erroneous choice was also the default form of the lemma. Of those errors that appear to go against the constructivist claim, that is, the specific form used is less frequent than the target form and/or is not the default form, both of Amelia’s (łyżkę, ‘spoon’, acc:sg, and woda, ‘water’, nom:sg) are partial errors (1/4 and 1/5 errors per use, respectively), as seen in Figure 8; it is therefore reasonable to again treat these same anomalies as potential performance slips.

In contrast, Seba’s three anomalies, to the left of Figure 11, are not partial. For these errors, Seba has not opted for the default form, even when its token frequency is considerably greater than his erroneous choice. The errors łóżek (‘beds’, gen:pl), lisów (‘foxes’, gen:pl) and mleku (‘milk’, loc:sg) appear to have been driven by a stronger influence(s) than token frequency of a competing form. The error łóżek may be a result of class ambiguity, in that the two highest frequency forms łóżka (gen: sg, nom: pl and acc:pl) and łóżku (dat: sg and loc:sg) may have misled Seba to assume that łóżek is indeed the uninflected form required by the nominative context in which he used it (a logical conclusion based on the premise of łóżka and łóżku). This is evidence, albeit small, that the acquisition process may be influenced by formal ambiguities between different declension classes in addition to frequency-based effects, though further experimental studies are necessary to fully examine the strength of this claim.

Analysis 2: Hypothesis 4 (error analysis against input type frequency)

This analysis observes the extent to which the children’s errors correspond with type frequency of inflectional suffixes across all paradigms. Engelmann et al. (2019) summarise this phenomenon thus: ‘Errors are less common for verbs [or nouns] that score high on [. . .] type frequency’ (p. 32). Input type frequency, as defined in the introduction, is derived from the CDS of both Investigator and mother in this study’s corpus; the CHILDES corpus was not used because its data do not disambiguate syncretic forms. In Figures 12 (Amelia) and 13 (Seba), the input type frequency of each erroneously used inflectional suffix (grey) and that of its corresponding target inflection (black) is compared. Non-adult overgeneralisations are indicated with an asterisk.

Figure 12.

Input Type Frequency of Inflectional Affix: Amelia’s Erroneous Uses (Grey) and Missed Target Inflection (Black).

Figure 13.

Input Type Frequency of Inflectional Affix: Seba’s Erroneous Uses (Grey) and Missed Target Inflection (Black): Errors 1–9.

The above-mentioned findings are best viewed in conjunction with the previous analysis of token frequency effects, as both variables confound in certain cases, that is, where both frequency measures may predict production of the same erroneous form. For Amelia, the type frequency of six out of eight erroneously used inflections was higher than that of the missed target inflection (by ratios of approximately 1:6–1:9 for four of them, though only narrowly over 1:1 for the other two). This accords well with the constructivist prediction. However, five of the six errors could also be attributed to input token frequency (see Figure 12). Of the two anomalies, one is the same partial error (łyżkę) noted earlier as potentially accidental, and the other (słońce, ‘sun’, nom/acc:sg) is a notable exception, discussed at the end of this section.

For Seba, the same pattern was observed. The type frequency of 12 of out his 18 inflectional errors was higher than that of the missed target inflection (by ratios of 1:2–1:23 for eight of them, though only narrowly over 1:1 for the other four) (Figures 13 and 14). Unlike Amelia, most of Seba’s errors (10 out of the 12) cannot be attributed to input token frequency; eight of them were non-adult overgeneralisations (with token frequency of zero), and the other two (łóżek, ‘beds’ gen:pl, and lisów, ‘foxes’ gen: pl) were adult forms that went against the token frequency trend in Figure 11. Of the remaining six out of 18 errors that do not correspond with higher type frequency, only two correlate with high relative token frequency; the other four anomalies may be explained by local grammatical and contextual reasons, though this remains speculative.

Figure 14.

Input Type Frequency of Inflectional Affix: Seba’s Erroneous Uses (Grey) and Missed Target Inflection (Black): Errors 10–18.

For both children, the errors for which the predictors of token and type frequency do not confound are the most telling about differences between the children’s use of inflection and its correspondence to the input. For Amelia, there are two such errors: her use of słońce (referred to above) corresponds to token frequency and not type frequency, yet elsewhere the non-adult *jabłki (‘apple’, nom/acc:pl) corresponds to type frequency. Therefore, there is some evidence that Amelia has begun isolating inflectional morphemes (at least -ki attested in jabłki). For Seba, there is greater evidence for productivity in his relevant errors: of the 15 such instances, 12 are strongly indicative of type frequency-driven inflection (10 non-adult forms with zero token frequency; two errors łóżek and lisów against the trend noted in Figure 11). Only two out of Seba’s relevant errors (spodnie, ‘shorts’ nom/acc:pl, and kubek, ‘cup’ nom/acc:sg) correspond with token and not type frequency, for which the type frequency of the missed target form was 7.0 and 6.4 times higher than that of the erroneous use, respectively, and therefore may counter the conflicting effects of type frequency.

Discussion

The findings of this study support all four constructivist hypotheses stated in the introduction. First, Analysis 1 demonstrates that while both children used most Polish noun inflections similarly to Polish monolingual child speech data (Krajewski et al., 2012), their range of use of those inflections was more lexically restricted than that of their caregiver, even with equalised lemmas and sample size. Those lemmas for which the children exhibited a greater size of paradigm were those which were more common in the input, thus supporting the theory of a gradual item-based nature by which children learn and use inflections earlier in development, as opposed to rapid learning of an inflection used with adult-like productivity.

Second, Analysis 2 demonstrates the existence of ‘pockets of ignorance’ among otherwise low rates of inflectional error (Rubino & Pine, 1998), a finding which complements the lexical restrictedness seen in Analysis 1. Third, all constructivist-based predictions about inflectional errors (summarised in Engelmann et al., 2019, pp. 31–33) were observed in the children’s data: all errors either (1) involved a target of lower token frequency; (2) involved replacement by a more frequent cell from the same paradigm; (3) were ‘near-miss’, deviating from the target form by one property (number, case or declension class, for this study) and/or (4) represented overgeneralisation of an inflectional construction to produce a non-adult form. Fourth, where predicted effects of token and type frequency do not confound, that is, they predict different erroneous inflected forms, Amelia (2;0) produced forms with higher token frequency, and Seba (3;11) produced forms with higher type frequency, indicating an age-dependent difference in children’s use of inflection whereby younger children reproduce whole stem-suffix inflected forms and older children analyse and use inflectional suffixes in a productive manner when a particular stem-suffix combination has not been retrieved from memory. This is further evidenced by Seba’s larger number of non-adult overgeneralisations than Amelia’s (relative to individual sample size).

Although the above conclusions are supportive of broad constructivist and input-based theories, the spontaneous nature of naturalistic data necessitates a degree of caution. This study has focussed on frequency as the primary determinant, with ad hoc contextual explanations of some errors resulting (arguably) from properties of the Polish inflectional system. Further factors such as semantics of the stem and function of the case-affix could not be controlled or entirely accounted for in this study, though for the most part any such effects appear to have been masked by token and type frequency for the spontaneous errors captured.

Another limitation is that PND will have been an additional factor in the children’s use of inflection, especially Seba’s. Analysis 2 considered the effects of type frequency – understood as the number of candidate nouns that can occupy a schematic inflectional construction (e.g. masculine loc:sg construction ____-ku when inflecting kubek ‘cup’) – and not PND, as the children’s means of analogising inflectional forms. Of course, children can and do analogise inflectional forms based on the properties of the noun stem they wish to use and other similar known inflectional forms (e.g. the number of known items phonologically similar to kubek that inflect in the same way; e.g. czubek ‘tip’ nom:sg → czubku loc:sg). As demonstrated in Figure 2, a related factor that has been relatively underexamined is that of phonological enemies that can lead astray, such as the following:

kub ku loc:sg → kub ek nom:sg

but łóż ku ‘bed’ loc:sg → łóż ko nom:sg and not łóż ek

Indeed, Seba made this precise error in his speech sample. As discussed earlier, the Polish inflectional system consists of a high degree of syncretism and class ambiguity, and errors similar to the above can be frequent at certain ages. Experimental investigation using carefully selected test items is required to examine the effects of both type frequency (as evidenced in this study), and of PND, in a controlled manner.

In conclusion, this study provides strong support for the constructivist theory that a complex morphological system such as Polish noun inflection is acquired by children through general learning mechanisms of rote storage, and analogy based on phonological form, in a manner which is gradual, susceptible to principled error, and intimately linked to patterns and properties of the specific linguistic input as experienced by the child.

Supplemental Material

sj-docx-1-fla-10.1177_01427237221123695 – Supplemental material for Acquiring Polish noun inflection: Two children’s productivity and error patterns in relation to parental input

Supplemental material, sj-docx-1-fla-10.1177_01427237221123695 for Acquiring Polish noun inflection: Two children’s productivity and error patterns in relation to parental input by David Price-Williams and Matt Davies in First Language

Footnotes

Ethical approval

Ethical approved granted at University of Chester prior to research.

ORCID iD

David Price-Williams

Supplemental material

Supplemental material for this article is available online.

References

Aguado-Orea

Pine

(2015). Comparing different models of the development of verb inflection in early child Spanish. PLOS ONE, 10(3), Article e0119613. https://doi.org/10.1371/journal.pone.0119613

Albirini

(2015). Factors affecting the acquisition of plural morphology in Jordanian Arabic. Journal of Child Language, 42, 734–762. https://doi.org/10.1017/S0305000914000270

Ambridge

Kidd

Rowland

Theakston

(2015). The ubiquity of frequency effects in first language acquisition. Journal of Child Language, 42(2), 239–273. https://doi.org/10.1017/S030500091400049X

Ambridge

Lieven

E. V. M.

(2011). Child language acquisition: Contrasting theoretical approaches. Cambridge University Press.

Ambridge

Rowland

C. F.

(2013). Experimental methods in studying child language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science, 4(2), 149–168. https://doi.org/10.1002/wcs.1215

Barðdal

(2008). Productivity: Evidence from case and argument structure in Icelandic. https://ebookcentral.proquest.com/lib/uocuk/home.action

Bybee

J. L.

(2010). Language, usage and cognition. Cambridge University Press.

Bybee

J. L.

(2013). Usage-based theory and exemplar representations of constructions. In Hoffman

Trousdale

(Eds.), The Oxford handbook of construction grammar (pp. 49–69). Oxford University Press.

Council of Europe. (2022). Global scale – Table 1 (CEFR 3.3): Common reference levels. https://www.coe.int/en/web/common-european-framework-reference-languages/table-1-cefr-3.3-common-reference-levels-global-scale

10.

Dąbrowska

(2001). Learning a morphological system without a default: The Polish genitive. Journal of Child Language, 28, 545–574. https://doi.org./10.1017/S0305000901004767

11.

Dąbrowska

Szczerbiński

(2006). Polish children’s productivity with case marking: The role of regularity, type frequency, and phonological diversity. Journal of Child Language, 33, 559–597. https://doi.org/10.1017/S0305000906007471

12.

Dąbrowska

Tomasello

(2008). Rapid learning of an abstract language-specific category: Polish children’s acquisition of the instrumental construction. Journal of Child Language, 35, 533–558. https://doi.org/10.1017/S0305000908008660

13.

Ellis

N. C.

Römer

O’Donnell

M. B.

(2016). Usage-based approaches to language acquisition and processing: Cognitive and corpus investigations of construction grammar. Wiley.

14.

Engelmann

Granlund

Kolak

Szreder

Ambridge

Pine

. . . Lieven

(2019). How the input shapes the acquisition of verb morphology: Elicited production and computational modelling in two highly inflected languages. Cognitive Psychology, 110, 30–69. https://doi.org/10.1016/j.cogpsych.2019.02.001

15.

Goldberg

A. E.

(1995). Constructions: A construction grammar approach to argument structure. University of Chicago Press.

16.

Granlund

Kolak

Vihman

Engelmann

Lieven

E. V. M.

Pine

J. M.

. . . Ambridge

(2019). Language-general and language-specific phenomena in the acquisition of inflectional noun morphology: A cross-linguistic elicited-production study of Polish, Finnish and Estonian. Journal of Memory and Language, 107, 169–194. https://doi.org/10.1016/j.jml.2019.04.004

17.

Haman

Etenowski

Łuniewska

(n.d.). The Polish frequency list of child-directed speech. Child Language Data Exchange System. https://childes.talkbank.org/access/Slavic/Polish/Polish-CDS.html

18.

Herce

(2019). Deconstructing (ir)regularity. Studies in Language, 43(1), 44–91. https://doi.org/10.1075/sl.17042.her

19.

Hoekstra

Hyams

(1998). Aspects of root infinitives. Lingua, 106(1–4), 81–112. https://doi.org/10.1016/S0024-3841(98)00030-8

20.

Kirjavainen

Theakston

Lieven

(2009). Can input explain children’s me-for-I errors? Journal of Child Language, 36(5), 1091–1114. https://doi.org/10.1017/S0305000909009350

21.

Krajewski

Lieven

E. V. M.

Theakston

A. L.

(2012). Productivity of a Polish child’s inflectional noun morphology: A naturalistic study. Morphology, 22, 9–34. https://doi.org/10.1007/s11525-011-9199-0

22.

Krajewski

Lieven

E. V. M.

Theakston

A. L.

Tomasello

(2011). How Polish children switch from one case to another when using novel nouns: Challenges for models of inflectional morphology. Language and Cognitive Processes, 26(4–6), 830–861. https://doi.org/10.1080/01690965.2010.506062

23.

Ladd

D. R.

Remijsen

Manyang

C. A.

(2009). On the distinction between regular and irregular morphology: Evidence from Dinka. Language, 85(3), 659–670. https://doi.org/10.1353/lan.0.0136

24.

Langacker

R. W.

(2000). Grammar and conceptualisation. Mouton de Gruyter.

25.

MacWhinney

(2000a). Tools for analyzing talk: The CHAT transcription format. https://doi.org/10.21415/3mhn-0z89

26.

MacWhinney

(2000b). Tools for analyzing talk: The CLAN program. https://doi.org/10.21415/T5G10R

27.

McClelland

J. L.

Patterson

(2002). ‘Words or rules’ cannot exploit the regularity in exceptions: Reply to Pinker and Ullman. Trends in Cognitive Sciences, 6(11), 464–465. https://doi.org/10.1016/S1364-6613(02)02012-0

28.

Pinker

Ullman

M. T.

(2002). The past-tense debate: The past and future of the past tense. Trends in Cognitive Science, 6(11), 456–463. https://10.1016/S1364-6613(02)01990-3

29.

Räsänen

S. H. M.

Ambridge

Pine

J. M.

(2016). An elicited-production study of inflectional verb morphology in child Finnish. Cognitive Science, 40, 1704–1738. https://doi.org/10.1111/cogs.12305

30.

Rowland

C. F.

Fletcher

S. L.

Freudenthal

(2008). How big is enough? Assessing the reliability of data from naturalistic samples. In Behrens

(Ed.), Corpora in language acquisition research: History, methods, perspectives (pp. 1–24). John Benjamins.

31.

Rubino

R. B.

Pine

J. M.

(1998). Subject–verb agreement in Brazilian Portuguese: What low error rates hide. Journal of Child Language, 25(1), 35–59. https://doi.org/10.1017/S0305000997003310

32.

Savičiūtė

Ambridge

Pine

J. M.

(2018). The roles of word-form frequency and phonological neighbourhood density in the acquisition of Lithuanian noun morphology. Journal of Child Language, 45, 641–672. https://doi.org/10.1017/S030500091700037X

33.

Saxton

(2017). Child language: Acquisition and development (2nd ed.). SAGE Publications Ltd.

34.

Swan

O. E.

(2002). A grammar of contemporary Polish. https://www.researchgate.net/profile/Oscar_Swan/publication/41495047_A_Grammar_of_contemporary_Polish/links/5447abf70cf2f14fb8120f35.pdf

35.

Tatsumi

Ambridge

Pine

J. M.

(2018). Testing an input-based account of children’s errors with inflectional morphology: An elicited production study of Japanese. Journal of Child Language, 45, 1144–1173. https://doi.org/10.1017/S0305000918000107

36.

Tatsumi

Chang

Pine

J. M.

(2021). Exploring the acquisition of verb inflections in Japanese: A probabilistic analysis of seven adult–child corpora. First Language, 41(1), 41–66. https://doi.org/10.1177/0142723720926320

37.

Tomasello

(2009). The usage-based theory of language acquisition. In Bavin

E. L.

(Ed.), The Cambridge handbook of child language (pp. 69–87). Cambridge University Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB