Abstract
After more than 100 years of psychological research, sex/gender
1
differences in cognitive abilities are still heavily debated (for reviews, see Halpern, 2012; Hyde, 2014). Spatial and mathematical abilities, in which men are commonly believed to excel, are very well researched. For instance, a male advantage in mental rotation, the ability to rotate complex figures in one’s mind, has been reported in several meta-analyses with effect sizes around Cohen’s
The two verbal abilities, however, that textbooks and review articles typically refer to when claiming the existence of a female advantage are verbal fluency (sometimes also called “word fluency”) and verbal memory (Andreano & Cahill, 2009; Halpern, 2012; Hamson et al., 2016; Hyde, 2014; Kimura, 2000; Miller & Halpern, 2014). Verbal-fluency and verbal-memory tests correlate with general cognitive abilities (Alexander & Smales, 1997; Kraan et al., 2013) and are frequently used in psychological assessments of developmental impairments in children (Gaillard et al., 2003; Pennington & Ozonoff, 1996), impairments and rehabilitation after stroke (Baldo et al., 2006; Barker-Collo & Feigin, 2006), and cognitive decline in dementia (Collie & Maruff, 2000; Zhao et al., 2013).
Verbal Fluency
Verbal fluency refers to the ability to generate (orally or written) as many words as possible that fulfill a certain criterion, normally under time restrictions. The criterion is typically either semantic, also called “categorical fluency” (e.g., naming animals, fruits, etc.) or phonemic (e.g., naming words that begin with a specific letter), also called “lexical/letter fluency.” Virtually all articles that claim women’s/girls’ superiority in verbal fluency refer to a landmark meta-analysis by Hyde and Linn (1988), who examined sex/gender differences in a few verbal abilities. The authors concluded that “speech production” or “verbal production” favored women by
Phonemic Versus Semantic Fluency, Age, Cohort Effects, and Gender of First/Last Author
Heister (1982) found a female advantage when participants were asked to generate words beginning with the letters “S” and “M” (phonemic fluency), whereas no sex/gender differences emerged for naming things that are red or round (semantic fluency). Other studies reported a female advantage in semantic fluency (Acevedo et al., 2000) or did not find a sex/gender difference in either phonemic or semantic fluency (Kavé, 2005). Overall, it is unclear whether a female advantage exists in both semantic and phonemic fluency.
Furthermore, it is unclear at what age the putative female advantage arises and whether it changes across the life span. Some studies suggest a steeper decline in older men compared with women (Maylor et al., 2007; Rodriguez-Aranda & Martinussen, 2006), whereas de Frias et al. (2006) found that the female advantage in semantic fluency was stable between 35 and 80 years. On the basis of semantic fluency data from more than 30,000 individuals (ages 50–84) in 14 European countries, Weber et al. (2014, 2017) showed that women from younger cohorts performed better than women from older cohorts. Sex/gender differences also varied across European countries. Both findings were interpreted to show the impact of better access of women to resources and education (Weber et al., 2014, 2017). So far, it is unclear whether sex/gender differences in verbal fluency change with age or across cohorts.
Finally, Hyde and Linn (1988) found that female first authors reported a stronger female advantage (
Verbal-Episodic Memory
As with verbal ability, there is no unitary definition of verbal memory. Nevertheless, there is a multitude of empirical data on what researchers considered verbal memory. Several studies found better performance in women (Catani et al., 2007; de Frias et al., 2006; Herlitz et al., 1997; P. A. Lowe et al., 2003), and a narrative review concluded that “females show an advantage at verbal memory” (Andreano & Cahill, 2009, p. 260). However, other studies found no sex/gender differences in verbal memory (Munnelly, 2016; Parsons et al., 2005). Meta-analyses on this issue were lacking until recently. Voyer et al. (2021) focused specifically on verbal working memory and found an overall significant female advantage that, however, was practically zero (Hedge’s
Another meta-analysis (Asperholm et al., 2019) investigated sex/gender differences in long-term memory, specifically episodic memory. Long-term memory is typically divided into declarative (explicit) and nondeclarative (implicit) memory; declarative memory comprises episodic memory (i.e., the ability to remember specific events or situations at a particular place at a particular time) and semantic memory (i.e., the ability to remember concepts and facts). Asperholm et al. (2019) investigated sex/gender differences in episodic memory for different stimuli, including images, movies, faces, routes, locations, and verbal content such as words/sentences. Verbal content showed a small female advantage (
Like Asperholm et al. (2019), in the present study, we were interested in episodic long-term memory and thus discarded studies/tasks that primarily assess working memory. In contrast to Asperholm et al., we had a narrower focus on verbal-episodic memory, which we investigated with a broader literature search. That is, we examined exclusively verbal-episodic memory (not memory for routes and locations) and included only studies with neutral stimuli (vs. emotional stimuli) in which participants learned material intentionally (vs. incidentally). The intentional learning of neutral stimuli is a key feature of frequently used neuropsychological tests on verbal long-term memory, such as the California Verbal Learning Test (CVLT; Delis et al., 2000), the Rey Auditory Verbal Learning Test (RAVLT; Schmidt, 1996), or the Wechsler Memory Scale (WMS; Wechsler, 2009). Further in contrast to Asperholm et al., the literature search of the current study also included “gray” literature, such as PhD/master’s theses, to investigate whether sex/gender differences are subject to publication effects. Moreover, the current study examined, for the first time, possible effects of first/last authors’ gender on sex/gender differences in verbal-episodic memory. Finally, we performed these analyses separately for recognition (i.e., when cues are provided for the material that had to be memorized) and recall (i.e., absence or lack of cues) because the female advantage appeared to be consistently larger for recall than for recognition (Asperholm et al., 2019; Voyer et al., 2021). The fact that only 14 and 18 of our 168 included studies overlapped with Voyer et al. (2021) and Asperholm et al., respectively, demonstrates that different aspects of verbal memory were investigated in the current study. Henceforth, we thus use the term “verbal-episodic memory” to refer to the data that were analyzed in the present study and “verbal memory” to refer to verbal memory in general.
Aims and Hypotheses
A female advantage is frequently assumed in verbal fluency and verbal memory. For verbal fluency, this assumption is based on an early meta-analysis by Hyde and Linn (1988) that required an update. For verbal memory, a meta-analysis was missing that focuses specifically on verbal-episodic memory—complementary to two recent meta-analyses about verbal working memory (Voyer et al., 2021) and episodic memory in general (Asperholm et al., 2019). In the present study, we thus aimed to reveal the magnitude of the putative female advantage in verbal fluency and verbal-episodic memory. For both, we additionally examined the impact of potentially modulating factors such as publication year, type of publication (articles vs. PhD/master theses), participants’ age, semantic fluency versus phonemic fluency, recall versus recognition, and gender of first/last author. We hypothesized a female advantage (a) in both verbal fluency and verbal-episodic memory of intentionally learned neutral stimuli (Andreano & Cahill, 2009; Halpern, 2012; Miller & Halpern, 2014), (b) that has increased over the past 50 to 60 years because of better access to education for women (Weber et al., 2014, 2017), (c) that emerges across all age groups but becomes larger in older adults (Maylor et al., 2007; Rodriguez-Aranda & Martinussen, 2006), and (d) that is affected by the gender of the first (Hyde and Linn, 1988) and last authors.
Method
The meta-analysis, including literature search, study selection, data analysis, and presentation of results, was performed following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (Moher et al., 2009) and the recommendations for meta-analyses described by Borenstein et al. (2009). Data analysis was carried out with Comprehensive Meta-Analysis (Version 3.3.070; Borenstein et al., 2014).
Literature search and study selection
Search terms and databases
Between October 22 and 29, 2016, the databases PsychInfo, ISI Web of Knowledge, and PubMed were searched for relevant literature. Between September 13 and 19, 2019, we additionally searched the ProQuest Dissertation & Theses database to identify unpublished PhD and master’s theses. For the search terms and number of identified references, see Table S1 in the Supplemental Material available online. An additional 16 studies were identified through other sources, such as comprehensive literature reviews and references used in previously identified publications. After removing 38,322 duplicates, the remaining 28,305 hits were screened for suitability. Screening comprised reading both title and full abstract. In isolated cases, references were excluded based solely on title, for example, in case the title indicated that the reference was a review or meta-analysis without original data or the topic of the reference was outside the scope of the present meta-analysis (e.g., “Persephone in the Underworld: The Motherless Hero in Novels by Burney, Radcliffe, Austen, Bronte, Eliot, and Woolf”). Some older PhD and master’s theses often did not have abstracts, in which case the whole thesis was screened. Details about the exclusion criteria and procedure during screening is provided in the Supplemental Material.
Study selection: final inclusion criteria
Of the 2,984 references that were included after screening of abstract/title, 72 full texts could not be obtained. The remaining 2,912 references then underwent a full-text search for eligibility. Inclusion criteria were:
Use of phonemic/semantic-fluency and/or verbal-episodic-memory (recognition/recall) tests that comply with the aforementioned definitions of verbal fluency and verbal-episodic memory. Examples for verbal fluency are the Controlled Oral Word Association Test (COWAT; Benton, 1967) or the F-A-S Test (Spreen & Benton, 1977), the Thurstone Word Fluency Test (Thurstone & Thurstone, 1962), or any test in which participants had to generate as many words as possible starting/ending with or containing certain letters and to provide as many examples as possible for a specific category. Not included were data from tests such as finding synonyms or essay writing (which were considered too peripheral for verbal fluency). Anagram tasks were excluded on the grounds that they draw on numerical and spatial abilities (Wilson et al., 1954).
For verbal-episodic memory, we excluded tasks that measured exclusively or predominantly working memory such as digit span forward or backward from the Wechsler Adult Intelligence Scales (Wechsler, 2008). Examples for included verbal-episodic memory tests are the Visual Verbal Learning Test (Brand & Jolles, 1985), the RAVLT, and the CVLT. Logical Memory II and Logical Memory Recognition (remembering a story) from the WMS were included, but not Logical Memory I because this subtest is more related to verbal working memory. If multiple verbal-episodic-memory parameters were provided (e.g., delayed recall, total recall, recall), we retained the total score; otherwise, the provided scores were kept. Learning in all verbal-episodic-memory measures had to be intentional (i.e., incidental learning measures were not included).
For both verbal fluency and episodic memory, we excluded tasks that employed emotional stimuli because they could be confounded with sex/gender differences in emotional processing (Kret & De Gelder, 2012; Stevens & Hamann, 2012). For example, affective semantic-fluency categories such as “pleasant/unpleasant” or “joy/fear” (e.g., Gawda & Szepietowska, 2013a, 2013b) were not included.
Verbal-fluency/episodic-memory stimuli were not presented laterally, that is, to one specific hemisphere. For example, tasks that employed laterality paradigms were not considered because of sex/gender differences in hemispheric asymmetry (Hirnstein et al., 2019).
Verbal-fluency/episodic-memory tasks were not performed simultaneously with other tasks because multitasking abilities might vary across men and women (Hirnstein et al., 2018).
The publication contained quantitative, empirical data (i.e., no reviews, study protocols, meta-analyses), which allowed computation of the effect size and the exact number (or percentages) of male and female participants. Only “pure” verbal-fluency and verbal-episodic-memory measures were included. That is, if covariates such as intelligence had been factored in, the data were excluded. If only aggregate scores were provided from test batteries that included both eligible and not eligible tasks, data were excluded. Finally, when studies reported multiple verbal-fluency/episodic-memory tasks but provided only statistical parameters to compute effect sizes for tests that found significant sex/gender differences—and insufficient statistical parameters for tests that did not find sex/gender differences—the whole study was discarded to avoid introducing a bias toward significant results.
There were at least 10 male and 10 female participants in the sample to mitigate the effect of spurious findings with very small sample sizes.
Participants were healthy individuals without a mental or other condition that could affect verbal-fluency/episodic-memory performance (e.g., depression, Alzheimer’s disease, learning disability) and were not under the influence of any kind of substance, medicine, or other factors that might influence cognitive performance (e.g., sleep deprivation, noise exposure). Data from control groups could be included unless control subjects were selected for specific features (e.g., intelligence, age, socioeconomic status) to match clinical groups.
Participants were not preselected for a specific feature that could potentially be related to verbal-fluency/episodic-memory performance (e.g., participants with certain gene combination or combinations, participants who performed better than average on a creativity test, samples with homosexual participants only).
The publication was written in English, German, or any Scandinavian language.
Cohen’s
For cases in which inclusion criteria were met but the study lacked important quantitative information (e.g., number of men/women/boys/girls, means, or
In total, 496 effect sizes from 168 references were included for quantitative analysis, comprising data from 355,173 participants (men/boys = 178,409, women/girls = 176,764). For a more detailed overview of the study-selection process, including reasons that led to exclusion, see Figure 1. For a complete list of all included references and effect sizes, see Table S2 in the Supplemental Material.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram showing the study-selection process.
Statistical analysis
For each relevant measure from the included references above, standardized differences in means (Cohen’s
Several studies reported multiple outcomes for each sample/subsample. For example, a study could provide data from two different tests that both measure recall. It is likely that those tests were correlated with each other and that the magnitude of that correlation affects the variance and, thus, the likelihood of finding statistically significant results (Borenstein et al., 2009). Because these correlations were rarely reported, we ran each analysis twice: once with
Overall sex/gender effects
First, we computed the overall sex/gender effect separately for verbal fluency and verbal-episodic memory. Then, we computed the overall sex/gender effect for each of the following four verbal-ability measures: phonemic and semantic fluency as measures of verbal fluency and recognition and recall as measures of verbal-episodic memory. One study had aggregated phonemic- and semantic-fluency scores into a combined verbal-fluency score (DeWan, 2006), whereas another had aggregated recognition and recall scores into combined verbal-episodic-memory scores (Rouch et al., 2005). Effect sizes from these studies were thus kept in the overall verbal-fluency/episodic-memory analysis but excluded from the recognition/recall/phonemic/semantic-fluency analysis.
For all these analyses, we provide
Effects of publication year, publication type, age, and gender of first/last authors
To investigate whether sex/gender differences change with publication year (as an indicator for changes over time), vary across publication type (articles vs. PhD/master’s theses), age, and the gender of the first/last authors, we ran a set of metaregressions. Metaregressions have the advantage that they allow investigating the effect of one factor while controlling for a set of other factors (Borenstein et al., 2009). Here again, we assumed that the true effect size varied across studies and thus applied a random-effects model (method of moments). All tests were two-sided and based on
Six covariates were created for the metaregressions: (a) The continuous covariate “publication year” simply coded the year when a reference was published. (b) “Publication type” was a categorical covariate that could either be “published article” or “PhD/master’s thesis.” (c) Age was analyzed with two covariates: “mean age” as a continuous variable, which was either obtained directly from the corresponding reference or, in case that information was missing, computed on the basis of the age range (e.g., an age range of 40–60 would lead to a mean age of 50). If age ranges were provided separately for men/boys and women/girls, we took the youngest and oldest age from either sex/gender. If mean ages were provided separately for women/girls and men/boys, we calculated a weighted overall mean. Using mean age alone, however, has two shortcomings. First, several studies provided only age information such as “>70 years,” which made it impossible to calculate a mean. Second, many studies have enormous age ranges. For example, approximately 20% of studies had age ranges of 40 years and more, which rendered mean age a rather coarse indicator. (d) For this reason, we created a second covariate to examine age effects: “age groups.” This was a categorical covariate, theoretically grounded in the Medical Subject Heading, the standardized vocabulary used in the Medline database for indexing, developed by National Library of Medicine. According to this classification, the following age categories were formed: “child/child preschool” (2–12), “adolescent” (13–18), “adult” (19–44), “middle aged” (45–64), and “aged” (65+). Effect sizes were grouped into those categories using the reported age range of the corresponding study. For example, an effect size based on a sample with an age range of 20 to 27 was classified as adult. An effect size based on an age range of 17 to 40 was coded blank and excluded from the age-groups analysis. As a consequence, the number of effect sizes was substantially higher for mean age (92%, 455/497) than for age groups (51%, 253/497). Although both age measures have their respective shortcomings, we combined both because this allows a reasonable estimate of age effects (see also Voyer et al., 2021). Finally, (e) and (f) were the categorical covariates “first author gender” and “last author gender,” respectively, which was either male or female. In case of single-author studies, this was coded as first author and was not included for analysis of last-author effects.
The categorical covariates described above were dummy-coded in order to be entered into the metaregression. This was done such that published articles, males, and adult served as reference groups for publication type, first/last author gender, and age groups, respectively. We did not include language as a covariate because there were too few non-English reports of data. For comparison, 263 out of 496 effect sizes (53%) were reported in English, whereas the second most frequent language, Dutch, comprised only 40 effect sizes (8%).
We ran a sequence of metaregressions for each verbal ability (i.e., recall, recognition, phonemic/semantic fluency) separately. The first metaregression always included the covariates publication year, mean age, publication type, and first-author gender. This was done to maximize the number of available effect sizes. Age groups was not entered into the first metaregression because of multicollinearity with mean age and because only half of the effect sizes could be assigned to a specific age group (see above). We thus ran a second metaregression that included age group and all significant covariates from the first metaregression as a control (except for mean age because of multicollinearity). Last-author gender was also not entered into the first metaregression because of multicollinearity with publication type: None of the PhD/master’s theses have a last author. Therefore, we ran a third metaregression for published articles that included only last-author gender and all significant covariates from the first metaregression as a control (except for publication type because of multicollinearity).
Results
Overall sex/gender differences
Effect sizes of the most frequent verbal-fluency and verbal-episodic-memory measures are presented in Table 1.
Descriptive Overview of Sex/Gender Differences in Verbal-Fluency and Verbal-Episodic-Memory Measures
Note: Values in brackets represent 95% confident intervals;
Verbal fluency
Assuming perfect independence between multiple outcomes in the same study, we found that the overall effect size was
Assuming perfect correlation between multiple outcomes in the same study, we found that all effects remained significant/nonsignificant:
Verbal-episodic memory
Assuming perfect independence, we found a significant female advantage,
Phonemic fluency
There was a significant female advantage,
Semantic fluency
There was no significant sex/gender difference in semantic fluency,
Recall
There was a significant female advantage,
Recognition
There was a significant female advantage,
Metaregressions for moderator variables
The first set of metaregressions contained the predictors publication year, publication type, first-author gender, and mean age. Assuming perfect independence, we found that all four models explained a significant proportion of between-studies variance: phonemic fluency,
Published articles versus PhD/master’s theses
Published articles consistently reported significantly higher female performance than PhD/master’s theses: phonemic fluency,

Effect of publication type. The asterisk denotes significant difference between published articles and PhD/master’s theses. Central lines represent means of the respective category, and upper and lower lines are confidence intervals. Figures are based on assuming perfect independence between multiple measures from the same sample or subsample.
Gender of first author
Female first authors reported significantly stronger female advantages in phonemic fluency (

Gender of first-author effect. The asterisk denotes significant difference between female and male first authors. Central lines represent means of the respective category, and upper and lower lines are confidence intervals. Figures are based on assuming perfect independence between multiple measures from the same sample or subsample.
Publication year
The female advantage significantly decreased in phonemic fluency (
Mean age
In phonemic fluency, the female advantage became significantly smaller with increasing mean age (
Age groups
A new set of metaregressions was computed that contained age groups and all significant covariates from the first set of metaregressions described above. Mean age was never retained because of multicollinearity with age groups.
The results are presented in Table 2. Age groups as a whole (i.e., with all age categories combined) varied significantly only in semantic fluency,
Descriptive Overview of Age-Group Effects
Note: Values in parentheses represent 95% confidence intervals;
Assuming perfect correlation, we found that all age-groups effects in phonemic fluency (63 effect sizes) and semantic fluency (74 effect sizes) remained significant/nonsignificant. In recall, age groups as a whole remained nonsignificant, but now only the aged subsample had a significantly smaller female advantage than adult (
Gender of last author
A third set of metaregressions was computed for only published articles that contained last-author gender and all significant covariates from the respective first set of metaregressions. Publication type was not included because of multicollinearity. Last-author gender became significant only in semantic fluency (
Discussion
Using a meta-analytical approach, we investigated whether women/girls perform better than men/boys in verbal fluency and verbal-episodic memory with neutral stimuli that were memorized intentionally and which factors moderated the female advantage.
Small but robust female advantage in phonemic but not semantic fluency
Women/girls performed significantly better in phonemic fluency than men/boys (
The overall effect size for phonemic fluency (
The large number of studies and effect sizes in the present meta-analysis allowed testing whether the observed sex/gender difference in semantic fluency depended on the specific category participants were tasked with. The results revealed that men/boys generally named more animals (
Small but robust female advantage in verbal-episodic memory
We found a significant female advantage for verbal-episodic memory, in general, with effect sizes between
The strongest female advantage arose for the CVLT (
Whereas the present meta-analysis together with Asperholm et al. (2019) suggest a small but robust female advantage for verbal-episodic memory, Voyer et al. (2021) demonstrated that the female advantage in verbal working memory is practically zero. The largest female advantage reported by the authors was
The female advantage is small but relevant
By comparison, the female advantage in verbal-episodic memory and phonemic fluency is smaller than in other verbal abilities, such as reading achievement (
Verbal-episodic-memory and phonemic-fluency tasks are frequently used for assessing psychological impairments (Barker-Collo & Feigin, 2006; Collie & Maruff, 2000; Pennington & Ozonoff, 1996). Given that the present study corroborates previous findings that standard tests, such as CVLT (Kramer et al., 2003), RAVLT (Bleecker et al., 1988), and COWAT (Halari et al., 2005), reliably showed a female advantage, this implies that sex/gender should be taken into account when phonemic fluency and verbal-episodic memory are used in the clinical/diagnostic context.
Stronger female advantage in published articles than PhD/master’s theses
We found support for the notion that the female advantage in verbal fluency and verbal-episodic memory is subject to publication bias. First, Egger’s regression and the funnel plots (see Fig. S1 in the Supplemental Material) suggest a “small study effect” for verbal-episodic memory, in general, as well as recall and recognition. That is, especially small studies with significant results favoring women/girls were more likely to be included in our meta-analysis than small studies favoring men/boys. Egger’s regression, however, was not significant for verbal, phonemic, or semantic fluency, which suggests the small-study effect is generally stronger in verbal-episodic memory.
In addition, we found that the female advantage in all four reported verbal abilities was higher in published articles than in PhD/master’s theses. The difference ranged between
First-authors’ gender affects sex/gender difference
The metaregression further revealed that the first-author’s gender affects the magnitude of the sex/gender difference in phonemic fluency, semantic fluency, and recognition, but not recall. Both male and female first authors consistently reported stronger performance for members of their own gender. The effect was in the range of
We also found a last-author effect in semantic fluency in which male last authors reported a significantly stronger female advantage than female last authors. This result is difficult to interpret because the sex/gender effect in semantic fluency is category-dependent, as described above. None of the other three measures (i.e., phonemic fluency, recall, and recognition) yielded significant last-author effects, and thus we refrain from speculations regarding last-author effects in the present study.
No clear cohort or age effects
The female advantage decreased significantly with publication year for recall (when perfect independence between multiple outcomes was assumed), but the effect was small (
Age effects were neither in line with the previously reported stronger deterioration in older men compared with older women (Graves et al., 2017; Kramer et al., 2003; Rodriguez-Aranda & Martinussen, 2006) nor with an inverted U-shaped curve with smaller sex/gender differences in earlier and later life (Asperholm et al., 2019). When the analysis was based on mean age, a significant coefficient (
Semantic fluency was the only verbal domain that showed a significant overall age-group effect: Middle-aged participants (45–64,
Limitations
First, the statistical indicators showed considerable variance. The null hypothesis, according to which there is only one true underlying effect size, was violated in all analyses. To include data from very heterogeneous samples can be considered an asset because it increases the generalizability of our findings. However, although we investigated several moderator variables, there are other potentially relevant factors that we did not examine, such as (a) specific categories for semantic fluency, (b) test language, (c) monolingual versus bilingual participants, and (d) participants’ country/region of origin. The fact that most studies were carried out in the United States and United Kingdom and used native English-speaking participants might hamper generalizability. For example, a recent study did not find that the female advantage in phonemic fluency varied across countries, but only UK, Italy, and Norway were investigated (Moè et al., 2021). However, the female advantage in reading comprehension has been demonstrated to vary across countries (Reilly, 2012; Stoet & Geary, 2013).
Second, we analyzed age effects with two approaches (age means and age groups) that each have their advantages and disadvantages. Age means allowed including more effect sizes at the expense of precision because the single number of age mean becomes meaningless in samples with large age ranges. Age groups allowed examining sex/gender differences in clearly defined developmental periods but at the expense of losing effect sizes that do not fall in an age category. As a result, some of the age groups have very few effect sizes (e.g., two or three), and we thus refrained from interpreting too much into significant differences between specific age groups. Conducting those analyses seemed nevertheless justified, and the lack of clear age effects may in part be due to the complex nature of sex/gender differences across age.
Third, we contacted authors whose work we had already identified as suitable for our meta-analysis and where only key statistical parameters were missing for calculating effect sizes. We did not reach out to authors who simply used tests/tasks that we considered as adequate, and we also did not contact forums or researchers in the field of verbal fluency/memory. We further reached out only to authors who provided contact details in published articles, which were unavailable for authors of PhD/master’s theses. Moreover, we did not include data from Google Scholar because the massive numbers of reference (> 200,000) was simply unfeasible to process. Thus, although the present meta-analysis compiled a large body of data, we might have missed several primary studies.
Conclusion and future avenues
Analyzing data from 168 studies, 496 effect sizes, and 355,173 participants, the present meta-analysis suggests that a small but robust female advantage in verbal fluency and verbal-episodic memory exists. With respect to verbal fluency, the female advantage emerged only in phonemic fluency, whereas sex/gender differences in semantic fluency appeared strongly category-dependent. The female advantage, especially in phonemic fluency, is smaller than previously shown (Hyde & Linn, 1988). However, phonemic fluency and verbal-episodic memory measures are frequently used in psychological/diagnostic settings, which highlights the need for taking sex/gender effects into account. A discussion of how the female advantage arises and what the underlying brain mechanisms are is beyond the scope of the present meta-analysis, but as argued for other cognitive sex/gender differences, we propose that the female advantage emerges from an intricate interaction of biological, psychological, and sociocultural factors (Halpern, 2012; Halpern & Tan, 2001; Hausmann, 2017; Jäncke, 2018).
The female advantage is affected by publication bias in two forms: Published articles reported larger female advantages than unpublished research, and both male and female first authors reported better performance for participants of their own gender. Although we found evidence for the existence of publication bias, it did not fully account for the female advantage reported here.
In general, meta-analyses focusing on cognitive abilities favoring women/girls are rare (for notable exceptions, see Asperholm et al., 2019; Voyer et al., 2007, 2021; Voyer & Voyer, 2014). Apart from including additional factors listed above, future studies should investigate publication bias and first-author/last-author effects in cognitive abilities in which men/boys typically excel (e.g., mental rotation). This has been largely ignored so far. Finally, more studies should adopt a biopsychosocial approach and include more routinely sex/gender-related, nonbinary factors (e.g., sex hormones, self-efficacy, gender stereotypes), and their interactions that might explain individual differences in verbal abilities and other cognitive domains better than sex/gender.
Supplemental Material
sj-docx-1-pps-10.1177_17456916221082116 – Supplemental material for Sex/Gender Differences in Verbal Fluency and Verbal-Episodic Memory: A Meta-Analysis
Supplemental material, sj-docx-1-pps-10.1177_17456916221082116 for Sex/Gender Differences in Verbal Fluency and Verbal-Episodic Memory: A Meta-Analysis by Marco Hirnstein, Josephine Stuebs, Angelica Moè and Markus Hausmann in Perspectives on Psychological Science
Footnotes
Transparency
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
