Abstract
Keywords
Introduction
In one common conception, cultural differences consist largely of differences in behavior. Across societies, differences in communication, cooperation, rituals, mating, eating, work, and leisure are readily observable. Even when we talk about differences of mentality or attitude between cultural groups, we often infer such psychological attributes through the lens of behavior. But there are many aspects of human psychology that are not expressed as overt behavior, being opaque to casual observation and even to controlled experimental study (Lupyan et al., 2023). Inferring the internal cognitive machinery that underpins overt behavior is a challenging “inverse problem”, and one that is intimately tied to our understanding of internal representations, defined here as the informational structures of the mind that model and track the structure of the world (Edelman, 2008). In order to examine true psychological variation, we must understand the structure of variation in internal representations.
The study of
In parallel with the study of individual differences in cognitive style within populations, there has also been a separate but substantial body of work on cognitive style across cultures. This research is nested within a larger body of work on cross-cultural differences in perception and cognition (Henrich et al., 2023), which has uncovered significant variation in domains ranging from memory (Wang, 2021) and spatial cognition (Majid et al., 2004) to moral judgments (Barrett et al., 2016), affect (Wei et al., 2025), motivation (Yanaoka et al., 2024; Zhang et al., 2024), and economic decision-making (Henrich et al., 2005; House et al., 2020). Cross-cultural psychological variation had been explored insufficiently for decades due to the WEIRD (Western, educated, industrialized, rich, democratic) people problem in the psychological and behavioral sciences (Apicella et al., 2020; Barrett, 2020; Henrich et al., 2010). A foundational discovery within this field was the finding that Western people tend to adopt an “analytic” cognitive style, which includes features such as attention to focal objects, ascription of causality to agents, and use of abstract categories, whereas East Asians tend to adopt a “holistic” cognitive style, with features such as attention to relationships among elements in the perceptual field, ascription of causality to situations, and use of relational categories (Choi et al., 1999; Kitayama et al., 2003; Masuda & Nisbett, 2001).
The divergence in cognitive styles occurs not only between East and West, but also among Eastern countries and among Western countries (Kitayama et al., 2009), as well as among regions within countries (Kitayama et al., 2006; Talhelm et al., 2014). These studies have linked the divergence in cognitive styles to historical factors such as subsistence method or exploration of geographical frontiers, each of which are thought to influence social structure (e.g., degree of interdependence), and are linked to socio-psychological variables such as the strength of social norms (Talhelm & English, 2020) or the perceived relationship between self and society (Markus & Kitayama, 1991). Some studies suggest that the key explanatory variable may instead be kinship intensity (i.e., how central kinship is to the formation of personal identity and social relationships) (Schulz et al., 2019). The causal processes linking these societal factors to cognition and perception—how we think about and see the world—are not well understood (Kitayama et al., 2009). At a proximal level, there is evidence of caretaker-to-child transmission of culturally typical cognitive styles via joint attention and shared discourse (Senzaki et al., 2016), but one missing piece in this overall picture is the population-level dynamics that induces systematic divergence across cultures. Much of the work on cultural differences in internal representation has relied on this contrast between analytic and holistic processing, often at the expense of other dimensions of psychological variation—for instance, Bruder and Zehra (2025) find cross-cultural variation in the reported intensity of sensory imagery.
Analytic–holistic cognitive style has been repeatedly deployed as an explanation in the context of cultural variation, but importantly, it does not appear to be able to explain variation among individuals within a culture (Kitayama et al., 2009; Na et al., 2010). Analytic–holistic cognitive style is thus likely to be a group-level trait generated by forces acting upon the cultural group itself as a unit. However, these results conflict with the sizeable literature on individual differences in cognitive style mentioned above (Kozhevnikov, 2007; Witkin & Moore, 1977), which finds substantial variation within samples of individuals with the same cultural background, often for dimensions of variation that are highly similar to analytic–holistic processing (Allinson & Hayes, 1996). This discrepancy motivates a reassessment of cognitive style and indeed the structure of internal representations in general.
In the present study, we employ the Internal Representations Questionnaire (IRQ) (Roebuck & Lupyan, 2020) to investigate cross-cultural differences in the structure of internal representations. The IRQ is an instrument designed to probe individual differences in modalities of thought. In the original study, conducted with a US sample, a factor analysis reveals a 4-modality structure:
Orthographic imagery refers to mental imagery specifically of text or written language. Representational manipulation refers to the ability to dynamically manipulate mental representations regardless of modality, as captured by items such as “I can easily imagine the sound of a trumpet getting louder”. Visual imagery and internal verbalization have been widely studied, but orthographic imagery and representational manipulation constitute novel modalities that are not typically discussed in the literature. Importantly, responses to the IRQ also predict performance on a cue-target matching task in a modality-selective manner, confirming predictive validity with respect to behavioral consequences.
Although the IRQ reveals notable findings about the population structure of internal representations, it is unclear how much of this structure can be attributed to the effect of genes, culture, or idiosyncratic experience. For example, if there turned out to be large variation in internal representations between cultures while controlling for genotype, and less variation between ancestry groups or genotypes while controlling for cultural upbringing, this would suggest a larger effect of culture than of genes (although the two will often be correlated). It would also offer hints about the plasticity of internal representations, as cultural evolution occurs more rapidly than genetic evolution (Richerson et al., 2010). Such an understanding of the sources of variation in internal representations can help us assess, among other things, the extent to which internal representations are amenable to interventions in settings such as education, professional training, or psychotherapy. A cross-cultural analysis is thus an important first step in understanding the status of internal representations.
In particular, we collected data from Japan and China, two populations that differ from the original US sample across various social and cultural dimensions. Our hypothesis pertains to differences in writing systems (Handel, 2019). Whereas English is written in a phonetic alphabet, Chinese writing is logographic and thus attributes both a semantic meaning and a sound to each character. For example, the English word “river” is formed by concatenating 5 graphemes, each with their own phonetic representation, whereas the Simplified Chinese (Hanzi) character for river, 河, is a unitary grapheme comprising elements that cue meaning and sound. The vast majority of Chinese characters represent some meaning (often more than one meaning) on their own, rather than represent meaning only when joined with other characters. Japanese writing makes heavy use of Chinese-derived logograms. But whereas Chinese logograms are organized in a one-to-one mapping between character and sound, Japanese logograms are commonly associated with multiple sounds. Moreover, the Japanese writing complements its logographic system with two additional phonetic syllabaries, yielding a hybrid script that shares features of both the English and Chinese systems.
We test the hypothesis that differences in writing systems explain differences in the structure of mental representations—an idea contemplated by thinkers ranging from Leibniz (1697) to McLuhan (1962). The IRQ identifies orthographic imagery as a modality of thought, and this modality is quite plausibly impacted by orthographic input from the cultural environment. Neuroimaging studies find differences in the profile of neural activation between Chinese and English reading (Perfetti et al., 2013; Wu et al., 2012), and there are several behavioral and cognitive differences that emerge during acquisition of these writing systems as well (McBride, 2016). Because reading reorganizes the brain not only in areas that directly subserve reading but also in areas devoted to other functions through downstream effects, for example face perception (Dehaene et al., 2015), the effect of exposure to a given writing system may extend beyond orthographic imagery into other modalities of internal representation as well. Due to its relative recency in human history, literacy is generally acknowledged to be a cultural adaptation rather than a genetic one. Therefore, an understanding of how writing systems shape internal representations can help us understand more generally how culture shapes the human mind, also moving beyond the sometimes simplistic identification of East and West with Holistic and Analytic thinking respectively.
Materials & Methods
Preregistration
We preregistered a number of details about the study and its analysis following the format of AsPredicted https://aspredicted.org. Preregisterations were submitted twice—once during collection of the Japanese data (before anyone was able to see the data), and once more prior to collection of the Chinese data. Both submissions are made public at the Open Science Framework at the URL https://osf.io/nxmg2/registrations.
The Japanese preregistration specified details such as exclusion criteria for participants, expected sample size, the procedure of questionnaire administration, and the translations of the items. With respect to the analysis, we stated that we would follow the same analysis as (Roebuck & Lupyan, 2020), and do so in the analysis below. After verifying the predicted statistical differences in observed scores across IRQ factors, between the new Japanese sample and the previous US data collected by Roebuck and Lupyan, we subsequently decided to extend the analysis by collecting additional data from a group that makes even more extensive use of logographic writing, namely Chinese Hanzi users in the People’s Republic of China. The Chinese preregistration was thus written with knowledge of the findings from the Japanese sample, as indicated in the preregistration. In the Chinese preregistration, we included several predictions for how demographic variables would be associated with the IRQ factor scores.
In addition to the analysis following Roebuck and Lupyan, we also conducted a confirmatory factor analysis for both the Japanese and Chinese data, and an exploratory factor analysis for the Chinese data, each of these post-hoc. There was a mixed effects model analysis that we had declared in the Chinese preregistration but that we chose to omit here, as the analysis assumed the validity of the US factor structure for the Chinese and Japanese samples. The post-hoc confirmatory factor analyses suggested that the factor structure extracted by Roebuck and Lupyan for their US sample was compatible with neither the Japanese nor Chinese data. Similarly, although we had made predictions about how demographic variables of the Chinese participants would be associated with scores for the US-derived factors, we omitted this analysis for the same reason. Instead, we performed the same analysis using the Chinese-derived factor structure.
Participants
We recruited participants in China and Japan through survey management companies in each country. The pre-exclusion Japanese sample consisted of 122 consenting participants, but 22 (18%) met the preregistered exclusion criteria by either failing one of the two attention check questions or by giving identical Likert responses to 90% or more of the main questionnaire items. When preregistering exclusion criteria for the Japanese sample, we had originally proposed to exclude participants who gave the same response on all items, but due to the discovery of a small number of participants who gave the same response on nearly all items, we subsequently shifted the criterion to 90% and applied it to analysis of the Japanese sample and preregistered it for the subsequent Chinese sample. This change in criterion did not impact the results in any meaningful way. The pre-exclusion Chinese sample consisted of 470 consenting participants, and only 1 participant failed any of the exclusion criteria, suggesting that the Chinese data might be of better quality than the Japanese data. There were an additional 2 participants who were excluded due to uninterpretable results, resulting in a final Chinese sample size of 467, all of whom are from regions where the standard dialect is Mandarin Chinese and the standard writing system is Simplified Hanzi. Sample sizes were determined by research budget.
For the Japanese sample, 57 (57%) participants reported their gender as male while 43 (43%) reported female, and the mean age was 53.7 with a range of 21 to 72, with 81% of the sample aged 45 or above (Figure 1(A)). For the Chinese sample, 234 (50%) participants reported their gender as male while 233 (50%) reported female, and the mean age was 31.5 years, resulting in a considerably younger group than the Japanese sample: the age range was 20 to 70 but 87% of the sample were in their 20 s or 30 s (Figure 1(B)). For the Chinese sample, we also obtained responses for several demographic and background variables beyond age and gender: years of education, hours per week spent on dense reading (e.g., books and newspapers but not social media), frequency of thinking in English, and frequency of using English in daily life (Figure 1(C)–(F)). Self-reported demographic characteristics of the Japanese (A) and Chinese (B–F) samples. (E) 
Ethical approval for the study was granted by the London School of Economics Research Ethics Committee (REC), which has US Department of Health and Human Services IORG (IRB organization) status. The REC case number for the present study is 19583. All aspects of the study were conducted in accordance with relevant guidelines and regulations endorsed by the REC. Informed consent was obtained from all respondents in their native languages, through an initial consent statement at the start of the questionnaire that described the purpose of the study, the possibility of their anonymized and aggregated data being published in academic venues, and the right to withdraw from the study at any point in time. Participants were transferred to the main questionnaire only if they had read through these terms and chose to accept them by selecting the acceptance option within the local survey management company’s user interface.
Instrument
We administered the Internal Representations Questionnaire (IRQ) (Roebuck & Lupyan, 2020) after it was translated into Simplified Chinese (Hanzi) by a native Chinese speaker and into Japanese by a native Japanese speaker, each with professional-level competence in both their mother tongue and English. The IRQ consists of 36 items that probe the use of different forms of mental representation in everyday life. The questionnaire was originally constructed to investigate the role of internal verbalization in shaping perceptual and cognitive processing, but the researchers found 4 factors that each represent a different modality of internal representation: visual imagery, internal verbalization, orthographic imagery, and representational manipulation. The items in the IRQ were selected by an exploratory factor analysis on US samples consisting of university students and Amazon Mechanical Turk workers. The items of the IRQ and their factor assignment in the original study are listed in Table 9.
For both the Chinese and Japanese samples, the IRQ was administered through the smartphone interfaces employed by each survey management company. For each questionnaire item, participants were required to select a response from a 5-point Likert scale that consisted of the options “strongly disagree”, “disagree”, “neither agree nor disagree”, “agree”, and “strongly agree”. There were two reverse-coded items (items 13 and 33) whose response variables were inverted back for data analysis. The main questionnaire items were preceded by a consent question that allowed participants to opt-out of the study. The order of presentation of the main items was randomised, and the participant could only complete the study by providing responses for all questions. Two attention check questions were presented at randomized positions in the questionnaire.
Overview of Analysis
We first conducted simple comparisons of observed scores across the IRQ factors. The Chinese and Japanese scores were compared to the scores of the US sample in Roebuck and Lupyan (2020; data published in online repository: https://osf.io/8rdzh/). Comparisons were made using both raw scores and within-culture standardized scores, the latter being a strategy to control for cross-cultural differences in response style (Fischer, 2004). As this simple comparison of observed scores was conducted without verification of the IRQ factor structure in the Chinese and Japanese samples, we performed a confirmatory factor analysis to evaluate the fit of the IRQ factors to the non-US samples and to test measurement invariance. The results were mixed, but taken in total suggested inadequate fit. To identify the difference in factor structure between the US and non-US data, we conducted an exploratory factor analysis only for the Chinese data, as the sample size of the Japanese data was insufficient. Finally, we obtained factor scores of the Chinese participants for the newly extracted factors, and used them in a regression analysis as outcome variables to be predicted by demographic variables.
Results
Cross-Cultural Comparison of Observed Scores
Comparison of Raw Scores
Results of Welch’s t-Tests for Simple Pairwise Comparison of Mean Scores Between US, Japanese (JP), and Chinese (CN) Samples, for the 4 Factors Extracted From the US Data in Roebuck and Lupyan (2020). The US Values are Computed From Data Published by Roebuck and Lupyan

Comparison of raw means and within-culture standardized means of item responses grouped according to the factor structure extracted in Roebuck and Lupyan (2020). The US values are computed from data published by Roebuck and Lupyan (2020). Error bars are standard errors, and statistical significance levels derived from pairwise Welch’s t-tests are indicated by asterisks (*:
Japanese responses on average were lower than both the US and Chinese responses for visual imagery, internal verbalization, and representational manipulation. As the Japanese scores were closer to the mid-point of the 5-point Likert scale, this pattern may reflect either a middle response bias as previously reported in this population (Chen et al., 1995; Tasaki & Shin, 2017), or a negative response bias relative to the US and Chinese samples. Despite their overall lower scores, Japanese participants were at roughly the same level as the US participants for items that load onto the orthographic imagery factor, although still lower than Chinese participants.
Comparison of Standardized Scores
Within-culture standardized responses revealed cultural differences that are more readily interpretable than the raw score comparisons (Table 1 and Figure 2, bottom panel). Within each population, the mean score was set to 0 and the standard deviation was scaled to 1. All 3 groups yielded the highest scores on items that load onto the visual imagery factor, followed by items that load onto the internal verbalization factor. Scores for both of these factors were higher than scores for representational manipulation across all groups.
The greatest cross-cultural variation was observed in the orthographic imagery factor: US scores for these items were about half a standard score lower than the Chinese scores and about a third of a standard score lower than the Japanese scores (US,
There is no clear answer to what standardization procedure is most adequate in an analysis like this one, and the within-culture standardization approach that we adopt here is among the common methods employed in analysis of cross-cultural questionnaires (Fischer, 2004). We also tried an alternative method in which scores are standardized within individuals, yielding what are known as ipsative scores (Baron, 1996), but the change in mean scores for the factors was on the order of 0.002 to 0.02 standard scores, negligible for practical purposes.
Internal Reliability of IRQ Factors
Cronbach’s Alpha Measure of Internal Consistency for Each of the 4 Factors Extracted From a US Sample in the Original Study. Internal Consistency for US Data is Computed From the Raw Data Published Online by Roebuck and Lupyan (2020)
The analysis revealed some items whose removal increased internal reliability. Such increases were largely on the order of Δα = +0.01, with the exception of one item linked to the visual imagery factor (item 10) in the Chinese sample, whose removal resulted in a large increase of 0.06. This item corresponded to the statement “If I imagine my memories visually they are more often static than moving”, suggesting a particularly poor fit of this item with respect to the other visual imagery items for Chinese participants.
Confirmatory Factor Analysis
Goodness-of-Fit Indices for Confirmatory Factor Analyses With the Factor Structure Extracted by Roebuck and Lupyan (2020). US Values are Computed From Raw Data Published Online by Roebuck and Lupyan for a Preliminary Data Set. The Recommended Criteria for Each Index (Hu & Bentler, 1999) are Displayed on the Bottom Row
Tests of Measurement Invariance for Three Combinations of Samples: {US, China, Japan}, {US, China}, and {US, Japan}, With Conventional Goodness-of-Fit Indices
The same series of tests were also conducted in pairwise fashion for the US and Chinese samples and also for the US and Japanese samples. Similarly to the aggregate analysis with all 3 samples, metric but not scalar invariance was established for each of these groupings. The US–Chinese pair demonstrated slightly better fit measures for configural invariance compared to the US-only sample, and the US–Japanese pair demonstrated slightly worse fit. In sum, the fit of the factor loadings extracted in the original study does not noticeably decrease in the Chinese and Japanese samples, although the fit of the item intercepts do, thereby suggesting that the IRQ measures the same constructs across the three sampled cultures, but is limited in the degree to which actual item responses can be directly compared across cultures.
Intercorrelations Among the IRQ Factors for the Three Samples. The US Intercorrelations are Taken From Table 1 of Roebuck and Lupyan (2020)
In summary, we uncovered mixed evidence about the extent to which the IRQ factors fit the Chinese and Japanese data. Several measures showed inadequate fit, but goodness-of-fit was not particularly worse than for the original US sample. Metric invariance was obtained for the three samples in aggregate as well as in a pairwise manner, but scalar invariance was not. The high factor intercorrelations for the Chinese and Japanese sample suggest that the IRQ factors are not nearly as well-separated for the Asian samples as they are in the US sample.
Exploratory Factor Analysis
Analytical Specifications
Due to ambiguous fit of the IRQ factor model with respect to the Chinese and Japanese samples, it is desirable to conduct an exploratory factor analysis to find an alternative factor structure that better captures the pattern of the data from the two East Asian societies. A better model may point us toward meaningful cross-cultural differences in the structure of internal representations. However, the Japanese sample (
To first select the number of factors to be retained in an exploratory factor analysis of the Chinese data, we employed 3 selection methods—“optimal coordinates” (Raîche et al., 2013), “parallel analysis” (Horn, 1965), and “comparison data” (Ruscio & Roche, 2012)—which were the 3 best-performing methods in Ruscio & Roche (2012) comparative analysis of methods for selecting number of factors. All of these methods indicated that retainment of 3 factors was optimal. A multivariate Shapiro-Wilk test for normality indicated that the Chinese data were not normally distributed (
Factor Loadings of the 3-Factor Exploratory Factor Analysis With the Chinese Data. The Column Labeled “R&L Factor” Indicates the Corresponding Factor for That Item in the Original Study by Roebuck and Lupyan (2020). h2 Indicates Communality, and u2 Indicates Uniqueness, the Complement of Communality. “Drop” Indicates Whether the Item was Dropped From the Final Factor Structure on the Basis of the Criteria Noted in the Text
Factor Loadings for the Factor Structure Given by the Exploratory Factor Analysis, After Items Have Been Dropped. Factor Loadings Below the Cutoff of 0.3 Have Been Removed for Ease of Interpretation. The IRQ Factor Column Indicates the Corresponding Factor in the Original Study by Roebuck and Lupyan (2020). Factor 1 is Dubbed the “Ortho-verbal” Factor; Factor 2 is Dubbed the “Visuo-Verbal” Factor; Factor 3 is Dubbed the “Spatial Manipulation” Factor, See Text
Extracted Factor Structure
Factor 1 was loaded on by many of the items that were coded as internal verbalization in the original IRQ study, but it also included all of the orthographic imagery items, suggesting that these two modalities are not statistically separable in the Chinese population.
Factor 2 was the only factor that was loaded on by visual imagery items, and thus appears to primarily be a visual factor, although there were a number of items coded as internal verbalization that also loaded on this factor. Although it requires further study, the splitting of internal verbalization items between Factors 1 and 2 may be occurring along the lines of discursive vs. non-discursive items (Alderson-Day et al., 2018; McCarthy-Jones & Fernyhough, 2011), where items with a discursive or reasoning-like quality load onto Factor 1 (e.g., item 15, “I tend to think things through verbally when I am relaxing”) whereas items that lack an explicit reasoning-like component (e.g., item 14, “My inner speech helps my imagination”) load onto Factor 2.
Factor 3 comprised only 3 items but with high loading. These items were all from the representational manipulation factor, and they were a subset that specifically concerned spatial manipulation of geometric constructs. The other items in the original representational manipulation factor pertained to other, non-spatial modalities—in particular verbal, gustatory, and auditory representations, so this factor appears to be strictly selective for spatial manipulation.
Factor Intercorrelations and Internal Reliability. Cronbach’s Alpha are for Values After Items Were Dropped According to the Criterion Noted in the Text
IRQ Items With Factor Labels From Both the Original US Study and the Exploratory Factor Analysis in the Original Study. Items 19 and 33 Were Reverse-Coded. Blank Cells in the Chinese Factor Column are Items That Were Dropped Based on the Procedure Described in the Text. IRQ Items are Redrawn, With Permission, from Roebuck & Lupyan (2020)
Association of Factor Scores With Participant Characteristics
Regression Analyses Using Demographic and Background Variables of the Chinese Participants to Predict Their Factor Scores Across the 3 Factors Extracted Above From the Exploratory Factor Analysis. All Variables Other Than Gender are Standardized, and Age and Dense Reading are Log-Transformed. Gender is Dichotomous, and Coded as 1 = Male, 2 = Female
We found a gender effect for the spatial manipulation factor: male factor scores were on average 0.24 standard deviation units higher than female factor scores. Although self-report is prone to biases in self-evaluation, this outcome is consistent with the widely replicated finding that male participants have an advantage over female participants in spatial cognition tasks such as mental rotation (Levine et al., 2016). For factor scores across all 3 factors, there was a positive effect of the reported (log-transformed) hours per week spent on intensive reading (on books and newspapers rather than, e.g., social media). Moreover, the magnitude of association was roughly equal across the 3 factors (ortho-verbal,
In this regression we included two variables designed to index the participant’s immersion in the English language. One is English thinking, which encodes Likert responses to a Chinese statement that corresponds to, “I frequently think in English.” The other is English usage, which similarly encodes Likert responses but to a statement that corresponds to “I frequently use English in daily life (such as reading English texts, watching English films, engaging in English conversations, etc.).” We included these variables as a proxy for familiarity with Alphabetic writing systems, despite their likely confounding with other variables such as socio-economic status. English thinking and English usage are highly correlated (Pearson’s
For the ortho-verbal (Factor 1) and visuo-verbal (Factor 2) factors, factor scores were predicted by English thinking (ortho-verbal,
In sum, the analysis of factor scores revealed an effect of gender for the spatial manipulation factor, and what plausibly appear to be general effects of reading and English usage across all three factors, despite the masking of English usage by English thinking in the ortho-verbal and visuo-verbal factors. The impact of English immersion (thinking and usage) on factor scores is not yet clear.
Discussion
Cultural psychology has revealed substantial cross-cultural variation in perceptual processing, especially for vision, and has demonstrated the correspondence of such perceptual differences with other cultural variables such as social interdependence–independence (Kitayama et al., 2009). This body of research has supplied compelling evidence that the organization of the human mind is permeable to cultural influence, but has often focused on analytic–holistic cognitive style at the expense of other possible dimensions of variation. To extend the scope of cross-cultural psychological inquiry, we employed the Internal Representations Questionnaire (IRQ) (Roebuck & Lupyan, 2020), an instrument designed to probe individual differences in qualitative modalities of thinking. Although there is a large body of research on modalities of thinking such as the visualizer–verbalizer continuum (Kirby et al., 1988; Mayer & Massa, 2003), the IRQ is a unique, bottom-up approach to the investigation of the structure of internal representations.
By administering the questionnaire to new cultural populations, we investigated both cross-cultural and within-culture individual differences in internal representation. In particular, we studied people in Japan and the People’s Republic of China, under the hypothesis that variation in writing systems (Handel, 2019) may account for meaningful variation in internal representations across cultures.
Summary of Outcomes
A simple comparison of raw and standardized scores using the factor structure extracted from the US sample in the original study (Roebuck & Lupyan, 2020) revealed substantive differences between cultures (Table 1; Figure 2). After using within-culture standardization to reduce the effect of culture-specific response styles (Fischer, 2004), Chinese and Japanese scores were considerably higher than US scores on the orthographic imagery factor, and US scores were higher than Chinese and Japanese scores on the internal verbalization factor. There were other cross-cultural differences as well, including between the Chinese and Japanese samples, but the magnitude of these findings were smaller.
In our preregistration prior to collecting the Chinese data, we had predicted that the Chinese scores for orthographic imagery would be similar to or higher than the Japanese scores for the same IRQ factor, and that the Chinese scores for internal verbalization would be similar to or lower than the Japanese scores for the same factor. Prior to preregistering, we had observed that the Japanese participants reported higher orthographic imagery and lower internal verbalization than the US participants, and reasoned that Chinese participants should exhibit the same contrast but in a more pronounced manner, due to written Chinese being a purer logographic system while written Japanese can be considered as intermediate between written English and written Chinese due to its combination of logographic and phonetic writing systems. In the standardized comparison (Figure 2, bottom), our prediction about Chinese orthographic imagery scores turned out to be accurate. The results for internal verbalization were less notable, as mean internal verbalization scores were only 0.02 standard scores smaller in the Chinese sample than they were in the Japanese sample, but were nonetheless consistent with the preregistered prediction. However, the mean age of the Japanese sample was 22 years older than the mean age of the Chinese sample, and a more balanced comparison of these two groups would require age-matched samples, especially given reports about age-related differences in mental imagery (Floridou et al., 2022; Kemps & Newson, 2005).
We observed mixed results about whether the original factor structure of Roebuck and Lupyan, derived from their US sample, was a good fit for the Chinese and Japanese data. A confirmatory factor analysis yielded ambiguous goodness-of-fit measures and factor intercorrelations in the Chinese and Japanese data that which were considerably higher than in the US data. We therefore conducted an exploratory factor analysis with the Chinese data but not the Japanese data, due to a limitation in sample size for the latter. The analysis revealed a 3-factor structure: (1) an “ortho-verbal” factor that comprises orthographic imagery as well as some internal verbalization items that may have in common a discursive character, (2) a “visuo-verbal” factor that comprises visual imagery as well as some internal verbalization items that may have in common a non-discursive character, and (3) a “spatial manipulation” factor that is a subset of the representational manipulation factor of the original IRQ study, containing items related to the manipulation of geometric objects but excluding other modalities of representational manipulation.
Using the extracted 3-factor structure to further analyze the Chinese data, a comparison of factor scores with demographic variables revealed a number of findings. Among individuals who reported more time spent reading or immersed in the English language (i.e., English thinking or English usage), higher scores were observed across all 3 factors. This may indicate that engagement with linguistic material—whether in the form of immersion in a foreign language or in reading—is associated with more vivid internal representations overall. Relatedly, Roebuck and Lupyan (2020) had found that mean responses are correlated across factors rather than exhibiting a tradeoff between the different factors, despite research on cognitive styles (e.g., visualizer–verbalizer) often assuming a tradeoff (Mayer & Massa, 2003).
Male participants had higher factor scores on the spatial manipulation factor than female participants, as predicted based on a large body of past research (Levine et al., 2016). A gender effect was present only for this one factor.
Interpretation of the Chinese Factors
The factor structure extracted from the Chinese sample differed from the factor structure reported by Roebuck and Lupyan (2020) for their US sample. The structure revealed here may serve to point us toward qualitative differences in the organization of internal representations between Chinese and US individuals.
Ortho-Verbal Conjunction
The joining of orthographic imagery and internal verbalization mirrors the notion that alphabetic reading involves extensive conversion of visuo-orthographic input into phonological representations, while Chinese reading involves more sustained activation of both orthographic and phonological representations (Perfetti et al., 2013; Xu et al., 1999). This difference is proposed to be due to a structural property of Chinese characters, namely how they primarily encode semantic meaning and only subordinately phonemic information, in contrast to a more direct phonological encoding in alphabetic symbols.
In Chinese orthography, tens of thousands of units of meaning (on the order of thousands for everyday use) are each represented with a dedicated character, and this large array is in turn mapped onto a much narrower set of just several hundred toned syllables. This is unlike in English, where isolated graphemes usually do not represent meaning in themselves, but only sounds. This mapping of a large set of logograms onto a smaller set of sounds results in a high density of homophony, where any given phonemic (syllabic) representation is likely to map onto multiple characters and hence multiple meanings (Figure 3(A)). This ambiguity incentivizes the development of a direct route of cognitive access from orthography to meaning that is unmediated by phonology (Figure 3(B)), as orthography carries considerably more information than phonology in such a writing system (Perfetti et al., 2013; Tan et al., 2005; Wu et al., 2012). (A) Scripts with dense homophony (e.g., Chinese) – unlike scripts with sparse homophony (e.g., written English) – entail a many-to-few mapping from graphemes to phonological form, thus yielding ambiguity if semantic meaning is decoded solely from phonological representations of written language. (B) This structural difference between writing systems plausibly explains existing neuroimaging evidence for stronger parallel encoding of phonological and orthographic representations during reading in Chinese than in English readers (see text for details). Illustrated here are orthographic, phonetic, and semantic representations of “sheep” in English and Chinese. (C) Parallel encoding of semantic meaning may explain our finding of the statistical inseparability of orthographic and verbal imagery (i.e., “ortho-verbal conjunction”) in Chinese but not English readers
Neuroimaging studies reveal a developmental divergence in cortical responses to orthographic input when comparing Chinese- and English-reading children. These data suggest that Chinese readers, compared to English readers, exhibit more sustained activation of visuo-orthographic representations in parallel with phonological representations, and that this sustained activation is subserved by cortical regions such as the superior parietal lobule, the inferior temporal gyrus, and the middle occipital gyrus, all of which are areas involved in visuo-orthographic analysis (Cao et al., 2009, 2010, 2014). This cross-cultural neurocognitive divergence is thus best explained as a difference in the processing demands of the two writing systems, resulting in different learning trajectories. A genetic explanation for this divergence is far less plausible, due to the historical recency of literacy.
These structural differences between the two writing systems also explains the discrepancy between the orthographic imagery factor that was extracted from the US sample by Roebuck and Lupyan and the composite ortho-verbal factor that was extracted from the Chinese sample in the present study. For the Chinese sample, there was no statistical separation between orthographic imagery and at least some components of internal verbalization, suggesting a stronger coupling between these two modalities of representation (“ortho-verbal conjunction”) compared to the US sample (Figure 3(C)). On the present explanatory account, this coupling arises from the parallel encoding of phonological and orthographic representations during reading, which arises as a learned neurocognitive adaptation to a literacy environment with dense homophony. More broadly, the account predicts that cultural variation in information environments influences cultural variation in the structure of internal representations (Kroupin et al., 2025).
Discursive Versus Non-Discursive Verbalization
It is not self-evident why the internal verbalization-related items subsumed by the ortho-verbal factor appear to share a discursive character. However, previous research using the Varieties of Inner Speech Questionnaire (VISQ and VISQ-R) found that self-reports about inner speech can be decomposed into multiple factors, one of which is a factor for “dialogic” inner speech (Alderson-Day et al., 2018; McCarthy-Jones & Fernyhough, 2011) that overlaps considerably with the discursive items that loaded onto the ortho-verbal factor in the Chinese sample. This proposed sub-division of internal verbalization is consistent with the factor structure extracted in the present study. It also means that it may be possible in a future study to use items from the VISQ to distinguish between dialogic and non-dialogic items, and explore the robustness of the apparent separation of these items in the extracted factor structure.
If internal representations of orthographic symbols are directly linked to atomic units of semantic meaning in Chinese-readers (through the “direct” pathway discussed above), then it is plausible that in the same population, internal representations of higher-order orthographic structures such as sentences or paragraphs are directly linked to higher-order units of meaning such as discourse and narrative. On this scenario, Chinese readers would be able to comprehend discursive meaning with relatively less reliance on internal verbalization, whereas in English readers, discursive meaning is obligatorily tied to internal verbalization. The discursive items of the questionnaire may be occupying the same factor as orthographic items in the Chinese factor structure as another downstream consequence of this direct processing pathway.
The visuo-verbal factor also subsumed a number of items that were, in the original US study, associated with internal verbalization. These items appeared to have in common a lack of the discursive component. Speculatively, they may pertain more to the immediate sensation or action of vocalization, rather than discourse, but the number of these items was too small to allow any measured judgment of their collective properties. One of these items, item 14 (“My inner speech helps my imagination”), carries an implicit connection to the visual modality insofar as imagination is commonly construed visually, but the other two do not do so in any obvious manner. It is not clear why visual imagery would be merged with internal verbalization, whether discursive or not, nor whether this is a robust finding in the first place. The organization of this factor will require further study.
Spatial Manipulation
The representational manipulation factor of Roebuck and Lupyan (2020) was reduced to a subset pertaining specifically to spatial manipulation. The coherence of this subset was strong, with the items loading on this spatial manipulation factor having the highest factor loadings among all the questionnaire items. Although the cause of this pattern is unclear, it is plausible that some component of the Chinese cultural environment, such as the educational curriculum, tends to decouple spatial manipulation from other modalities of representational manipulation when compared to the US population.
Broader Outlook and Future Directions
The present study compares the structure of internal representations across samples in the United States, Japan, and the People’s Republic of China, under the hypothesis of a causal role played by writing systems. Although we find preliminary evidence supporting our orthographic hypothesis, additional studies are required for more robust validation and qualification. For example, investigation of Asian populations that employ alphabetic scripts—like Vietnamese, Malaysian Malay, most Indonesian, and Mongolian populations—can help resolve the role of cultural differences other than writing systems as potential confounds. The direction of causality itself also requires validation, as it remains possible that cross-cultural differences in internal representations—driven by some factor other than orthography, for instance genetics or social organization—may account for the form of writing systems, rather than the converse. Investigation of non-literate or minimally literate sub-populations would partly help resolve this ambiguity, as well as confer valuable insights regarding the gradations of the orthographic effect on imagery.
Beyond differences in internal representation induced by logographic vs. alphabetical writing systems, we may observe subtle differences within each system. For example, the “deeper” orthographies of English and French may exhibit signs of ortho-verbal conjunction to a greater extent than “shallower” orthographies like Finnish and Italian, due to denser homophony in the former (Seymour et al., 2003).
Finally, our framework suggests a structural coupling between writing systems and internal representations, thus offering a conceptual inroad toward the inference of population-level changes in mental imagery driven by historical changes in orthography (Han et al., 2022; Kelly et al., 2021; Morin & Koshevoy, 2024). In addition to this possibility of historical inference, the framework also supports the prediction of ongoing and future changes. For example, the widespread adoption of digital interfaces for reading and writing has reportedly precipitated a “character amnesia” among users of Chinese script in recent years (Huang et al., 2021), with plausible consequences for the structure of their internal representations. Artificial intelligence is likely to instigate further, possibly dramatic, changes in literacy practices, thus potentially driving systematic changes in imagery and internal representations (Clark, 2025; Oakley et al., 2025). Although only a first step, our study sketches out this functional relationship between cultural technologies of literacy and our internal mental lives.
Conclusion
Administering the Internal Representations Questionnaire (IRQ) to Chinese and Japanese samples, we obtained evidence about cross-cultural differences in the structure of internal representations. These populations were appropriate for testing the hypothesis that variation in writing systems induces variation in internal representations. A naive comparison of item responses using the factor structure extracted from the original US study demonstrated that respondents from the two east Asian cultures had higher scores for orthographic imagery and lower scores for internal verbalization compared to respondents from the US sample, a finding that is aligned with basic features of their respective writing systems. A confirmatory factor analysis raised doubt about whether the US factors were appropriate for the two new samples, so we performed an exploratory factor analysis on the Chinese data and extracted a 3-factor structure that exhibited notable differences from the US factor structure. The extracted factor structure indicated differences in the organization of internal representations between Chinese and US participants, revealing findings that are consistent with data from cross-cultural studies on the psycholinguistics and functional neuroimaging of reading. In particular, some components of internal verbalization were statistically inseparable from orthographic imagery, suggesting that the two are closely tied together in Chinese but not US participants. This may be a downstream consequence of differences in learned neurocognitive adaptations to their respective writing systems, a process that would reveal the potency of cultural transmission in shaping basic aspects of human psychology are not readily observed in behavior.
