Sage Journals: Discover world-class research

Abstract

A fundamental question in social sciences is whether inequality facilitates or hinders economic growth. Before finding the answer, it is necessary to establish the type of inequality indicators that holds greater significance, while controlling for the heterogeneity of the countries. This research proposes multiple novel approaches utilizing the recent advances in Machine Learning to determine which inequality measure for each group of countries is the key index forecasting growth. A dataset comprising a panel of 150 countries spanning the period 1980 to 2020 has been employed. To account for heterogeneity, clustering and feature importance issues, the K-Means the XGBoost methods are used. The results show that while for a majority of developed and developing countries, wealth inequality is the most influential factor, for a group of pre-communists and underdeveloped, income inequality indicators are more strongly associated with growth. However, wealth inequality has been found to be significant across all groups of countries worldwide.

JEL Classification: E01, O15, O47.

Keywords

inequality economic growth machine learning K-means XGBoost heterogeneity

Introduction

Rising income inequality in both developed and developing economies has received significant attention in the past few years. One crucial area of study in economic development is the relationship between inequality and economic growth. Due to the recent increase in inequality indicators in a majority of the countries, this subject has regained attention as a heavily discussed topic in current literature (Neves et al., 2016; Topuz, 2022). Piketty (2014) points out that increases in income shares for the top 1% or 0.1% of the population have been the main cause of rising inequality in many advanced economies since 1980. It seems that a growing interest in comprehending the cause and effects of income inequality has arisen recently in both industrialized and developing economies (Benhabib et al., 2017; Brueckner & Lederman, 2018; Kennedy et al., 2017; S.-C. Lin et al., 2009; Y.-C. Lin et al., 2014; Madsen et al., 2018; Piketty, 2014). Over the past three decades, the rich-poor income gap has widened in the majority of OECD countries (Cohen & Ladaique, 2018). According to Cingano (2014) in the OECD region, the income gap between the top 10% richest and the 10% poorest of the population grew from 7:1 to roughly 10:1 from 1980s to 2014. Figure 1 depicts the world map of change in the wealth Gini coefficient indicator from 1980 to 2020. As it is shown, wealth inequality has witnessed a significantly increased in most nations, encompassing about 70% of the global population (Motahar & Mamipour, 2025a; WID, 2021).

Figure 1.

Wealth Gini coefficient (changes from 1980 to 2020).

Since fostering economic growth is the primary aim of economic policy (Mankiw, 2020), it is crucial to examine how inequality indicators affect growth. Atkinson and Bourguignon (2000) pointed out that in the latter half of the 20th century, numerous economists regarded differences in distribution as less significant compared to the overall growth of economic output (Motahar & Mamipour, 2025b). This perspective was largely influenced by the widely accepted notion of an inherent trade-off between growth and equality (Okun, 2015). The prevalent theory was that redistributing wealth from the wealthy to the less affluent could impede growth, arguing that higher taxes and subsidies might distort economic incentives, ultimately resulting in a disadvantage for all. This impact needs to be investigated empirically and comprehensively with the latest methodologies and improved datasets.

Several theoretical researches have investigated the interrelationship between inequality indicators and economic growth through different channels, though their results were inconclusive (Dudzevičiūtė & Prakapienė, 2018). The empirical literature also offers contradictory results, reinforcing the ambiguity in the theoretical models’ conclusions (Cartone et al., 2021; Ferreira et al., 2022; Halter et al., 2014; Neves et al., 2016; Neves & Silva, 2014; Pierdzioch et al., 2022). There are several possible explanations for the contradictory empirical findings related to the inequality-growth relationship. These include the lack of comparable data, oversight of the non-linear nature of the relationship and neglecting the heterogeneity of countries. This paper tries to address these deficiencies. Specifically, this paper tries to answer the following questions: What is the most important inequality indicator predicting economic growth considering the heterogeneity of the countries and the nonlinearity of relationship between the variables?

To address this question this paper benefits from the latest Machine Learning (ML) techniques especially because the relationship between inequality and growth is neither universally agreed upon nor clearly defined (Shen & Zhao, 2023), and their connection may be complex (Mo, 2000; Motahar & Mamipour, 2025a). Data-driven ML approaches are advantageous as they dispense with the need for a pre-established model, opting instead to automate the process of uncovering insights from data. Figure 2 shows the research steps in a schematic form. First, the countries are clustered into groups using an unsupervised ML algorithm called the K-Means method. By clustering them into more homogenous groups, their heterogeneity is accounted for leading to more accurate outcomes. Then, a ML technique, namely XGBoost, attempts to determine which inequality measure predicting GDP per capita holds the highest feature importance for each country cluster. Finally, the Granger causality test is conducted to determine whether the obtained inequality variables from the previous step are also statistically significant impacting growth as well as a robustness check method.

Figure 2.

Research design.

This research adds to the body of literature in several ways:

Traditional research often grouped countries’ data together, potentially leading to inaccuracies due to heterogeneity. Following Motahar and Mamipour (2025a), this study addresses this by clustering nations into four distinct groups according to inequality metrics and macroeconomic characteristics. This approach results in more homogeneous country groups, improving the precision of estimations. It also aids in identifying subgroups with similar patterns, reducing estimate variability, and providing economic insights into the countries’ similarities and disparities. This method, a first in the field, offers an improvement over Fixed Effect models which might not capture all time-invariant characteristics.

ML methods are increasingly integrated into the statistical toolkit of economists. In our exercise, ML techniques offer several advantages over the classical statistical regression methods. ML approach is well-suited, especially when an extensive collection of covariates (p) are highly correlated, as is the case here. Also, ML data-driven approaches are beneficial since they do not require a predefined model; instead, they seek to automate discovery from data. These models are particularly advantageous for us because there isn’t a universally accepted model that defines the connection between inequality and growth. As many researchers have pointed out, this relationship can be complex (Motahar & Mamipour, 2025a).

Income inequality has been predominantly used as the single index for economic inequality, and on top of that, net Gini has been almost always exclusively used as the single indicator of income inequality. However, at least two other major indicators of economic inequality must be examined as well, namely poverty and wealth inequality. Interestingly, some recent papers have found these two virtually ignored indices to be even more impactful on economic growth compared to income inequality (e.g., Bagchi & Svejnar, 2015; Marrero & Servén, 2022). On top of that, each of these three inequality indices has various measures. For instance, Gini is not the only measure of income inequality; other measures exist, such as Atkinsons and Palma. This paper examines 33 measures of income inequality, poverty, and wealth inequality to select the most relevant variable for each group. To the best of our knowledge, this is the first research on this topic that examines inequality measures comprehensively to this degree.

Our study utilizes extensive datasets from various sources for 150 countries (1980–2020). It includes 33 inequality measures and 12 control variables, forming the most comprehensive panel in this field. Scholars largely agree that wealth, rather than income, more accurately reflects inequality’s impact on growth. Despite this, empirical research has predominantly used income measures due to data limitations. Occasionally, wealth distribution was approximated using land distribution. Contrarily, our paper employs the World Inequality Database for direct, comparable wealth inequality data, circumventing the need for proxies. Table 1 summarizes the deficiencies in the literature and the way this paper attempts to address them.

Table 1.

The Research Gaps and Main Contributions of the Current Study.

#	Identified gaps in previous studies	Contributions of the current study
I	All countries’ data in a single panel dataset	Clustering countries into categories according to inequality indicators and key macroeconomic characteristics- More homogeneous groups of countries in each cluster;- Providing economic insights into the similarities and disparities of the countries;
II	Classical statistical regression methods	Machine learning methods- ML techniques are well-suited when there exists an extensive collection of highly correlated variables (the inequality measures);- ML data-driven approaches are beneficial since they do not require a predefined model; instead, they seek to automate discovery from data. They derive insights from data and determine intricate functions that connect inputs to outputs to facilitate predictions on new data: Molina and Garip (2019).
III	Examining only income inequality and proxying it solely by Gini	Examining 33 economic inequality measures from three subgroups- More comprehensive examination of the relationship (Bagchi & Svejnar, 2015; Marrero & Servén, 2022).
IV	Applied dataset	Extensive and up-to-date datasets from various sources are available for 150 countries, covering the years 1980 to 2020.- Increased statistical power and generalizability- Utilizing the WDI, which gathers update and comparable data on wealth inequality without requiring a proxy for wealth, opposed to the previous literature.

Source. Current research.

Literature Review

Theoretical Review

Theoretical studies have identified various pathways through which income inequality influences economic growth (Motahar & Mamipour, 2025a). Mdingi and Ho (2021) believe in the promoting impact of inequality on growth. Additionally, Kaldor (1955) and Kalecki (1971) contend that inequality leads to increased capital accumulation, which ultimately fuels economic growth. From the perspective of incentives, inequality may also positively affect growth (Bradbury & Triest, 2016; Katz, 1986). In other words, income inequality motivates people to put in additional effort and to take financial risks or further their education in order to reap greater rewards later on. Additionally, they are urged to move to more productive industries in order to boost economic expansion (Cingano, 2014).

Alternatively, researchers have identified several growth-dampening transmission channels for inequality (Luo, 2023). The two most important variables in contemporary economies mediating the inequality-growth relationship are human capital accumulation and career choices (Galor, 2011; Galor & Zeira, 1993; Neves & Silva, 2014). Income inequality may hinder the development of human capital and economic progress. Researchers have shown that, in the face of fixed expenditures related to education investments and flaws in the credit system, individuals’ choice of vocation is significantly influenced by the distribution of their income. Table 2 summarizes the growth promoting and dampening channels of inequality. It should be noted that this study evaluates the overall effect of inequality on growth.

Table 2.

Overview of Channels That Enhance or Hinder Growth Due to Inequality.

Channels	Mechanism	Source
Dampening	• Underinvestment in human capital• Less productive occupational choices• Socio-political unrest• Undermining democracy• Redistributive policies• Rent-seeking• Demand/consumption deficit	Galor and Zeira (1993), Alesina and Perotti (1996), Stiglitz (2013), Alesina and Rodrik (1994), Banerjee and Duflo (2003), Bertola et al. (2005)
Promoting	• Saving/Investment• Incentivization	Kaldor (1955), Katz (1986)

Source. Current research.

From socio-political point of view, according to some academics, growing socio-political turmoil brought on by extreme economic inequality could impede growth (Alesina & Perotti, 1996; Mdingi & Ho, 2021; Venieris & Gupta, 1986). Income inequality can result in strikes, criminal activity, and other growth-dampening outcomes. Inequality could also undermine democracy (Stiglitz, 2013). Stiglitz (2013) asserts that a greater concentration of wealth corresponds to a greater concentration of power, which might skew policies toward benefiting the wealthy. Inequality may also lead to lower growth by creating a deficit in consumption and demand (Bertola et al., 2005). Political economy models also show that redistributive measures generally result in decreased efficiency for the sake of equity consequently leading to slower growth (Banerjee & Duflo, 2003). In general, theoretical literature review highlights the ambiguity of the inequality-growth relationship and a need for more research in this area.

Empirical Review

The conflicting findings found in empirical literature echoes the theoretical literature’s inconclusive assessment of inequality’s impact on growth (Baselgia & Foellmi, 2022; Halter et al., 2014). Empirical literature’ findings can be categorized into negative, positive, and no or non-monotonous relationship. Due to space limitations, these finding are summarized in Table A1 in Appendix 1.

Numerous eminent researchers conclude that growth and inequality are positively correlated. H. Li and Zou (1998) through dividing public spending into consumptive and productive services, expand the Alesina and Rodrik (1994) approach. Their model uses both fixed- and random-effect estimators and predicts that growth is positively impacted by income inequality. Forbes (2000) examined the reduced-form of inequality-growth relationship in 45 different nations from 1995 to 1966 and concluded with a positive relation between income inequality and growth. Scholl and Klasen (2019) utilized the methodology outlined by Forbes, applying it to a dataset comprising 122 countries from the years 1961 to 2012 (Motahar & Mamipour, 2025b). Using estimation techniques such as Fixed Effect, Generalized Method of Moments, and Instrumentalized Variables, they discovered a positive inequality-growth interrelationship; However, considering the heterogeneity of the countries, this overall conclusion might be inaccurate as this positive coefficient in the regression was mainly driven by the pre-communist countries.

On the other hand, Alesina and Rodrik (1994) found a negative impact from inequality on growth. Other researchers found no or non-linear relationship between the two variables. Utilizing 2SLS and GMM approaches and US data from 1929 to 2013, Benos and Karagiannis (2018) concluded that growth is unaffected by changes in inequality. Furthermore, Castelló-Climent (2010) observed the positive and negative effects of inequality on growth in developed and less developed countries, respectively. As it is depicted in Table A1 Appendix 1, inequality-growth relationship is a topic of intense debate in the literature. The conflicting empirical findings concerning the relationship between inequality and growth can be explained by various factors, including heterogeneity, non-linearity, the absence of comparable data, and the reliance on proxies (Motahar & Mamipour, 2025a).

Heterogeneity, Non-linearity, Data Limitation, and Proxies

Although there is a widespread use of vast cross-country datasets, the heterogeneity is ignored in a major part of the literature. The majority of research to date has concentrated on utilizing fixed-effect estimators to eliminate unobserved time-invariant components and GMM estimators to handle endogeneity. While estimation methods may address endogeneity, they often overlook the variation in parameters, a consideration that is reasonable given the differences in socioeconomic and institutional aspects among countries, like economic policies and technological advancements (Hailemariam & Dzhumashev, 2019). The substantial diversity observed in how inequality affects growth across nations suggests that neglecting this heterogeneity and drawing conclusions from average relationships could lead to erroneous policy implications for specific countries (Voitchovsky, 2005).

Another reason behind this fragmented picture on the inequality-growth link derives from ignoring the complexity and non-linearity of their relationship. Benhabib et al. (2017), Brueckner and Lederman (2018), B.-L. Chen (2003), S.-C. Lin et al. (2009), and Y.-C. Lin et al. (2014) highlighted a non-linear relationship between the two elements. In particular, an inverse U-shaped linkages have been discovered by B.-L. Chen (2003) and Banerjee and Duflo (2003). Their research indicates that the common practice of assuming a linear relationship between inequality and growth in empirical studies contradicts the theories suggesting a non-linear connection. Despite economic theory supporting a non-linear link between inequality and growth, this aspect has been largely overlooked in many empirical investigations. On the contrary, following Motahar and Mamipour (2025a), this study utilizes ML techniques capable of discerning intricate patterns within the data.

The absence of consensus in the literature on the relationship between income inequality and economic growth can largely be attributed to limitations in the available data, as previously conducted studies are constrained by the lack of comparable inequality data across different times and places (Atkinson & Brandolini, 2006; Neves et al., 2016; Neves & Silva, 2014). As Atkinson and Brandolini (2015) point out, many studies rely on time series with discontinuities in inequality data, which can significantly impact empirical findings. These limitations in earlier research have led to potential measurement errors and challenges in controlling for time-invariant variables, particularly due to the inadequacy of the time dimension for panel data estimations. This article, however, utilizes a comprehensive and current dataset from various sources, encompassing 150 countries from 1980 to 2020.

Another explanation for the literature’s lack of consensus is that while wealth distribution is typically the basis for theoretical reasoning, almost all empirical studies employ income instead of wealth distribution because of data due to unavailability of data for all countries. Aghion et al. (1999) believes that in empirical studies, the utilization of proxies is imperative due to the inadequacy of data pertaining to wealth distribution in numerous countries. The prevalent methodology adopted by researchers is to employ data on income inequality as a substitute for wealth inequality.

There are exceptions, such as the studies by Alesina and Rodrik (1994) and Deininger and Olinto (1999), which use land holding as a proxy for wealth inequality. However, as Alesina and Rodrik (1994) observe, land is just one aspect of wealth and doesn’t fully align with their model’s definition of capital. Following Motahar and Mamipour (2025b), this article overcomes these limitations by utilizing the World Inequality Database, which offers direct and comparable information on wealth inequality, removing the necessity for proxies.

Moreover, even when studying the impact of income inequality on growth, empirical studies often use the disposable income Gini coefficient. However, public discussions frequently focus on the income shares of top earners. The Gini coefficient, sensitive to changes in the middle of the income distribution, might underrepresent variations in the tails (Atkinson et al., 2011; Juuti, 2020). Therefore, incorporating measures like top income shares and decile ratios (e.g., the Palma ratio) is vital for a broader understanding of income inequality. This article aims to bridge these gaps by analyzing 33 indicators related to income inequality, poverty, and wealth inequality.

Research Methodology and Data

The methodology employed in this study involves using an unsupervised Machine Learning algorithm followed by a supervised one to investigate the relationship between inequality and economic growth. Firstly, the K-means algorithm (Davidson & Ravi, 2005; Likas et al., 2003; Y. Li & Wu, 2012) is utilized to cluster countries into more homogenous groups. Secondly, the XGBoost algorithm (T. Chen & Guestrin, 2016), is employed to identify the most important inequality measures for each cluster of countries. Finally, following Baltagi (2008), Hood et al. (2008), and Hurlin (2004) a panel Granger causality is applied to test the significance value of the relationship and as a robustness check method.

K-Means

In order to achieve homogeneous groups compared with hierarchical methods, following Gordon (1999), the K-means clustering technique as the most well-known clustering method is utilized to separate the observations into a predetermined number of clusters (K) in order to minimize the squared Euclidean distance from the center of the cluster (Equation 1). The reasons for the algorithm’s popularity are its ease of interpretation, simplicity of implementation, speed of convergence, and adaptability to sparse data (Dhillon & Modha, 2001). This unsupervised learning process involves categorizing countries based on common attributes to reveal internal relationships within the unstructured dataset, as follows in Equation 1:

$\sum_{m \in c} \sum_{i = 1}^{N} {(X_{im} - \bar{X_{ic}})}^{2}$ (1)

Where:

$X_{im} =$ variable $i (i = 1 \dots, N)$ for observation m $(m = 1, \dots, M)$

$\bar{X_{ic}} =$ the center of cluster c to which observation m is assigned, or the average $X_{i}$ for all the observations in cluster c. Xs represent the countries in a 45-dimensional space of our features, including both inequality and control variables.

XGBoost

At the next step, in order to find the most influential feature among the 33 inequality indicators, following T. Chen and Guestrin (2016), the eXtreme Gradient Boosting (XGBoost) is employed as a model-free ML predictive method in a panel framework spanning from 1980 to 2020. This technique is attributable to the attainment of a proficient implementation of the gradient boosting framework. XGBoost is recognized as a scalable and comprehensive tree-boosting system, extensively utilized and acknowledged for its top-tier performance in regression applications (Zhang & Zhan, 2017; Zheng et al., 2017). This method addresses overfitting, non-linear relationships and feature interactions. XGBoost is an ensemble of regression trees (CART). XGBoost distinguishes itself from the traditional gradient boosting framework introduced by Friedman (2001) by incorporating a regularized objective into its loss function. This method combines the principles of gradient boosting and decision tree.

Decision Tree: A decision tree constructs a model that forecasts the label by analyzing a sequence of if-then-else true/false feature queries, aiming to determine the fewest number of questions required to evaluate the likelihood of making an accurate decision. Decision trees are applicable for classification, where they predict a category, or for regression, where they estimate a continuous numerical value. In the straightforward example below, a decision tree is employed to predict a house’s price (the label) based on its size and the number of bedrooms (the features; NVIDIA, 2025) (Figure 3).

Figure 3.

A decision tree regression example.

Gradient Boosting: The concept of gradient boosting originates from the notion of enhancing a single weak model, such as decision trees, by integrating it with several other weak models to create a robust model overall. Gradient boosting is an advanced form of boosting where the sequential creation of weak models is structured as a gradient descent algorithm applied to an objective function. This method aims to reduce errors by setting specific targets for the subsequent model. These targets are determined by the error gradient concerning the prediction, which is why it is called gradient boosting (Chen & Guestrin, 2016).

Gradient Boosting Decision Trees (GBDTs) work by sequentially training a series of shallow decision trees, where each new tree is trained to correct the errors of the previous one. The ultimate prediction is derived from a weighted combination of all the individual tree predictions. While “bagging” is used to reduce variance and prevent overfitting, GBDT’s “boosting” approach focuses on reducing bias and avoiding underfitting (NVIDIA, 2025). The primary goal is to minimize a specified loss function by sequentially adding models that correct the errors of the preceding models. The steps involved in XGBoost regression include:

1. Initialization: Start with an initial prediction, usually the mean of the target variable for regression tasks and define the learning rate (η), which controls the contribution of each new tree.

2. Building Trees: Add decision trees sequentially, each attempting to correct the residual errors of the previous trees. Each tree is built using the following steps:

2.1. Compute Residuals: Calculate the residuals (errors) for each data point based on the current model’s predictions.

2.2. Fit a Tree to Residuals: Train a new decision tree to predict these residuals. The gradient (the first derivative of the loss function) provides the direction in which the loss is increasing most rapidly, while the Hessian (The second derivative of the loss function) provides curvature information about the loss surface, helping to make more precise and faster updates.

2.3. Update Predictions: Adjust the predictions by adding the new tree’s predictions, scaled by a learning rate parameter.

3. Objective Function: in line with Motahar and Mamipour (2025a), we define an objective function that includes both the loss function (measuring model fit) and a regularization term (penalizing model complexity) to prevent overfitting. The objective function for XGBoost can be expressed as:

$L (θ) = \sum_{i = 1}^{n} ℓ (y_{i}, y_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$ (2)

where ℓ is the loss function (e.g., mean squared error for regression), $y_{i}$ is the predicted value, and Ω is the regularization term.

Further equations and their description are brought in the Table 3.

Table 3.

XGBoost Equations and Descriptions.

Equation	Description		Equation
$L^{m} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{m}) + \sum_{j = 1}^{m} Ω (f_{j})$	n	Number of samples	(3)
	$l (.)$	Loss function (differentiable)
	${\hat{y}}_{i}^{m}$	Predicted
	$y_{i}$	Target
$Ω (f) = γ T + \frac{1}{2} λ \sum_{k = 1}^{T} w_{k}^{2}$	$Ω (.)$	The regularization term	(4)
	T	Number of nodes
	w	Node’s weight
	$γ;$ λ	Control the regularization degree
${\hat{y}}_{i}^{m} = {\hat{y}}_{i}^{m - 1} + f_{m} (x_{i})$	${\hat{y}}_{i}^{m}$	The mth iteration	(5)
$L^{m} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{m - 1} + f_{m} (x_{i})) + Ω (f_{m}) + const$	$\sum_{j = 1}^{m - 1} Ω (f_{j})$	As $\sum_{j = 1}^{m - 1} Ω (f_{j})$ calculated through (m−1)th iteration tree (constant)	(6)
$ℒ^{m} = \sum_{i = 1}^{n} [l (y_{i}, {\hat{y}}_{i}^{m - 1}) + g_{i} f_{m} (x_{i}) + \frac{1}{2} h_{i} f_{m}^{2} (x_{i})] + Q (f_{m})$	Taylor expansion		(7)
$g_{i} = \partial_{{\hat{y}}_{m - 1}} l (y_{i}, {\hat{y}}^{m - 1})$	First order derivative on the loss function		(8)
$h_{i} = \partial_{{\hat{y}}_{i}^{m - 1}}^{2} l (y_{i}, {\hat{y}}^{m - 1})$	Loss function second order derivative		(9)
$IMP F = \sum_{m = 1}^{M} \sum_{l = 1}^{L - 1} I (F_{m}^{l}, F) I (F_{m}^{l}, F) = {\begin{matrix} 1, {ifF}_{m}^{l} = = F \\ 0, otherwise \end{matrix}$	M	The number of trees (iterations)	(10)
	$F_{m}^{l}$	The feature related to the node l
	$L - 1$	The number of non-leaf nodes of the tree
	L	Nodes of the mth tree number
	$I ()$	Indicator function

Source. Current research.

Considering the aforementioned methodology, XGBoost regressions excel in managing nonlinearity, using decision trees. Each tree partitions the data space into regions, enabling complex, piecewise constant approximations of the relationships between variables without the need to specify a functional form. This tree-based approach allows XGBoost to flexibly adapt to the data structure, capturing intricate patterns and interactions. Unlike traditional econometric methods that often require predefined functional forms and extensive feature engineering to handle nonlinearity, XGBoost’s method of building and combining trees iteratively corrects errors, effectively modeling complex nonlinear relationships and improving predictive performance.

Finally, in line with the approach outlined by Ma et al. (2020), the next phase involves utilizing XGBoost Feature Importance to quantify the contribution of each variable to the model’s predictive accuracy. Feature Importance in XGBoost assesses how significantly each individual variable influences the model’s prediction, indicating the utility of a particular variable in the context of the current model and its predictive capability.

Granger Causality

To capture the complexity of relationship between inequality and growth, the XGBoost method is employed due to its ability to model intricate relationships and interactions between variables. However, to infer a statistically significant relationship between the identified inequality measure and economic growth the Granger causality test is utilized. The rationale behind this choice is twofold: Firstly, the purpose of machine learning (ML) methods such as XGBoost is primarily predictive, in this research focusing on identifying and ranking the importance of features that contribute to accurate predictions. However, to make inferences regarding the impact of one variable on the other, econometric panel regression methods, such as Granger causality tests, are necessary, verifying whether the variables highlighted by XGBoost have a statistically significant impact (Varian, 2014). Secondly, the use of XGBoost in this research represents the exploratory phase, where complex patterns and relationships are uncovered without imposing a specific functional form. The Granger causality test, on the other hand, represents the confirmatory phase, where the key findings are tested in a more constrained and traditional econometric framework. This means the Granger causality test serves as a complementary validation tool. This dual approach reinforces the robustness and reliability of the findings across different methodological paradigms.

The Granger causality test involves comparing two models: the first model considers only the past values of y_t for prediction, while the second model incorporates both x_t and y_t to forecast y_t. If a significant difference is observed between these two models, it suggests that the added variable (x_t) Granger-causes y_t. In these equations, “C” represents control variables.

$Mode l_{1} : y_{t} = α_{0} + α t + \sum_{i = 1}^{p} α_{i} y_{t - i} + \sum_{i = 1}^{p} γ_{i} C_{t - i} + ϵ_{t}$ (11)

$Mode l_{2} : y_{t} = α_{0} + α t + \sum_{i = 1}^{p} α_{i} y_{t - i} + \sum_{i = 1}^{p} γ_{i} C_{t - i} + \sum_{i = 1}^{p} β_{i} x_{t - i} + ϵ_{t}$ (12)

The subsequent stage of the Granger causality test entails comparing the models’ residual sum of squares (RSS) utilizing the Fisher test.

$F = \frac{(RS S_{1} - RS S_{2}) / p)}{(RS S_{2} / n - 2 p - 1)},$ (13)

Next, two hypotheses are tested.

$H_{0} : β_{i} = 0, \forall i \in [1, \dots, p]$

$H_{1} : β_{i} \neq 0, \exists i \in [1, \dots, p]$

A recent version of Granger-causality developed by Juodis et al. (2021) is utilized. Granger causality developed by Juodis et al. (2021) exhibits several advantages over its previous versions. A significant one being that it enables panel, and not just time-series Granger tests. Additionally, this method accounts for cross-sectional heteroscedasticity and allows for multivariate Granger tests, without reducing the degrees of freedom. Furthermore, it allows for cross-sectional dependence, making it a more comprehensive and robust Granger causality test.

Data

The inequality data comprises the three groups of economic inequality indicators including income inequality, poverty, and wealth inequality for 150 countries from 1980 to 2020. The World Income Inequality Database (Atkinson & Brandolini, 2001; Deininger & Squire, 1996; WIID, 2021), serves as the main source of income inequality for this study. Table 4 depicts different inequality indicators, corresponding descriptions and sources.

Table 4.

Inequality Variables.

Index	Indicator	Description	Source
Income Inequality	$gini_mkt$	Market income Gini coefficient/ Before-tax	SWIID (Solt, 2020)
	$gini_disp$	Disposable income Gini coefficient/ After-tax	SWIID (Solt, 2020)
	$gem 1$	GE(−1): Generalized Entropy index. A value of −1 designatesvery sensitive to disparities in income at the lowest end of the scale.	(WIID)
	$ge 0$	GE(0), L-Theil: Generalized Entropy index. A value of 0 designates sensitive to inequality at the lower end of income distribution
	$ge 1$	GE(1), H-Theil: Generalized Entropy index. A value of 1 designates sensitive to inequality at the top of income distribution
	$ge 2$	GE(2), .5CV2: Generalized Entropy index. A value of 2 designates very sensitive to the low end of the income distribution
	$a 025$	Atkinson(0.25): Very slightly more sensitive to inequality at the bottom of the income distribution
	$a 050$	Atkinson(0.50): Slightly more sensitive to inequality at the bottom of the distribution
	$a 075$	Atkinson(0.75): More sensitive to inequality at the bottom of the distribution
	$a 1$	Atkinson(1): Highly sensitive to inequality at the bottom of the distribution
	$a 2$	Atkinson(2): Very highly sensitive to inequality at the bottom of the distribution
	Palma	Palma ratio: Top 10% / Bottom 40%
	$s 80 s 20$	Top 20%/Bottom 20
	$bottom 5$	Income share-bottom 5%
	$bottom 20$	Income share-bottom 20%
	$bottom 40$	Income share-bottom 40%
	top5	Income share-top 5%
	top10	Income share-top 10%
	$top 20$	Income share-top 20%
	$middle 50$	Income share-middle 50%
	Ginia	Absolute Gini: Disposable Gini × GDP per capita/1,000
	Sd	Standard Deviation of Absolute Gini
Poverty	Poverty gap at $1.90	Mean shortfall in income from the poverty line $1.90 a day as a percentage of the poverty line (2011 PPP) (%)	WB (2021)
	Poverty gap at $3.20	Mean shortfall in income from the poverty line $3.20 a day as a percentage of the poverty line (2011 PPP) (%)
	Poverty gap at $5.60	Mean shortfall in income from the poverty line $5.60 a day as a percentage of the poverty line (2011 PPP) (%)
	Poverty headcount at $1.90	Percentage of the population living on less than $1.90 a day(2011 PPP) (%)
	Poverty headcount at $3.20	Percentage of the population living on less than $3.20 a day(2011 PPP) (%)
	Poverty headcount at $5.60	Percentage of the population living on less than $1.90 a day (2011 PPP) (%)
Wealth	Wealth Gini	Wealth Gini coefficient	(WID)
	Wealth top 1%	Wealth share of the top 1%
	Wealth top 10%	Wealth share of the top 10%
	Wealth middle 40%	Wealth share of the middle 40%
	Wealth bottom 50%	Wealth share of the bottom 50%

Source. Current research.

Note. The reference year for all data sources is 2021.

As depicted in Table 4, for net and market Gini, the new and expanded dataset (SWIID), developed by Solt (2016, 2020) is used in this paper. The dataset contains a number of crucial elements that are missing from other data sources that have been used in earlier research. Specifically, Atkinson and Brandolini (2006) recommendations have been included into the most recent version of the SWIID, greatly improving it and offering the most comparable data available for researchers studying income disparity across large countries (see Solt, 2016). The six poverty indicators used in this paper, which include poverty gap ratios and poverty headcount ratios at thresholds of $1.9, $3.2, and $5.5 per day, were all extracted from the World Bank 2021 database. It is the only reliable dataset on poverty having worldwide coverage. For wealth inequality, the WID is utilized.

It should be noted that WID seeks to measure income and wealth distributions primarily through the use of tax statistics, which is often met with skepticism from economists due to the individuals’ incentives to minimize their tax liabilities. Because of this, studies that use tax data to try and determine wealth distributions may be impacted by the inherent biases in the data. But WID data source solved this problem by building their datasets with a wealth of data on capital and non-capital assets. This method not only allayed some of the well acknowledged issues with tax data, but it also enabled the writers to incorporate income from capital into their analyses of wealth distribution and income. Following Motahar and Mamipour (2025a), this paper also employs 12 control variables as presented in Table 5.

Table 5.

Control Variables.

Index	Description or proxy	Data source
Education	Average years of education	Barro-Lee
Corruption	An average value of multiple corruption measures including:Absence of Corruption, Access to justice, Executive bribery and corrupt exchanges, Executive embezzlement and theft, Fair trial, Public sector corrupt exchanges, Public sector theft	World Bank
FDI	Foreign direct investment, net inflows (% of GDP)	World Bank
Financial development	Private credit by deposit money banks and other financial institutions (% of GDP)	IMF (2023)
Working age population (%)	Population ages 15–64 (% of total population)	World Bank
Price level of investment	Price level of capital formation	Penn World Table
Inflation	Inflation, consumer prices (annual %)	World Bank
Infrastructure	Average value of two measures, including fixed telephone subscriptions(per 100 people) and mobile cellular subscriptions (per 100 people)	Penn World Table
Investment (%GDP)	Total value of the gross fixed capital formation and changes in inventories and acquisitions less disposals of valuables (% of GDP)	World Bank
Trade (%GDP)	Sum of export and input goods and services (% of GDP)	World Bank
Initial income	GDP per capita	World Bank
Technology	Patent applications stock by residents	World Bank

Source. Author’s elaboration based on Motahar and Mamipour (2025a).

Results and Discussion

Clustering

The K-means algorithm involves data input in the form of a matrix, where feature values are recorded in their corresponding columns and each row corresponds to a data point to be categorized into clusters. This study employs a total of 45 features, comprising 33 inequality and 12 primary macroeconomic features, as presented in Tables 4 and 5. Each country, however, has 41 time series observations for each feature that must be consolidated into a single value for each feature to be used in the K-means model. Thus, the average time series values for each feature is computed for each country. Another input value for the K-means model is the K, that is, number of clusters. There isn’t a conclusive way to figure out K. To determine the ideal number of clusters that minimizes differences within the group, it is sometimes suggested to run many cycles of computations for various values of K to select the optimal number of clusters that minimizes deviations within the group (Jain, 2010). The elbow criteria is an additional method that evaluates the proportion of variation explained in relation to the number of clusters. In this method, the number of clusters should be selected where adding another cluster does not significantly improve information gained. Plotting the number of clusters against the proportion of variation explained by them, in particular, will result in an angle in the graph and a drop in marginal benefit at a given point. Figure 4 shows the optimal number of clusters. It should be noted that the ML technique and time trend have been considered in this study. In this section, averaging 41 observation has been mentioned, though this was only for the clustering part of this analysis and not the main methodology, which is finding the most significant feature for each cluster of countries using XGBoost method.

Figure 4.

Optimal number of clusters.

As the clustering results are seen in Figure 3, no clear-cut number of clusters fulfilling the elbow criterion exists. The preliminary calculations were made with the agglomerative method to evaluate the values of K, resulting in K = 4. Additional calculations with the aim of minimizing variance within the clusters and maximizing it between them assuming successively K = 3, 4, and 5 confirmed the appropriateness of K = 4. For comparability, the raw value of each variable is scaled based on the standard score method according to the following formula; μ and σ are population’s mean and standard deviation, respectively.

$z = \frac{x - μ}{σ}$

Table 6 depicts the K-means clustering results based on the specifications mentioned above. Note that each cluster has a centroid which is an imaginary location representing the center of that cluster in a 45-dimensional space of the features.

Table 6.

Countries Within Each Cluster Along With Their Distances From the Centroids (DC).

Cluster I	DC I	Cluster II	DC II	Cluster III	DC III	Cluster IV	DC IV
Australia	3.69	Albania	3.73	Afghanistan	6.36	Azerbaijan	4.49
Austria	3.76	Argentina	3.32	Angola	7.69	Bulgaria	3.73
Belgium	3.58	Armenia	3.94	Burundi	4.87	Bosnia	3.63
Canada	3.46	Bangladesh	4.56	Benin	2.77	Belarus	5.70
Switzerland	4.01	Bahrain	5.34	Burkina Faso	10.21	China	9.35
Cyprus	7.11	Belize	5.81	Bolivia	5.12	Croatia	4.22
Czechia	5.85	Brazil	6.72	Bhutan	5.48	Kazakhstan	8.25
Germany	3.63	Chile	6.83	Botswana	7.30	Lithuania	5.71
Denmark	3.42	Costa Rica	3.65	Central Africa	11.71	Latvia	4.41
Spain	5.30	Djibouti	7.65	Cote d’Ivoire	3.41	Moldova	4.06
Estonia	5.68	Dominican	3.53	Cameroon	2.44	Montenegro	4.13
Finland	4.23	Algeria	3.13	Congo Dem.	8.33	Poland	4.52
France	3.97	Ecuador	3.71	Congo	4.81	Romania	4.43
UK	3.84	Egypt	3.22	Colombia	5.78	Russia	3.56
Greece	5.50	Gabon	5.89	Ethiopia	3.54	Serbia	4.38
Hungary	5.45	Georgia	4.74	Ghana	3.94	Slovakia	7.52
Ireland	5.87	Guyana	4.31	Guinea	3.68	Ukraine	6.10
Iceland	3.95	Indonesia	3.88	Gambia	5.10
Israel	5.29	Iran	3.18	Guinea-Bissau	6.71
Italy	6.58	Iraq	4.66	Guatemala	3.80
Japan	6.23	Jamaica	4.34	Honduras	3.51
South Korea	5.87	Jordan	2.55	Haiti	5.14
Luxembourg	6.25	Kyrgyzstan	4.81	India	5.56
Malta	9.24	Cambodia	4.16	Kenya	2.69
Netherlands	3.06	Kuwait	5.67	Liberia	6.29
Norway	3.87	Laos	5.18	Madagascar	3.15
Portugal	5.36	Lebanon	4.15	Mali	4.51
Qatar	7.97	Sri Lanka	3.14	Mozambique	5.45
Slovenia	5.23	Morocco	2.57	Mauritania	4.24
Sweden	3.10	Maldives	6.47	Malawi	4.46
United States	10.27	Mexico	5.42	Namibia	9.59
		No. Macedonia	3.84	Niger	3.58
		Myanmar	5.50	Nigeria	5.46
		Mongolia	3.46	Nicaragua	6.27
		Mauritius	2.87	Nepal	4.75
		Malaysia	3.66	Peru	6.52
		Oman	5.16	Rwanda	4.35
		Pakistan	5.66	Sudan	4.28
		Panama	5.92	Senegal	2.92
		Philippines	3.20	Sierra Leone	3.60
		New Guinea	6.76	Chad	3.04
		Paraguay	5.50	Togo	3.83
		Saudi Arabia	5.48	Tanzania	4.25
		El Salvador	4.19	Uganda	2.32
		Syria	4.12	Zambia	6.52
		Thailand	4.36	Zimbabwe	9.13
		Tajikistan	5.21
		Turkmenistan	5.12
		Trinidad	3.50
		Tunisia	2.24
		Turkey	4.12
		Uruguay	3.45
		Uzbekistan	13.03
		Venezuela	10.38
		Vietnam	3.50
		Yemen	4.41

Source. Author’s calculations.

The first cluster mainly comprises developed countries based on the International Monetary Fund (IMF) list of advanced economies, although there are some discrepancies between the two lists. Qatar, for instance, is part of the first cluster despite not being classified as an advanced economy by the IMF. The second cluster could be referred to as the developing world, as it mainly includes emerging economies, though Myanmar, Bangladesh, and Cambodia are considered low-income countries by the IMF. The third cluster could be characterized as underdeveloped countries, with the most significant mismatch being India, which the IMF categorizes as a developing country, unlike our clustering. Remarkably, the K-means method has created a fourth cluster of countries consisting solely of pre-communist countries. However, some countries such as Tajikistan or Armenia, were previously part of the Eastern Bloc but are located in other clusters. China is the largest economy in the fourth cluster.

The clustering analysis reveals a significant discovery: countries within each cluster share similarities not only in terms of development features but also in their inequality measures. This implies that clustering analysis solely based on macroeconomic features or alternatively inequality measures would yield virtually similar results. To validate this, we adjusted the weight of control variables incrementally by 0.1 from 0.1 to 1 and then again by 1 from 1 to 10. We observed that the K-means clustering outcome remained relatively consistent with the reported results above, which did not involve weight adjustments. Obviously, some discrepancies were present in this analysis as well, as seen in the case of the US, which, despite being a developed country, exhibits a significantly higher level of inequality in comparison to the average level observed in other developed countries. In short, these findings indicate that developed countries with some exceptions are also similar to each other in terms of economic inequality and the same rule applies to the developing, underdeveloped and pre-communist countries.

Feature Importance

In Table 7, to benchmark the accuracy of different ML methods, we present the out-of-sample forecast performance with our data that is used to predict GDP per capita growth. As indicated in the first column, we use Mean Squared Out-of-sample Error (MSE), Mean Absolute Out of-sample Error (MAE) and R squared as error metrics. C denotes cluster.

Table 7.

Statistical Error Measures of the Employed Models.

Metric	XGBoost				Random Forest				Lasso (alpha = .2)
Metric	CI	CII	CIII	CIV	CI	CII	CIII	CIV	CI	CII	CIII	CIV
MSE	10.53	6.55	11.53	10.76	24.89	11.68	20.29	26.41	40.47	13.37	37.56	44.12
MAE	2.003	1.272	2.104	2.012	3.121	2.371	3.138	2.145	3.971	2.340	3.726	2.156
R ²	.455	.731	.450	.421	.212	.481	.096	.564	.082	.164	.039	.112

Source. Author’s calculations.

Note. MSE = mean squared out-of-sample error; MAE = mean absolute out of-sample error; C = cluster.

Due to its accuracy, XGBoost is the selected methodology for the feature importance evaluation in this research. It should also be noted that this research deliberately refrains from employing PCA since the method assumes that the data is linearly related to its principal components. Other than that, PCA may not preserve the interpretability of the features.

Here by utilizing XGBoost method, without the restriction to predefine any model, GDP per capita growth is regressed on various inequality measures one-by-one, along with control variables, within a panel data framework to find the most important feature for each cluster that forecasts growth. The list of inequality measures and control variables are as in Tables 6 and 7, respectively. In the XGBoost model, there are model-related parameters (hyperparameters), such as tree depth and the number of trees, which determine the structure and complexity of the model. Finding a set of optimal hyperparameter values (hyperparameter tuning) is vital to improving an ML model’s overall performance, as a lack of hyperparameter tuning can lead to inaccurate results and higher prediction errors. Following Motahar and Mamipour (2025a), this research utilizes the grid search tuning approach to determine the best hyperparameter configuration for the XGBoost model, employing the GridSearchCV tool from the Scikit-learn library in Python. The grid search method systematically examines all possible combinations of predefined hyperparameters to find the configuration that results in the lowest root mean square error on the validation dataset. The parameters of the estimator utilized in this process are optimized by performing cross-validation on a parameter grid. In accounting for each country’s institutional particularity, we assigned country code variables. These variables have been transformed into a categorical parameter using a one-hot encoding method. A binary of 1 or 0 value shows the new categorical column constructed based on the previous categorical value. After model tuning, each feature’s split weight and average gain are generated, and then normalized to calculate the weight- and gain-based relative importance scores, respectively. The scores measure the relative contribution of each feature to the accuracy of the predictive model in XGBoost, with higher scores indicating greater relative importance.

Figure 5 presents the feature importance of all our inequality measures for each cluster. In this figure the inequality features are divided into three groups: income inequality, poverty, and wealth inequality measures, represented in orange, red, and crimson, respectively. The feature with the highest feature importance has a green tick beside it. For clusters I and II, it can be observed that almost all the wealth inequality measures are among the top most influential factors in the XGBoost analysis, with wealth Gini and wealth share of top 1% being the most important factors. These results align with the findings of many earlier studies demonstrating the theoretical superiority of wealth disparity over income inequality as a term for assessing the influence of inequality on growth (Aghion et al., 1999).

Figure 5.

Feature importance of inequality measures in each cluster.

According to empirical works by Alesina and Rodrik (1994), Birdsall and Londoño (1997), and Deininger and Olinto (1999), when both measures of inequality are taken into account, the coefficient on the income Gini is frequently no longer significant. In another study, Bagchi and Svejnar (2015) examined the impact of all three categories of inequality simultaneously. Their results suggest that when proxies are included for wealth inequality, income inequality, and poverty in one regression panel, the former remains significant while the other two measures become insignificant. Their panel mainly consisted of developed and developing countries.

In the third cluster, though, disposable Gini is the most influential factor on economic growth. As this cluster mainly consists of underdeveloped African economies, it is not surprising that income inequality plays a major role here since it is extremely high in this region (WID, 2021). In the fourth cluster, XGBoost finds several income inequality measures such as ge1, Palma ratio, and income share of top 20% more important than wealth inequality measures. This finding could be explained by the low rates of wealth inequality in the fourth cluster (WID, 2021). This might be because state-owned rental buildings were quickly sold off once socialism ended, leading to extremely high rates of homeownership in the pre-socialist countries (Ronald, 2008). Further investigation on the interpretation of the results for the third and fourth clusters is required.

To assess the robustness of the results obtained, the random forest algorithm has been utilized, and it yielded comparable outcomes with XGBoost. The most important feature remained consistent in all the clusters, even when the time lags of the variables were varied (Motahar & Mamipour, 2025a). Additionally, we conducted separate analyses including all 33 inequality measures in the model simultaneously as well as individually alongside the control variables, but this did not substantially alter our findings. This is because XGBoost is generally robust in handling multicollinearity issue These consistent findings indicate that the identified feature is a robust predictor for the target variable, and its importance is not unduly influenced by the choice of algorithm, time lag, or inclusion of additional inequality measures. It is worth noting that since the inequality measures are highly correlated, all reflecting economic inequality level across countries, it is not surprising that significant difference in the feature importance of these measures on growth is not observed.

In the end, it should be noted that poverty measures in the first and fourth clusters, in contrast to the second and third ones, have a low feature importance level. This could be attributed to the very low poverty levels in the developed and pre-communist countries. Other than that, the main finding of this research could be the following: Wealth inequality measures, namely wealth Gini and wealth share of the top 10%, are the only inequality indicators that have a relative feature importance of above 5% in all the four clusters. This result is in line with the earlier discussed literature such as Bagchi and Svejnar (2015), finding wealth inequality the most significant economic inequality index impacting economic growth.

Granger Causality

XGBoost is a purely predictive approach; therefore, a significant relationship between the previous step’s variables and the target variable cannot be necessarily inferred. The series must be stationary in order for the causality results to be valid. The results indicate that wealth Gini and top 1% wealth are stationary in their corresponding clusters, while the disposable Gini and Palma are nonstationary at level. Therefore, the first-difference of the latter two variables was employed. Besides, the employed Granger method allows for cross-sectional dependence. In the case of heteroscedasticity, the method developed by Juodis et al. (2021) is used. The Granger non-causality test was performed four times, to examine whether there is a Granger causal relationship between the wealth Gini of the first cluster, the top 1% share of wealth of the second, the disposable Gini of the third, and the Palma ratio of the fourth with their corresponding GDP per capita growths. The results are depicted by Table 8.

Table 8.

Granger Causality Tests Results.

Cluster	Null hypothesis	HPJ Wald test	p-Value	Result
I	Wealth Gini does not granger-cause GDP Growth	3.8745	.0490	Rejected
II	Top 1% share of wealth does not granger-cause GDP Growth	4.9020	.0268	Rejected
III	Disposable Gini does not granger-cause GDP growth	3.2854	.0699	Rejected
IV	Palma ratio does not granger-cause GDP growth	10.4437	.0012	Rejected

Source. Author’s calculations.

The results imply that in all four clusters, the corresponding inequality measures have granger caused GDP per capita growth with a significance level of 10%; meaning that a significant relationship among the variables could also be inferred.

Conclusion and Policy Implications

This study employed Machine Learning techniques to investigate, between 1980 and 2020, how inequality forecasts growth in a sample of 150 nations. By considering several inequality measures, the nonlinearity and complexity of the relationship among the variables, and the heterogeneity of the countries in the panel, the empirical approach departs significantly from the conventional panel data models that are commonly used in the literature. A major finding of this study indicates that countries within each cluster share not only similar development factors but also show remarkable homogeneity in their level of inequality.

The paper’s most notable discovery is that wealth inequality tends to be a comparatively significant inequality index impacting GDP per capita growth for all the country clusters. Previous empirical investigations encompassing both developing and developed nations have consistently demonstrated a detrimental association between wealth inequality and economic growth. Because of this it can be concluded that policy debates on the impact of inequality on growth should prioritize wealth redistribution rather than income redistribution and poverty reduction, since wealth exerts a more pronounced influence on growth compared to income inequality and poverty, especially in the developing and developed world.

In terms of social policy implications, the results of key inequality indicators forecasting economic growth under heterogeneity and nonlinearity assumption, suggest that the focus of debate on forecasting economic growth would be attributed more to wealth distribution rather than income distribution when wealth inequality is the most significant economic inequality index impacting economic growth. This urgency is further underscored by the substantially elevated levels of wealth inequality prevalent across all countries, with the global average income Gini coefficient resting at approximately 0.39, while the corresponding average wealth Gini coefficient stands significantly higher at 0.76. This accentuates the reality that wealth accumulation is considerably more concentrated at the upper echelons than mere income generation. Further research in this area is obviously needed, especially on estimating the impact of the significant inequality measures found in this paper on the growth for each country cluster.

Footnotes

The authors would like to express their sincere appreciation to the Editor-in-Chief and anonymous referees for their helpful comments and suggestions which tremendously improved the quality of the paper.

ORCID iD

Masoud Yahoo

Ethical Considerations

Ethics Committee Approval and Informed Consent Approval were not needed for this study because this research does not involve interaction with or observation of people,and/or the use of peoples’ data and does not include human subjects;Accordingly,this study does not include any clinical study and does not involve experimentation on animals.

Author Contributions

Seyed Armin Motahar: Conceptualization,model analysis,data curation,writing the initial draft;Masoud Yahoo: model analysis,research and investigation,writing,review,editing and presentation of the published work.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This research was supported by the MPOB-UKM endowment chair,Universiti Kebangsaan Malaysia (grant code: MPOB-UKM-2023-008). The authors would like to express their sincere appreciation to the Editor-in-Chief and three anonymous referees for their helpful comments and suggestions which tremendously improved the quality of the paper.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Data Availability Statement

Data and materials will be made available on request.

Generative AI and AI-Assisted Technologies in the Writing Process

The authors declare that they have not use generative artificial intelligence (AI) and AI-assisted technologies in the writing process.

References

Acheampong

A. O.

Adebayo

T. S.

Dzator

Koomson

(2023). Income inequality and economic growth in BRICS: Insights from non-parametric techniques. Journal of Economic Inequality, 21(3), 619–640.

Aghion

Caroli

Garcia-Penalosa

(1999). Inequality and economic growth: the perspective of the new growth theories. Journal of Economic Literature, 37(4), 1615–1660.

Albig

Clemens

Fichtner

Gebauer

Junker

Kholodilin

(2017). How rising income inequality influenced economic growth in Germany. DIW Economic Bulletin, 7(10), 113–121.

Alesina

Perotti

(1996). Income distribution, political instability, and investment. European Economic Review, 40(6), 1203–1228.

Alesina

Rodrik

(1994). Distributive politics and economic growth. Quarterly Journal of Economics, 109(2), 465–490.

Atkinson

A. B.

Bourguignon

(2000). Introduction: Income distribution and economics. In Atkinson

A. B.

Bourguignon

(Eds.), Handbook of income distribution (Vol. 1, pp. 1–58). Elsevier B. V.

Atkinson

A. B.

Brandolini

(2001). Promise and pitfalls in the use of “secondary” data-sets: Income inequality in OECD countries as a case study. Journal of Economic Literature, 39(3), 771–799.

Atkinson

A. B.

Brandolini

(2006). From earnings dispersion to income inequality. In Farina

Savaglio

(Eds.), Inequality and economic integration (pp. 35–62). Routledge.

Atkinson

A. B.

Brandolini

(2015). Unveiling the ethics behind inequality measurement: Dalton's contribution to economics. Econometrics Journal, 125(583), 209–234.

10.

Atkinson

A. B.

Piketty

Saez

(2011). Top incomes in the long run of history. Journal of Economic Literature, 49(1), 3–71.

11.

Babu

M. S.

Bhaskaran

Venkatesh

(2016). Does inequality hamper long run growth? Evidence from emerging economies. Economic Analysis and Policy, 52, 99–113.

12.

Bagchi

Svejnar

(2015). Does wealth inequality matter for growth? The effect of billionaire wealth, income distribution, and poverty. Journal of Comparative Economics, 43(3), 505–530.

13.

Baltagi

B. H.

(2008). Econometric analysis of panel data (Vol. 4). Springer.

14.

Banerjee

A. V.

Duflo

(2003). Inequality and growth: What can the data say? Journal of Economic Growth, 8, 267–299.

15.

Barro

R. J.

(2000). Inequality and growth in a panel of countries. Journal of Economic Growth, 5, 5–32.

16.

Baselgia

Foellmi

(2022). Inequality and growth: A review on a great open debate in economics. WIDER Working Paper.

17.

Benhabib

Bisin

Luo

(2017). Earnings inequality and other determinants of wealth inequality. American Economic Review, 107(5), 593–597.

18.

Benos

Karagiannis

(2018). Inequality and growth in the united states: Why physical and human capital matter. Economic Inquiry, 56(1), 572–619.

19.

Berg

Ostry

J. D.

Tsangarides

C. G.

Yakhshilikov

(2018). Redistribution, inequality, and growth: New evidence. Journal of Economic Growth, 23, 259–305.

20.

Bertola

Foellmi

Zweimüller

(2005). Income distribution in macroeconomic models. Princeton University Press.

21.

Bhorat

van der Westhuizen

(2008). Economic growth, poverty and inequality in South Africa: The first decade of democracy. In Development Policy Research Unit Conference (Vol. 5, No. 1).

22.

Birdsall

Londoño

J. L.

(1997). Asset inequality matters: An assessment of the World Bank’s approach to poverty reduction. American Economic Review, 87(2), 32–37.

23.

Bradbury

Triest

R. K.

(2016). Inequality of opportunity and aggregate economic performance. RSF: The Russell Sage Foundation Journal of the Social Sciences, 2(2), 178–201.

24.

Braun

Parro

Valenzuela

(2019). Does finance alter the relation between inequality and growth? Economic Inquiry, 57(1), 410–428.

25.

Brueckner

Lederman

(2018). Inequality and economic growth: The role of initial income. Journal of Economic Growth, 23, 341–366.

26.

Cartone

Postiglione

Hewings

G. J. D.

(2021). Does economic convergence hold? A spatial quantile analysis on European regions. Economic Modelling, 95, 408–417.

27.

Castelló-Climent

(2010). Inequality and growth in advanced economies: An empirical investigation. Journal of Economic Inequality, 8, 293–321.

28.

Chambers

Krause

(2010). Is the relationship between inequality and growth affected by physical and human capital accumulation? Journal of Economic Inequality, 8, 153–172.

29.

Charles-Coll

J. A.

(2013). The debate over the relationship between income inequality and economic growth: Does inequality matter for growth? Research in Applied Economics, 5(2), 1.

30.

Chen

B.-L.

(2003). An inverted-U relationship between inequality and long-run growth. Economics Letters, 78(2), 205–212.

31.

Chen

Guestrin

(2016). Xgboost: A scalable tree boosting system [Conference session]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

32.

Cingano

(2014). Trends in income inequality and its impact on economic growth. OECD Social, Employment, and Migration Working Papers; No. 163 Paris. OECD Publishing.

33.

Clarke

G. R. G.

(1995). More evidence on income distribution and growth. Journal of Development Economics, 47(2), 403–427.

34.

Cohen

Ladaique

(2018). Drivers of growing income inequalities in OECD and European countries. In Carmo

Rio

Medgyesi

(Eds.), Reducing Inequalities. (pp. 31–43). Palgrave Macmillan.

35.

Davidson

Ravi

(2005). Clustering with constraints: Feasibility issues and the k-means algorithm [Conference session]. Proceedings of the 2005 SIAM International Conference on Data Mining.

36.

Deininger

Olinto

(1999). Asset distribution, inequality, and growth. World Bank.

37.

Deininger

Squire

(1996). A new data set measuring income inequality. World Bank Economic Review, 10(3), 565–591.

38.

Dhillon

I. S.

Modha

D. S.

(2001). Concept decompositions for large sparse text data using clustering. Machine Learning, 42, 143–175.

39.

Dudzevičiūtė

Prakapienė

(2018). Investigation of the economic growth, poverty and inequality inter-linkages in the European union countries. Journal of Security and Sustainability Issues, 7(4), 839–854.

40.

Dutt

Tsetlin

(2021). Income distribution and economic development: Insights from machine learning. Economics and Politics, 33(1), 1–36.

41.

El-Shagi

Shao

(2019). The impact of inequality and redistribution on growth. Review of Income and Wealth, 65(2), 239–263.

42.

Ferreira

I. A.

Gisselquist

R. M.

Tarp

(2022). On the impact of inequality on growth, human development, and governance. International Studies Review, 24(1), viab058.

43.

Forbes

K. J.

(2000). A reassessment of the relationship between inequality and growth. American Economic Review, 90(4), 869–887.

44.

Friedman

J. H.

(2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

45.

Galor

(2011). Inequality, human capital formation, and the process of development. In Hanushek

E. A.

Machin

Woessmann

(Eds.), Handbook of the economics of education (Vol. 4, pp. 441–493). Elsevier.

46.

Galor

Zeira

(1993). Income distribution and macroeconomics. Review of Economic Studies, 60(1), 35–52.

47.

Gordon

A. D.

(1999). Classification. CRC Press.

48.

Gründler

Scheuermeyer

(2015). Income inequality, economic growth, and the effect of redistribution. Würzburg Economic Papers No. 95. University of Würzburg, Department of Economics.

49.

Hailemariam

Dzhumashev

(2019). Income inequality and economic growth: Heterogeneity and nonlinearity. Studies in Nonlinear Dynamics and Econometrics, 24(3), 20180084.

50.

Halter

Oechslin

Zweimüller

(2014). Inequality and growth: The neglected time dimension. Journal of Economic Growth, 19, 81–104.

51.

Herzer

Vollmer

(2012). Inequality and growth: Evidence from panel cointegration. Journal of Economic Inequality, 10, 489–503.

52.

Hood

M. V.

Kidd

Morris

I. L.

(2008). Two sides of the same coin? Employing Granger causality tests in a time series cross-section framework. Political Analysis, 16(3), 324–344.

53.

Hou

(2020). Three essays on income inequality and economic growth. The University of Texas at Dallas.

54.

Hurlin

(2004). Testing Granger causality in heterogeneous panel data models with fixed coefficients. Document de recherche LEO, 5, 1–31.

55.

IMF. (2023). Financial development index database. IMF.

56.

Iradian

(2005). Inequality, poverty, and growth: Cross-country evidence. IMF Working Paper, WP/06/229.

57.

Iyke

B. N.

S. Y.

(2017). Income inequality and growth: New insights from Italy. Economia Internazionale/International Economics, 70(4), 419–442.

58.

Jain

A. K.

(2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.

59.

Juodis

Karavias

Sarafidis

(2021). A homogeneous approach to testing for Granger non-causality in heterogeneous panels. Empirical Economics, 60(1), 93–112.

60.

Juuti

(2020). Inequality and economic growth: A method-dependent relationship driven by the measure of income inequality? SSRN 3575624.

61.

Kaldor

(1955). Alternative theories of distribution. Review of Economic Studies, 23(2), 83–100.

62.

Kalecki

(1971). Selected essays on the dynamics of the capitalist economy 1933-1970. CUP Archive.

63.

Katz

L. F.

(1986). Efficiency wage theories: A partial evaluation. NBER Macroeconomics Annual, 1, 235–276.

64.

Kennedy

Smyth

Valadkhani

Chen

(2017). Does income inequality hinder economic growth? New evidence using Australian taxation statistics. Economic Modelling, 65, 119–128.

65.

Knowles

(2005). Inequality and economic growth: The empirical relationship reconsidered in the light of comparable data. Journal of Development Studies, 41(1), 135–159.

66.

Zou

(1998). Income inequality is not harmful for growth: Theory and evidence. Review of Development Economics, 2(3), 318–334.

67.

Likas

Vlassis

N. J.

Verbeek

(2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461.

68.

Lin

S.-C.

Huang

H.-C.

Kim

D.-H.

Yeh

C.-C.

(2009). Nonlinearity between inequality and growth. Studies in Nonlinear Dynamics and Econometrics, 13(2), 1–18.

69.

Lin

Y.-C.

Huang

H.-C.

Yeh

C.-C.

(2014). Inequality-growth nexus along the development process. Studies in Nonlinear Dynamics and Econometrics, 18(3), 237–252.

70.

(2012). A clustering method based on K-means algorithm. Physics Procedia, 25, 1104–1109.

71.

Luo

(2023). Inequality and economic growth: A literature review. Luo

(Ed.), In Inequality, demography and fiscal policy (pp. 123–131). Springer.

72.

Madsen

J. B.

Islam

M. R.

Doucouliagos

(2018). Inequality, financial development and economic growth in the OECD, 1870–2011. European Economic Review, 101, 605–624.

73.

Cheng

J. C. P.

Chen

Lin

Jiang

(2020). Identification of the most influential areas for air pollution control using XGBoost and grid importance rank. Journal of Cleaner Production, 274, 122835.

74.

Majeed

M. T.

(2016). Economic growth and income inequality nexus: An empirical analysis for Pakistan. Kashmir Economic Review, 25(1).

75.

Malinen

(2013). Inequality and growth: Another look with a new measure and method. Journal of International Development, 25(1), 122–138.

76.

Mankiw

N. G.

(2020). Principles of macroeconomics. Cengage learning.

77.

Marrero

G. A.

Servén

(2022). Growth, inequality and poverty: A robust relationship? Empirical Economics, 63(2), 725–791.

78.

Mdingi

S.-Y.

(2021). Literature review on income inequality and economic growth. MethodsX, 8, 101402.

79.

Molina

Garip

(2019). Machine learning for sociology. Annual Review of Sociology, 45, 27–45.

80.

P. H.

(2000). Income inequality and economic growth. Kyklos, 53(3), 293–315.

81.

Motahar

S. A.

Mamipour

(2025a). The impact of wealth inequality on economic growth: A machine learning approach. Computational Economics. https://doi.org/10.1007/s10614-025-10902-7

82.

Motahar

S. A.

Mamipour

(2025b). Growth effects of wealth inequality through transmission channels. Economies, 13(2), 41.

83.

Naguib

(2017). The relationship between inequality and growth: Evidence from new data. Swiss Journal of Economics and Statistics, 153, 183–225.

84.

Neves

P. C.

Afonso

Ó.

Silva

S. T.

(2016). A meta-analytic reassessment of the effects of inequality on growth. World Development, 78, 386–400.

85.

Neves

P. C.

Silva

S. M. T.

(2014). Inequality and growth: Uncovering the main conclusions from the empirics. Journal of Development Studies, 50(1), 1–21.

86.

Niyimbanira

(2017). Analysis of the impact of economic growth on income inequality and poverty in South Africa: The case of Mpumalanga Province. International Journal of Economics and Financial Issues, 7(4), 254–261.

87.

NVIDIA. (2025). NVIDIA Corporation. https://www.nvidia.com/en-us/glossary/xgboost/

88.

Okun

A. M.

(2015). Equality and efficiency: The big tradeoff. Brookings Institution Press.

89.

Ostry

M. J. D.

Berg

M. A.

Tsangarides

M. C. G.

(2014). Redistribution, inequality, and growth. International Monetary Fund.

90.

Panizza

U. G.

(2002). Income inequality and economic growth: Evidence from American data. Journal of Economic Growth, 7, 25–41.

91.

Partridge

M. D.

(1997). Is inequality harmful for growth? Comment. The American Economic Review, 87(5), 1019–1032.

92.

Perotti

(1996). Growth, income distribution, and democracy: What the data say. Journal of Economic Growth, 1, 149–187.

93.

Persson

Tabellini

(1994). Does centralization increase the size of government? European Economic Review, 38(3–4), 765–773.

94.

Pierdzioch

Gupta

Hassani

Silva

E. S.

(2022). Forecasting changes of economic inequality: A boosting approach. Social Science Journal, 59(2), 252–268.

95.

Piketty

(2014). Capital in the twenty-first century. Harvard University Press.

96.

Rangel

L. A.

Andrade

Divino

J. A.

(2002). Economic growth and income inequality in Brazil: Analyzing the comparable minimum areas. Working Paper No. 1312, Brazilian Institute for Applied Economic Research and University of Brasilia, Brazil.

97.

Ronald

(2008). The ideology of home ownership: Homeowner societies and the role of housing. Springer.

98.

Royuela

Veneri

Ramos

(2019). The short-run relationship between inequality and growth: Evidence from OECD regions during the Great Recession. Regional Studies, 53(4), 574–586.

99.

Scholl

Klasen

(2019). Re-estimating the relationship between inequality and growth. Oxford Economic Papers, 71(4), 824–847.

100.

Shahbaz

(2010). Income inequality-economic growth and non-linearity: A case of Pakistan. International Journal of Social Economics, 37(8), 613–636.

101.

Shen

Zhao

(2023). How does income inequality affects economic growth at different income levels? Economic Research-Ekonomska Istraživanja, 36(1), 864–884.

102.

Solt

(2016). The standardized world income inequality database. Social Science Quarterly, 97(5), 1267–1281.

103.

Solt

(2020). Measuring income inequality across countries and over time: The standardized world income inequality database. Social Science Quarterly, 101(3), 1183–1199.

104.

Stiglitz

J. E.

(2013). Inequality is a choice. The New York Times, 13.

105.

Topuz

S. G.

(2022). The relationship between income inequality and economic growth: Are transmission channels effective? Social Indicators Research, 162(3), 1177–1231.

106.

Varian

H. R.

(2014). Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28(2), 3–28.

107.

Venieris

Y. P.

Gupta

D. K.

(1986). Income distribution and sociopolitical instability as determinants of savings: A cross-sectional model. Journal of Political Economy, 94(4), 873–883.

108.

Voitchovsky

(2005). Does the profile of income inequality matter for economic growth? Distinguishing between the effects of inequality in different parts of the income distribution. Journal of Economic Growth, 10, 273–296.

109.

WB. (2021). Poverty & inequality database. The World Bank Economic Data.

110.

WID. (2021). Inequality database. World Inequality Database.

111.

WIID. (2021). Income inequality database. The World Income Inequality Database.

112.

Zhang

Zhan

(2017, April 17–20). Machine learning in rock facies classification: An application of XGBoost [Conference session]. International Geophysical Conference, Qingdao, China.

113.

Zheng

Yuan

Chen

(2017). Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies, 10(8), 1168.