Abstract
Keywords
Introduction
The problem of stock price forecasting (SPF) has always been one of the most widely studied issues, involving a comprehensive approach that focuses on the analysis of historical prices, price movements, or trends to forecast future prices. 1 Numerous models and predictions of stock prices have been proposed. 2 Because stock prices move in a random walk, 3 researchers claim that the financial information of the company will be systematically reflected in the current price. According to the Efficient Market Hypothesis (EMH), an efficient market is one where prices always reflect all available information, 4 and it is categorized into three forms of market efficiency: Weak form, semi-strong form, and strong form. In practice, investors and financial practitioners have commonly employed technical analysis and fundamental analysis for SPF or trading decision-making. 5 According to the research in the work of Nti et al., 6 fundamental analysis is the study of factors influencing supply and demand. Important data used for fundamental analysis include company data such as financial reports, annual company reports, and balance sheets.
A widely used method is time series analysis, which involves techniques for analyzing time series data to extract meaningful statistical attributes and characteristics of the data. The initial approach is decomposing the series, and commonly used methods are the Holt-Winters method 7 or the Census II X-11 method. 8 The autoregressive integrated moving average (ARIMA) approach is a widely used statistical method for analyzing and forecasting time series data.9,10 While ARIMA has demonstrated its utility in capturing short- to medium-term price trends, it can be difficult to handle the complex dynamics and non-linear patterns often observed in stock markets. To address the shortcomings of conventional SPF systems based on ARIMA approaches, a learning-based approach using machine learning (ML)11–13 and deep learning (DL) techniques was introduced.14,15 The ML approaches have shown significant promise in understanding the complexities of financial markets, characterized by dynamic interactions among various elements that influence stock prices. During the 2000s, in comparison to conventional probabilistic or ML approaches, Zuo and Kita 16 employed a Bayesian network (BN) to predict the price-earnings ratio (P/E ratio). A study by Adebiyi et al. investigated the performance of both ARIMA and artificial neural network (ANN) using stock data from the New York Stock Exchange. 17
Recently, DL approaches were applied to predicting stock prices. DL models can capture complex temporal dependencies and non-linear patterns that are prevalent in stock price movements. Modern models like the convolutional neural network (CNN), the long short-term memory (LSTM) network, and the bidirectional LSTM (BiLSTM) network utilize the approximation of a continuous function and adapt data with fewer assumptions, thereby achieving higher accuracy and efficiency in solving nonlinear issues. Furthermore, a hybrid model for SPF typically integrates multiple predictive modeling techniques to enhance the accuracy and reliability of stock price predictions.18–20
Performing a literature review on SPF is a necessary preliminary step before conducting the study or making decisions within this field. This review investigates the development of SPF techniques, ranging from traditional ARIMA methods to advanced DL methods. When starting with a literature review about SPF, we first take advantage of the capabilities of well-known platforms such as Scopus and Google Scholar. Our initial step involves entering targeted keywords into the Scopus database and combining various search terms. These include “stock price forecasting,” “stock price prediction,” “stock price forecasting using ARIMA,” “machine learning in stock prediction,” “deep learning model in stock forecasting,” “CNN, LSTM in stock price forecasting,” “GAN in stock price forecasting,” “transformer for stock price forecasting,” “graph-CNN in stock price forecasting,” and “hybrid models in stock price forecasting.” Through the application of specific filters, the search is fine-tuned to align with preferred publication dates, reputable journals, and subject areas. Essential keywords are used to ensure that the reviewed papers are relevant to the topic. Employing a parallel approach with Google Scholar, the same terms are entered, with particular emphasis on the “cited by” feature. This process leads to subsequent research papers that cite foundational works. The focus of our review was on specific forecasting models such as ARIMA, traditional ML, DL, and hybrid models for historical data or a particular stock market. We give priority to selecting papers for review that are primarily from peer-reviewed, reputable journals and conferences. At the same time, we exclude articles published in workshop publications or technical reports. Additionally, the keywords we select for the subject area are mainly related to computer science, engineering, economics, econometrics and finance, business, management and accounting, and decision sciences. Consequently, a total of 110 studies (only English-language papers) were identified, and those published in conferences or book chapters were removed by subject area, or they would be published in articles (
During the process of accessing these resources, summaries, key findings, methodologies, and significant conclusions are systematically extracted. Our literature review is based on the use of these carefully chosen findings, ensuring it is both comprehensive and based on the most recent developments in SPF. Furthermore, it is important to mention that we exclusively assessed papers introducing new models for time series data forecasting, with a specific focus on forecasting methods for traditional financial data, such as historical data and indicators. We also exclude methods applied to non-traditional sources, such as social media trends, news updates, or news sentiment. The concept of sentiment analysis and the implementation of existing models were not considered in this review. After reading the abstract (and, as needed, other sections), the implementation of inclusion and exclusion criteria yielded 73 papers. Figure 1 shows the search strategy in our study.

Search strategy for the selection method of the relevant studies.
Related works
In this section, we present a literature review on several common approaches that have been applied for SPF. Figure 2 shows a flowchart illustrating the common approaches to stock price forecasting outlined in the study.

A flowchart covering the various approaches outlined in the study.
ARIMA approaches
The well-known traditional statistics time series forecasting methods, such as ARIMA and its variants17,21–29 are still used a lot because of their efficiency level. Table 1 presents a summary of ARIMA-based approaches for SPF. For the articles reviewed, we summarize the methods used, comparison methods, datasets, target outputs, input features, metrics evaluations, and briefly discuss performance results. Low and Sakk 22 examined the performance of two forecasting models, ARIMA and LSTM, for predicting stock prices. ARIMA is combined with LSTM to determine which is superior in terms of forecasting accuracy. The models were applied to data from ten different stock tickers, specifically exchange-traded funds from various market sectors. The results suggest that ARIMA shows comparable accuracy to LSTM's long-term prediction capabilities. Wahyudi 23 employed the ARIMA model to predict the volatility of Indonesian stock prices. The best ARIMA model is determined using the Akaike information criterion (AIC) criteria. The results indicate that the ARIMA model can compete well with existing techniques for stock price prediction, especially in the short term. Pulungan et al. 24 applied a combination of autoregressive (AR) and moving average (MA) methods. The data needs to be stationary for ARIMA to be applied efficiently. The most fitting ARIMA model for this data was identified as ARIMA (31,1).
Summary of the existing ARIMA-based stock price forecasting approaches.
ARIMA: autoregressive integrated moving average; MAPE: mean absolute percent error; MSE: mean squared error; MAE: mean absolute error; LSTM: long short-term memory; CSPI: composite stock price index.
Machine learning approaches
ML techniques can capture nonlinear information in time series data without relying on stochastic data or economic knowledge. Thus, ML approaches can be used to build high-performance SPF systems without expert knowledge. The traditional ML algorithms, such as ANNs,11,30,31 k-nearest neighbors (KNN),32,33 support vector machine (SVM),34–40 ensemble models,41–47 and BN,48,49 have been successfully and widely used in SPF systems. Table 2 presents articles on SPF based on ML approaches.
Summary of the existing ML-based stock price forecasting approaches.
ANN: artificial neural network; KNN: k-nearest neighbor; SVM: support vector machine; SSA: singular spectrum analysis; RF: random forest; NB: Naïve Bayes; LDA: linear discriminant analysis; MAPE: mean absolute percent error; RMSE: root mean squared error; MSE: mean squared error; MAE: mean absolute error; AUC: area under the receiver operating characteristic curve; SPSS: statistical product and services solutions.
Sigo 30 explored the nonlinear movement patterns of three leading stocks on the Bombay Stock Exchange (BSE) in India. ANN is employed to analyze data spanning from 2008 to 2017. The results of the study aim to guide investors in making informed investment decisions and maximizing their returns by focusing on the most valuable stocks. Selvamuthu et al. 31 addressed the challenge of predicting stock prices in the Indian stock market. Recognizing that stock price data is inherently difficult to predict due to its dynamic nature, the authors explored the efficiency of ANN.
When applied to SPF, KNN is used to predict a stock's future price based on its past values. Yunneng 32 presented an enhanced version of the KNN algorithm for stock price predictions. This improvement aims to provide more accurate predictions of stock prices. Lin et al. 33 presented a novel method for improving the accuracy of stock time series forecasting using a multidimensional KNN algorithm. The results showed that the proposed method outperformed the other models in predicting stock prices, proving to be a more reliable and effective forecasting system.
SVMs have been primarily developed for classification problems, but their application has been extended to regression problems known as support vector regression (SVR). SVR can be applied to SPF. Xiao et al. 34 introduced a novel methodology for stock price analysis and forecasting, combining singular spectrum analysis (SSA) and SVM. Ismail et al. 35 aimed to predict the direction of stock price movement. The study introduced a hybrid method that combines various ML techniques—namely logistic regression (LR), ANN, SVM, and random forest (RF)—to enhance prediction accuracy.
Developing an ensemble model for SPF involves aggregating the predictions from multiple models to improve the accuracy and robustness of the predictions. Syukur and Istiawan 42 investigated the prediction of the LQ45 index on the Indonesia Stock Exchange (ISX) using various ML techniques. RF was found to obtain the best performance in predicting the LQ45 index compared to C4.5, SVM, LR, Naïve Bayes (NB), and linear discriminant analysis (LDA).
In the context of SPF, Bayesian neural networks (BNNs) enable the prediction of the likelihood of various stock prices, based on given evidence or observed variables. Chandra and He 48 explored the utilization of BNNs for forecasting stock prices. Malagrino et al. 49 explored the potential of BNNs to understand the influence of global stock market indices on iBOVESPA, the primary index of the São Paulo Stock Exchange in Brazil. The objective is to forecast the closing direction of iBOVESPA the next day. The BNN models were able to achieve a mean accuracy of around 71%, with a peak accuracy of nearly 78%, in predicting the daily closing direction of iBOVESPA.
Deep learning approaches
A DL model can effectively outperform traditional SPF systems in terms of accuracy. CNNs50–54 and recurrent neural network (RNN) such as LSTM or gated recurrent unit (GRU),55–60 and BiLSTM61–64 are extensively employed for SPF systems. Table 3 presents a summary of DL-based approaches for SPF.
Summary of the existing DL-based stock price forecasting approaches.
ARIMA: autoregressive integrated moving average; BiLSTM: bidirectional long short-term memory; LR: logistic regression; GRU: gated recurrent unit; PCA: principal component analysis; AE: auto-encoder; ECA: efficient channel attention; MAPE: mean absolute percent error; MSE: mean squared error; RMSE: root mean squared error; MAE: mean absolute error; DL: deep learning.
CNNs are commonly used for image and video processing; however, their proficiency in identifying hierarchical patterns can extend to time series forecasting as well. Gunduz et al. 50 applied a CNN model that utilized specially ordered features derived from various indicators, prices, and temporal information appropriate to stocks in the Borsa Istanbul 100. Wen et al. 51 presented an approach to forecasting stock market trends utilizing financial time series data, exemplified by the S&P 500. CNN explored to distinguish the spatial structure inherent in the time series.
RNNs have been proposed for SPF systems. Results have demonstrated that methodologies based on RNN can outperform classic ML techniques. RNNs can handle sequences of variable length, offering flexibility in managing time series data of diverse lengths. To overcome the vanishing gradient problem intrinsic to RNNs, LSTM networks, GRU, and their variants were developed. The utilization of LSTMs has been substantiated as effective in accurately forecasting stock prices. Ghosh et al. 55 demonstrated the efficacy of employing both LSTM networks and RFs to forecast directional movements of stock prices from the S&P 500 index for intraday trading. Authors 56 proposed an optimized approach for predicting stock prices using advanced DL techniques, such as LSTM and GRU models. The authors employed DL LASSO and principal component analysis (PCA) for dimensionality reduction, focusing on various factors influencing stock prices.
BiLSTM is often used for sequence-to-sequence learning tasks, like SPF. The BiLSTM allows the model to capture both past and future information around a specific time step, potentially enhancing the model's ability to understand the underlying patterns in the sequence. Xu et al. 62 focused on utilizing a stacked DL structure for stock market predictions, specifically aiming to predict the stock price of the subsequent day. This model employs historical stock price data sourced from Yahoo Finance and integrates several methodologies, including the wavelet transform technique, stacked autoencoder, RNN, and BiLSTM. Liu et al. 64 employed an auto-encoder (AE) technique to extract stock price series data, recognizing its proficiency in managing the non-smooth and non-linear characteristics inherent in the data. The core structure of the AE incorporates a BiLSTM module, which allows the model to efficiently extract substantial historical and prospective information from stock price series data.
Hybrid approaches
In SPF, hybrid models refer to combinations of different models aiming to leverage the strengths and reduce the drawbacks of individual methods. Hybrid models can achieve higher predictive accuracy than single models. However, they also come with challenges, such as increased model complexity, potential difficulties in model interpretation, and the requirement for extensive tuning and validation. These hybrid models can generally be categorized into two main types: hybrid traditional approaches and hybrid DL approaches. Table 4 presents a summary of hybrid-based approaches for SPF.
Summary of the existing hybrid-based stock price forecasting approaches.
CT-GCNN: conceptual-temporal graph convolutional neural network; BiLSTM: bidirectional long short-term memory; KNN: k-nearest neighbor; SVM: support vector machine; SVR: support vector regression; RF: random forest; RNN: recurrent neural network; KPCA: Kernel principal component analysis; OFS: orthogonal forward selection; teaching-learning-based optimization; FF: firefly algorithm; PSO: particle swarm optimization; RS: random search; ECA: efficient channel attention; DCGAN: deep convolutional generative adversarial network; GAN: generative adversarial network; MAPE: mean absolute percent error; MSE: mean squared error; RMSE: root mean square error; MAE: mean absolute error; AUC: area under the receiver operating characteristic curve
Hybrid traditional approaches65–70 typically combine traditional statistical methods with ML techniques, or they combine various ML approaches with each other. Nayak et al. 65 introduced a hybrid model that integrates both the SVM and KNN techniques for predicting Indian stock market indices. The model's performance was evaluated using the mean squared error (MSE), and it was found that the SVM-KNN model outperformed several baseline models. Siddique and Panda 66 compared various hybrid ML models for prediction. These models utilized dimension reduction techniques such as orthogonal forward selection (OFS) and kernel PCA (KPCA). They were combined with SVR and teaching-learning-based optimization (TLBO). The study concluded that the model incorporating KPCA (KPCA-SVR-TLBO) outperformed and was more feasible than the model employing OFS (OFS-SVR-TLBO).
Hybrid DL approaches frequently combine DL techniques with traditional methods71–75 or DL architectures with each other, such as CNN-LSTM, LSTM or BiLSTM with attention mechanisms (AMs), transformer models, and graph convolutional neural network (GraphCNN).76–90 These hybrid DL models prove to be efficient in identifying complex patterns and relationships in data due to the high capacity and adaptability of DL architectures, especially in applications like SPF. Chandar 71 proposed a new method for stock trading by combining technical indicators and CNNs, termed TI-CNN. The model uses ten technical indicators derived from historical stock data, converts them into an image using gramian angular field, and then inputs this into the CNN. Korade and Zuber 72 explored the usage of CNN for SPF and aim at optimizing the CNN hyperparameters using different optimization techniques. The authors employ the firefly algorithm (FF), particle swarm optimization, and random search for optimizing the hyperparameters, comparing their performance based on different evaluation metrics applied to training and testing datasets.
The study by Lu et al. 76 proposed a method for forecasting stock prices utilizing a hybrid CNN-LSTM model. This model utilizes CNN for efficient feature extraction from historical data and LSTM to analyze relationships in time-series data, subsequently predicting stock prices. Wang et al. 78 aimed to predict the closing price of stocks using a composite model called CNN-BiSLSTM. Here, the BiSLSTM represents bidirectional special LSTM.
The integration of AMs with LSTMs in SPF models presents the possibility of improved prediction accuracy and reliability. An attention-LSTM model can analyze historical stock prices and potentially other relevant information to predict future stock prices. Lu et al. 81 discussed a combined approach using CNN, BiLSTM, and attention mechanism for predicting stock prices. The results showed that the CNN-BiLSTM-AM method outperforms seven other methods in accuracy. The study referenced by Chen et al. 82 introduced a novel model for predicting stock prices, utilizing a CNN, a BiLSTM, and an efficient channel attention (ECA) module.
Transformers were developed to reduce the limitations inherent to AMs and recurrent models like RNNs. Specifically, they address the challenges brought about by the inherent sequential processing of RNNs and the high computational demands of AMs, allowing for more efficient and scalable modeling of sequential data. When employed for SPF, transformer models are good at identifying complex patterns within time series data and understanding the long-term dependencies existing between various time steps. Wang 85 introduced a novel method named BiLSTM-MTRAN-TCN for predicting stock prices. This method used BiLSTM, an improved transformer model (MTRAN-TCN), and TCN (temporal convolutional network), aiming to explore the individual benefits of each model. Li and Qian 86 introduced a novel hybrid neural network—the FDG-transformer—specifically developed for predicting stock prices.
Recently, generative adversarial networks (GANs) and GraphCNN have been applied for SPF, often achieving high accuracy. GANs can be used to generate synthetic time-series data that mimics real stock price movements. This synthetic data can help augment the training data, allowing models to generalize better to unseen data and potentially leading to more accurate forecasts. The Wu et al. 87 introduced a novel framework that combines GAN with piecewise linear representation for predicting stock market trading actions such as buying, selling, and holding. Staffini 88 proposed a novel approach to predicting stock prices using a deep convolutional GAN (DCGAN). The generator model of the GAN learns to generate data like real stock prices. The discriminator model learns to distinguish between real and generated stock prices. The results show that the proposed DCGAN model outperformed standard, widely used tools for forecasting stock prices.
GraphCNNs extend convolutional operations from regular grids to irregular graphs. This advancement allows models to effectively capture the relational structures and dependencies between different entities. GraphCNNs are particularly applicable to SPF, where they can model the relationships between different stocks or between different features of a single stock. The work of Wang et al. 89 proposed a new model for stock price prediction, integrating a knowledge graph, GraphCNN, and community detection. This model aims to overcome the limitations of existing models, which often neglect deeper influencing factors and rely on small-scale stock datasets. Fuping 90 concentrated on predicting stock price movements using a novel method called the conceptual-temporal graph CNN (CT-GCNN) model. This model explores stock price movements in both time and concept dimensions, accounting for the linkage effect of price movement among stocks within the same conceptual segment.
Dataset and metric evaluation
Statistics of selected papers
This subsection investigates a statistical analysis of 73 papers, specifically selected for this review on SPF, utilizing analyze tools from the Scopus platform. Figure 3 illustrates the annual publication count, revealing a conspicuous upward trend in papers on SPF from 2014 to 2023. The number of papers increased from a single publication in 2014 to 18 in 2023.

Number of published papers by year.
Figure 4 illustrates the distribution of published papers, categorized by source (compare the document counts for the top 10 sources). Research on SPF has been published in various journals and conferences. Particularly notable is the number of studies in the ‘Expert Systems with Applications’ journal. The variety of journals, spanning specialized and multidisciplinary fields such as computing, economics, and finance, underscores the interdisciplinary essence of the research.

An example of number of published papers by source.
Research on SPF is primarily distributed through articles (61), indicating a robust presence in academic journals, followed by conference papers (11), which suggest active discussions and explorations in conference settings, and is minimally represented in book chapters (1), as shown in Figure 5. The dominance of articles points towards depth and precision in the field, while conference papers highlight ongoing, dynamic discussions and potential collaborations among researchers.

Number of published papers by type.
Figure 6 illustrates the number of published papers by authors from various countries or territories (compare the document counts for the top 15 countries or territories). This shows a geographical diversity in the research within this field. China, with 24 publications, and India, with 17, lead in stock market research, reflecting their strong economic interests and growth. Other countries also show global interest in SPF. This figure shows that this research topic has widespread relevance and interest across different economic contexts.

An example of the number of published papers by authors, categorized by country or territory.
Figure 7 illustrates the number of published papers by authors, categorized by affiliation (compare the document counts for the top 15 affiliations), with leading contributions from Capital University of EcoNomics and Business and Hebei University of Science and Technology at three publications each. Several other universities from around the world have also made notable contributions. This figure implies a robust and diverse global effort in the study of SPF, reflecting both academic and industry interests in the area.

An example of the number of published papers by authors, categorized by affiliation.
Dataset
Many studies have used stock market data for their analyses, as shown in Tables 1–4. This research examines a wide range of datasets sourced from various stock indices and companies. These datasets have been used in multiple studies that focus on SPF. They include noticeable indices such as the FTSE MIB Index and A-share market stock prices in China, as well as specific companies like Amazon, Apple, Google, Tesla, IBM, and Oracle. The data for these studies is often sourced from platforms like Yahoo Finance.
Earlier studies have utilized a diverse set of data types. These range from closing prices, opening prices, highs, lows, and trading volumes to other technical indicators. Some even incorporate historical and tick data. Data from major global stock exchanges, including the Shanghai Stock Exchange, Kuala Lumpur Stock Exchange, and BSE Limited, plays an important role in these studies. Some studies specifically aim to predict the direction or price of the next day's close. Table 5 presents some stock datasets and their source links.
Some stock datasets and their source links.
Metric evaluation
From Tables 1–4 in the “Related works” section, we realize that various metrics are utilized to evaluate the performance of different models in SPF. These include mean absolute percent error (MAPE), mean squared error (MSE), root mean square error (RMSE), mean absolute error (MAE), accuracy, R2 (coefficient of determination), mean bias error (MBE), precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC-AUC), among others.
Metrics for evaluating prediction accuracy: MAPE, MSE, RMSE, and MAE are widely employed across various models and approaches to assess the accuracy and performance of stock price predictions. Metrics for classification models: Accuracy, precision, recall, F1-score, and ROC-AUC are typically used to evaluate models where classification, such as predicting upward or downward trends, is involved. Understanding model predictions and biases: R2 is utilized to comprehend how well the model's predictions align with actual outcomes, providing insights into the explanatory power of the model by understanding the proportion of variability in the dependent variable that is predictable from the independent variable(s).
Discussion and future directions
Method-Based discussion
ARIMA approaches
A review of ARIMA models reveals that both ARIMA and its enhanced variants21,22 have been widely utilized in SPF. ARIMA often demonstrates a high degree of accuracy in short-term forecasts and is particularly efficient for linear and stationary series, making it apt for numerous financial time series data. The models facilitate the use of evaluative parameters, such as the AIC23,25,27 and the Bayesian information criterion (BIC),17,27 to select the most fitting models, thereby ensuring optimal performance. However, ARIMA models are unsuitable for non-linear and non-stationary data, restricting their application since financial time series often display non-linear behaviors. 17 They may not effectively capture evolving trends and patterns in the long term. Moreover, identifying the correct order of differencing and the appropriate number of AR and MA terms (p, d, q parameters)27,75 can be challenging and necessitates expertise, making it less accessible to non-experts.
Machine learning approaches
ML can discern and model nonlinear relationships in stock prices, which poses challenges for ARIMA methods.30,34 Many ML techniques, including SVM, KNN, ANN, and RF, provide a variety of strategies for learning. Each model has its own strengths in predicting stock prices.31,33,34,41 However, it is important to note that due to the complexity of some ML models, they may fit the training data too closely and fail to generalize well to new, unseen data.69,70 ML models can be highly sensitive to noise in the data, leading to inaccurate predictions. Training and optimizing ML models can be computationally intensive, demanding substantial resources and time, especially for large datasets.45,49
DL approaches
DL models have demonstrated their ability to predict stock prices with high accuracy and often outperform traditional models.50,55,59,63 These models can deal with the inherent non-linear and non-smooth features of stock price data.61,64 Their versatility enables them to handle various data types and structures, utilizing diverse variable sets from different markets. 19 Moreover, they provide flexibility in exploring time series data of varying lengths, which is particularly beneficial for stocks with inconsistent trading histories.53,63,89
However, DL models require significant computational power and resources for training, which may not be accessible to all.51,61 The efficacy of these models heavily depends on the quality and quantity of the data; insufficient data can reduce their performance. The complexity of DL models can result in overfitting, particularly when the model is very complex relative to the simplicity of the task or the volume of available data. Some inherent issues, like the vanishing gradient problem in basic RNNs, necessitate the use of more advanced variants,62,64 increasing the complexity of model development and implementation.
Hybrid approaches
Hybrid models frequently outperform single models by combining the strengths of multiple approaches. By utilizing a variety of models, hybrid models can generalize more effectively to unseen data, thereby reducing the risk of overfitting.68,76,78 Especially hybrid models that incorporate ML and DL can explore non-linear relationships in data, a common occurrence in financial markets. DL hybrids, such as CNN-LSTM, can extract hierarchical features and comprehend sequential dependencies, making them suitable for time-series forecasting like stock prices.76,77 AMs81,82 and transformers85,86 have demonstrated promise in investigating sequential data by allocating varying levels of importance to different time steps, which is pivotal in financial time-series data.
However, integrating multiple models can lead to increased complexity, making them computationally expensive and more challenging to manage.69,79,88 These models often necessitate tuning and validation, which can be resource intensive. DL hybrids might require substantial amounts of data for effective training, which might pose a limitation in certain scenarios.
Research limitations
The main focus of our review was on ARIMA, traditional ML, DL, and hybrid models for historical data for time series analysis in SPF. However, alternative approaches such as the EMH, fundamental analysis, a combination of technical and fundamental analysis, and sentiment and social data analysis were not given prominence in our investigation. Our review specifically evaluates the effectiveness of methods utilizing traditional financial data, such as historical data and indicators. The exclusion of non-traditional sources, such as social media trends, news updates, sentiment analysis, or real-time market indicators, presents limitations. It overlooks important factors influencing investor behavior and market trends. In addition, the study overlooks advanced computational approaches such as quantum algorithms and reinforcement learning techniques. Finally, the study faced several limitations in selecting papers and excluding irrelevant documents from the large number of returned results from Scopus. We primarily focused on journal papers, excluding other sources like conference papers, workshop publications, book chapters, and technical reports, which might have led to missing relevant articles due to the narrow scope of sources. We recognize that the search strategy used in our study may have led to the omission of relevant articles.
Future directions
Forecasting stock prices is a complex task due to the myriad of factors influencing market dynamics, presenting numerous challenges. One significant challenge is managing market noise, 19 as prices are influenced by both relevant information and irrelevant or random data, making the differentiation between impactful and non-impactful information complex. The accuracy of predictions is also relied on the quality and completeness of the data, with the handling of missing data, outliers, and incorrect data posing persistent challenges.
The challenges within stock price prediction have catalyzed the exploration of new methodologies and approaches, all aimed at enhancing predictive accuracy and reliability. Here are some future directions for SPF:
+ In addition to employing time series data for forecasting, it is important to explore and use alternative data sources for stock price prediction. Specifically, exploiting sentiments extracted from various platforms, such as social media and news outlets91–94 can significantly enhance our ability to comprehend and accurately predict market movements.
+ The combination of traditional financial theories with AI algorithms—including transformers,85,95 graphCNN,77,96 reinforcement learning,97,98 meta-learning,99,100 leads to enhanced performance in SPF.
+ Real-time analysis for SPF provides a range of advantages.101,102 It enables timely decision-making by offering up-to-the-minute data, allowing traders to react quickly to market changes and news events.
+ As quantum computing technology continues to advance, researchers can explore the application of quantum algorithms to optimize trading strategies and enhance the efficiency of SPF models.103–106
Conclusion
SPF remains a topic of interest among investors, analysts, and researchers. This field is dedicated to predicting the future price of a stock, leveraging historical data and various influential factors. The study introduces a literature review and bibliometric analysis of papers on SPF, analyzing various methods, datasets, and metric evaluations, and summarizing results. The published papers were collected from the Scopus database, covering a range of methods, from ARIMA to DL approaches. We briefly provided the concepts and applicability of these approaches in SPF. The number of papers has increased from 2014 to 2023. We found a significant rise in the use of DL models and hybrid DL models from 2020 to 2023. This trend indicates that the research topic is attracting more attention from researchers.
These statistical models, such as ARIMA and its variations, are effective in linear conditions but are often affected by the nonlinear complexity of financial markets. The application of ML and DL in this field has led to the introduction of models that can represent complex patterns and nonlinear relationships in stock data. Based on this analysis, we realize that DL models such as LSTMs, convolutional LSTMs, transformers, and GANs are much more effective than traditional approaches, demonstrating promising performances. In recent years, the combination of various DL approaches has shown promising potential to enhance the performance of SPF. However, it might face challenges such as the need for large datasets for training, computational costs, and the risk of overfitting.
Future research in this domain should investigate various data types to improve prediction accuracy. This includes not only traditional financial data such as historical stock prices and volume but also technical indicators and non-traditional data like financial news, news sentiments, and social media sentiments. Moreover, exploring and optimizing novel hybrid models is necessary for enhancing the performance of forecasting systems.
