Abstract
Keywords
Introduction
A call centre could serve as the company’s online persona, and as technology and digitization have advanced, it has become an essential service for communicating with both current and potential clients.1–3 These days, one of the most crucial channels of communication between businesses and their clients is the call centre. Inbound call centres, in which clients place calls and call centre representatives answer them, are the most prevalent type of call centres.4,5 It is evident that more and more businesses are setting up their own call centres to give their clients the finest available information and services. Customers save a great deal of time when call centres are used well, and businesses operate much more efficiently.6,7
As the world becomes more and more reliant on data, it is increasingly important to utilize the collected data to improve a company’s operations, and call centers are not an exception. One essential use for this data is the ability to predict future activity so that the company will be able to maximize efficiency in resource planning and increase customer satisfaction. In fact, being able to accurately forecast call center activity is one of the most important steps in improving a call center’s operations7,8 By projecting the actual number of incoming calls, a business can improve service requirements, fulfill customer satisfaction, and anticipate adequate staffing and scheduling levels.7,9 A telephone call centre’s staffing algorithm depends on an accurate estimate of client demand. The available pool of agents must be scheduled and rescheduled according to revised projections, which are usually generated weeks or months in advance.8–10
As a result, a comprehensive body of knowledge has accumulated different forecasting methods used in call centers1,11–13 and for forecasting time series data in general.14–19 The benefits of having an appropriate forecasting of call center are numerous. In addition to improved resource allocation, staffing, and customer experience,20,21 a volume forecast also provides the company and call center management with tools for service level optimization and possibly better cost management, as they can avoid over and understaffing which often lead to reactionary decisions and therefore to extra costs.20,22
Based on the importance of forecasting call centers and study gaps to find out suitable forecasting methods, this study aims to find out a forecasting method for a forecasting model for the call center of a financial services company using predictive analytical method. Using the proposed model, the organization would be able to transition from a reactive approach of operational resource allocation to a more proactive one in handling inbound calls. The case company is an international financial services provider. The company has more than five thousand employees and over five billion euros in revenue with a strong presence in the Nordics. It provides a comprehensive range of financial services, including retail banking, corporate banking, investment banking, asset financing, and wealth management. The organization serves various customer categories, ranging from individuals and small businesses to large corporations and institutional clients. The study focuses on the call center of one specific product area inside the company. The product area in question, asset finance, deals with leasing and hire purchase products for both corporate and private customers in Finland.
In the current situation, the company collects data on call center activities. However, the data is utilized only in visualizing historical and current workloads, with no method of forecasting into the future. Due to the availability of data on both the calls received as well as some factors that the company hypothesizes, it is possible to create a model to forecast the call center volume. These factors are the number of financed agreements, and number of inbound credit applications, as both are indicative of new business and therefore potentially new customer contacts. This results in a fillable research gap where necessary data exists and is collected but is not used for predictive purposes due to lack of a process. This study aims to formulate a prediction process and allow the organization to move from a reactive operations resource management strategy to a proactive one. The main objective therefore is to be able to present company management with a decision support tool for operational resource planning.
The model is built using predictive analytics, which is a broad category of data analysis techniques ranging from time series analysis to machine learning, artificial intelligence, and pattern recognition, in which the model itself determines the key factors, such as variable weights and coefficients, as opposed to the model user’s assumptions and inputs.
23
With the capability of predicting the call center volume, the company can decide ahead to allocate their resources effectively. With the main objective of presenting company management with a decision support tool for operational resource planning, this research intends to address the following research questions (RQs). RQ 1: What are the suitable models that should be tested in an organization with no existing forecasting capabilities in their call center operations? RQ2: What is the most suitable model for building a call center volume forecast, specifically in the case company? RQ3: Do new financed agreement volumes or inbound credit application volumes predict call center volumes sufficiently to be used in the forecast?
To address the above research questions, this study built upon predictive analytics, which is a broad category of data analysis techniques ranging from time series analysis to machine learning, artificial intelligence, and pattern recognition, in which the model itself determines the key factors, such as variable weights and coefficients, as opposed to the model user’s assumptions and inputs. 23
The structure of the paper is organized as follows: Section 2 illustrates a brief overview of the studied case company and its needs. Section 3 presents a literature review covering call center operations, predictive analytics and forecasting methods including time series analysis, neural networks, linear regression, and ensemble methods and call center demand forecasting. Section 4 outlines the study methodology that contains data collection, data preparation and analysis tools, model selection and data collection and analysis. The study results are presented in Section 5, while Section 6 explores the study discussions based on the study outcomes. Both the theoretical and practical implications as well as study limitations and future study directions are also outlined in Section 6. Finally, the study is concluded in section 7.
Description of the case company
The case company is an international financial services provider. The company has 5000+ employees and over five billion euros in revenue with a strong presence in the Nordics. It provides a comprehensive range of financial services, including retail banking, corporate banking, investment banking, asset financing, and wealth management. The organization serves various customer categories, ranging from individuals and small businesses to large corporations and institutional clients.
This study focuses on the call center of one specific product area inside the company. The product area in question, asset finance, deals with leasing and hire purchase products for both corporate and private customers in Finland. Due to the nature of the financial services industry, it has been requested by the case company that specific call, agreement, and application volume figures are not disclosed in this study, but instead their values normalized on a 0-1 scale where such figures are presented, 1 being the highest value of the data and 0 the lowest.
Literature review
The past literature into call centers especially highlights how the problems call centers face are in part is still the same as in the past,10,22,24 namely, how to predict volumes for efficient resource management and planning. Different types of call center inbound volumes and the factors that influence them are also presented as well as their importance to companies’ value creation. In terms of predictive analytics and time series analysis, different methods and best practices are reviewed1,17,23,25 and various practical examples are presented from different fields where these methods have been used to make forecasts in the call center context12,13,26–28 as well as other fields.29–34
Several models can be tested in organizations without existing forecasting capabilities for call center operations. These include traditional time series models like Holt-Winter’s, ARIMA, and SARIMA, as well as AI-based approaches such as Neural Prophet and Random Forest.1,35 Parametric forecasts using Gaussian quadrature can be employed in stochastic programming models to optimize workforce scheduling. 36 For intra-daily call arrivals forecasting, machine learning methods, particularly Random Forest, have shown superior prediction performance. 35 Additionally, queueing models like Erlang C (M/M/N) and Erlang A (M/M/N+M) can be evaluated for staff optimization, with Erlang A providing more accurate predictions when call abandonment is considered (Nag and Helal, 2017). 37 These models can help organizations enhance staffing accuracy based on call arrival patterns, handling times, and seasonality.
Accurate forecasting of call center volumes is a cornerstone of efficient workforce management and resource allocation. The ability to predict call volumes ensures that organizations can maintain service levels, reduce operational costs, and improve customer satisfaction. 38 A variety of forecasting models have been studied and applied to address this challenge, ranging from classical statistical methods to advanced machine learning approaches, each suited to different contexts and requirements. Autoregressive (AR) and Autoregressive Moving Average (ARMA) models have been widely explored for their ability to handle dynamic forecasting scenarios. These models excel in capturing the temporal dependencies inherent in call volume data, making them effective for short-term forecasts. 38 However, their performance can be limited when dealing with complex seasonal patterns or irregular trends.
For out-of-hours call centers, regression models that integrate calendar effects have proven particularly useful. These models account for variations in call volumes driven by factors such as holidays, weekends, and special events. By incorporating these calendar-driven variables, they provide more precise predictions of daily call volumes and arrival patterns, addressing the unique challenges of non-standard operating hours. 39 In multi-skill call centers, where agents possess diverse skill sets to handle different types of calls, forecasting becomes even more complex. ARIMA models have been proposed as robust tools for mid- and long-term forecasting in such environments. 40 When combined with Erlang models, ARIMA-based forecasts can inform strategic workforce planning, ensuring the right number of agents are available to meet service demand across multiple skill groups. 40 These integrations help balance operational efficiency with customer satisfaction.
Electricity utility call centers often experience strong seasonal fluctuations due to weather-related demand variations. Studies have shown that both ARIMA and Seasonal ARIMA (SARIMA) models are effective in this domain, with SARIMA models offering a clear advantage in handling seasonal patterns. By incorporating seasonal components, SARIMA models enhance the accuracy of long-term forecasts, making them a preferred choice for call centers that operate in industries with pronounced seasonality.6,41 These findings underscore the importance of tailoring forecasting approaches to the specific characteristics of the call center environment. Key factors such as seasonality, calendar effects, and service-specific patterns must be carefully considered to select and implement the most appropriate forecasting model. By aligning forecasting strategies with operational goals, organizations can optimize resource allocation, reduce costs, and deliver consistent service quality.
Call center volume forecasting is a vital aspect of efficient resource allocation and customer service optimization across various industries.2,7 Accurate forecasts enable organizations to balance workforce management, reduce operational costs, and ensure seamless customer experience. Among the many forecasting approaches, time series analysis techniques have been particularly prominent. Seasonal AutoRegressive Integrated Moving Average (SARIMA) models, for instance, have demonstrated exceptional performance in predicting inbound call volumes for electricity utility call centers, effectively accounting for seasonal fluctuations and long-term trends. 41 This makes SARIMA a preferred choice for industries with pronounced seasonal demand patterns, such as utilities. In scenarios where real-world data is scarce or incomplete, simulation techniques have emerged as valuable tools. These methods generate synthetic data that can be used to evaluate and refine forecasting algorithms. For example, Steinmann and Freitas Filho 42 demonstrated the utility of simulations in testing and improving the accuracy of forecasting models, particularly in environments where historical data is either unavailable or insufficient for robust model development. Machine learning techniques have also gained traction in call center operations, particularly in optimizing costs and enhancing decision-making.38,43 Classification algorithms like AdaBoost, Gradient Boosting, and Random Forest have been successfully applied in banking call centers to predict customer behavior. Specifically, these methods have been used to identify customers who are more likely to accept credit offers, enabling targeted outreach. By focusing efforts on customers with a higher probability of agreement, organizations can significantly reduce expenses associated with blanket marketing while simultaneously increasing profitability. 44 Despite the growing application of advanced forecasting methods, there remains untapped potential in integrating broader predictive variables, such as new financed agreement volumes or inbound credit application volumes, into call center forecasting models. Although the reviewed studies do not directly address these variables, they underscore the importance of leveraging data-driven approaches to enhance forecasting accuracy and optimize operations.
By incorporating these additional predictors, future research could unlock new opportunities for improving call center efficiency, customer targeting, and resource allocation. These advancements highlight the critical role of sophisticated forecasting models and data analytics in modern call center management. The integration of time series analysis, simulation techniques, and machine learning not only enhances predictive accuracy but also provides actionable insights for strategic decision-making. This multifaceted approach ensures that call centers remain agile and effective in meeting the dynamic demands of their respective industries.
Call center operations
Customer service is an essential part of every company’s operation and even more so for companies providing services in a more complex and digital environment. A dedicated call center is therefore a necessity for most companies serving many customers and companies working in the financial services industry are definitely not an exception. According to past literature, call centers are still struggling with the same problems now as when they were established in the beginning of the 20th century. 22 Namely, the problem of resourcing, or how to know when demand for call center personnel is at its highest to be able to meet demand without over-resourcing when demand is lower. While technology and self-service channels have improved to a point where easy to solve issues can be handled in some cases by the customer themselves, there is no less need for efficient call center processes, as the issues handled by agents are now often more complicated and the impact on a company’s reputation and customer satisfaction is as high as ever.
A smooth and efficient call center can even bring strategic value to an organization by increasing loyalty through customer satisfaction. 26 This is especially important in the financial services industry, where products are often close to identical due to regulations, so brand image and customer experience play a significant part in attracting and retaining customers. According to the International Customer Management Institute, call center management is even defined as “the art of having the right number of properly skilled people and supporting resources in place at the right times to handle an accurately forecasted workload, at service level and with quality.”. 22 Therefore, it can be derived that forecasting call center workloads is one of the most essential functions in customer center operations. While in previous decades getting access to historical call center data was a sought-after way of maximizing customer service efficiency and finding patterns in caller behavior, 24 with organizations’ current data capabilities forecasting is the logical next step.
From a forecasting perspective the most significant is the first one, workload arrival. Cleveland 22 categorizes arrivals into three types: random, smooth, and peaked. Effectively these categories try to explain the trend in customer contacts, random meaning essentially that the one customer contact is not correlated with the likelihood of the next one. Smooth on the other hand exhibits a clear trend over time but is usually more associated with outbound calls according to Cleveland, as opposed to inbound calls. Finally, peak arrival describes a situation where the contacts surge around a specific peak time, whether that be on a certain weekday, hour inside a day, or other short period of time. When it comes to forecasting call center contacts with quantitative methods, Cleveland identifies two different routes. Either explanatory forecasting, for example using regression or multivariate methods, or time-series analysis which is found sufficiently accurate for shorter term forecasts of up to 3 months forward. 45 Explanatory methods are favored when one is looking for correlations to external factors that might influence call center volumes, such as a price increase.
Predictive analytics and forecasting
Predictive analytics is characterized by two things separating it from other types of analytics. It is based on data, in other words the user is relying on the characteristics of the data itself instead of their own assumptions in creating the model, and it is forward looking, as can be inferred from the name. While models differ greatly in how they work internally, they all more or less follow an assumption that what happened in the past will continue to happen in the future. 23 However, this does not mean that forecasts cannot be used in a changing environment, and a correction could be made that the forecaster assumes that the environment will keep changing the same way. 17 The process of building and maintaining a predictive model also often follows the same timeline, pictured below, where the model is calibrated continuously after new data is available for use within the analysis later.
According to Hyndman and Athanasopoulos 17 in the introduction to their book Forecasting: Principles and Practice, humankind has sought to predict the future in one form or another since ancient Babylon. He defines forecasting as “predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts”. 17 When deciding to create a quantitative forecast, numerical data of the past must be collected and available, and there must be an assumption that patterns found in the data will continue in the future. If these conditions are satisfied, a preliminary data analysis should be conducted where the data is graphed, and some trends or features might be visible and help in choosing the models to test. The next step is deciding which models to try in finding the best possible forecast.
One of the most severe challenges in building a forecasting model is avoiding overfitting the data by building a model that is too sophisticated right out of the gate, since the error of overfitting is most likely only noticed after the model has already been deployed and in use. 23 Overfitting means the model includes more complexity than is needed, either by including too many terms compared to the need or too much flexibility, for example by using a neural net to model a linear relationship that would be better served by a regression model. These issues can lead to inaccurate predictions, as irrelevant predictors create unneeded variation and make the model less verifiable by others due to the complexity. 46
Call center demand forecasting
Very little academic literature seems to exist on financial services operations forecasting, 47 most likely due to the differences in characteristics of various processes. The field of call center forecasting can be considered quite new, since until the 2000s the amount of research was very limited. One possible reason being that while call centers in theory generate several data points for each call, such as the caller info, call length etc., these have not always been recorded to a database due to cost of storage and system capabilitiesClick or tap here to enter text. 36 Forecasting has been identified as one of the most important operational issues in call center management, 20 with a role still growing in importance in the future.
Adopting a rigorous process for creating demand forecasts can be a major first step in call center capacity and planning improvements, especially in an operation without any previous forecasting process in place. Saccani 28 divides call center capacity planning into shorter term forecasts that are concerned with capacity planning in the next few hours, or even shorter time frames, and he calls these queuing methods. The implementation of the model in the organization can also be more crucial than the sophistication of the model itself, since it has been found that simple forecasting techniques can perform as well as those that are more complex, especially for longer forecasting periods. 48 This is supported by some of the actual results of call center demand forecasting studies, where rather common time series forecasting methods such as Holt-Winters,1,26 and ARIMA11,26 models were the best performing out of those chosen by the authors, especially in the short term. Some research has been done into more complex models such as stochastic programming and recourse 36 or models that are called center specific such as Erlang C. 12 However, the majority seem to favor the more common suite of time series forecasting methods. 13
Based on a literature survey on call center analysis methods used to create forecasts, the ARIMA and exponential smoothing methods were the most favored by researchers. The “time-series” column in the table refers to Autoregressive Integrated Moving Average (ARIMA) and exponential smoothing methods. Another popular forecasting method developed by multiple authors is the linear fixed effects method, which aims at modeling the arrival rates of calls based on Poisson distribution. The method is especially popular for interday and intraday forecasts, while for longer time periods time-series methods were favored. 48 In general, the attention towards more complex models in call center forecasting have been reserved for daily and intra-daily forecasts, instead of more aggregated weekly or monthly forecasts. 28 Another popular forecasting method has been utilizing artificial neural networks, which has shown mixed results according to Barrow 26 and is also focused primarily on intraday forecasts.
The “time-series” column refers to ARIMA and exponential smoothing methods. In the literature survey, another very popular forecasting method, developed by multiple authors in succession, was the linear fixed effects method, which aims at modeling the arrival rates of calls based on a Poisson distribution. These are especially popular for interday and intraday forecasts, while for longer time periods time-series methods were favored. 48 In general, it seems that attention towards more complex models in call center forecasting have been reserved for daily and intra-daily forecasts, instead of more aggregated weekly or monthly forecasts. 28 Another popular forecasting method has been utilizing artificial neural networks, which has shown mixed results according to the literature reviewed by Barrow, 26 and is also focused primarily on intraday forecasts.
Research methodology
The methodology section is divided into three parts and includes an introduction to the data used in the analysis, the tools used for data manipulation and forecasting, as well as the decision on what forecasting models are being compared.
Data selection
The case company collects data from their contact center services platform into a relational SQL (structured query language) database. This call center data, a normalized sample of which can be found in the appendix, is used as the basis of the study. Additionally, the call center volume data is tested alongside two different external variables that the author hypothesizes could influence the changing volumes. These are incoming credit applications and started financing agreements, as it is worth investigating whether new agreement volumes also drive customer service activity. The data for the above variables can be found in another SQL database connected to the company’s Enterprise Resource Planning system which records agreement and credit application data.
Due to organizational data retention rules, the dataset history is limited to 166 weeks of data, of which the full scope is used to train and test the models. The data used starts on 18.1.2021 and ends with the week starting 18.3.2024. With special permission, it would be possible to attain a longer period of data, but only for a point-in-time analysis, instead of a continuous process where the forecast is updated weekly. For that reason, the data used will only cover what is within data retention rules in the company’s structured data mart. To be able to have enough observations, the forecasted time period should be either weeks or days, as months would provide less than 50 observations. The needs of the forecast model from the organization are focused on allocating resources for the next weeks as well as allowing for a ramp-up of personnel ahead of time, so daily or hourly forecasts are not crucial. For these reasons, the data for model development constitutes just over 3 years (166 weeks) of historical data and observation frequency is aggregated to a weekly level to be able to provide weekly forecasts.
Data preparation and analysis tools
The analysis tool used both for creating the forecast as well as data wrangling is Alteryx Designer, an analysis software application for Windows, used as the main analysis tool in the case company. There are two main reasons for this tool is selected. As the forecast creation process is not designed to be a one-off, but a continuous weekly forecast, the tool to run it should be one that has a wider user base in the company and that is approved by the company’s IT department for use in business processes. The second reason is that the tool has an intuitive user interface and can handle both the data preparation and running of different predictive models with very little coding knowledge needed, which makes the forecast easier to maintain in the future. The Alteryx Designer time series tools are built on top of the programming language R; a language specifically designed for statistical computing 15 often used in time series and other statistical analyses.
The same logic applies for the choice of forecast presentation, which is done with the Tableau business intelligence and analytics software. As Tableau is the primary business intelligence tool in the case company, the forecast report published to the contact center personnel is built with Tableau. The call center data warehouse includes data from all group companies and different departments. Therefore, the first step is to filter the data so that only the relevant data is left by identifying specific customer service “queues” that are part of the department under study. This data is then enriched by joining credit application volumes as well as financed agreement volumes and finally with temporal dimensions, most importantly the year and the week, to be able to have a variable for aggregating to weekly sums. The data is then split into a training and testing set. The most recent 33 (20%) observations of the data act as the testing set and the first 133 (80%) as the training set. In date terms this means that data after the week starting 31 July 2023 is included in the testing set while everything up to and including the week starting 31 July 2023 comprises the training set.
Model selection: ARIMA model
In this study, time-series analysis methods, more specifically ARIMA and exponential smoothing models, were chosen to be compared for model selection. This is due to several factors. Intraday sample cast is on a weekly level and, according to numerous studies, a limited dataset, as opposed to hundreds or thousands of datapoints in an intraday sample, does not often see benefits from more complex models such as neural nets or ensemble models.14,19,28, Second, the forecasting process inside the company needs to be simple enough to be maintained also in the future. A less complex forecast has been shown to be a good starting point since small prospective gains in model performance might not be worth risking the interpretability and ease of maintenance of a forecast model, since the most important thing is that the model is used and implemented in the first place.22,28
ARIMA model is a popular time-series forecasting method used in various fields, including finance, economics, and engineering. 49 The ARIMA models a time series as a combination of its past values and random error terms. The model consists of three components: autoregression (AR), differencing (I), and moving average (MA). The AR component models the dependence between the current observation and its past values, the I component is used to make the time series stationary, and the MA component models the error term. The order of the ARIMA model is specified as (p, d, q), where p is the order of the AR component, d is the degree of differencing, and q is the order of the MA component. It is also possible to consider covariates outside the past forecast value data points in what is often called ARIMAX (ARIMA with exogenous variables).
The model is defined as follows:
Y_t is the value of the time series at time t.
c is a constant term.
φ1, …, φp are the autoregressive coefficients.
Y_{t-1}, …, Y_{t-p} are the lagged values of the time series.
θ1, …, θ_q are the moving average coefficients.
e_{t-1}, …, e_{t-q} are the lagged errors.
X_t represents the exogenous variables at time t.
β represents the coefficients of the exogenous variables.
e_t is the error term.
In conclusion, ARIMA and ARIMAX models are popular and effective time-series forecasting techniques that have been widely used in various fields. ARIMAX models are particularly useful when external factors need to be considered in the forecasting process. In this study, ARIMA and exponential smoothing models were chosen to be compared for model selection in the case company. This is due to several reasons. First, the needed forecast is on a weekly level and, according to numerous studies, a limited dataset, as opposed to hundreds or thousands of datapoints in an intraday sample, does not often see benefits from more complex models such as neural nets or ensemble models.14,19 Moreover, the forecasting process inside the company needs to be simple enough to be maintained in the future. A less complex forecast has been shown to be a good starting point since small prospective gains in model performance might not be worth risking the interpretability and ease of maintenance of a forecast model. 22 The ARIMA model is tested both with and without external variables, agreement volume and application volume. Therefore, four different models are being compared to find the best fit for the case company. These are an exponential smoothing model (ETS), an ARIMA model, and two ARIMAX models one using number of financed agreements as the explanatory variable, while the other using number of credit applications.
Data collection and analysis
The case company collects data from their contact center services platform into a relationally structured query language (SQL) database. The call center data, a normalized sample of which can be found in the appendix, is used as the basis of the study. Additionally, the call center volume data is tested alongside two different external variables that the author hypothesizes could influence the changing volumes. These are incoming credit applications and started financing agreements, as it is worth investigating whether new agreement volumes also drive customer service activity. The data for the above variables can be found in another SQL database connected to the company’s Enterprise Resource Planning system which records agreement and credit application data.
The call center data warehouse includes data from all groups of companies and different departments. Therefore, the first step is to filter the data so that only the relevant data is left by identifying specific customer service “queues” that are part of the department under study. This data is then enriched by joining credit application volumes as well as financed agreement volumes and finally with temporal dimensions. Due to organizational data retention rules, the dataset history is limited to 166 weeks of data, of which the full scope is used to train and test the models. The data used starts on 18.1.2021 and ends with the week starting 18.3.2024. The needs of the forecast model from the organization are focused on allocating resources for the next weeks as well as allowing for a ramp-up of personnel ahead of time, so daily or hourly forecasts are not crucial. For these reasons, the data for model development constitutes just over 3 years (166 weeks) of historical data and observation frequency is aggregated to a weekly level to be able to provide weekly forecasts. The data is then split into a training and testing set. The most recent 33 (20%) observations of the data act as the testing set and the first 133 (80%) as the training set.
The analysis tool used both for creating forecasts as well as data wrangling is Alteryx Designer, an analysis software application for Windows. This tool has an intuitive user interface and can handle both the data preparation and running of different predictive models with very little coding knowledge needed. The Alteryx Designer time series tools are built on top of the programming language R.
15
The ARIMA and exponential smoothing models have numerous parameters that influence the accuracy of the forecast. The ordering parameters of the autoregressive models were chosen with R programming
Results analysis
Exploratory data analysis
First, an exploratory data analysis should be conducted, which includes the key summary statistics of the data under study, such as the number of data points, their variability and number of variables, as well as visualizations that make it easier to interpret the type of data being analyzed. 23 Understanding the data on hand, and its characteristics is an important part in selecting what models should be selected for analyzing the data. 51 Important characteristics to determine include ordering, distances, and continuity as well as aggregated statistics. 52
As the study collected temporal data, it can be seen from Figure 1 that the data at hand is an ordered set (ordered by date) with 7 days, and discrete. A normalized sample of the data can also be found in Appendix 1. In Table 1 we can see aggregated statistics such as average, median, maximum, and minimum values as well as standard deviation for each of the three variables under study for all 166 observations. The statistics have been normalized for easier comparison, so that the maximum value of each variable is equal to 1 and the others are presented as a fraction of that maximum (Figure 1). Source data time-series line graph including call, credit application, and financed agreement volumes. Source data descriptive statistics, normalized values.
From the aggregated statistics we can see that the variables are all relatively stable. Average and median do not have strong differences, signifying that the data population most likely does not have significant outliers in either direction. This is further supported by the minimum values, with a roughly equal distance to the average as the maximum values.
Time series forecasting model comparison
The autocorrelation function (ACF), as well as the partial ACF (PACF) for the aggregated calls data, the basis for the ARIMA model, are plotted in Figure 2. The ACF and PACF can be used in determining the orders of the AR and MA components to be used, as they show the correlation between current and past values. The more specific parameters and R functions for all compared models were made with a Hyndman-Khandakar algorithm. As the plot is sinusoidal, we can detect some level of seasonality. The ACF plot is also developing towards zero, so we can determine that there is at least some level of autocorrelation. Despite this, the best fitting ARIMA model without external variables was an ARIMA (0,1,1), which is very close to an exponential smoothing model. This would suggest that the amounts of autocorrelation are not significant enough to add value to the model. The best fit suggests that the data shows a trend over time, but observations are not heavily correlated with past values, while the model residuals of the previous observations influence the current period. ACF and PACF plots of aggregated calls data.
Similarly, for the ARIMAX model with agreement volumes, the best fitting parameters are again ARIMA (0,1,1) suggesting similar characteristics to the ARIMA model without external variables where the data exhibits some trend over time, but observations are not very correlated. However, for application volume the best fitting ARIMAX model was ARIMA (1,0,0). This means that the model has one autoregressive term, but without differencing or moving average component (i.e., first-order autoregressive model). The observations are therefore best predicted by a relatively strong correlation to the previous value. The selected exponential smoothing model was an ETS (A, N, N) model. Figure 3 presents the decomposition components of the ETS model. The (A, N, N) model was chosen by comparing AICc values of different options. The ANN model means that the model is a simple exponential smoothing model with additive errors where alpha is 0.279. ETS (N, A, A) decomposition plot of aggregated calls data.
Comparison of models’ error terms.
Figure 4 displays the plotted values of the compared forecasting models. The winning ARIMA model is almost equal to the chosen ETS configuration. The longer-term downward trend seems not to be accounted for by the ARIMAX models tested, as there is no corresponding long-term trend for the external variables, as can be seen from Figure 4. Plotted forecasts on top of training and testing data.
The identified model can then be used to generate a forecast for the upcoming weeks, allowing the call centre management to have some preliminary indication for whether call centre volumes are trending up or down during the forecasted period. They are then able to use this indicative forecast to create a resourcing plan and make more informed staffing decisions. Another valuable finding based on the results is that the new financing and application volumes in these financial services call centre do not have a significant impact on call centre volumes, at least in the 133-week period that made up the training data set.
Discussions
This study addresses the need for forecasting models in the financial sector to predict call centre volumes, an essential aspect for optimizing service levels and resource allocation. With the rising significance of service level optimization, this study aimed to identify suitable forecasting models in a case company focusing on the effect of incorporating new financing or credit application volumes into the forecasting process.
Theoretical implications
Based on the existing literature, it seems clear that being able to produce even a rudimentary forecast for customer service operations is one of the most important steps in improving efficiency, 22 and has been identified as such for a long time. 24 As the case company already has data tables available that records call center activities, it seems like an obvious progression to utilize the data not only for historical tracking, but for looking forward as well.
While workload forecasting in general and call center volume forecasting in specifically has a wide array of predictive analytics methods to choose from based on existing studies, favoring simplicity in the initial forecast seems to bring close to the same amount of benefits as a more complex model, particularly when aggregating the data to a weekly forecast.28,48 For this reason and due to the nature of the data (limited number of data points), common timeseries analysis is chosen as the preferred method in this study, supported by results of several similar studies in other industries dealing with call center volumes1,11,26 as well as them being identified as often superior to more complex models when forecasting on a longer time horizon than daily or intra-day forecasts.
While ensemble methods and neural networks have also seen encouraging results in time series forecasting,14,19,48 a simpler model to start with is preferred, both for the aforementioned reasons as well as to ensure that the model can actually be maintained and adjusted relatively easily in the organization, as forecasting is often and also in this case an ongoing process instead of a single event and to avoid overfitting by adding complexity for the sake of complexity.
Practical implications
This study sets out to answer three questions, the first one concerning what types of models should be tested and compared when building an initial time series forecasting model for an organization which does not yet have a process in place. According to the literature review, even a simple model is an important tool in call center management22,24,28 and favoring simplicity does not necessarily mean less accurate forecasts. 28 Due to these findings, simpler models commonly used in time series analysis, such as ARIMA and ETS, are favored.1,11,26
With the second research question the goal was to find out what the best model is out of the chosen ones when it comes to forecasting the call center data of the case company, in particular. An error term comparison was done between three types of ARIMA models, as well as an exponential smoothing model. The data was cleaned and aggregated to a weekly level containing 166 months of data and divided into a training and testing set. The parameters of the time series models were chosen using R programming tools to come up with the most accurate ones. In the end, the simpler models were the most accurate, with an ARIMA model with no external variables having the smallest error terms across the board, followed by a simple exponential smoothing model with additive errors. This is the model that is used in building the forecast process.
The second question in essence also answers the third research question. The goal was to investigate whether the external variables identified in the organization, new agreement volumes and credit application volumes, were driving call center volumes to a significant extent. This seems not to be the case, at least to the extent that they would provide more accurate forecasts compared to just using the historical call center volumes on their own in building forecasts. The explanation for this might be that while the new agreement and application generation obviously does generate customer contacts, most of the call volume is driven by the existing financed portfolio, instead of new agreements.
Study limitations
From a data analysis perspective, the biggest limitations of the study are the sample size of 166 observations and, in a way, data availability. The limited number of observations, which are restricted by the case company’s database data retention rules, and variables, which are limited by the data collection capabilities, essentially rule out some more sophisticated forecasting models, such as the use of neural networks.
As found in the literature review, one can achieve good results even without complex models, however there might be various factors that have a larger impact on call center volumes than their historical values apart from the ones included in this study. Testing these would require more extensive data collection in the organization. One clear limitation is that the subject of the calls is not recorded, so it is now not possible to create accurate forecasts for specific customer issues, some of which take more time to handle than others.
Another limitation is that there are some external factors that influence call center volumes that are hard to quantify in a statistical forecast. One such factor is the development of the company’s self-service channels, which can quickly reduce the need for phone-based customer queries. This is something that must be considered by the call center managers when interpreting the forecast dashboard.
Future research
In future study, if data collection around call center activities at the case company becomes more extensive, it could be worthwhile investigating what factors, if any, have a significant impact on volume changes. Data exists on the lengths of individual calls and if this could be combined with data points describing the subject of the call on any level of specificity, it would be possible to create even more granular forecasts down to the daily and hourly level. A daily or hourly forecast would also increase the number of data points available within the constraints of the company’s data retention rules. This would potentially also allow for more flexibility methodologically.
Furthermore, when it comes to operative forecasting there is another unexplored area within customer service with plenty of data points and the possibility for many types of analysis enabled by emerging pattern recognition capabilities. This area is emails, which are also tracked within a customer engagement software platform. With a view to how much time certain tasks take on average, there is much potential in having an even clearer picture of the company’s future operative resource allocation.
Moreover, customer happiness, loyalty, and brand perception are all directly impacted by customer experience, which is a critical factor in a company’s success. Businesses are using technology more and more to improve customer service in the highly competitive business world of today. In this field, artificial intelligence (AI) has revolutionized creative ways to satisfy clients’ ever-evolving demands. 53 AI-powered customer service is radically changing how companies interact with their customers by providing prompt, proactive, and individualized assistance. In the future study, application of AI in customer satisfaction can be explored too for mutual benefits.
Conclusions and future work
The research questions guided the investigation, emphasizing the exploration of appropriate forecasting models, determination of the best-performing model for the company, and assessment of the impact of new financing or credit application volumes on forecast accuracy. A quantitative methodology was adopted, utilizing data collected from the case company’s SQL database tables. Based on the rigorous literature review, four time series analysis models were chosen for comparison, tailored to the characteristics of the collected data and the requirement for weekly-level forecasting.
The results of the model comparison revealed that the inclusion of financing or credit application volumes did not significantly affect forecasting accuracy of the case company.This study set out to answer the question concerning the types of models to be tested and compared when building an initial time series forecasting model for an organization which does not yet have a process in place. According to the literature review, even a simple model is an important tool in call center management and favoring simplicity does not necessarily mean less accurate forecasts. Due to these findings, simpler models commonly used in time series analysis, such as ARIMA and ETS, are favored.1,11
The next goal of the research was to find out what the best model is out of the chosen ones when it comes to forecasting the call center data of the case company, in particular. An error term comparison was done between three types of ARIMA models, as well as an exponential smoothing model. The data was cleaned and aggregated to a weekly level containing 166 months of data and divided into a training and testing set. The result shows that the simpler models were the most accurate, with an ARIMA model with no external variables having the smallest error terms across the board, followed by a simple exponential smoothing model with additive errors.
