Forecasting Patient Visits to Hospitals using a WD & ANN-based Decomposition and Ensemble Model

Forecasting the number of patient visits to hospitals has aroused an increasingly large interest from both theoretic and application perspectives. To enhance the accuracy of forecasting hospital visits, this paper proposes a hybrid approach by coupling wavelet decomposition (WD) and artificial neural network (ANN) under the framework of “decomposition and ensemble”. In this model, the WD is first employed to decompose the original monthly data of the number of patient visits to hospitals into several components and one residual term. Then, the ANN as a powerful prediction tool is implemented to fit each decomposed component and generate individual prediction results. Finally, all individual prediction values are fused into the final prediction output by simple addition method. For illustration and verification, four sets of monthly series data of the number of patient visits to hospitals are used as the sample data, and the results show that the proposed model can obtain significantly more accurate forecasting results than all considered popular forecasting techniques.


INTRODUCTION
The high efficiency of hospital management depends to some degree on appropriate allocation of material resources and proper physician and nurse staffing because of the limit of those resources and hospital budget pressure.Forecasting the number of patient visits to hospitals can be helpful in allocating limited human and material resources of hospitals (Hadavandi et al., 2012).For instance, forecasting of short-term hospital census may result in improvement of inpatient bed allocation and decrease in the incidence of overstaffing and understaffing (Littig et al., 2007).Nowadays, for its great help in hospitals' resource allocation, forecasting the number of patient visits to hospitals has been paid more and more attention to and achieved a significant status in hospital management (Safar & Alkhezzi, 2016).More accurate hospital visits prediction can contribute to higher efficiency of hospital management without any doubt.
In the existing literature, a lot of studies have been done in this filed and traditional forecasting techniques, especially autoregressive integrated moving average (ARIMA) modeling, are quite popular.For instance, Milner (1997) modeled attendances at accident and emergency departments by one-off original ARIMA.Abdel-Aal and Mangoud (1998) applied an ARIMA model in forecasting the monthly patient volume at a health care clinic.Diaz et al. (2001) used ARIMA to forecast emergency hospital admissions.Zibners (2006) used Box-Jenkins ARIMA to predict patient visits to an academic pediatric emergency department (ED).Friede et al. (2009) utilized ARIMA to predict daily trauma admissions considering the effect of weather, weekday and other variables.Abraham et al. (2009) applied a seasonal ARIMA to forecast emergency occupancy.Sun et al. (2009) established ARIMA models to predict daily attendance at an ED.Kam (2010) used seasonal ARIMA to predict daily number of patient visits to ED. ARIMA was constructed to predict ED visitor volume in Chen et al.'s study (2011).Kadri et al. (2014) used ARIMA to forecast daily patient attendances.Exponential smoothing model, as a common time series forecasting tool, is developed by Bergs (2014) to model monthly ED visits.Kovats et al. (2004) applied autoregressive Poisson to model daily emergency hospital admissions.Koestler et al. (2013) introduced a seasonality adjusted Poisson Autoregressive (PAR) model to hospital census forecasting.
Other traditional methods applied in forecasting hospital visits are presented as follows.Dan and Qualls (1994) compared five types of prediction models including raw observations, moving averages, mean values with moving averages, seasonal indicators with moving averages, and ARIMA in predictions of ED volume, length of stay, and acuity.And it turned out that simpler models performed best in their study.Rotstein et al. (1997) formulated a General Linear Model (GLM) to describe ED patient volume, which implied the model's usefulness in short-range forecasting of patient volume.Batal et al. (2001) developed prediction equations for daily patient volume via stepwise linear regression analysis.Reis and Mandl (2003) constructed a trimmed mean seasonal model in expected number of daily patient visits to an ED.Brillman et al. (2005) established a first order cyclical regression model with fixed-width sine and cosine harmonics as the seasonal component and a hierarchical model with scalable Gaussian function as the seasonal component for ED daily respiratory chief complaints.Flottemesch et al. (2007) established a mathematical model of ED census which could be utilized as a forecasting method.In Boyle et al.'s study (2008), a general linear regression model formed with 11 dummy variables in predicting monthly patient admissions performed best.Jones (2009) utilized Vector autoregression (VAR) to model and forecast demand in the emergency department.Au-Yeung et al. (2009) predicted patient arrivals to an accident and emergency department via a structural time series model.
Among so many traditional approaches, the prediction efficacy differs.Morzuch and Allen (2006) compared an unobserved components model (UCM) and double exponential smoothing in prediction of hourly hospital ED arrivals, and the latter outperformed the former.Three forecasting models including hourly historical average, seasonal ARIMA and sinusoidal with an autoregression-structured error term were used in ED bed occupancy by Schweigler et al. (2009).Marcilio et al. (2013) used time-series methods including generalized linear models, generalized estimating equations and seasonal ARIMA to forecast daily ED visits.Kim et al. (2014) applied exponential smoothing, ARIMA, seasonal ARIMA, and generalized autoregressive conditional heteroscedasticity (GARCH) methods in patient volume forecasting.Although these traditional techniques are popular in forecasting hospital visits, they are not good at dealing complexity in hospital visits data.
Another approach for hospital visits forecasting is Artificial Intelligence (AI) such as Artificial Neural Network (ANN) (Pan, 2017).Although ANN doesn't seem as prevalent as those traditional methods in this field, it is widely used in other fields of prediction as a powerful forecasting tool.However, there are still a few studies in this field with AI involved.For example, Jones et al. (2008) compared seasonal ARIMA, exponential smoothing, time series regression and ANN in daily ED patient volume prediction to linear regression and the former four methods didn't bring consistent good results in forecasting out of sample though they all show better in sample fitting.Aladag and Aladag (2012) explored modeling the number of outpatient visits by ANN with different activation functions.Xu et al. (2013) modeled daily patient arrivals at ED via ANN.However, though superior to traditional methods, AI methods also have their own shortcomings.For example, the local minimum and overfitting problems sometimes occur while using ANN.
Since both single traditional and AI methods mentioned above seem to come across a bottleneck in improving the accuracy of predicting patient visits to hospitals, some hybrid models have been formulated to dig more deeply in this field.Those hybrid models are advantageous in dealing with data complexity, which can lead to better forecasting results.Cheng et al. (2008) coupled a new fuzzy time series method based on weighted-transitional matrix with an expectation method and a grade-selection method as forecasting methods to predict the number of outpatient visits.Garg et al. (2012) proposed a model based on fuzzy time series to forecast the number of outpatient visits.Hadavandia et al. (2012) developed a hybrid AI model using genetic algorithm for outpatient visits forecasting.These hybrid models developed a new platform for hospital visits prediction.
To address the shortcomings of single traditional AI models, this study proposes a hybrid decomposition-andensemble learning paradigm incorporating wavelet decomposition (WD), artificial neural network (ANN) and the simple addition (ADD) to predict the number of patient visits to Chinese hospitals based on the principle of

Contribution of this paper to the literature
• WD-FNN-ADD learning paradigm can effectively improve the forecasting precision, in terms of RMSE and MAPE criteria, since WD-FNN-ADD model outperforms all the benchmark models.The proposed WD-FNN-ADD model is a very promising tool in forecasting complex time series data of hospital visits.
• The proposed WD-FNN-ADD approach can also be utilized to address other difficult forecasting tasks, especially for complex, volatile data with multiple data characteristics.
• How to select the most appropriate wavelet in wavelet decomposition is extremely crucial, which exerts an influence on the decomposition results and accordingly has an impact on forecasting results.
"decomposition and ensemble" (Yu et al., 2008;Wang et al., 2011;Wang, Tang, Yu, 2011;Tang et al., 2012;Yu et al., 2015;Tang et al., 2015).In this model, the WD is first employed to decompose the original monthly data of the number of patient visits to hospitals into several details and one approximation.Then, ANN, as a powerful prediction tool, is implemented to model each decomposed component and generate individual prediction results.Finally, all individual prediction values are fused into the final prediction output by the simple addition (ADD).
Generally speaking, due to the high irregularity and volatility of the number of patient visits to hospitals, the principle of "decomposition and ensemble" is introduced to cope with difficulties in modeling hospital visits.Consequently, a hybrid decomposition-and-ensemble learning paradigm integrating WD, ANN and ADD is proposed.In this proposed paradigm, WD, as a decomposition tool, is first performed to divide nonlinear, complicated hospital visits data into several relatively simple subsets of data, which are more easily processed further and thus bringing better forecasting performance.
The main motivation of this study is to propose a hybrid ensemble learning paradigm integrating WD, ANN and ADD to predict the number of patient visits to Chinese hospitals and compare its prediction performance with some extant popular forecasting techniques.The rest of the paper is organized as follows.Section 2 describes the proposed hybrid ensemble learning paradigm in detail.In Section 3, forecasting results and effectiveness of the proposed methodology are discussed.Finally, conclusions are drawn in Section 4.

METHOD
Due to volatility of hospital visits, a novel WD & ANN-based decomposition-and-ensemble approach is introduced to forecast the number of patient visits to hospitals.The general framework of the proposed hybrid learning paradigm can be illustrated in Figure 1.Generally speaking, there are three main steps involved.In this section, WD is first presented in Section 2.1 and ANN is then described in Section 2.2 Finally, the WD&ANN-based decomposition-and-ensemble algorithm is formulated in Section 2.3.

Wavelet Decomposition (WD)
Wavelet decomposition is proposed to overcome the drawbacks of Fourier transform.In Fourier transform, a signal is required to be periodic and frequencies are assumed not to evolve in time.To resolve the stationary assumption, short-time Fourier transform is proposed by applying a single fixed window, which leads to more cycles captured in a window with frequency increasing.Unlike short-time Fourier transform, the striking feature of wavelet transform lies in its adjustable windows.In wavelet transform, long window is used for low frequency while short window is for high frequency (Benhmad, 2011).In general, wavelet transform is superior in representing local details of a signal and capturing transient mutation in a signal, which results in its popularity in signal analysis, image processing, medical imaging and diagnosis, et al.For example, Geva and Kerem (1998) decomposed the electroencephalogram (EEG) signal via fast wavelet transform (FWT) to forecast generalized epileptic seizures.In Sierra et al.'s study (1997), an electrocardiogram signal was processed by wavelet multi-scale decomposition for further risk prediction.
Original time series data   ( = 1, 2, … , ) can be decomposed into several components by discrete wavelet transform at a given scale , which can be expressed as where the former term is a low-frequency approximation containing scaling functions    (k is determined by data length and j.) with coefficients    .The other term consisting of mother wavelet functions    () with coefficients    in Eq.( 1) is composed of a series of high-frequency details.

Artificial Neural Network (ANN)
With flexible function designs and powerful ability of self-learning, artificial intelligence (AI) is superior to traditional forecasting methods.In addition, artificial neural network (ANN) may be one of the most effective AI tools and as a typical learning paradigm, it is widely employed in forecasting of complex data.In this study, a standard three-layer feed-forward neural network (FNN) (Hornik et al., 1989;White, 1990), based on error backpropagation algorithm, is applied for modeling the number of visits in hospitals.
Usually, a FNN-based forecasting model can be trained by in-sample dataset and then applied to out-of-sample dataset for prediction.The model parameters (connection weights and node biases) are adjusted iteratively by a process of minimizing the forecasting error function.Basically, the final output of the FNN-based forecasting model can be represented as where   ( = 1, 2, … , ) represents the input patterns, () is the output,   ( = 0, 1, 2, … , ) is a bias on the th unit,   ( = 1, 2, … , ;  = 1, 2, … , ) and   are the connection weights between layers, (•) is the transfer function of the hidden layer,  is the number of input nodes, and  is the number of hidden nodes.Actually, the FNN model in Eq. ( 2) performs a nonlinear functional mapping from past observations ( −1 ,  −2 , … ,  − ) to future value  − , i.e.
where  is the horizon,  is the vector of all parameters, and (•) is the function trained by FNN.

WD & ANN-based Decomposition-and-Ensemble Paradigm
Based on the techniques above, a hybrid WD&ANN-based decomposition-and-ensemble paradigm for patient volume is formulated as shown in Figure 1.
With a given time series   ( = 1, 2, … , ), a multi-step-ahead prediction is performed in the form as follows: where  � + is the predicted value in period  + ,   is the actual value,  represents the lag order and  is the horizon.
There are three main steps in the proposed paradigm, as illustrated in Figure 1.
Step 2: The components obtained after Step 1 are modeled respectively using FNN, a powerful forecasting tool, to produce corresponding predictions.
Step 3: All the prediction results of wavelet decomposition components are fused into the final prediction for the original time series xt via simple addition (ADD).Since WD decomposes the original data xt into several components, the individual prediction results can be added up to produce the final result.
To summarize, the proposed WD&ANN-based decomposition-and-ensemble learning paradigm can be abbreviated as a "WD (decomposition)-FNN (prediction)-ADD (ensemble)" hybrid learning approach.
In order to verify the effectiveness of the proposed WD&ANN-based decomposition-and-ensemble learning paradigm, four datasets of visits in hospitals in China are targeted for testing in the next section.

EXPERIMENTAL STUDY
To verify the effectiveness of this proposed hybrid paradigm, four sets of data of Chinese hospital visits are utilized as sample data, and some benchmark models are used to compare with the proposed approach.This part, Section 3, comprises five major parts, including data description, evaluation criteria, benchmark models, parameter settings and results analyses.

Data Description
In this study, four different monthly series datasets of the number of patient visits to Chinese hospitals which are aggregates from the whole hospitals, tertiary hospitals, secondary hospitals and primary hospitals in China are chosen.These data are obtained from Wind Info.Each dataset ranges from January 2010 to February 2015 with 57 observations excluding December in every year as shown in Figure 2. The data from January 2010 to January 2013 are used as the training set (accounting for about 80% with 44 observations), and the remaining data are used as the testing set (with 13 observations).To test the forecasting capability of different models, one-, two-and threestep-ahead predictions are performed.

Evaluation Criteria
To evaluate forecasting accuracy, two criteria are applied and they are the root mean squared error (RMSE) and the mean absolute percent error (MAPE): (5) where   is the actual value,  �  is the predicted value and  is the number of observations in the testing dataset.

Benchmark Models
In order to test the effectiveness of the proposed WD&ANN-based decomposition-and-ensemble forecasting model, two types of traditional forecasting models and an artificial intelligence (AI) method are formulated as benchmarks.
For traditional forecasting models, auto-regressive integrated moving average (ARIMA) is quite popular (Box et al., 2011), and has been widely utilized in predicting hospital visits.Another traditional forecasting model, i.e., exponential smoothing (ES), is a kind of important forecasting technique and also shows up in hospital visits forecasting (Bergs et al., 2014;Morzuch, 2006).So the two methods mentioned above are implemented.
For artificial intelligence (AI), the most typical AI technique, i.e., feed-forward neural network (FNN), is embraced in this study as a benchmark model.
In general, for the proposed WD-FNN-ADD decomposition-and-ensemble model, two traditional methods (i.e., ARIMA and exponential smoothing) and one artificial intelligence (AI) method (i.e., FNN) are formulated to be compared with.

Parameter Settings
For traditional methodology, the ARIMA models as a benchmark in this study are determined based on autocorrelation and partial correlation analysis.Parameters of exponential smoothing models are set based on the Standard Error minimization principle.For AI technique, each FNN built in this study uses seven nodes in the hidden layer, one output neuron and I input neurons, where I is the lag order determined by autocorrelation analysis and finally set to 11. FNNs run iteratively 10000 times using the training subsets.
For the WD&ANN-based hybrid model, the first step is to decompose the original data into an approximation and a series of details via wavelet decomposition (WD) where Daubechies wavelet db8 is used.In accordance with the length of each dataset, the WD (Daubechies, 1992;Mallat, 1989) decomposition level is set to 5, which means WD decomposes the data into one approximation and five details, as shown respectively in Figures 3-4.The second step of the hybrid model is to forecast the approximation and all the details by FNN, where for consistency purpose FNNs are formed in the similar way as those applied as benchmark models.In the third step, the application of the ADD ensemble method, simply summing together those individual prediction results obtained in the second step, contributes to the production of the final results.

Results Analyses
The prediction results of the four models (i.e., two traditional models ARIMA and ES, one artificial technique FNN, and the proposed hybrid model WD-FNN-ADD) for four diverse sets of hospital visits data are illustrated in Figures 5-12.Figures 5-6, 7-8, 9-10 and 11-12 respectively show the forecasting performance comparison results for the whole, tertiary, secondary and primary hospital visits from the aspects of RMSE and MAPE criteria.From the results, it is obvious that the proposed method WD-ANN-ADD outperforms all the three benchmark models in all cases, in terms of RMSE and MAPE.The results of RMSE and MAPE criteria for four separate types of hospital visits reveal that the proposed WD-ANN-ADD model performs the best in all cases compared to the other three models while the exponential smoothing (ES) method always shows the worst prediction accuracy without exception.As for ARIMA and ANN, It is hard to define whose performance in forecasting is better for the situation differs when the experimental dataset renews and more details are described in the following content.The fact that the hybrid WD-FNN-ADD model strikingly surpasses all the other three benchmarks in multistep-ahead prediction accuracy, in terms of RMSE and MAPE criteria, displays the superiority of the "decomposition and ensemble" strategy in prediction accuracy improvement and also confirms the effectiveness of this novel paradigm in improving forecasting accuracy.In the light of the hybrid method's (i.e., WD-FNN-ADD) absolute excellent performance in prediction, a conclusion can be drawn: wavelet composition (WD) does play an important role in the whole experimental process.Wavelet decomposition as a common signal processing tool can decompose the original complex data into several components that can be more easily processed further, which explains the importance of wavelet decomposition in the hybrid model, i.e., WD-FNN-ADD.
Both the RMSE and MAPE of the exponential smoothing model (ES) prediction in the context of four different sets of hospital visits data are much higher than the others' (i.e., ARIMA, ANN and WD-ANN-ADD model).Due to volatility of the monthly number of patient visits to hospitals and inconsistency of data structure, it is hard to model each dataset exactly via an exponential smoothing method with a sole parameter.
In this study, which one of ARIMA and ANN has better performance in modeling the number of patient visits hospitals can't be hastily determined.ANN outperforms ARIMA a little, in terms of RMSE and MAPE criteria, in forecasting the number of visits in the whole Chinese hospitals as shown in Figures 5-6.From Figures 7-8 and Figures 11-12, it is easy to find that ARIMA demonstrates better results for both the Chinese tertiary and secondary hospital visits prediction than ANN does.Here comes the Chinese secondary hospital visits forecasting.From  Generally, four main conclusions can be drawn according to the analyses above.(1) The proposed WD-based FNN hybrid decomposition-and-ensemble model, i.e., WD-FNN-ADD, is significantly superior to all the benchmark models, i.e., ARIMA, exponential smoothing and ANN, in terms of RMSE and MAPE comparison of one-, two-and three-step-ahead forecasting.(2) The proposed hybrid approach remarkably outperforms all the other models listed in this study in all cases of the whole, tertiary, secondary and primary hospital visits prediction, revealing that the principle of "decomposition and ensemble" can effectively improve forecasting precision in the context of hospital visits and requiring more emphasis in the significant status of wavelet decomposition in the hybrid model in the frame of "decomposition and ensemble".(3) That general exponential smoothing method performs the worst among all the methods in this study indicates the efficacy loss of this method in predicting mutable and irregular data.(4) The proposed WD-based FNN decomposition-and-ensemble forecasting approach, with the framework of "decomposition and ensemble" and the powerful WD decomposition technique, is a rather effective and promising model for forecasting the number of visits in hospitals.

CONCLUSIONS
In order to address the difficulties of complex hospital visits data, a hybrid decomposition-and-ensemble learning paradigm is proposed, by integrating wavelet decomposition (WD), feed-forward neural network (FNN) and simple addition (ADD), i.e., WD-FNN-ADD.The main contribution of the proposed hybrid approach can be summered into two aspects.First, different from other traditional models, a much more powerful method with its merits of handling the problem of data complexity, i.e., WD-FNN-ADD, is employed in hospital visits forecasting.Second, to the best of our knowledge, although the principle of "decomposition and ensemble" has be widely applied in time series forecasting, e.g., crude oil price (Garg et al., Tang et al., 2012;Yu et al., 2014;Yu et al., 2015), hydropower consumption (Yu et al., 2011;Wang et al., 2011) and nuclear consumption forecasting (Wang et al., 2011), such a hybrid decomposition-and-ensemble model, i.e., WD-FNN-ADD, hasn't been introduced to hospital visits analysis or forecasting research.Therefore, the gap is filled via this study applying WD-FNN-ADD to the field of hospital visits forecasting.
With the four datasets of the monthly number of patient visits to the whole, tertiary, secondary and primary hospitals in China as sample data, the experimental results indicate that the proposed hybrid WD-FNN-ADD learning paradigm can effectively improve the forecasting precision, in terms of RMSE and MAPE criteria, since WD-FNN-ADD model outperforms all the benchmark models.The proposed WD-FNN-ADD model is a very promising tool in forecasting complex time series data of hospital visits.
Besides predicting the number of patient visits to hospitals, the proposed WD-FNN-ADD approach can also be utilized to address other difficult forecasting tasks, especially for complex, volatile data with multiple data characteristics.Moreover, how to select the most appropriate wavelet in wavelet decomposition is extremely crucial, which exerts an influence on the decomposition results and accordingly has an impact on forecasting results.We will look into these issues in the near future.

Figure 1 .
Figure 1.The overall process of the WD&ANN-based decomposition-and-ensemble learning paradigm

Figure 2 .
Figure 2. The original monthly number of visits to the (a) whole hospitals (b) tertiary hospitals (c) secondary hospitals (d) primary hospitals in China

Figure 3 .
Figure 3. Decomposition result for the monthly number of patient visits to the(a) whole hospitals (b) tertiary hospitals via WD

Figure 4 .Figure 5 .
Figure 4. Decomposition result for the monthly number of patient visits to the(a) secondary hospitals (b) primary hospitals via WD

Figure 6 .Figure 7 .Figure 8 .
Figure 6.MAPE comparison of different models for the number of patient visits to the whole hospitals in China

Figures 9 -
10, almost all the values of RMSE and MAPE of ARIMA prediction are lower than ANN's except the only one RMSE value of the two-step-ahead Chinese secondary hospital visits prediction as shown in Figure 9.The hidden reason may lie in data characteristics and this should be discussed in the near future.

Figure 9 .
Figure 9. RMSE comparison of different models for the number of patient visits to the secondary hospitals in China

Figure 11 .
Figure 11.RMSE comparison of different models for the number of patient visits to the primary hospitals in China