Predicting emergency department visits in a large teaching hospital

Background Emergency department (ED) visits show a high volatility over time. Therefore, EDs are likely to be crowded at peak-volume moments. ED crowding is a widely reported problem with negative consequences for patients as well as staff. Previous studies on the predictive value of weather variables on ED visits show conflicting results. Also, no such studies were performed in the Netherlands. Therefore, we evaluated prediction models for the number of ED visits in our large the Netherlands teaching hospital based on calendar and weather variables as potential predictors. Methods Data on all ED visits from June 2016 until December 31, 2019, were extracted. The 2016–2018 data were used as training set, the 2019 data as test set. Weather data were extracted from three publicly available datasets from the Royal Netherlands Meteorological Institute. Weather observations in proximity of the hospital were used to predict the weather in the hospital’s catchment area by applying the inverse distance weighting interpolation method. The predictability of daily ED visits was examined by creating linear prediction models using stepwise selection; the mean absolute percentage error (MAPE) was used as measurement of fit. Results The number of daily ED visits shows a positive time trend and a large impact of calendar events (higher on Mondays and Fridays, lower on Saturdays and Sundays, higher at special times such as carnival, lower in holidays falling on Monday through Saturday, and summer vacation). The weather itself was a better predictor than weather volatility, but only showed a small effect; the calendar-only prediction model had very similar coefficients to the calendar+weather model for the days of the week, time trend, and special time periods (both MAPE’s were 8.7%). Conclusions Because of this similar performance, and the inaccuracy caused by weather forecasts, we decided the calendar-only model would be most useful in our hospital; it can probably be transferred for use in EDs of the same size and in a similar region. However, the variability in ED visits is considerable. Therefore, one should always anticipate potential unforeseen spikes and dips in ED visits that are not shown by the model. Supplementary Information The online version contains supplementary material available at 10.1186/s12245-021-00357-6.


Background
Large numbers of patients generally present at emergency departments (EDs). In the Netherlands, 2.3 million patients were seen on EDs in 2017 [1]. ED visits show a high volatility over time [2][3][4]. Historically, many EDs have been staffed based on average patient volumes [5], resulting in EDs that are more likely to be crowded at peak-volume moments.
ED crowding is defined as a situation in which the demand for emergency services exceeds the ability of the department to provide quality care within acceptable time frames [6]. ED crowding is a widely reported problem with negative consequences for patients as well as staff. Two large systematic reviews report various negative consequences for patients, including treatment delay and increased mortality [7,8], increased frequency of exposure to error, increased risk of readmission, and reduced patient satisfaction [8]. Reported consequences for staff include higher stress levels, increased violence towards staff, and inability to adhere to guideline-recommended treatment [8].
Both these systematic reviews included articles that identified insufficient staffing as a possible cause for ED crowding, and additional staffing as a possible solution [7,8]. To ensure adequate staffing only when needed, a flexible, volume-based staffing plan could be considered [5,9]. This can be achieved by analyzing available patient arrival patterns followed by developing a predictive model [5,10]. Possibly relevant predictors of patient arrival patterns are calendar data [2] and weather variables [11][12][13].
The predictive value of weather variables on ED visits is not yet clear as previous studies show conflicting results. Climatic differences between countries may be a possible explanation for this. Also, population's adaptations to local climatic circumstances might be of influence [14,15]. Furthermore, holidays may differ between countries and cultures, as they often have a national or cultural character. As far as we know, no previous prediction models using weather and calendar data have been developed for hospitals in the Netherlands. Therefore, the aim of this study was to create a prediction model for the number of ED visits in the Jeroen Bosch Hospital as a representative example, based on calendar data and weather variables as potential predictors.

Emergency department visits
Data on all ED visits in the Jeroen Bosch Hospital, a large teaching hospital in the Netherlands, from June 2016 until December 31, 2019, were extracted and anonymized. The data set contained the birth year and gender of the visitor and the date and time of admission to the ED. The 2016-2018 data were used as training set, the 2019 data as test set.

Calendar variables
Descriptive analysis of the ED visit data showed a positive time trend (Fig. 1), and influence of the day of the week (Fig. 2), month (Fig. 3), and summer  vacation and holidays on the number of daily ED visits (details in additional file 1). All these were added as potential predictors to the models.

Weather data
Weather data were extracted from three publicly available datasets from the Royal Netherlands Meteorological Institute (KNMI): daily readings from the automatic weather stations [16], hourly readings from the automatic weather stations [17], and daily readings from the precipitation stations [18]. The daily readings from the automatic weather stations provide an extensive characterization of the observed wind speed, temperature, radiation, pressure, visibility, cloudiness, humidity, and precipitation in the Netherlands; the daily readings from the precipitation stations and the hourly readings from the automatic weather stations further outline the occurred precipitation as well as special weather conditions such as fog, glazed frost, and storms (details in additional file 2). As shown in Fig. 4, there are no weather stations in close proximity of the Jeroen Bosch Hospital, and only a few are located not too far away. The weather observations from the KNMI weather stations in proximity of the Jeroen Bosch Hospital (green dots in Fig. 4) were used to predict the weather in the Jeroen Bosch Hospital's catchment area by applying the inverse distance weighting interpolation method [19]. The parameter of this interpolation method is estimated using leave-one-out cross-validation, where only the KNMI weather stations within the catchment area of the Jeroen Bosch Hospital are left out one at a time (details in additional file 3). Two sets of weather-related predictors were created by applying the inverse distance weighting weather interpolation method, based on two alternative hypotheses. The first set contained the weather data predicted at the Jeroen Bosch Hospital as a means to describe the weather experienced by ED visitors. The second set contained the weather data predicted for two towns in the periphery of the Jeroen Bosch Hospital's catchment area, Kaathoven and Drunen. The use of weather predictions at these locations has the potential to better describe the weather in the full catchment area than weather predictions at one central location (the Jeroen Bosch Hospital).

Prediction modeling
The predictability of daily ED visits was examined by creating linear prediction models. As many of the weather predictors were likely not to have a substantial effect on the number of ED visits, a variable selection method was used to refine and merge the available sets of calendar and weather-based predictors. This was done using the stepwise selection method which, starting with an empty model, adds or drops predictors one at a time in order to maximize improvement in some measurement of fit. For the measurement of fit the Akaike information criterion (AIC) was used, which estimates the out-of-sample error using only the sum of squared residuals of the model, the number of predictors in the model, and the number of observations in the training set. This variable selection method was applied on both the set of calendar variables plus the first set of weatherrelated predictors and on the set of calendar variables plus the second set of weather-related predictors in order to find suitable ED visit predictors. The results were used to evaluate the quality of predictors and refine/merge the sets of predictors accordingly. The final daily ED visits' linear prediction model based on calendar plus weather-based predictors was created by reapplying stepwise selection on this refined set of predictors.
The effect of predicted weather on the number of ED visits was evaluated by comparing the accuracy of test set predictions from the calendar and weather-based model with the accuracy of a linear prediction model without the weather variables from the final prediction model based on calendar plus weather-based predictors. The mean absolute percentage error (MAPE) was used as measurement of fit for the predictions, which allowed for interpretation as the average percentage the predictions are off compared to the true number of daily ED visits and for comparisons with ED visits models of other hospitals. The difference in MAPE of the models was used to verify whether and how much the weather affects the number of daily ED visits at the Jeroen Bosch Hospital.

Variable selection
Automated stepwise variable selection on dataset A (calendar-plus Jeroen Bosch Hospital-weather-based predictors; Table 1) showed that most variables present in dataset A were relevant for predicting daily ED visits, with the majority of the missing calendar-based variables being indicators for months. A small set of interaction variables was selected by stepwise selection as well. Automated stepwise variable selection on dataset B (calendar-plus Kaathoven/Drunen-weather-based predictors; Table 2) showed similar results, with generally one of these cities having a positive coefficient and one having an equally sized negative coefficient, illustrating that the volatility of the weather could also be a meaningful predictor of daily ED visits.

Merging and refining the sets of predictors
Dataset C (the refined set of predictors) is created by combining the best predictors found in datasets A and B using stepwise selection. This new set of predictors contains most calendar-based predictors from the previous sets, with the seasonal indicators replacing the monthly indicators which had a small effect on ED visits and were often omitted by stepwise selection. A small selection of important interactions seen in the results of the stepwise selections was included as well. For the description of the weather, 24 predictors were included, 12 weather variables describing 12 weather phenomena using weather predictions at the Jeroen Bosch Hospital and 12 weather variables describing the volatility of these 12 weather phenomena using the absolute difference in predicted weather between Kaathoven and Drunen. A more extensive description of the predictors in the new set can be found in additional file 4.

Weather-and calendar-based prediction model
The final calendar-and weather-based daily ED visits linear prediction model is created by reapplying stepwise selection on dataset C (Table 3) and shows higher average daily ED visits on Mondays and Fridays as well as lower average daily ED visits on Saturday and Sunday. The number of daily ED visits is also affected by a positive time trend and special time periods such as carnival, holidays falling on Monday through Saturday, and summer vacation. A selection of weather predictions at the Jeroen Bosch Hospital was included by stepwise selection as well, illustrating that the weather itself made for better daily ED visits predictors than the volatility of the weather.

Calendar-only model
The calendar-only dataset D that is used to assess the importance of weather for predicting ED visits consists     of the calendar variables selected from dataset C in stepwise selection plus the seasonal indicators omitted from this model. The calendar-only prediction model (Table 4) created using all predictors from this dataset has very similar coefficients for the days of the week, time trend and special time periods as the model created by applying stepwise selection on dataset C ( Table 3). The effects of the calendar variables on the predicted number of ED visits are shown in Table 5.The prediction performance of these two models turned out to be very similar (Table 6), both on average making predictions that are 8.7% off compared to the true number of daily ED visits.

Discussion
Our study shows that adding weather variables did not substantially increase the performance of a linear model based on calendar variables in predicting the daily number of ED visits in a retrospective setting (MAPE 8.718% vs. 8.684%). Of course, the intended value of such a model is to predict future daily numbers of ED visits, implicating that weather forecasts instead of retrospective weather data would have to be used. The uncertainties inherent to weather forecasts would increase the error of the model that includes weather variables. Therefore, we conclude that a model based on calendar variables would be most suited for hospital EDs that are comparable to ours. Previous research work confirming that calendar variables have greater predictive value on the number of ED visits than weather variables can be found in the systematic review conducted by Wargon et al. and in a study of Marcilio et al. [2,11]. A constant, moderate number of predicted ED visits were found for regular Tuesdays, Wednesdays, and Thursdays with an increased number of predicted ED visits during regular Mondays and Fridays, and a decrease during regular Saturdays and Sundays. A previous study in a tertiary hospital in Brazil and a study in four academic hospitals in France also showed a decrease in the number of ED visits at weekends [2,20], and in the latter, also an increase in the number of ED visits at Mondays was reported [20]. This was reported earlier as the weekly cycle [11]. In our study, holidays that fall on Monday through Saturday and summer vacation showed a decrease in predicted ED visits. A decrease in the number of ED visits in August was also reported in the French study [20]; however, this pattern was not found in Brazil [2]. The number of ED visits in the Jeroen Bosch Hospital showed an upwards trend, with an increase of 3.2% every year, from a mean of approximately 85 ED visits each day in 2016 to an approximate of 103 predicted ED visits each day in 2022. This yearly increase in the number of ED visits is similar to the findings in the French hospitals [20]. Our study has several strengths and limitations. A strength of this study is the use of a linear model. More complex methods such as the (seasonal) autoregressive integrated moving average ((S)ARIMA), which is a time series model, are sometimes favored; however, we used a linear model as these are easiest to understand for nonstatisticians [11,20]. Marcilio et al. compared linear models with a time series model and found that the linear models were slightly superior to the time series model [2]. Wargon et al. also described a linear model as the superior method [20]. However, Whitt and Zhang concluded that a time series model outperformed a linear model [21]. Another strength of this study is the use of the MAPE as a measurement of fit. The MAPE is similar to the mean squared error (MSE) that is more commonly used in general. However, the MAPE is used more often in this context as it yields a more intuitive interpretation of the error and thus allows for comparison with models made in similar works. The MAPE of approximately 8.7% in our linear model shows our model is accurate and comparable to several other studies showing the MAPE for linear models as well as time series models ranging between 4.2 and 14.4% [2,11,12,21]. A limitation of our study is that we did not include our most recent data because of the disrupted situation caused by the COVID-19 pandemic. We presume this situation will stabilize in the future, but recalibration with data encompassing the post-COVID-19 situation will probably be useful. A second limitation is the small geographic area studied. It was therefore not possible to investigate cultural differences or population's climatic adaptations. The similarity between the days with a decreased number of ED visits is that all are work-free days, often characterized by family visits. As the Netherlands have a temperate climate, our weatherrelated findings are potentially valid for other geographic areas with a temperate climate. A third limitation of our study is the need for interpolation of the weather data, as interpolation is an estimate. However, by using data from all weather stations in the greater Jeroen Bosch Hospital area, and by comparing two models with weather predictions at the Jeroen Bosch Hospital and at the periphery of the Jeroen Bosch Hospital's catchment area (Kaathoven and Drunen), we think our estimate is quite robust.

Conclusion
As far as we know, this is the first model predicting ED visits using calendar and weather variables in the Netherlands. It has similar performance as prediction models described in the literature so far. In conclusion, our calendar based linear model is useful to predict the number of ED visits for EDs of the same size and in a similar region as the Jeroen Bosch Hospital. However, as shown in Fig. 1, the variability in ED visits is considerable. Therefore, when using this model in practice, one should always anticipate the possibility of unforeseen spikes and dips in ED visits that are not shown by the model.