If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.
Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.
Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.
Date: July 05, 2022
Software: Time Series Lab - Home Edition
Topics: score-driven models, long memory, fat tails
We continue the case studies with a record of the lowest annual water levels on the Nile river during 622-1467 measured at the island of Roda, near Cairo, Egypt. The series is also available till 1918 but has periods of many missing values which is not the topic of this case study. For missing value analysis see for example the (first) Nile Case study. The Nile Minimum dataset is part of any TSL installer file and can be found in the data folder located in the install folder of TSL. When we inspect the autocorrelation function of the time series on the Database page, we find that the ACF displays a classic long memory pattern. Even after increasing the number of lags to 50, we still find significant lags, see the following figure:
Autocorrelation function Nile Minimum
In this case study we will demonstrate several ways of modelling this dataset. We begin with the score-driven models which show interesting results, especially if we deviate from the Normal distribution.
The power of score-driven models lies in the ability of score-driven models to deviate from the
Normal distribution for the irregular component of the model. A distribution like the Student
t distribution, for example, is much less susceptible to outliers in the data. Furthermore, the
score-driven models allow us to choose an arbitrary number of AR orders (p) and score lags (q). But how to choose p and q? It turns out that the algorithm of Hyndman and Khandakar (2008) to find optimum values for p and q works for score-driven
models as well. This does not come as a surprise since ARMA models are subsets of score-driven models.
We navigate to the Pre-built models page of TSL and select only the model DCS-g in the score-driven column. We then tick the Auto detect optimum p, q and select an 100%/0% ratio for Training and Validation sample. Press the Process dashboard button in the bottom right corner. TSL starts working and comes up with an optimum of $p = 2, q = 2$ with a constant included.
———————————————————————————— PARAMETER OPTIMIZATION ———————————————————————————— Model TSL019: DCS-g The dependent variable Y is: Minimum The selection sample is: 0622-01-01 - 1467-01-01 (N = 1, T = 846 with 0 missings) Lower AIC found with value 2041.4941 Model specs: p = 0, q = 1, constant included Lower AIC found with value 1876.851 Model specs: p = 1, q = 2, constant included Lower AIC found with value 1876.6624 Model specs: p = 2, q = 2, constant included —————————————————————————————————— MODEL FIT ——————————————————————————————————— Model: TSL001 DCS-g(2,2) variable: Minimum TSL001 Log likelihood -932.3312 Akaike Information Criterion (AIC) 1876.6624 Bias corrected AIC (AICc) 1876.7626 Bayesian Information Criterion (BIC) 1905.1056 in-sample MSE 0.5313 ... RMSE 0.7289 ... MAE 0.5403 ... MAPE 4.6854 Sample size 846 Effective sample size 844 * based on one-step-ahead forecast errors
Continuing the modelling process we select only the DCS-t model in the score-driven column. We then tick the Auto detect optimum p, q box and press the Process dashboard button in the bottom right corner. After TSL is done finding the optimum number p, q we have the results
———————————————————————————— PARAMETER OPTIMIZATION ———————————————————————————— Model TSL020: DCS-t The dependent variable Y is: Minimum The selection sample is: 0622-01-01 - 1467-01-01 (N = 1, T = 846 with 0 missings) Lower AIC found with value 2005.2239 Model specs: p = 0, q = 1, constant included Lower AIC found with value 1801.8766 Model specs: p = 4, q = 2, constant included Lower AIC found with value 1799.8932 Model specs: p = 3, q = 2, constant included Lower AIC found with value 1798.3775 Model specs: p = 2, q = 2, constant included —————————————————————————————————— MODEL FIT ——————————————————————————————————— Model: TSL002 DCS-t(2,2) variable: Minimum TSL002 Log likelihood -892.1888 Akaike Information Criterion (AIC) 1798.3775 Bias corrected AIC (AICc) 1798.5112 Bayesian Information Criterion (BIC) 1831.5612 in-sample MSE 0.5242 ... RMSE 0.7240 ... MAE 0.5315 ... MAPE 4.5938 Sample size 846 Effective sample size 844 * based on one-step-ahead forecast errors
The improvement in model fit is large. The likelihood improved 40 likelihood points and the AIC of the DCS-t(2,2) model is almost 100 points lower (better). In-sample measures MSE, RMSE, MAE, and MAPE are all better as well, albeit less dramatic. The student t model had one extra model parameter that needs to be estimated. This is the degrees of freedom and it is estimated at 4.8061 which shows that the tails of the distribution are much thicker than that of the Normal distribution meaning that the probability of extreme events becomes larger. Note that for degrees of freedom going to infinity, the DCS-t(p, q) model reverts to the DCS-g(p, q) model. In practice, the degrees of freedom do not need to go all the way to infinity. Degrees of freedom being estimated > 100 already closely resembles the DCS-g(p, q) model. The effect of thicker tails can be see in the figure below.
Extracted signal of the DCS-t(2, 2) and DCS-t(2, 2) model
Two component model
Another way to model a long memory process is by using two components, one persistent component and one (less persistent) stationary component. We go to the Build your own model page of TSL and select a time-varying level (Random Walk). Do not select a slope component but do select an ARMA(1,0). Go to the Estimation page and click the Estimate button. The result is a model with a log likelihood value roughly similar to the DCS-g(2, 2) model and an extracted signal that shows a comparable pattern as well, see the following figure.
Extracted signal DCS and two component models
We can also specifically deal with outliers and possible structural breaks in the data. Go to the Build your own model page and add Automatically find Intervention variables to the model. Go to the Estimation page and click the Estimate button. The result is an even better model fit than the one from the DCS-t(2, 2) model. We have:
—————————————————————————————— PARAMETER SUMMARY ——————————————————————————————— Intervention coefficients: Beta Value Std.Err t-stat Prob β_outlier_0627-01-01 2.244 0.5456 4.113 4.3053e-05 β_outlier_0646-01-01 2.772 0.5447 5.089 4.4532e-07 β_outlier_0656-01-01 1.694 0.5447 3.110 0.0019 β_outlier_0660-01-01 1.681 0.5447 3.086 0.0021 β_outlier_0691-01-01 -2.202 0.5447 -4.043 5.7797e-05 β_outlier_0713-01-01 -1.738 0.5447 -3.191 0.0015 β_outlier_0719-01-01 1.750 0.5447 3.213 0.0014 β_outlier_0809-01-01 3.502 0.5447 6.429 2.1781e-10 β_outlier_0878-01-01 2.747 0.5447 5.044 5.6139e-07 β_outlier_0962-01-01 1.889 0.5447 3.468 5.5236e-04 β_outlier_0981-01-01 -1.650 0.5447 -3.030 0.0025 β_outlier_1060-01-01 1.781 0.5447 3.269 0.0011 β_outlier_1067-01-01 1.782 0.5447 3.271 0.0011 β_outlier_1100-01-01 1.922 0.5447 3.529 4.4111e-04 β_outlier_1292-01-01 -1.947 0.5447 -3.574 3.7188e-04 β_outlier_1357-01-01 3.548 0.5447 6.514 1.2730e-10 β_outlier_1409-01-01 1.812 0.5470 3.312 9.6739e-04 β_outlier_1433-01-01 2.707 0.5448 4.968 8.2148e-07 β_outlier_1439-01-01 1.897 0.5447 3.482 5.2299e-04 β_outlier_1444-01-01 2.809 0.5448 5.157 3.1512e-07 β_break_1397-01-01 -1.815 0.4063 -4.466 9.0772e-06 β_break_1411-01-01 1.608 0.4073 3.948 8.5591e-05 —————————————————————————————————— MODEL FIT ——————————————————————————————————— Model: TSL004 variable: Minimum TSL004 Log likelihood -779.5080 Akaike Information Criterion (AIC) 1613.0160 Bias corrected AIC (AICc) 1614.8644 Bayesian Information Criterion (BIC) 1741.0100 in-sample MSE 0.5107 ... RMSE 0.7146 ... MAE 0.5235 ... MAPE 4.5169 Sample size 846 Effective sample size 823 * based on one-step-ahead forecast errors
TSL finds 20 outliers and 2 structural breaks, roughly 1 interventions every 38 year. The extracted signal of the two component model with interventions is roughly comparable to the DCS-t(2, 2) models as can be seen in the following figure.
Hyndman, R. J. and Y. Khandakar (2008). Automatic time series forecasting: the forecast package for r. Journal of statistical software 27, 1–22.