Case Studies

If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.

Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.

Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.

Long memory

Author: Rutger Lit
Date: July 05, 2022
Software: Time Series Lab - Home Edition
Topics: score-driven models, long memory, fat tails

Long memory

We continue the case studies with a record of the lowest annual water levels on the Nile river during 622-1467 measured at the island of Roda, near Cairo, Egypt. The series is also available till 1918 but has periods of many missing values which is not the topic of this case study. For missing value analysis see for example the (first) Nile Case study. The Nile Minimum dataset is part of any TSL installer file and can be found in the data folder located in the install folder of TSL. When we inspect the autocorrelation function of the time series on the Database page, we find that the ACF displays a classic long memory pattern. Even after increasing the number of lags to 50, we still find significant lags, see the following figure:

Autocorrelation function Nile Minimum

Data inspection and preparation page

In this case study we will demonstrate several ways of modelling this dataset. We begin with the score-driven models which show interesting results, especially if we deviate from the Normal distribution.

Score-driven models

The power of score-driven models lies in the ability of score-driven models to deviate from the Normal distribution for the irregular component of the model. A distribution like the Student t distribution, for example, is much less susceptible to outliers in the data. Furthermore, the score-driven models allow us to choose an arbitrary number of AR orders (p) and score lags (q). But how to choose p and q? It turns out that the algorithm of Hyndman and Khandakar (2008) to find optimum values for p and q works for score-driven models as well. This does not come as a surprise since ARMA models are subsets of score-driven models.
We navigate to the Pre-built models page of TSL and select only the model DCS-g in the score-driven column. We then tick the Auto detect optimum p, q and select an 100%/0% ratio for Training and Validation sample. Press the Process dashboard button in the bottom right corner. TSL starts working and comes up with an optimum of $p = 2, q = 2$ with a constant included.


———————————————————————————— PARAMETER OPTIMIZATION ————————————————————————————

Model TSL019: DCS-g
The dependent variable Y is: Minimum
The selection sample is: 0622-01-01 - 1467-01-01 (N = 1, T = 846 with 0 missings)

Lower AIC found with value 2041.4941
Model specs: p = 0, q = 1, constant included

Lower AIC found with value 1876.851
Model specs: p = 1, q = 2, constant included

Lower AIC found with value 1876.6624
Model specs: p = 2, q = 2, constant included


—————————————————————————————————— MODEL FIT ———————————————————————————————————

Model: TSL001 DCS-g(2,2)
variable: Minimum

                                               TSL001
Log likelihood                              -932.3312   
Akaike Information Criterion (AIC)          1876.6624   
Bias corrected AIC (AICc)                   1876.7626   
Bayesian Information Criterion (BIC)        1905.1056   
in-sample MSE                                  0.5313   
... RMSE                                       0.7289   
... MAE                                        0.5403   
... MAPE                                       4.6854   
Sample size                                       846   
Effective sample size                             844   
* based on one-step-ahead forecast errors 
                            

Continuing the modelling process we select only the DCS-t model in the score-driven column. We then tick the Auto detect optimum p, q box and press the Process dashboard button in the bottom right corner. After TSL is done finding the optimum number p, q we have the results


———————————————————————————— PARAMETER OPTIMIZATION ————————————————————————————

Model TSL020: DCS-t
The dependent variable Y is: Minimum
The selection sample is: 0622-01-01 - 1467-01-01 (N = 1, T = 846 with 0 missings)

Lower AIC found with value 2005.2239
Model specs: p = 0, q = 1, constant included

Lower AIC found with value 1801.8766
Model specs: p = 4, q = 2, constant included

Lower AIC found with value 1799.8932
Model specs: p = 3, q = 2, constant included

Lower AIC found with value 1798.3775
Model specs: p = 2, q = 2, constant included


—————————————————————————————————— MODEL FIT ———————————————————————————————————

Model: TSL002 DCS-t(2,2)
variable: Minimum

                                               TSL002
Log likelihood                              -892.1888   
Akaike Information Criterion (AIC)          1798.3775   
Bias corrected AIC (AICc)                   1798.5112   
Bayesian Information Criterion (BIC)        1831.5612   
in-sample MSE                                  0.5242   
... RMSE                                       0.7240   
... MAE                                        0.5315   
... MAPE                                       4.5938   
Sample size                                       846   
Effective sample size                             844   
* based on one-step-ahead forecast errors
                            

The improvement in model fit is large. The likelihood improved 40 likelihood points and the AIC of the DCS-t(2,2) model is almost 100 points lower (better). In-sample measures MSE, RMSE, MAE, and MAPE are all better as well, albeit less dramatic. The student t model had one extra model parameter that needs to be estimated. This is the degrees of freedom and it is estimated at 4.8061 which shows that the tails of the distribution are much thicker than that of the Normal distribution meaning that the probability of extreme events becomes larger. Note that for degrees of freedom going to infinity, the DCS-t(p, q) model reverts to the DCS-g(p, q) model. In practice, the degrees of freedom do not need to go all the way to infinity. Degrees of freedom being estimated > 100 already closely resembles the DCS-g(p, q) model. The effect of thicker tails can be see in the figure below.

Extracted signal of the DCS-t(2, 2) and DCS-t(2, 2) model

Data inspection and preparation page
The figure shows the extracted signal of the DCS-t(2, 2) and DCS-t(2, 2) model. We see that the DCS-t(2, 2) reacts less strongly to the outliers in the data

Two component model

Another way to model a long memory process is by using two components, one persistent component and one (less persistent) stationary component. We go to the Build your own model page of TSL and select a time-varying level (Random Walk). Do not select a slope component but do select an ARMA(1,0). Go to the Estimation page and click the Estimate button. The result is a model with a log likelihood value roughly similar to the DCS-g(2, 2) model and an extracted signal that shows a comparable pattern as well, see the following figure.

Extracted signal DCS and two component models

Data inspection and preparation page

We can also specifically deal with outliers and possible structural breaks in the data. Go to the Build your own model page and add Automatically find Intervention variables to the model. Go to the Estimation page and click the Estimate button. The result is an even better model fit than the one from the DCS-t(2, 2) model. We have:


—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————

Intervention coefficients:

Beta                               Value        Std.Err         t-stat           Prob
β_outlier_0627-01-01               2.244         0.5456          4.113     4.3053e-05   
β_outlier_0646-01-01               2.772         0.5447          5.089     4.4532e-07   
β_outlier_0656-01-01               1.694         0.5447          3.110         0.0019   
β_outlier_0660-01-01               1.681         0.5447          3.086         0.0021   
β_outlier_0691-01-01              -2.202         0.5447         -4.043     5.7797e-05   
β_outlier_0713-01-01              -1.738         0.5447         -3.191         0.0015   
β_outlier_0719-01-01               1.750         0.5447          3.213         0.0014   
β_outlier_0809-01-01               3.502         0.5447          6.429     2.1781e-10   
β_outlier_0878-01-01               2.747         0.5447          5.044     5.6139e-07   
β_outlier_0962-01-01               1.889         0.5447          3.468     5.5236e-04   
β_outlier_0981-01-01              -1.650         0.5447         -3.030         0.0025   
β_outlier_1060-01-01               1.781         0.5447          3.269         0.0011   
β_outlier_1067-01-01               1.782         0.5447          3.271         0.0011   
β_outlier_1100-01-01               1.922         0.5447          3.529     4.4111e-04   
β_outlier_1292-01-01              -1.947         0.5447         -3.574     3.7188e-04   
β_outlier_1357-01-01               3.548         0.5447          6.514     1.2730e-10   
β_outlier_1409-01-01               1.812         0.5470          3.312     9.6739e-04   
β_outlier_1433-01-01               2.707         0.5448          4.968     8.2148e-07   
β_outlier_1439-01-01               1.897         0.5447          3.482     5.2299e-04   
β_outlier_1444-01-01               2.809         0.5448          5.157     3.1512e-07   
β_break_1397-01-01                -1.815         0.4063         -4.466     9.0772e-06   
β_break_1411-01-01                 1.608         0.4073          3.948     8.5591e-05   


—————————————————————————————————— MODEL FIT ———————————————————————————————————

Model: TSL004
variable: Minimum

                                               TSL004
Log likelihood                              -779.5080   
Akaike Information Criterion (AIC)          1613.0160   
Bias corrected AIC (AICc)                   1614.8644   
Bayesian Information Criterion (BIC)        1741.0100   
in-sample MSE                                  0.5107   
... RMSE                                       0.7146   
... MAE                                        0.5235   
... MAPE                                       4.5169   
Sample size                                       846   
Effective sample size                             823   
* based on one-step-ahead forecast errors
                            

TSL finds 20 outliers and 2 structural breaks, roughly 1 interventions every 38 year. The extracted signal of the two component model with interventions is roughly comparable to the DCS-t(2, 2) models as can be seen in the following figure.

Structural break

Data inspection and preparation page

Autocorrelation functions

Data inspection and preparation page

Bibliography

References

Hyndman, R. J. and Y. Khandakar (2008). Automatic time series forecasting: the forecast package for r. Journal of statistical software 27, 1–22.