The Time Series Lab (TSL) software packages make time series analysis available to anyone with a basic knowledge of statistics. The program is written in such a way that results can be obtained quickly. However, many advanced options are available for the time series experts among us. The modelling process in TSL consists of a five step procedure: Database, Model setup, Estimation, Graphics & diagnostics, and Forecasting. In our case studies, we often present screenshots of the program so that you can easily replicate results.
Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.
May 28, 2021
TSL module: State Space Edition - Univariate Basic
Topics: fractional seasonal components and comparison of forecasting ability
The data for this case study is weekly data on US finished motor gasoline products supplied (in thousands of barrels per day) from February 1991 to May 2005.
It is part of the R package fpp2 and available from the EIA website.
The dataset is used in the TBATS paper of De Livera, A.M., R.J. Hyndman, and R.D. Snyder (2011).
Furthermore, the dataset is analysed by R.J. Hyndman on his blog.
We quote from this blog post:
The TBATS model is preferable when the seasonality changes over time. The ARIMA approach is preferable if there are covariates that are useful predictors as these can be added as additional regressors.
This gasoline case study illustrates that you don't need to choose between the two methods when you work within our TSL software platform. TSL offers a modelling framework with fractional seasonals that can evolve stochastically over time AND, at the same time, with the inclusion of covariates (explanatory variables). We show it is possible for TSL to produce more accurate forecasts compared to the TBATS package. We deliberately compare with TBATS since this package shows very accurate forecasts when complex seasonal patterns are present in the data.
Step 1 is to inspect the time series data. In the figure below, the gasoline dataset is loaded into TSL and plotted. The upward trend and seasonality patterns are clearly visible in the data. The Data characteristics area shows T = 745 observations. At a later stage (Estimation step) we will split the time series into an estimation sample and a test period.
You can copy the contents of the blue Data characteristics pane to the clipboard by right-mouse clicking the area and selecting Copy contents. The characteristics can also appear in the graph itself by right-mouse clicking the graph and selecting Add characteristics to plot.
Step 2 of the modelling process is the selection of components. In the case of the gasoline data we start our analysis with the Local Linear Trend model. To specify this model in TSL, select a stochastic level and a stochastic slope.
Step 3 of the modelling process is the estimation of the model. Our dataset consist of a total of 745 observations (February 1991 to May 2005). For this case study we select the first 484 observations as estimation sample and leaving 261 observations as test sample. For this split of the time series in two parts, we can compare our results with the results in De Livera, A.M., R.J. Hyndman, and R.D. Snyder (2011).
Estimate the model and go to step 5 and click on the Model Comparison button. Click on the start loss calculation button which is the green round button in the bottom left of the window. Under User defined models, a new checkbutton should appear which you should tick. The TSL screen you now see should be like the figure shown below.
The reason TSL does not automatically add the latest estimated model to the User defined models list, is that for long time series and large complex models, the loss calculation can take some time. Since these calculations are not always needed for every user / model, the start of the loss calculation can be controlled by the user itself.
On the Model comparison page, you can see the model specification by hoovering with the mouse over the model number (TSL001 in our example). If you click on the model number, TSL takes you to the text output on the Text output page for more model details.
The pyramid shaped loss line can be explained by the fact that a forecast from the Local Linear Trend model is a straight line that is upward sloping for our dataset. The forecasts do not take into account the seasonal pattern of the data so when the data is at the highest or lowest point in the seasonal cycle, the loss is the highest.
It is time to introduce a seasonal component. We go back to step 2 and add a stochastic seasonal to the model with a Seasonal period length of 365.25/7 ≈ 52.179 (weekly data taking leap years into account) and a Number of Fourier terms equal to 26.
Seasonal period length is the number of time points after which the seasonal repeats. This can be a fractional number. For example, with daily data, specify a period of 365.25 for a seasonal that repeats each year, taking leap years into account.
Number of Fourier terms specifies the seasonal flexibility. Note that a higher number is not always better and parsimonious models often perform better in forecasting. The extended module determines the optimal set of Fourier terms based on their statistical relevance.
Step 3: Estimate
Step 5: Model comparison ► start loss calculation
We learn from the figure above that the loss with the seasonal included is much lower.
Next, untick the loss line of the first model so that only the loss line of the best model remains.
We can now compare this loss with the loss presented in Figure 2 of De Livera, A.M., R.J. Hyndman, and R.D. Snyder (2011). We cannot include that figure here due to copyright reasons so you have to take my word for it that it is roughly comparable to the loss from our latest figure albeit our loss is a little bit higher. But not for long!
Next, go back to step 2 and lower the number of Fourier terms of the seasonal component. You can experiment with a good number by checking the forecasting capabilities of models with lower numbers of Fourier terms. Note that parsimonious models often perform better with forecasting. The extended module determines the optimal set of Fourier terms based on their statistical relevance. The extended module can make combinations of Fourier terms that cannot be constructed with the basic module. A well performing number of Fourier terms turns out to be 7. A model comparison between 26 and 7 Fourier terms is given in the Figure below. The loss corresponding to the model with 7 Fourier terms is lower than the one that is presented in Figure 2 of De Livera, A.M., R.J. Hyndman, and R.D. Snyder (2011) and that is obtained from TBATS.
A figure with the extracted trend and seasonal pattern is obtained from the Graphics and Diagnostics page.
- Forecasts can further be improved by adding explanatory variables. In TSL you can do this with the click of a couple of buttons on the Model setup page. Let us know which variables you have used to boost the forecast precision for the gasoline dataset!
- Estimate the model with a Level, Slope, and Seasonal with frequency 52. Verify that by taking the leap year not into account (52 instead of 52.179), forecasts become worse.
De Livera, A.M., R.J. Hyndman, and R.D. Snyder (2011). Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing. Journal of the American Statistical Association 106:496, 1513-1527.