This exercise will help one to learn how to build ARIMA/SARIMA models for time series data forecasting. Autoregressive integrated moving average(ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series. ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied one or more times to eliminate the non-stationarity.
This exercise uses the Python packages to explore and analyze the temperature anomalies data, and set up the predictive models to do long-term forecasts.
You will first learn to visualize the time series data and decompose it into trend, seasonality and noise. Next, you will build ARIMA models and SARIMA models under different assumption and evaluate their forecasting performance. The exercise is implemented in Python 3.5.
Your learning objectives are:
You will need to:
Make sure you have installed Python 3.5, Jupyter notebook and the following packages. If you need any guide, check the links below:
You can find sample code in the solution for this exercise is included in temperature_analysis.ipynb.
To analysis time series data, a basic technology is to decompose it into different components, including trend, seasonality, noise(residual). One can see the long-run trend and seasonality more clearly in the decomposition process.
Check https://en.wikipedia.org/wiki/Decomposition_of_time_series for more information.
Stationarity is a required assumption for ARIMA models. Hence before we build ARIMA models, we must make sure we can transform the original time series data into stationary time series data by de-trending and differencing.
Check the link below for the definition of stationarity and de-trending/differencing methods.
https://people.duke.edu/~rnau/411diff.htm
Cross-validation process for time series data is a bit different from the normal data sets, because we can't select the data points randomly otherwise we will break the date/time order.
In order to use continuous training and validation data, we should use the similar strategy as follow:
In this problem, we are dealing with the temperature anomalies data from 1880-01 to 2010-08.
This is a monthly time series data. Our aim is to build ARIMA/SARIMA model to fit the in-sample data and do out-of-sample forecast. In this exercise one will see the limitation of linear models in long-term prediction. One could try modify the sample code to do short term prediction.
After loading the necessary packages and the data. We can start to explore the temperature anomalies data set.
First, we can plot the original time series data. As one can see the temperature anomalies is oscillating around 0 before 1980, and after 1980, the temperature anomalies increases with a positive slope significantly.
Since we have the monthly data as our original data, let's try smooth it using annual moving average methods. Namely, plot each point using the average of the 12 nearest data points instead of the actual value.
The previous one is not smooth enough, let's use a larger moving average window, say using the average in one decade.
In this plot below you can see a much more clear trend of the data. Actually, before 1980, there was still a gradually increasing trend. After 1980 the slope become much steeper.
Now we can also decompose the data into different components: trend, seasonality, and noise.
From below, we can notice a gradually increasing trend and a annual seasonality.
ARIMA model is a linear combination of AR(Autoregressive) model and MA(Moving Average) model. Check this link for more information about ARIMA model.
https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average
We use the ACF and PACF plots to determine the order of ARIMA/SARIMA models.
AR model and MA model requires the time series to be stationary. Hence we need to use detrending or differencing to transform the raw data into a stationary data.
Let's first check the stationarity of the original data using Dickey-Fuller test.
So we need to use detrending/differencing methods for stationarity transformation.
If one use differencing to remove trend, then he/she need to use ARIMA model with constant trend component and d = the order of difference.
Let's take first difference and check stationary using Dickey-Fuller test.
We can also determine the orders for ARIMA/SARIMA model based on the ACF/PACF plots.
Hence, for simplicity we decide to set p = 1 , q = 2 and d = 1 (because we only take first difference to get stationarity) for our ARIMA model.
If one use detrend the data using linear model, then we need to build ARIMA model with linear trend component meaning that the original data it can be transformed into a stationary series by removing the effect of a time trend
Let's build a linear regression model with y= temperature anomalies, x=time, and check the stationarity of the residual.
From the ACF and PACF plots:
(This model use differencing for stationarity transformation.)
Use all data as training data to fit the model and do a long time forecast:
Evaluate out-of-sample forecast performance using last 200 points as test data:
As shown above, the ARIMA model does not perform well both in-sample or out-of sample.
(This model use differencing for stationarity transformation.)
Use all data as training data to fit the model and do a long time forecast:
Evaluate out-of-sample forecast performance using last 200 points as test data:
When we take seasonality into account and use SARIMA, we have a better in-sample fitness.
(This model use detrending for stationarity transformation.)
Use all data as training data to fit the model and do a long time forecast:
Evaluate out-of-sample forecast performance using last 200 points as test data:
(This model use detrending for stationarity transformation.)
Use all data as training data to fit the model and do a long time forecast:
Evaluate out-of-sample forecast performance using last 200 points as test data:
In this notebook, we have built ARIMA models and SARIMA models with non-constant/constant trends under different stationary assumptions. One can see that: