Last Updated: 06/03/2020

What is Bootstrapping?

Bootstrapping is any test or metric that uses random sampling with replacement. In time series, block bootstrapping is commonly used to capture structural dependency. Three block bootstrapping methods are introduced.

Use Case

We use Bootstrapping for Univariate time series forecasting.

What will you build?

What will you learn?

Getting Data

We can fetch the data in 2 possible ways:

The data has values (prices, indexes etc.) and corresponding timestamp.

Restructuring data for ingestion

Since Bootstrapping works with stationary data, and price values are not stationary. It is necessary to convert price into return.

The data is automatically converted from its original values into returns .

Apart from this, this section also provides functionality for clipping the data between start_date and end_date.

And users can choose one of the three bootstrapping methods above.

Training

Start and End date can be used to clip the dataset as per the usage.

Forecasting Horizon

Forecast period specifies how much in future you want to forecast, determined by Start and End date.

Model Parameters[3]

  • Moving: Moving Block method.
  • Circular: Circular Block Bootstrap
  • Stationary: Stationary Bootstrap
  • Optimal Length: Average block size to use if block size is random(Stationary). Or fixed block size to use.

About model

The model comprises an three components:

Model Result

The graph of price, which is converted back from returns, shows that most simulations capture the uptrending of stock movement. This is due to the historical data showing an upward trend, and bootstrapping method taking historical data would also show upward trending.

This section includes analysis over simulation data, and divided into the following subsections:

Simulations specify if you want to apply the following analysis over return or price.

Simulation Samples

Take a look at some randomly selected simulations.Simulated results are quite different from each other. This implied that the simulation results may capture many different scenarios. The disadvantage would be due to the large number of simulated sequences, it is hard to have generalized insights.

Histogram/Distribution of observations at a single timestep

It is recommended to plot the histogram of the first time point. If it is distributed around the latest realized value (prices/index), then at least we can say that simulated data is not unrealistic.

Distributions of entire forecasting periods

The middle line is the mean value of all simulations while the filled area is determined by the standard deviation of simulations at each time step. Due to bootstrapping is in nature a random copy of historical data and thus the expected value and variance of simulation would be consistent overtime.

K-Means Clustering of Scenarios

By default , we choose 5 clusters using L2-K-Means clustering to extract different scenarios. Other than cluster 3, the other clusters are almost the same in terms of variance and mean value. Even cluster 3 is only a bit more volatile. The reason is that all of the 5 clusters follow the same distribution and show present similar behaviour.

Hierarchical Clustering of scenarios

By default , we choose 5 clusters using Hierarchical Clustering(KL Divergence Affinity) to extract different scenarios. The same conclusion from KMeans also applies here. The only difference is that Hierarchical Clustering tends to throw away one outlier instead of a group of outliers.

.

Clustering Comparison

Clustering is a method allowing us to focus on major patterns reflected by the simulation. KMeans seems only able to identify outliers, almost impossible situations but KLD Hierarchical Clustering is able to separate scenarios based on the volatility imply stable and a little less stable future is most likely to happen.