Last Updated: 06/04/2020

What is a Generative Adversarial Network (GAN)?

A generative adversarial network (GAN) is a class of machine learning frameworks invented by Ian Goodfellow and his colleagues in 2014. Two neural networks contesting with each other in a zero-sum (or a min-max) game. Given a training set, this technique learns to generate new data with the same statistics as the training set.

If such a model is trained on, for example, time-series data of the Vix dataset, it should be able to generate good quality data for the Vix dataset.

Use Case

We use GAN for Univariate time series forecasting.

What will you build?

What will you learn?

Example Request:

  1. Get the Simulated data from given data set
POST /gan/simulation?model_name=demo_trail&size=1000&length=15&version=1.0&access_token=xxx
Host: https://gan.quantuniversity.com
Authorization: xxxxx
Content-Type: application/json
connection: keep-alive
content-length: 322241
content-type: application/json
date: Mon,19 Jul 2021 18:28:05 GMT
server: nginx/1.14.0 (Ubuntu)
Accept-Charset:utf-8

The definition of request is equal to the regular call to a post above. The response is identical except for adding four additional fields

Fields

Type

Description

model_name

string

Select which dataset you have uploaded. For the current service, we only support the Vix model

size

int

It specifies the number of simulations you want to generate

length

int

It defines the forecast period. It specifies how much in the future you want to forecast

version

float

It indicates the version you are using

There are additional rules around publishing that each request to this API must respect:

  1. You should get access_token from QuUniversity and use that token to query every APIs it provides. Please the link: https://academy.qusandbox.com

Problems errors:

Error code

Description

400 Bad Request

Required fields were invalid, not specified

401 Unauthorized

The access_token is invalid or has been revoked

422 Validation Error

The given parameter is invalid, please check the spelling

500 Internal Server Error

Something went wrong on the model side

  • You can select which model to run based on the name tag. We train the model, and it supports two datasets now (Vix data and daily price data)
  • A number of sequences specify the number of simulations you want to generate.
  • Sequence length defines the forecast period. It specifies how much in the future you want to forecast. (The maximum value can be the window length that you specified in the training module)

This section includes an analysis of simulation data, and is divided into the following subsections:

Let's look at some of the plots. The plots are generated using the Vix dataset.

Simulation Samples

Simulated results are quite different from each other. This implied that the simulation results may capture many different scenarios. The disadvantage would be due to a large number of simulated sequences, it is hard to have generalized insights.

Histogram/Distribution of observations at a single timestep

It is recommended to plot the histogram of the first time point. If it is distributed around the latest realized value (prices/index), then at least we can say that simulated data is not unrealistic.

K-Means Clustering of Scenarios

By default, we choose 5 clusters using L2-K-Means clustering to extract different scenarios.

All Clusters behave differently and they all fall into comparable ranges. Simulations do show different scenarios but it is still unclear how to interpret these results.

Hierarchical Clustering of scenarios

By default, we choose 5 clusters using Hierarchical Clustering(KL Divergence Affinity) to extract different scenarios. Cluster 0 and 1 presents relatively low volatilities, and cluster 2 and 3 shows medium volatilities. Cluster 4 is the most volatile one among all clusters. Therefore, the Hierarchical Clustering method groups observations based on volatility.

Clustering Comparison

It is impossible and meaningless to analyze each sequence of simulated Prices. Instead, we should focus on some 'major' pattern reflected by the simulated data. In order to capture the 'major' pattern from simulation results, KMeans(L2 distance) and Hierarchical Clustering(KL Divergence Affinity) are applied, and Line Area charts would be implemented for each cluster.

Each cluster could be regarded as a potential scenario for Price. If viewed this way, further scenario analysis could be done for different clusters, which may lead to robust decision-making.