Last Updated: 06/04/2020
A generative adversarial network (GAN) is a class of machine learning frameworks invented by Ian Goodfellow and his colleagues in 2014. Two neural networks contesting with each other in a zero-sum (or a min-max) game. Given a training set, this technique learns to generate new data with the same statistics as the training set.
If such a model is trained on, for example, time-series data of the Vix dataset, it should be able to generate good quality data for the Vix dataset.
We use GAN for Univariate time series forecasting.
POST /gan/simulation?model_name=demo_trail&size=1000&length=15&version=1.0&access_token=xxx Host: https://gan.quantuniversity.com Authorization: xxxxx Content-Type: application/json connection: keep-alive content-length: 322241 content-type: application/json date: Mon,19 Jul 2021 18:28:05 GMT server: nginx/1.14.0 (Ubuntu) Accept-Charset:utf-8
The definition of request is equal to the regular call to a post above. The response is identical except for adding four additional fields
Fields | Type | Description |
model_name | string | Select which dataset you have uploaded. For the current service, we only support the Vix model |
size | int | It specifies the number of simulations you want to generate |
length | int | It defines the forecast period. It specifies how much in the future you want to forecast |
version | float | It indicates the version you are using |
There are additional rules around publishing that each request to this API must respect:
Problems errors:
Error code | Description |
400 Bad Request | Required fields were invalid, not specified |
401 Unauthorized | The access_token is invalid or has been revoked |
422 Validation Error | The given parameter is invalid, please check the spelling |
500 Internal Server Error | Something went wrong on the model side |
|
This section includes an analysis of simulation data, and is divided into the following subsections:
Let's look at some of the plots. The plots are generated using the Vix dataset.
Simulated results are quite different from each other. This implied that the simulation results may capture many different scenarios. The disadvantage would be due to a large number of simulated sequences, it is hard to have generalized insights.
It is recommended to plot the histogram of the first time point. If it is distributed around the latest realized value (prices/index), then at least we can say that simulated data is not unrealistic.
By default, we choose 5 clusters using L2-K-Means clustering to extract different scenarios.
All Clusters behave differently and they all fall into comparable ranges. Simulations do show different scenarios but it is still unclear how to interpret these results.
By default, we choose 5 clusters using Hierarchical Clustering(KL Divergence Affinity) to extract different scenarios. Cluster 0 and 1 presents relatively low volatilities, and cluster 2 and 3 shows medium volatilities. Cluster 4 is the most volatile one among all clusters. Therefore, the Hierarchical Clustering method groups observations based on volatility.
It is impossible and meaningless to analyze each sequence of simulated Prices. Instead, we should focus on some 'major' pattern reflected by the simulated data. In order to capture the 'major' pattern from simulation results, KMeans(L2 distance) and Hierarchical Clustering(KL Divergence Affinity) are applied, and Line Area charts would be implemented for each cluster.
Each cluster could be regarded as a potential scenario for Price. If viewed this way, further scenario analysis could be done for different clusters, which may lead to robust decision-making.