Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Commit

Permalink
Merge pull request #174 from zhenwendai/update_notebooks
Browse files Browse the repository at this point in the history
Updated the VAE and GP regression notebook.
  • Loading branch information
zhenwendai authored May 29, 2019
2 parents bf19138 + d311f86 commit 94f09f9
Show file tree
Hide file tree
Showing 3 changed files with 255 additions and 76 deletions.
195 changes: 178 additions & 17 deletions examples/notebooks/gp_regression.ipynb

Large diffs are not rendered by default.

133 changes: 74 additions & 59 deletions examples/notebooks/variational_auto_encoder.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,25 @@
"source": [
"# Variational Auto-Encoder (VAE)\n",
"\n",
"### Zhenwen Dai (2019-04-23)"
"### Zhenwen Dai (2019-05-29)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Variational auto-encoder (VAE) is a latent variable model that uses a latent variable to generate data represented in vector form. Consider a latent variable $x$ and an observed variable $y$. The plain VAE is defined as\n",
"\\begin{align}\n",
"p(x) =& \\mathcal{N}(0, I) \\\\\n",
"p(y|x) =& \\mathcal{N}(f(x), \\sigma^2I)\n",
"\\end{align}\n",
"where $f$ is the deep neural network (DNN), often referred to as the decoder network.\n",
"\n",
"The variational posterior of VAE is defined as \n",
"\\begin{align}\n",
"q(x) = \\mathcal{N}\\left(g_{\\mu}(y), \\sigma^2_x I)\\right)\n",
"\\end{align}\n",
"where $g_{\\mu}$ is the encoder networks that generate the mean of the variational posterior of $x$. For simplicity, we assume that all the data points share the same variance in the variational posteior. This can be extended by generating the variance also from the encoder network."
]
},
{
Expand Down Expand Up @@ -59,7 +77,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Defintion"
"## Model Defintion\n",
"\n",
"We first define that the encoder and decoder DNN with MXNet Gluon blocks. Both DNNs have two hidden layers with tanh non-linearity."
]
},
{
Expand Down Expand Up @@ -102,16 +122,10 @@
]
},
{
"cell_type": "code",
"execution_count": 7,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"from mxfusion.components.variables.var_trans import PositiveTransformation\n",
"from mxfusion import Variable, Model, Posterior\n",
"from mxfusion.components.functions import MXFusionGluonFunction\n",
"from mxfusion.components.distributions import Normal\n",
"from mxfusion.components.functions.operators import broadcast_to"
"Then, we define the model of VAE in MXFusion. Note that for simplicity in implementation, we use scalar normal distributions defined for individual entries of a Matrix instead of multivariate normal distributions with diagonal covariance matrices."
]
},
{
Expand All @@ -123,19 +137,25 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Model (95c68)\n",
"Variable (b8df3) = BroadcastToOperator(data=Variable noise_var (873b0))\n",
"Variable (36f25) = BroadcastToOperator(data=Variable (399cc))\n",
"Variable (6a234) = BroadcastToOperator(data=Variable (2fe44))\n",
"Variable x (1696c) ~ Normal(mean=Variable (6a234), variance=Variable (36f25))\n",
"Variable f (0d26d) = GluonFunctionEvaluation(decoder_input_0=Variable x (1696c), decoder_dense0_weight=Variable (89315), decoder_dense0_bias=Variable (41eac), decoder_dense1_weight=Variable (b69fe), decoder_dense1_bias=Variable (8e4e6), decoder_dense2_weight=Variable (a99ff), decoder_dense2_bias=Variable (f0361))\n",
"Variable y (5f5c3) ~ Normal(mean=Variable f (0d26d), variance=Variable (b8df3))\n"
"Model (37a04)\n",
"Variable (b92c2) = BroadcastToOperator(data=Variable noise_var (a50d4))\n",
"Variable (39c2c) = BroadcastToOperator(data=Variable (e1aad))\n",
"Variable (b7150) = BroadcastToOperator(data=Variable (a57d4))\n",
"Variable x (53056) ~ Normal(mean=Variable (b7150), variance=Variable (39c2c))\n",
"Variable f (ad606) = GluonFunctionEvaluation(decoder_input_0=Variable x (53056), decoder_dense0_weight=Variable (b9b70), decoder_dense0_bias=Variable (d95aa), decoder_dense1_weight=Variable (73dc2), decoder_dense1_bias=Variable (b85dd), decoder_dense2_weight=Variable (7a61c), decoder_dense2_bias=Variable (eba91))\n",
"Variable y (23bca) ~ Normal(mean=Variable f (ad606), variance=Variable (b92c2))\n"
]
}
],
"source": [
"m = mf.models.Model()\n",
"m.N = mf.components.Variable()\n",
"from mxfusion.components.variables.var_trans import PositiveTransformation\n",
"from mxfusion import Variable, Model, Posterior\n",
"from mxfusion.components.functions import MXFusionGluonFunction\n",
"from mxfusion.components.distributions import Normal\n",
"from mxfusion.components.functions.operators import broadcast_to\n",
"\n",
"m = Model()\n",
"m.N = Variable()\n",
"m.decoder = MXFusionGluonFunction(decoder, num_outputs=1,broadcastable=True)\n",
"m.x = Normal.define_variable(mean=broadcast_to(mx.nd.array([0]), (m.N, Q)),\n",
" variance=broadcast_to(mx.nd.array([1]), (m.N, Q)), shape=(m.N, Q))\n",
Expand All @@ -146,6 +166,13 @@
"print(m)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also define the variational posterior following the equation above."
]
},
{
"cell_type": "code",
"execution_count": 9,
Expand All @@ -155,10 +182,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Posterior (63197)\n",
"Variable x_mean (09eba) = GluonFunctionEvaluation(encoder_input_0=Variable y (5f5c3), encoder_dense0_weight=Variable (81ec2), encoder_dense0_bias=Variable (aa736), encoder_dense1_weight=Variable (3c4ae), encoder_dense1_bias=Variable (1bab5), encoder_dense2_weight=Variable (7b531), encoder_dense2_bias=Variable (84731))\n",
"Variable (f88b7) = BroadcastToOperator(data=Variable x_var (fc12e))\n",
"Variable x (1696c) ~ Normal(mean=Variable x_mean (09eba), variance=Variable (f88b7))\n"
"Posterior (4ec05)\n",
"Variable x_mean (86d22) = GluonFunctionEvaluation(encoder_input_0=Variable y (23bca), encoder_dense0_weight=Variable (51b3d), encoder_dense0_bias=Variable (c0092), encoder_dense1_weight=Variable (ad9ef), encoder_dense1_bias=Variable (83db0), encoder_dense2_weight=Variable (78b82), encoder_dense2_bias=Variable (b856d))\n",
"Variable (6dc84) = BroadcastToOperator(data=Variable x_var (19d07))\n",
"Variable x (53056) ~ Normal(mean=Variable x_mean (86d22), variance=Variable (6dc84))\n"
]
}
],
Expand All @@ -175,16 +202,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Variational Inference"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"from mxfusion.inference import BatchInferenceLoop, StochasticVariationalInference, GradBasedInference"
"## Variational Inference\n",
"\n",
"Variational inference is done via creating an inference object and passing in the stochastic variational inference algorithm."
]
},
{
Expand All @@ -193,18 +213,18 @@
"metadata": {},
"outputs": [],
"source": [
"from mxfusion.inference import BatchInferenceLoop, StochasticVariationalInference, GradBasedInference\n",
"\n",
"observed = [m.y]\n",
"alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)\n",
"infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())"
]
},
{
"cell_type": "code",
"execution_count": 12,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"infr.initialize(y=mx.nd.array(Y))"
"SVI is a gradient-based algorithm. We can run the algorithm by providing the data and specifying the parameters for the gradient optimizer (the default gradient optimizer is Adam)."
]
},
{
Expand All @@ -218,16 +238,16 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 201 loss: 1715.0395507812555\n",
"Iteration 401 loss: 599.86877441406255\n",
"Iteration 601 loss: 177.60995483398438\n",
"Iteration 801 loss: -75.347778320312555\n",
"Iteration 1001 loss: -213.82623291015625\n",
"Iteration 1201 loss: -332.34564208984375\n",
"Iteration 1401 loss: -305.57965087890625\n",
"Iteration 1601 loss: -577.47900390625585\n",
"Iteration 1801 loss: -669.97760009765625\n",
"Iteration 2000 loss: -753.83203125234385"
"Iteration 200 loss: 1720.556396484375\t\t\t\t\t\n",
"Iteration 400 loss: 601.11962890625\t\t\t\t\t\t\t\n",
"Iteration 600 loss: 168.620849609375\t\t\t\t\t\t\n",
"Iteration 800 loss: -48.67474365234375\t\t\t\t\t\n",
"Iteration 1000 loss: -207.34835815429688\t\t\t\t\n",
"Iteration 1200 loss: -354.17742919921875\t\t\t\t\n",
"Iteration 1400 loss: -356.26409912109375\t\t\t\t\n",
"Iteration 1600 loss: -561.263427734375\t\t\t\t\t\t\n",
"Iteration 1800 loss: -697.8665161132812\t\t\t\t\t\n",
"Iteration 2000 loss: -753.83203125\t\t\t\t8\t\t\t\t\t\n"
]
}
],
Expand All @@ -239,30 +259,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot the training data in the latent space"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from mxfusion.inference import TransferInference"
"## Plot the training data in the latent space\n",
"\n",
"Finally, we may be interested in visualizing the latent space of our dataset. We can do that by calling encoder network."
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from mxfusion.inference import TransferInference\n",
"\n",
"q_x_mean = q.encoder.gluon_block(mx.nd.array(Y)).asnumpy()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 18,
"metadata": {},
"outputs": [
{
Expand Down
3 changes: 3 additions & 0 deletions mxfusion/modules/gp_modules/svgp_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,9 @@ def compute(self, F, variables):
kern = self.model.kernel
kern_params = kern.fetch_parameters(variables)

X, Z, noise_var, mu, S_W, S_diag, kern_params = arrays_as_samples(
F, [X, Z, noise_var, mu, S_W, S_diag, kern_params])

S = F.linalg.syrk(S_W) + make_diagonal(F, S_diag)

Kuu = kern.K(F, Z, **kern_params)
Expand Down

0 comments on commit 94f09f9

Please sign in to comment.