Merge pull request #174 from zhenwendai/update_notebooks

Updated the VAE and GP regression notebook.
amzn · May 29, 2019 · 94f09f9 · 94f09f9
2 parents bf19138 + d311f86
commit 94f09f9
Show file tree

Hide file tree

Showing 3 changed files with 255 additions and 76 deletions.
diff --git a/examples/notebooks/gp_regression.ipynb b/examples/notebooks/gp_regression.ipynb
diff --git a/examples/notebooks/variational_auto_encoder.ipynb b/examples/notebooks/variational_auto_encoder.ipynb
@@ -6,7 +6,25 @@
    "source": [
     "# Variational Auto-Encoder (VAE)\n",
     "\n",
-    "### Zhenwen Dai (2019-04-23)"
+    "### Zhenwen Dai (2019-05-29)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Variational auto-encoder (VAE) is a latent variable model that uses a latent variable to generate data represented in vector form. Consider a latent variable $x$ and an observed variable $y$. The plain VAE is defined as\n",
+    "\\begin{align}\n",
+    "p(x) =& \\mathcal{N}(0, I) \\\\\n",
+    "p(y|x) =& \\mathcal{N}(f(x), \\sigma^2I)\n",
+    "\\end{align}\n",
+    "where $f$ is the deep neural network (DNN), often referred to as the decoder network.\n",
+    "\n",
+    "The variational posterior of VAE is defined as \n",
+    "\\begin{align}\n",
+    "q(x) = \\mathcal{N}\\left(g_{\\mu}(y), \\sigma^2_x I)\\right)\n",
+    "\\end{align}\n",
+    "where $g_{\\mu}$ is the encoder networks that generate the mean of the variational posterior of $x$. For simplicity, we assume that all the data points share the same variance in the variational posteior. This can be extended by generating the variance also from the encoder network."
    ]
   },
   {
@@ -59,7 +77,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Model Defintion"
+    "## Model Defintion\n",
+    "\n",
+    "We first define that the encoder and decoder DNN with MXNet Gluon blocks. Both DNNs have two hidden layers with tanh non-linearity."
    ]
   },
   {
@@ -102,16 +122,10 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 7,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "from mxfusion.components.variables.var_trans import PositiveTransformation\n",
-    "from mxfusion import Variable, Model, Posterior\n",
-    "from mxfusion.components.functions import MXFusionGluonFunction\n",
-    "from mxfusion.components.distributions import Normal\n",
-    "from mxfusion.components.functions.operators import broadcast_to"
+    "Then, we define the model of VAE in MXFusion. Note that for simplicity in implementation, we use scalar normal distributions defined for individual entries of a Matrix instead of multivariate normal distributions with diagonal covariance matrices."
    ]
   },
   {
@@ -123,19 +137,25 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Model (95c68)\n",
-      "Variable (b8df3) = BroadcastToOperator(data=Variable noise_var (873b0))\n",
-      "Variable (36f25) = BroadcastToOperator(data=Variable (399cc))\n",
-      "Variable (6a234) = BroadcastToOperator(data=Variable (2fe44))\n",
-      "Variable x (1696c) ~ Normal(mean=Variable (6a234), variance=Variable (36f25))\n",
-      "Variable f (0d26d) = GluonFunctionEvaluation(decoder_input_0=Variable x (1696c), decoder_dense0_weight=Variable (89315), decoder_dense0_bias=Variable (41eac), decoder_dense1_weight=Variable (b69fe), decoder_dense1_bias=Variable (8e4e6), decoder_dense2_weight=Variable (a99ff), decoder_dense2_bias=Variable (f0361))\n",
-      "Variable y (5f5c3) ~ Normal(mean=Variable f (0d26d), variance=Variable (b8df3))\n"
+      "Model (37a04)\n",
+      "Variable (b92c2) = BroadcastToOperator(data=Variable noise_var (a50d4))\n",
+      "Variable (39c2c) = BroadcastToOperator(data=Variable (e1aad))\n",
+      "Variable (b7150) = BroadcastToOperator(data=Variable (a57d4))\n",
+      "Variable x (53056) ~ Normal(mean=Variable (b7150), variance=Variable (39c2c))\n",
+      "Variable f (ad606) = GluonFunctionEvaluation(decoder_input_0=Variable x (53056), decoder_dense0_weight=Variable (b9b70), decoder_dense0_bias=Variable (d95aa), decoder_dense1_weight=Variable (73dc2), decoder_dense1_bias=Variable (b85dd), decoder_dense2_weight=Variable (7a61c), decoder_dense2_bias=Variable (eba91))\n",
+      "Variable y (23bca) ~ Normal(mean=Variable f (ad606), variance=Variable (b92c2))\n"
      ]
     }
    ],
    "source": [
-    "m = mf.models.Model()\n",
-    "m.N = mf.components.Variable()\n",
+    "from mxfusion.components.variables.var_trans import PositiveTransformation\n",
+    "from mxfusion import Variable, Model, Posterior\n",
+    "from mxfusion.components.functions import MXFusionGluonFunction\n",
+    "from mxfusion.components.distributions import Normal\n",
+    "from mxfusion.components.functions.operators import broadcast_to\n",
+    "\n",
+    "m = Model()\n",
+    "m.N = Variable()\n",
     "m.decoder = MXFusionGluonFunction(decoder, num_outputs=1,broadcastable=True)\n",
     "m.x = Normal.define_variable(mean=broadcast_to(mx.nd.array([0]), (m.N, Q)),\n",
     "                             variance=broadcast_to(mx.nd.array([1]), (m.N, Q)), shape=(m.N, Q))\n",
@@ -146,6 +166,13 @@
     "print(m)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We also define the variational posterior following the equation above."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 9,
@@ -155,10 +182,10 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Posterior (63197)\n",
-      "Variable x_mean (09eba) = GluonFunctionEvaluation(encoder_input_0=Variable y (5f5c3), encoder_dense0_weight=Variable (81ec2), encoder_dense0_bias=Variable (aa736), encoder_dense1_weight=Variable (3c4ae), encoder_dense1_bias=Variable (1bab5), encoder_dense2_weight=Variable (7b531), encoder_dense2_bias=Variable (84731))\n",
-      "Variable (f88b7) = BroadcastToOperator(data=Variable x_var (fc12e))\n",
-      "Variable x (1696c) ~ Normal(mean=Variable x_mean (09eba), variance=Variable (f88b7))\n"
+      "Posterior (4ec05)\n",
+      "Variable x_mean (86d22) = GluonFunctionEvaluation(encoder_input_0=Variable y (23bca), encoder_dense0_weight=Variable (51b3d), encoder_dense0_bias=Variable (c0092), encoder_dense1_weight=Variable (ad9ef), encoder_dense1_bias=Variable (83db0), encoder_dense2_weight=Variable (78b82), encoder_dense2_bias=Variable (b856d))\n",
+      "Variable (6dc84) = BroadcastToOperator(data=Variable x_var (19d07))\n",
+      "Variable x (53056) ~ Normal(mean=Variable x_mean (86d22), variance=Variable (6dc84))\n"
      ]
     }
    ],
@@ -175,16 +202,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Variational Inference"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from mxfusion.inference import BatchInferenceLoop, StochasticVariationalInference, GradBasedInference"
+    "## Variational Inference\n",
+    "\n",
+    "Variational inference is done via creating an inference object and passing in the stochastic variational inference algorithm."
    ]
   },
   {
@@ -193,18 +213,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "from mxfusion.inference import BatchInferenceLoop, StochasticVariationalInference, GradBasedInference\n",
+    "\n",
     "observed = [m.y]\n",
     "alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)\n",
     "infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 12,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "infr.initialize(y=mx.nd.array(Y))"
+    "SVI is a gradient-based algorithm. We can run the algorithm by providing the data and specifying the parameters for the gradient optimizer (the default gradient optimizer is Adam)."
    ]
   },
   {
@@ -218,16 +238,16 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Iteration 201 loss: 1715.0395507812555\n",
-      "Iteration 401 loss: 599.86877441406255\n",
-      "Iteration 601 loss: 177.60995483398438\n",
-      "Iteration 801 loss: -75.347778320312555\n",
-      "Iteration 1001 loss: -213.82623291015625\n",
-      "Iteration 1201 loss: -332.34564208984375\n",
-      "Iteration 1401 loss: -305.57965087890625\n",
-      "Iteration 1601 loss: -577.47900390625585\n",
-      "Iteration 1801 loss: -669.97760009765625\n",
-      "Iteration 2000 loss: -753.83203125234385"
+      "Iteration 200 loss: 1720.556396484375\t\t\t\t\t\n",
+      "Iteration 400 loss: 601.11962890625\t\t\t\t\t\t\t\n",
+      "Iteration 600 loss: 168.620849609375\t\t\t\t\t\t\n",
+      "Iteration 800 loss: -48.67474365234375\t\t\t\t\t\n",
+      "Iteration 1000 loss: -207.34835815429688\t\t\t\t\n",
+      "Iteration 1200 loss: -354.17742919921875\t\t\t\t\n",
+      "Iteration 1400 loss: -356.26409912109375\t\t\t\t\n",
+      "Iteration 1600 loss: -561.263427734375\t\t\t\t\t\t\n",
+      "Iteration 1800 loss: -697.8665161132812\t\t\t\t\t\n",
+      "Iteration 2000 loss: -753.83203125\t\t\t\t8\t\t\t\t\t\n"
      ]
     }
    ],
@@ -239,30 +259,25 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Plot the training data in the latent space"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from mxfusion.inference import TransferInference"
+    "## Plot the training data in the latent space\n",
+    "\n",
+    "Finally, we may be interested in visualizing the latent space of our dataset. We can do that by calling encoder network."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 17,
    "metadata": {},
    "outputs": [],
    "source": [
+    "from mxfusion.inference import TransferInference\n",
+    "\n",
     "q_x_mean = q.encoder.gluon_block(mx.nd.array(Y)).asnumpy()"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {

diff --git a/mxfusion/modules/gp_modules/svgp_regression.py b/mxfusion/modules/gp_modules/svgp_regression.py
@@ -224,6 +224,9 @@ def compute(self, F, variables):
         kern = self.model.kernel
         kern_params = kern.fetch_parameters(variables)
 
+        X, Z, noise_var, mu, S_W, S_diag, kern_params = arrays_as_samples(
+            F, [X, Z, noise_var, mu, S_W, S_diag, kern_params])
+
         S = F.linalg.syrk(S_W) + make_diagonal(F, S_diag)
 
         Kuu = kern.K(F, Z, **kern_params)