Can JAX handle the derivatives of expectation in statistics? If Yes, how does it work? #4800

shixinxing · 2020-11-05T14:40:36Z

shixinxing
Nov 5, 2020

For example, I have a function F(x), and x is under some kind of distribution p(x, theta), which has some parameters theta to determine the distribution, like mean or covariance in Normal distribution.

Then I compute the expectation of F(x) under p(x, theta) (by using samples generated by random function, then computing their mean), resulting a formula which only has unknown paramters theta.

Can I just use grad() to obtain the derivative of theta? Does it make sense?

Answered by cooijmanstim

Nov 5, 2020

In general, no, and you will need REINFORCE aka the score-function estimator, see e.g. http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/. For location-scale distributions (which includes the normal distribution) you can use a pretty straightforward reparameterization where you sample from standard normal and then scale and shift by the desired stdev and mean. See http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/.

View full answer

cooijmanstim · 2020-11-05T18:37:20Z

cooijmanstim
Nov 5, 2020

In general, no, and you will need REINFORCE aka the score-function estimator, see e.g. http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/. For location-scale distributions (which includes the normal distribution) you can use a pretty straightforward reparameterization where you sample from standard normal and then scale and shift by the desired stdev and mean. See http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/.

4 replies

jeremiecoullon Nov 5, 2020

Though for a simple example (Gaussian data), is this what you mean?

import jax.numpy as jnp
from jax import random, grad

def get_samples(key, theta):
    "theta: mean of a Gaussian with sd=1"
    return random.normal(key, shape=(100,)) + theta

def F(x):
    return x*x

def average_F(key, F, theta):
    return jnp.mean(F(get_samples(key, theta)))

# get gradient of the average
grad_average_F = grad(average_F, 2)



key = random.PRNGKey(0)
theta = 10.

print(average_F(key, F, theta))
print(grad_average_F(key, F, theta))

cooijmanstim Nov 5, 2020

Exactly, that would allow you to differentiate wrt/through the mean. If you also want to differentiate through the standard deviation (as in a VAE), your get_samples would compute mean + stdev * normal(...).

shixinxing Nov 6, 2020
Author

@cooijmanstim @jeremiecoullon
Thank you a lot! The blog really provides me a detailed theoretical explanation, and the codes also give a specific procedure.
I conduct some tests by using random.multivariate_normal() in oder to embedding theta into sampling process:

import jax
import jax.random as random
import jax.numpy as jnp
from jax import grad

# use std as the parameter of a 2-dim normal distribution
def get_samples_std(std,num_samples,key):
  cov = jnp.array([[1.0, std**2],[std**2,1.0]])
  mean = jnp.array([0., 0.])
  samples = random.multivariate_normal(key,mean,cov,shape=(num_samples,))
  return samples

# x1*x2
def F(x):
    return x[:,0]*x[:,1]

def average_F(key, F, std):
    return jnp.mean(F(get_samples_std(std, 1000, key)))

# directly use grad
grad_average_F = grad(average_F, 2)

key = random.PRNGKey(100)
# set the parameter std, theoretical result should be 2*std
std = 0.25
print(average_F(key, F, std))
print(grad_average_F(key, F, std))

The actual result is very close to theoretical analysis:

0.016832203
0.49083382

It seems that we can use grad() directly for multivariate normal distribution without reparameterizing the random variable explicitly.
I am thinking that this is probably not true for other kinds of distribution( When needing to write the score function explicitly, and design the derivative progress by myself).

After that, I wonder if I can get the derivative using its mathematical definition:

eps = 1.e-3
(average_F(key,F,std+eps)-average_F(key,F,std-eps))/2*eps

The actual result is problematic:

DeviceArray(4.908331e-07, dtype=float32)

In fact, I would like to design a gaussian-form variational distribution for sparse GP using JAX, in this procedure I need to compute the derivative of expectation when maximizing the ELBO, I don't know if I can use grad() directly since computing the derivative analytically seems challenging... or if there is another way to figure out this...
Thanks again!

cooijmanstim Nov 6, 2020

It turns out that jax.random.multivariate_normal already implements the reparameterization trick: https://github.com/google/jax/blob/556b146110bf3e0e6276e7181100f9ce0b2876f5/jax/random.py#L704. That's why you could simply use grad() and get the right answer. Notice your finite differences approximation has the right digits but an extra factor eps**-2. You need (...)/2/eps or (...)/(2 * eps) rather than (...)/2*eps which is interpreted as ((...)/2)*eps.

I can't say much about your specific problem, but generally yes it is typical to use grad() on sampled expectations (with reparameterization or REINFORCE). These things are rarely tractable to work out analytically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can JAX handle the derivatives of expectation in statistics? If Yes, how does it work? #4800

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can JAX handle the derivatives of expectation in statistics? If Yes, how does it work? #4800

shixinxing Nov 5, 2020

Replies: 1 comment · 4 replies

cooijmanstim Nov 5, 2020

jeremiecoullon Nov 5, 2020

cooijmanstim Nov 5, 2020

shixinxing Nov 6, 2020 Author

cooijmanstim Nov 6, 2020

shixinxing
Nov 5, 2020

Replies: 1 comment 4 replies

cooijmanstim
Nov 5, 2020

shixinxing Nov 6, 2020
Author