Make subsample faster in CPU #865

fehiepsi · 2021-01-06T22:47:51Z

This PR is motivated by @insperatum in a forum topic and Jake's answer here on the differences between PyTorch CPU and jax implementation.

Benchmarks

import numpyro
from jax import random, jit

size = 100000
subsample_size = 1000

@jit
def subsample_fn(rng_key):
    return numpyro.primitives._subsample_fn(size, subsample_size, rng_key)

key0, key1 = random.PRNGKey(0), random.PRNGKey(1)
x = subsample_fn(key0).copy()
%time x = subsample_fn(key1).copy()

returns

CPU times: user 262 µs, sys: 52 µs, total: 314 µs
Wall time: 253 µs

while in PyTorch %time y = torch.randperm(size)[:subsample_size], it took

CPU times: user 26.5 ms, sys: 1.37 ms, total: 27.9 ms
Wall time: 2.51 ms

and in previous implementation, it took

CPU times: user 60.2 ms, sys: 269 µs, total: 60.4 ms
Wall time: 54.1 ms

The reason for high performance comparing to PyTorch is in PyTorch, we took a permutation of full size first, then collecting a subset. Here, we only take a permutation of size subsample_size. This is observed by @fritzo at this discussion.

fritzo

Nice work! We'll have to port this to Pyro if anyone complains about slow subsampling speed there.

fritzo · 2021-01-07T01:22:36Z

Do you need to ensure double precision so that random.uniform will sample sets larger than 2**24 ~ 16million?

fehiepsi · 2021-01-07T05:00:34Z

Thanks for reviewing and pointing that issue out, @fritzo! I have switched to randint instead of using uniform. It is a bit slower 451 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) but is still pretty fast and does not suffer from the precision issue. In JAX, precision is only decided one time, at the beginning of a program, so we can't switch between two modes in the implementation.

fritzo

Sorry I lost track of this!

fehiepsi added 3 commits January 6, 2021 16:14

improve the speed of subsample

9b98d5b

add reference

6e20dd9

also use subsample in gibbs_fn

3156cc8

fehiepsi requested a review from fritzo January 6, 2021 22:48

fehiepsi added the awaiting review label Jan 6, 2021

fritzo previously approved these changes Jan 7, 2021

View reviewed changes

make sure that subsample set does not suffer by single precision issue

fbecfc8

fehiepsi dismissed fritzo’s stale review via fbecfc8 January 7, 2021 04:58

fehiepsi added 2 commits January 8, 2021 22:35

merge master

03b9f85

merge master

cc35a1e

fehiepsi modified the milestones: 0.5.1, 0.5 Jan 16, 2021

fritzo approved these changes Jan 17, 2021

View reviewed changes

fritzo merged commit 4f0f499 into pyro-ppl:master Jan 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make subsample faster in CPU #865

Make subsample faster in CPU #865

fehiepsi commented Jan 6, 2021

fritzo left a comment

fritzo commented Jan 7, 2021 •

edited

Loading

fehiepsi commented Jan 7, 2021 •

edited

Loading

fritzo left a comment

Make subsample faster in CPU #865

Make subsample faster in CPU #865

Conversation

fehiepsi commented Jan 6, 2021

Benchmarks

fritzo left a comment

Choose a reason for hiding this comment

fritzo commented Jan 7, 2021 • edited Loading

fehiepsi commented Jan 7, 2021 • edited Loading

fritzo left a comment

Choose a reason for hiding this comment

fritzo commented Jan 7, 2021 •

edited

Loading

fehiepsi commented Jan 7, 2021 •

edited

Loading