Implement pyro.ops.streaming module #2856

fritzo · 2021-06-02T18:31:09Z

Addresses #2843

This implements a new module pyro.ops.streaming to streamingly track various statistics. The first intended use case is the planned StreamingMCMC class which will track statistics rather than store samples. There are other potential uses in high-dimensional inference, e.g. recording statistics of gradients during SVI and computing sample moments from predictive when the samples don't fit in memory.

Design choices

The two basic operations are .update() and .get(). The third operation .merge() will be useful for multiple-chain MCMC and computing things like rhat.

I have restricted to the data type to dictionaries of tensors, which is the basic datatype in pyro.infer.mcmc and in much of NumPyro. We could easily generalize this to pytrees by adding classes StatsOfList and StatsOfTuple.

Tested

tested commutativity of update-get
tested commutativity and associativity of update-merge-get
ran mypy locally

fritzo · 2021-06-02T21:22:22Z

@eb8680 I've added you as a reviewer because these streaming classes create a new semigroup abstraction and you're the resident algebra expert.

fritzo · 2021-06-02T21:49:41Z

@mtsokol I believe you can use something like the following statistics in #2843:

from pyro.ops.streaming import CountMeanVariance, StatsOfDict

...
stats = StatsOfDict(default=CountMeanVariance)
for mcmc_sample in ...:  # learning loop
    stats.update({
        name: transformed_sample for name, transformed_sample in mcmc_sample.items()
    })
result = stats.get()

Let me know if it looks like you'll need any changes to this PR.

eb8680

Neat API! I'm a little confused by some of the types. Did you try running mypy locally?

pyro/ops/streaming.py

eb8680 · 2021-06-03T04:13:15Z

pyro/ops/streaming.py

+        self.count += 1
+
+    def merge(self, other: "CountStats"):
+        assert isinstance(other, type(self))


nit: these assertions should no longer be necessary with type hints

Hmm, good point more generally. However I'd like to argue that we should include both type hints and assertions until all common tools can leverage type hints. My reasoning is that I'd really like to catch errors as early as possible, e.g. when users (like me) are working in a jupyter notebook. I think until Jupyter dynamically checks types while editing we'll want extra guard rails especially for tricky interfaces like this.

eb8680 · 2021-06-03T04:16:22Z

pyro/ops/streaming.py

+    """
+    def __init__(
+        self,
+        types: Dict[object, Type[StreamingStats]] = {},


Can this be strengthened to Dict[str, Type[StreamingStats]]?

I strengthened to Hashable, but I think we do want to support e.g. integer keys among chains.

pyro/ops/streaming.py

fritzo · 2021-06-03T13:30:40Z

@eb8680 thanks for reviewing!

I'm a little confused by some of the types. Did you try running mypy locally?

Sorry, I didn't run mypy locally, and some of the types are stale after refactoring. I'll fix... UPDATE ...fixed and ran mypy locally.

mtsokol · 2021-06-03T15:21:21Z

Current #2857 draft isn't chain-aware and I'm wondering about it. It can be either handled by pyro.ops.streaming, e.g.

class CountMeanStats(StreamingStats):
    def __init__(self, num_chains=1):
        self.counts = [0] * num_chains
        ...

    def update(self, sample, chain_index=0):
        ...

    def get(self, group_by_chain=True):
        # we can sum across chains

so the update in StreamingMCMC would be easy:

self._statistics.update({
    name: transformed_sample for name, transformed_sample in z_acc.items()
}, chain_index)

Otherwise it can be handled by StreamingMCMC with e.g. separate CountMeanStats for each chain that can be returned or merged into one if group_by_chain=False somewhere in summary.
WDYT?

fritzo · 2021-06-03T15:44:09Z

@mtsokol I think it's best to keep chain logic in the StreamingMCMC class so as to keep StreamingStats subclasses as simple as possible (and hence easy to extend by creating new subclasses). However I think with the latest couple commits you can easily separate by chain by making either a nested StatsOfDict or using keys of the form (chain_id, site["name"]). Let me know if you have any ideas about changing the StreamingStats interface to make this easier.

eb8680

LGTM after merge conflicts are resolved

fritzo · 2021-06-04T22:45:08Z

Thanks for reviewing @eb8680! Looks like I'll be using this right away in my mutation models 😄

Implement first version of pyro.ops.streaming

dff7517

fritzo added enhancement awaiting review labels Jun 2, 2021

fritzo requested a review from eb8680 June 2, 2021 18:31

fritzo added WIP and removed awaiting review labels Jun 2, 2021

fritzo added 2 commits June 2, 2021 15:16

Refactor to simplify vector stat implementation

27ea8c7

Fix typo

fe7ffb8

fritzo added awaiting review and removed WIP labels Jun 2, 2021

fritzo mentioned this pull request Jun 2, 2021

FR Streaming MCMC interface for big models #2843

Open

7 tasks

eb8680 reviewed Jun 3, 2021

View reviewed changes

fritzo added 3 commits June 3, 2021 08:40

Fix types; add StackStats

212a9e8

Fix doctests

93d748e

Refine type hints

b0f1096

fritzo added 3 commits June 3, 2021 09:26

Relax StatsOfDict key type

9cd53c8

Fix tests

9072002

Relax type hints to StatsOfDict

7a4b568

fritzo added 2 commits June 3, 2021 09:59

Merge branch 'dev' into ops-streaming

c684f4d

Enable type checking for pyro.ops.streaming

7e53700

fritzo mentioned this pull request Jun 3, 2021

Type Hints for Optim module and Setup MyPy as part of CI #2853

Merged

eb8680 previously approved these changes Jun 4, 2021

View reviewed changes

Merge branch 'dev' into ops-streaming

171e762

fritzo dismissed eb8680’s stale review via 171e762 June 4, 2021 19:56

eb8680 approved these changes Jun 7, 2021

View reviewed changes

eb8680 merged commit 9bcaa38 into dev Jun 7, 2021

eb8680 deleted the ops-streaming branch June 7, 2021 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pyro.ops.streaming module #2856

Implement pyro.ops.streaming module #2856

fritzo commented Jun 2, 2021 •

edited

Loading

fritzo commented Jun 2, 2021

fritzo commented Jun 2, 2021 •

edited

Loading

eb8680 left a comment

eb8680 Jun 3, 2021

fritzo Jun 3, 2021

eb8680 Jun 3, 2021

fritzo Jun 3, 2021

fritzo commented Jun 3, 2021 •

edited

Loading

mtsokol commented Jun 3, 2021 •

edited

Loading

fritzo commented Jun 3, 2021 •

edited

Loading

eb8680 left a comment

fritzo commented Jun 4, 2021

Implement pyro.ops.streaming module #2856

Implement pyro.ops.streaming module #2856

Conversation

fritzo commented Jun 2, 2021 • edited Loading

Design choices

Tested

fritzo commented Jun 2, 2021

fritzo commented Jun 2, 2021 • edited Loading

eb8680 left a comment

Choose a reason for hiding this comment

eb8680 Jun 3, 2021

Choose a reason for hiding this comment

fritzo Jun 3, 2021

Choose a reason for hiding this comment

eb8680 Jun 3, 2021

Choose a reason for hiding this comment

fritzo Jun 3, 2021

Choose a reason for hiding this comment

fritzo commented Jun 3, 2021 • edited Loading

mtsokol commented Jun 3, 2021 • edited Loading

fritzo commented Jun 3, 2021 • edited Loading

eb8680 left a comment

Choose a reason for hiding this comment

fritzo commented Jun 4, 2021

fritzo commented Jun 2, 2021 •

edited

Loading

fritzo commented Jun 2, 2021 •

edited

Loading

fritzo commented Jun 3, 2021 •

edited

Loading

mtsokol commented Jun 3, 2021 •

edited

Loading

fritzo commented Jun 3, 2021 •

edited

Loading