Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for Hilbert space approximation of Gaussian processes #1097

Merged
merged 10 commits into from
Jul 18, 2021

Conversation

omarfsosa
Copy link
Contributor

This example replicates a few of the models in the excellent case
study by Aki Vehtari [1] (originally written using R and Stan).
The case study uses approximate Gaussian processes [2] to model the
relative number of births per day in the US from 1969 to 1988.
The Hilbert space approximation is way faster than the exact Gaussian
processes because it circumvents the need for inverting the
covariance matrix.

The original case study presented by Aki also emphasizes the iterative
process of building a Bayesian model, which is excellent as a pedagogical
resource. Here, however, I replicate only 4 out of all the models available in [1].
There are a few minor differences in the mathematical details of my model GP4,
which I had to make in order for the chains to mix properly. I have clearly
commented on the places where our models are different.

References:
1. Gelman, Vehtari, Simpson, et al (2020), "Bayesian workflow book - Birthdays" <https://avehtari.github.io/casestudies/Birthdays/birthdays.html>.
2. Riutort-Mayol G, Bürkner PC, Andersen MR, et al (2020),
"Practical Hilbert space approximate bayesian gaussian processes for probabilistic programming"

examples/hsgp.py Outdated

if __name__ == "__main__":
args = parse_arguments()
jax.config.update("jax_enable_x64", args.x64)
Copy link
Member

@fehiepsi fehiepsi Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use a wrapper numpyro.enable_x64()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for this tip. I've updated the PR.

fehiepsi
fehiepsi previously approved these changes Jul 16, 2021
Copy link
Member

@fehiepsi fehiepsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is superb, @omarfsosa! I really like your code structure and I also think that you've incorporated the best practices of using (Num)Pyro for modeling. Please let me know if you need me to take a closer look at something.

Could you do a few steps:

  • run make license
  • if you prefer using I in the docstring, do you want to add your name too? You can put "Author" line at any place in the docstring.
  • add your script to rtd index

@omarfsosa
Copy link
Contributor Author

This is superb, @omarfsosa! I really like your code structure and I also think that you've incorporated the best practices of using (Num)Pyro for modeling. Please let me know if you need me to take a closer look at something.

Could you do a few steps:

  • run make license
  • if you prefer using I in the docstring, do you want to add your name too? You can put "Author" line at any place in the docstring.
  • add your script to rtd index

Thanks for the feedback and suggestions @fehiepsi ! I've made some changes according to your recommendations. If you have a better idea for how to get load the data I'd love to hear it. At the moment I'm pulling the csv directly from the repo of the original case study, which feels a bit fragile maybe?

@fehiepsi
Copy link
Member

I guess it is fine to use the link in your script. Currently, we have some load_data utility in datasets that you can mimic.

feels a bit fragile

How about adding a test? :)

Btw, you'll need to run make format to pass CI.

@fehiepsi
Copy link
Member

@omarfsosa One of the test is failing

Traceback (most recent call last):
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 717, in <module>
    main(args)
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 686, in main
    data = model.get_data()
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 596, in get_data
    **self._get_floating_days(data),
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 541, in _get_floating_days
    & data["date"].dt.day.ge(25),
AttributeError: 'DatetimeProperties' object has no attribute 'day_of_week'
test/test_examples.py::test_cpu[hsgp.py --model tywd --num-samples 10 --num-warmup 10 --num-chains 2] Running:
python examples/hsgp.py --model tywd --num-samples 10 --num-warmup 10 --num-chains 2
FAILED

…of 'day_of_week' for compatibility with earlier versions of pandas
@omarfsosa
Copy link
Contributor Author

@omarfsosa One of the test is failing

Traceback (most recent call last):
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 717, in <module>
    main(args)
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 686, in main
    data = model.get_data()
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 596, in get_data
    **self._get_floating_days(data),
  File "/home/runner/work/numpyro/numpyro/examples/hsgp.py", line 541, in _get_floating_days
    & data["date"].dt.day.ge(25),
AttributeError: 'DatetimeProperties' object has no attribute 'day_of_week'
test/test_examples.py::test_cpu[hsgp.py --model tywd --num-samples 10 --num-warmup 10 --num-chains 2] Running:
python examples/hsgp.py --model tywd --num-samples 10 --num-warmup 10 --num-chains 2
FAILED

Thanks @fehiepsi . My tests were all passing locally but, for some reason, they ran using a later version of pandas (1.3 instead of 1.1.5). I've changed the offending lines to use weekday instead of day_of_week and this fixed it for me. I've updated the PR.

@fehiepsi
Copy link
Member

Thanks, @omarfsosa!

@fehiepsi fehiepsi merged commit 339e0c0 into pyro-ppl:master Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants