Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new sampler: drghmc #39

Merged
merged 29 commits into from
Oct 30, 2023
Merged

new sampler: drghmc #39

merged 29 commits into from
Oct 30, 2023

Conversation

gil2rok
Copy link
Collaborator

@gil2rok gil2rok commented Jul 11, 2023

Implemented sampler for (Probabilistic) Delayed Rejection Generalized HMC with basic test.

This pull request is a cleaner version of my previous pull request #38 , which contained a messy commit history.

@codecov-commenter
Copy link

codecov-commenter commented Jul 11, 2023

Codecov Report

Merging #39 (403c8ca) into main (d32dcb5) will increase coverage by 0.57%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #39      +/-   ##
==========================================
+ Coverage   98.01%   98.59%   +0.57%     
==========================================
  Files          11       12       +1     
  Lines         302      426     +124     
==========================================
+ Hits          296      420     +124     
  Misses          6        6              
Files Coverage Δ
bayes_kit/__init__.py 100.00% <100.00%> (ø)
bayes_kit/drghmc.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gil2rok gil2rok added the enhancement New feature or request label Jul 11, 2023
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to start a review, but I'd be much more comfortable if @WardBrian or @jsoules could chime in on the Python and hopefully @modichirag will chime in on the algrorithm.

Sorry there are so many comments, but given this is your first PR, a lot of this stuff's coming up for the first time. Most of my comments are on doc and doc style. Usually you want the doc to be thorough enough to suggest tests.

You are going to need to add a bunch more unit tests to cover all the code. You really want to get this up to 100% test coverage including testing all the exceptional edge cases are throwing the right type of exception.

bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved

detailed_balance = (
(logp_prop - logp_cur)
+ (hastings_prop - hastings_cur)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the right way around? Usually the density ratio and Hastings corrections are reversed in numerator and denominator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is the right away around.

These lines of code implement the equation below, taken from your paper with Chirag. The key thing to note is that $x$ represents the current draw and $y$ represents the proposed draw.

Thus I match my code to the paper's equation:

  • logp_prop = $\pi(y)$
  • logp_cur = $\pi(x)$
  • hastings_prop = $\prod_i [1 - \widetilde{\alpha}_i(y) ]$
  • hastings_cur = $\prod_i [1 - \widetilde{\alpha}_i(x) ]$
Screenshot 2023-08-07 at 11 19 19 AM

detailed_balance = (
(logp_prop - logp_cur)
+ (hastings_prop - hastings_cur)
+ (pdr_prop - pdr_cur)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]
Why is the probabilistic delayed rejection a difference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equation below is the acceptance probability for probabilistic delayed rejection, taken from your paper with Chirag. In it, $x$ is the current draw, $y$ is the proposed draw, $\pi$ is the joint log density, $\alpha$ is the acceptance probability, and $p$ is the probability of attempting, or retrying, another proposal upon rejection.

Afte rejection, another proposal is made with probability $p$. To we maintain detailed balance, the acceptance probability is as below.

This acceptance probability includes a term in the numerator and denominator, $p$, which becomes a difference on the log scale.

Screenshot 2023-08-07 at 12 08 47 PM

bayes_kit/drghmc.py Outdated Show resolved Hide resolved
test/test_drghmc.py Show resolved Hide resolved
@gil2rok
Copy link
Collaborator Author

gil2rok commented Jul 12, 2023

Thanks for the (super detailed!) feedback.

I plan to:

  1. change the documentation as per @bob-carpenter 's comments
  2. change my use of | as a Union operator as per @WardBrian's comment
  3. add support for gradient cacheing as per @modichirag 's in-person discussion.

FYI: over the next few weeks I will be prioritizing my actual research project so progress on these changes will be a bit slower. However, I am still excited to continue working on this.

Copy link
Collaborator

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to beat up on you further, but here's some additional notes. (I started this last night, apologies for anything repetitive/inconsistent.)

Feel free to let me know if you'd like clarifications or suggestions for anything!

bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
@gil2rok
Copy link
Collaborator Author

gil2rok commented Aug 8, 2023

I have addressed every comment by (1) resolving it after incorporating the feedback or (2) responding with a question or clarifying what I meant. Feel free to resolve the comment if, after my response, there is nothing left to discuss.

A few more questions:

  1. I want to clarify notation:

    • theta represents model parameters and rho represents auxiliary momentum
    • a draw is a tuple of model parameters theta and its log density
    • a sample is a sequence of draws (as @bob-carpenter commented above)

    Then what term should I use to refer to (theta, rho) in the extended phase space when writing the DRGHMC docstrings? I want to be able to refer to the current (theta, rho), proposed (theta, rho), and ghost (theta, rho).

  2. What tests do you recommend I include? I am already testing for:

    • The number of density + gradient evaluations
    • Dummy examples with standard normal and binomial
    • Various errors are thrown correctly
  3. I want to confirm the correct implementation of the leapfrog integrator in the format @bob-carpenter wrote in an earlier comment here.

    Bob wrote that his implementation should save a gradient evaluation and "avoids having to save that grad value out of the loop". However, I am not sure if my implementation does this.

    # pos = position
    # mo = momentum
    # grad = gradient
    
    def leapfrog(pos, mo):
    	grad = get_grad(pos)
    	mo_mid = mo + (0.5 * stepsize) * grad
    	pos += stepsize * mo_mid
    	
    	for i in range(stepcount - 1):
    		grad = get_grad(pos)
    		mo_mid += stepsize * grad
    		pos += stepsize * mo_mid
    
    	grad = get_grad(pos)
    	mo = mo_mid + (0.5 * stepsize) * grad
    	return (pos, mo)

@bob-carpenter
Copy link
Collaborator

1a. theta and rho are fine with me
1b. A draw consists of both a theta and a rho.
1c. Yes, given the correction to 1b.

The point is that if we have draws of multiple variables, we can throw some of the variables away to get marginal draws from the remaining variable. Usually we only care about the theta draws for downstream Bayesian inference.

(theta, rho) is just the phase space (no "extended") and it's conventional to just use the pair and not try to reduce to a single variable.

  1. You want to test all the boundary conditions of all your algorithms. And otherwise provide a range of tests that test all the behavior for "regular" inputs. I wouldn't call normal and binomial examples "dummy".

  2. Yes, that looks good. You can reason through the algorithm and verify inductively that there's only one gradient eval for each position instead of an extra one as the original implementation had.

@gil2rok
Copy link
Collaborator Author

gil2rok commented Aug 13, 2023

All comments are addressed, code coverage is 100%, everything is properly formatted and typed (thanks to @WardBrian for help on this) and all tests are passing (except for specific functions in test_iat or test_drghmc that sometimes fails depending on the rng seed).

I would love some more feedback: what else is needed for this PR to be merged?

Copy link
Collaborator

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Gilad:

Given this is your first big PR, I went at it pretty hard in terms of how to formulate documentation. This is a lot of little changes, but none of them really touch code.

I'm just listing this as a Comment so that you don't need my approval to merge, but I'm happy to look at again if you'd like more feedback.

For external reinforcement of the ideas I'm trying to lay out, you can see Google's doc guide for Python: https://google.github.io/styleguide/pyguide.html#s3.8-comments-and-docstrings

bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
bayes_kit/drghmc.py Outdated Show resolved Hide resolved
test/test_drghmc.py Outdated Show resolved Hide resolved
@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 23, 2023

A couple general questions/notes:

  1. What is the recommended line length? The black code formatter seems to suggest 88 characters, while Google's style guide recommends 80 characters in docstrings (link is found here ). My code currently adheres to the 88 character line limit.
  2. The Google style guide recommends every function docstring starts with a one line summary. Some of @bob-carpenter's comments on my one line docstring summaries are much clearer than what I wrote, but are much longer than this 88 character limit. Wanted to briefly explain why I will not just be accepting these suggestions as is, and instead will edit them down.
  3. I want to ensure I am using the correct terminology for draws. As per, new sampler: drghmc #39 (comment), @bob-carpenter wrote that a draw is a tuple of a position variable and momentum variable, denoted by (theta, rho). If this is so, why does the DrawAndLogP type define a draw as a single VectorType variable theta instead of the tuple (theta, rho)? This seems inconsistent.

@WardBrian
Copy link
Collaborator

My two cents:

  1. recommended line length

I think 88 characters is good for code and comments

2. one line summary.

I tend to focus more on it being one (preferably simple) sentence rather than exactly one line. I think the spirit of the guideline is just that there should be a summary to start

…ce(), rename stepsizes to step_sizes, rename stepcounts to step_counts, change list type to sequence type.
@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 26, 2023

Yesterday I met with @WardBrian (huge thank you!) to finalize some changes, summarized below, that address the remaining comments from @bob-carpenter :

  1. Bayesian Analysis journal name is now italisized.
  2. leapfrog_stepsizes is renamed to leapfrog_step_sizes. Also, leapfrog_stepcounts is renamed to leapfrog_step_counts.
  3. Validation logic is reworked such that my validation functions no longer return their input unnecessarily.
  4. leapfrog_step_sizes can now be a list containingint or float values. This is because I changed its container data type from a list to sequence.

I am now formatting the code + updating my tests. Afterwards, can @bob-carpenter review it one last time to confirm it is ready to merge?

@bob-carpenter
Copy link
Collaborator

Yes, happy to review. Just let me know when it's ready. This is something where it's probably good for both me and @WardBrian to review for the sampler and Python style respectively.

@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 26, 2023

Quick question: for typing reasons, leapfrog_step_sizes and leapfrog_step_counts are of type Sequence[float] and Sequence[int] respectively.

In the documentation should I refer to these as lists or sequences? The term list is simple to understand, but the term sequence is more technically precise.

For example, should the error message read "expected leapfrog_step_counts to be a list" or should the error message read "expected leapfrog_step_counts to be a sequence"?

@bob-carpenter
Copy link
Collaborator

I would use "sequence" in the written doc (not capitalized---common nouns are never capitalized in English). The type will show up in the doc as Sequence to reinforce that. List is just one of many types of Sequence. But tuples and numpy arrays are also of type Sequence.

Copy link
Collaborator

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When @gil2rok and I sat down to discuss the remaining issues we already discussed the Python style (e.g. #42).

The few comments I think I would leave at this point end up going against some of the things @bob-carpenter suggested earlier, so I am happy to accept this PR if he is and have a separate discussion on certain conventions later, since they're more about the entire repo than this PR

@WardBrian
Copy link
Collaborator

(@gil2rok : Your tests are raising a type error currently, which should probably be fixed before actually merging 😉 )

@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 27, 2023

@WardBrian I am aware of the type issues!

I am fixing them on my local machine and also updating the tests. I should push these changes to the repo later today!

@bob-carpenter
Copy link
Collaborator

@WardBrian says:

The few comments I think I would leave at this point end up going against some of the things @bob-carpenter suggested earlier

@WardBrian is better than me at Python, so if he has conflicting advice w.r.t. Python, you should go with his suggestions.

@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 27, 2023

Tests are updated and passing (except for stochasticity that makes them occasionally fail). All files are properly formatted, code coverage is 100%, and all comments are addressed.

@WardBrian: are there any Python style changes you would recommend, as per Bob's comment above? If not, is there anything else to do before merging this pull request?

@WardBrian
Copy link
Collaborator

@gil2rok @bob-carpenter

I am happy with the PR as it stands now. The things I have in mind are more about conventions around e.g. input checking or error handling that would apply to both this but also the existing code in the repo.

I'd like to discuss it (at some point), but a follow on PR would be necessary in either case (either to align the rest of the repo to what was done here, or to adjust this code to whatever we decide)

@bob-carpenter bob-carpenter merged commit deb13a6 into main Oct 30, 2023
@gil2rok
Copy link
Collaborator Author

gil2rok commented Oct 30, 2023

Very excited the pull request was merged!

@bob-carpenter We should also change the README to include DRGHMC. Is it all right if I make that change?

@bob-carpenter
Copy link
Collaborator

Yes, please do change the README. You can just push that w/o review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants