Hierarchical Bayesian Modeling Workbooks for the Large Synoptic Survey Telescope Data Science Fellowship Program Session 4.
The notebook uploads have inline plots that are rather large. You may need to click reload once or twice to view the notebook in the browser.
Author: Megan I. Shabram, PhD, NASA Postdoctoral Program Fellow, [email protected]
In this Hierachical Bayesian Model workbook, we will begin by using Stan, a Hamiltonian Monte Carlo method. Here, PyStan is used to obtain a sample from the full posterior distribution of a truncated Gaussian population model, where the measurement uncertainty of the population constituents are Normally distributed. The truncation renders this HBM with no analytical solution, thus requiring a Monte Carlo numerical approximation technique. After we investigate using Stan, we will then explore the same problem using JAGS a Gibbs sampling method (that reverts to Metropolis-Hastings when necessary). The simulated data sets generated below are modeled after the projected eccentricity obtained for exoplanet systems that both transit and occult their host star (see Shabram et al. 2016)
- Learn how to use a computing infrastructre that allows you to carry out future HBM research projects.
- Gain a sense for the workflow involved in setting up an HBM, in particular, using key diagnostics and simulated data.
- Practice data wrangling with Python Pandas data frames and Python Numpy dictionaries.
- Understand the relationship between data quantity, quality, number of iterations needed when performing MCMC, and analysis model complexity (the term analysis model is also refered to as a generative model, a population model, or sometimes a shape function).
- Learn how to run HBM on datasets with missing values using JAGS.
Later: - Notebook 2: model mispecification and regularization, (e.g., running Nm2 simulated data through an Nm1 analysis model used in the HBM)
- Notebook 3: Break point model on eclipsing binary data.
The additional software you will need to install:
PyJAGSI have also included two codes in this folder:
PyStan
triangle_linear.pyand a real eclipsing binary data set for use in Notebook 3:
credible_interval.py
kdestats.py
EBs_for_jags.txt
Note: These notebooks were created using Python 2.7, on a MacBook Air with 8GB of RAM, and required me to have xcode's gcc compiler to install PyJAGS. Installation help has been added at the end of this README. The MCMC simulations may be very hard on your laptop computer causing it to breath heavy. Check your laptop specs accordingly.
Author: Megan I. Shabram, PhD, NASA Postdoctoral Program Fellow, [email protected]
- Use the PyJAGS model and analysis code below to explore a two-component truncated Gaussian mixture model
- Using the simulated data code cells below, evaluate the one-component truncated Gaussian generative model simualted data with the two-component Gaussian mixture HBM. What do you notice about the posteriors for the mixture fractions? (Be sure to rename your output files).
- Repeat this exersize but now evaluating simulated data from a one-component generative model with a two-component HBM. (Also be sure to rename your output files here as well).
- Notebook 3: Using the JAGS model code block for a break-point HBM, set up and evaluate the break-point HBM on the real Kepler eclipsing binary data set provided (some code is provided for analysis, but you may also want to copy code from previous notebooks, and be sure to adjust the number of traceplots and 2d marginals for latent variables.
Joint Orbital Period Breakpoint and Eccentricity Distribution Hierarchical Bayesian Model for Eclipsing Binaries with PyJAGS
Simulating a joint eccentricity and Period distribution of Eclpising Binaries from the Kepler Mission
Author: Megan I. Shabram, PhD, NASA Postdoctoral Program Fellow, [email protected]
Using the JAGS model code block for a break-point HBM provided below, set up and evaluate the break-point HBM on a real Kepler eclipsing binary data set. Some code and the datafile is provided for analysis, but you may also want to copy code from previous notebooks, and be sure to adjust the number of traceplots and 2d marginals for latent variables (e.g., don't plot all ~700 latent variable marginal posteriors etc.). There is some wait time involved during the MCMC computation.
Installation Help:
JAGS:
SourceForge Download of JAGS: https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Mac%20OS%20X/JAGS-4.3.0.dmg/download?use_mirror=newcontinuum&r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fmcmc-jags%2Ffiles%2F&use_mirror=newcontinuum
Ubuntu JAGS. https://launchpad.net/ubuntu/+source/jags
JAGS for Linux: https://sourceforge.net/projects/mcmc-jags/files/latest/download?source=directory
Mac Installation help for PyJAGS: Updated Oct. 11 2017
curl https://pkg-config.freedesktop.org/releases/pkg-config-0.28.tar.gz -o pkgconfig.tgz
tar -zxf pkgconfig.tgz && cd pkg-config-0.28
./configure --with-internal-glib && make install
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/opt/lib/pkgconfig/:$PKG_CONFIG_PATH
export MACOSX_DEPLOYMENT_TARGET=10.9
pip install pyjags