Bayesian data selection

This repo provides example code implementing Bayesian data selection with the "Stein volume criterion (SVC)", as introduced in the paper

Bayesian data selection, Eli N. Weinstein and Jeffrey W. Miller, 2023, https://www.jmlr.org/papers/v24/21-1067.html

Installation

Download this repo, create a new python 3.9 virtual environment (eg. using conda), and run

pip install .

To test your installation, navigate to the svc subfolder and run

pytest

pPCA

To perform data selection on a probabilistic PCA model, using the fast linear approximation described in the paper, navigate to the svc subfolder and run

python pPCA.py example_pPCA.cfg

A detailed description of the model's options and how to input your own data can be found in the config file example_pPCA.cfg.

Results will (by default) be put in a time-stamped subfolder within the results folder. Below is an example output plot based on simulated data showing the SVC difference (with $m_{\mathcal{B}_j} = m_{\mathcal{F}_0} - m_{\mathcal{F}_j}$ ) for each data dimension. Dimensions marked "out" are those over which the model is misspecified, and we see that the SVC difference is, appropriately, larger over those dimensions.

Glass

To perform data selection on the glass model of gene expression data described in the paper, using a variational approximation to the SVC and the LOORF estimator, navigate to the svc subfolder and run

python RNAGlass.py example_RNAGlass.cfg

A detailed description of the model's options and how to input your own data can be found in the config file example_RNAGlass.cfg.

Results will (by default) be put in a time-stamped subfolder within the results folder. Below is an example output plot based on simulated data showing which data dimensions are included (selection probability close to 1) and which are excluded (selection probability close to 0) by the stochastic data selection procedure. The dimension marked "out" has more severe misspecification than the others, and it is, appropriately, deselected. At lower values of D, all five dimensions will be deselected.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
svc		svc
.gitattributes		.gitattributes
.gitignore		.gitignore
Figure_1.png		Figure_1.png
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian data selection

Installation

pPCA

Glass

About

Releases

Packages

Languages

License

EWeinstein/data-selection

Folders and files

Latest commit

History

Repository files navigation

Bayesian data selection

Installation

pPCA

Glass

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages