Skip to content

📄Can a Confident Prior Replace a Cold Posterior?

License

Notifications You must be signed in to change notification settings

martin-marek/dirclip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can a Confident Prior Replace a Cold Posterior?

Official repository for the paper Can a Confident Prior Replace a Cold Posterior?

Key ideas

Representing aleatoric uncertainty. We introduce the DirClip prior to control the aleatoric (data) uncertainty of a Bayesian neural network. Consider the following toy classification problem: should we prefer the smooth or the complex decision boundary? Either choice is valid, depending on our beliefs about the quality of the data labels. The DirClip prior lets us represent these beliefs.

Results. Using the DirClip prior, we can force a BNN to have low aleatoric uncertainty, nearly matching the accuracy of a cold posterior without any tempering.

Training stability. Why does the DirClip prior stop working when $\alpha<0.8$? When the prior dominates the likelihood, posterior gradients may point toward the wrong class, leading to unstable training. For a more detailed discussion, please see the full paper.

Training a model

The core directory contains all required code for model training. We recommend interfacing with Python, although Bash is also supported thanks to Fire.

# Python
from run import run
run(model_name='cnn', ds_name='mnist', distribution='dirclip-10', distribution_param=0.9)
# Bash
python run.py --model_name='cnn' --ds_name='mnist' --distribution='dirclip-10' --distribution_param=0.9

The experiments directory contains three Python scripts for reproducing all of our training runs. However, they are meant to serve mostly as pseudocode: the scripts are very readable but you might find it necessary to add some experiment-management code to run multiple jobs in parallel, monitor them, etc. Since reproducing all of our experiments would take ~700 TPU-core-days, we also provide download links for model weights (32 GB) and data to reproduce loss landscape plots and Normal prior confidence (31 MB).

Notebooks

All figures in the report were generated using the provided Jupyter notebooks:

Citation

@misc{dirclip,
  title={Can a Confident Prior Replace a Cold Posterior?}, 
  author={Martin Marek and Brooks Paige and Pavel Izmailov},
  year={2024},
  eprint={2403.01272},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

About

📄Can a Confident Prior Replace a Cold Posterior?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages