This repository contains the code for the paper "Dimension reduction, exact recovery, and error estimates for sparse reconstruction in phase space" by M. Holler, A. Schlüter and B. Wirth (arXiv, ACHA). If you use this code, please cite the paper as reference.
An important theme in modern inverse problems is the reconstruction of time-dependent data from only finitely many measurements. To obtain satisfactory reconstruction results in this setting it is essential to strongly exploit temporal consistency between the different measurement times. The strongest consistency can be achieved by reconstructing data directly in phase space, the space of positions and velocities. However, this space is usually too high-dimensional for feasible computations. We introduce a novel dimension reduction technique, based on projections of phase space onto lower-dimensional subspaces, which provably circumvents this curse of dimensionality: Indeed, in the exemplary framework of superresolution we prove that known exact reconstruction results stay true after dimension reduction, and we additionally prove new error estimates of reconstructions from noisy data in optimal transport metrics which are of the same quality as one would obtain in the non-dimension-reduced case.
First, clone the repository and its submodule:
git clone --recurse-submodules https://github.com/alexschlueter/supermops.git
cd supermops
To install the supermops
package, run
pip install .
in the root directory of this repository. If you want be able to change the code in the supermops directory and see the changes reflected in the installed version immediately, use pip install -e .
instead to install the package in editable mode.
For additional packages needed to run the provided scripts, run:
pip install -r requirements.txt
If incompatibilities with some packages arise, you can also try the known working versions in requirements_versions.txt
, which have been tested to work with Python 3.11.2.
To solve the resulting optimization problems, we call the commercial MOSEK solver throught the CVXPY library. For this, you need to install a MOSEK license in the appropriate path, see here for academic licenses.
If you really want to circumvent this, change the lines where the solver is called to solver.solve(use_mosek=False)
. You may also need to uninstall the mosek package again with pip uninstall mosek
to prevent CVXPY from calling it. Note however that this is untested and the results might be inconsistent with the data from the article.
If you are only interested in trying our method, the above installation is all that is needed. However, for comparison, we also ran experiments with the ADCG algorithm applied to the full-dimensional dynamic formulation of the superresolution problem. For this, we used the Julia implementation of Alberti et al. [2], which is contained in the git submodule "dynamic_spike_super_resolution" in this repository (with some modifications to work for higher Julia versions and to allow for the resumption of reconstructions). If you want to run these experiments as well, you need to install Julia (the scripts were tested with Julia v1.9.2) and prepare the package using
julia --project=./dynamic_spike_super_resolution -e"using Pkg; Pkg.instantiate();"
For a simple, commented example showcasing the main functions, check out the file example.py
and run it using
python example.py
Let
Applying this to the phase space measure
We assume that, at each time step
First, the static problem at a time instant
Second, the full-dimensional dynamic problem proposed by Alberti et al. in [2], which is set in an appropriate subset
Third, our new proposed dimension-reduced version of the above problem. To define the dimension reduction, for a unit vector
The new dimension-reduced variable
Taking an appropriate domain
The first constraint is sometimes referred to as projected / reduced time consistency in the code, while the second constraint ensures consistency with the measured data.
There are some differences between the article and the code, some due to historical differences in variable names and some to allow for a simpler implementation.
First, the position-velocity projection variable mu
) in this code.
Second, in the implementation of our optimization problem, the snapshots nu
) is used. The relationship between the variables is the following: For some
Now,
so
for some subset of directions mu_dirs
, some subset nu_dirs
, and parameters paper_alpha
) and redcons_bound
). Note that the 2-norm in the constraint only makes sense after discretization. Also, the reparametrized Move operation in one dimension, which occurs in the constraint, is the same as the Radon transform in 2D, so the same discretized implementation can be used for both.
See supermops/solvers/cvxpy_solver.py
for the actual implementation of the optimization problem.
Although many parts of the code are generalized for arbitrary space dimensions main_bbox
and measurement times, the only well-tested code paths are for main_bbox = BoundingBox([[0, 1], [0, 1]])
and times in the form np.linspace(-1, 1, 2*K+1)
.
Especially the stripe projection only works for main_bbox
being the unit cube.
The pipeline to run experiments consists of the following steps, using files contained in the scripts/
directory:
- Generate a dataset of ground truth particle configurations
- Define an experiment by writing a
params.py
- Generate a
params.json
from the python file - Run
reconstruct_v7.py
for our model orreconstruct_adcg_v2.jl
for ADCG - Run
evaluate_v6.py
to evaluate each reconstruction result - Run
collect_eval_v2.py
to collect all evaluation results into a singleeval_all.h5
file - Load
eval_all.h5
using thesupermops.eval.PDEval
class, analyse and plot the results contained in the pandas DataFrames
If the dataset in step 1 is supposed to have uniform distribution of dynamic separation values (as generated by gen_ground_truth.py
), we first need to generate a distribution of separation values to initialize the rejection-sampling algorithm (this is heavily inspired by the implementation of [2]). This is done by running gen_separation_distr.py
and afterwards collect_separations.py
.
Each experiment consists of many sets of parameters called "pjobs". These are written to the params.json
and processed during the next steps of the pipeline.
The reconstruction and evaluation steps were run for many pjobs in parallel on a compute cluster and will probably take too long to run on a single machine. The scripts have some common parameters to control the splitting of pjobs into groups for easier parallelization, see for example the output of python reconstruct_v7.py --help
. The two most important parameters are
--reduction N # split the pjobs contained in the experiment into groups of N
--jobid j # in the current invocation, process all pjobs from the j-th group of the split
All evaluation results (i.e. the eval_all.h5
files from step 6 for each experiment) as well as the datasets of ground truth particle configurations can be downloaded to the folder article/data
by running
python scripts/download_article_data.py
(or manually downloaded from http://doi.org/10.17879/09978425554).
If you simply want to explore the raw evaluation data, open article/notebooks/explore_article_eval.ipynb
with jupyter.
To recreate the plots in the article from the provided data, enter the subdirectories in article/plots
and run make
(you need a latex installation for this).
For each step in the pipeline, we now describe how to reproduce the results from the paper. We also provide a scripts/reproduce_paper.py
file, which should in principle run these steps automatically. Note however, that this is only provided for demonstration purposes, as the actual data for the paper was produced on an HPC cluster and reconstructing all examples on a single machine would take much too long.
- Download the ground truth datasets as described above. Set the
REDLIN_GT_ROOT
environment variable in your current shell to the absolute path of thearticle/data/ground_truth
folder. - The
params.py
files for all experiments are provided in thearticle/experiments
folder.
For each experiment (i.e. for each separate params.py
), run the commands for the following steps in the experiment root folder, which is the folder containing the params.py
. Also, let $git
be the path to this git repo.
- Run
python params.py --write-json
resulting in aparams.json
file - If the experiment is an ADCG experiment (check if the method field in the json is "ADCG"): run
julia --project=$git/dynamic_spike_super_resolution $git/scripts/reconstruct_adcg_v2.jl --param-json params.json
(with appropriate job parameters, see previous section). If the experiment has "PyRedLin" in the method fields, run
python $git/scripts/reconstruct_v7.py --param-json params.json
instead.
- Run
python $git/scripts/evaluate_v6.py --param-json params.json
(with appropriate job parameters) - Run
python $git/scripts/collect_eval_v2.py --param-json params.json
You should now have one eval_all.h5
per experiment containing the collected evaluation results.
[1] Dimension reduction, exact recovery, and error estimates for sparse reconstruction in phase space by Martin Holler, Alexander Schlüter and Benedikt Wirth (2021)
[2] Dynamic Spike Superresolution and Applications to Ultrafast Ultrasound Imaging by Giovanni S. Alberti, Habib Ammari, Francisco Romero, and Timothée Wintz (2019)
[3] Dimension reduction, exact recovery, and error estimates for sparse reconstruction in phase space by M. Holler, A. Schlüter, B. Wirth, Applied and Computational Harmonic Analysis (2024)