Cnn influence example #195

Xuzzo · 2022-11-23T14:28:49Z

Description

This PR introduces an example where influence functions are used for CNNs.

Changes

Conjugate gradient calculated implicitly through scipy LinearOperator. Closes Reduce memory usage in influence calculation #128
IF computation now supports multi-dimensional input. Closes IF calculation breaks with multi-dimensional input #209
New notebook with examples and theory
Progress on Investigate flakiness of influence values #130
Closes Add notebooks requirements file #221

Checklist

Wrote Unit tests (if necessary)
Updated Documentation (if necessary)
Updated Changelog
If notebooks were added/changed, added boilerplate cells are tagged with "nbsphinx":"hidden"

src/pydvl/utils/dataset.py

requirements.txt

src/pydvl/influence/general.py

src/pydvl/influence/model_wrappers/torch_wrappers.py

notebooks/notebook_support.py

Xuzzo · 2022-12-19T13:00:53Z

Removed caching of results. Kept the one for the model
I have added the difference in score between corrupted and simple scores in the last table
Fixed tqdm for model training and influence calculation

src/pydvl/influence/frameworks/torch_differentiable.py

src/pydvl/influence/general.py

AnesBenmerzoug · 2022-12-20T09:48:50Z

@Xuzzo Thanks for your work! It looks better now.

The main thing I noticed is that in the imagenet notebook when you load the dataset it prints information that is useless to the notebook reader:

Perhaps we should call datasets.utils.logging.set_verbosity_error() in the load_preprocess_imagenet function.

mdbenito

I've looked a bit at the supporting code but couldn't check the notebooks yet. Will do asap

notebooks/notebook_support.py

src/pydvl/influence/conjugate_gradient.py

Xuzzo · 2022-12-20T14:47:42Z

MR comments should be addressed.

AnesBenmerzoug

I had another quick look at it. It looks good to me and can be merged!

mdbenito · 2022-12-21T17:34:13Z

Hi hi,

the supporting code is better now, thanks!

I haven't had time to check it in detail yet, but the notebook:

contains several typos, please run it through a spell checker. Some of them are recurring and I have fixed many times in the past, like "data-points". 🙏🏽
does not hide cells as instructed in the PR template and CONTRIBUTING.md (why?)
Points to a whole book as reference for the main computation. At least one should mention the section or theorem.
has a (useful) theory section which ignores the theory section in the review document draft, and is based on a paper which diverges from the usual theory of influence functions.

I understand that it makes sense to be consistent with the paper that is implemented, but a major selling point of the review and of a library like pydvl is that we provide a common framework and notation to understand any paper on influence functions. In particular, it is important to unify notation and be general enough. Koh2017 is a rather sloppy source because it fails to properly define the object that is being approximated and also diverges from the usual definitions. Custom definitions are ok, but we should strive to make the link to what is standard in the literature, I believe.

Now, the stub I wrote over a year ago is itself worse than sloppy, I'm not claiming it's any better (actually I just read it and it's pretty confusing, to put it mildly 😅). But it'd be nice to try to link both. It is ok to remain superficial in the notebook, if it is consistent with the review. Maybe you can start by mentioning the latter, and in the future work to improve that document (we should coordinate, though).

Fix/cnn influence

Xuzzo · 2022-12-28T09:13:04Z

To do, from discussion in #235:

add notebook requirements
check model saving
check dataset ram usage: add comment on downsampling ds in notebook
Change InternalDataset to TensorDataset
fix pipeline

Xuzzo added 6 commits November 9, 2022 00:14

WIP influence example using imagenet and resnet

93ec7f2

WIP extending example of cnn with ifs

7b73a98

WIP notebook

17a45e9

WIP still working on damn example

7990df1

WIP writing docs for notebook

b8df585

WIP writing theory in example

4800233

mdbenito added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 24, 2022

mdbenito reviewed Nov 24, 2022

View reviewed changes

src/pydvl/utils/dataset.py Outdated Show resolved Hide resolved

mdbenito mentioned this pull request Dec 1, 2022

Label all unreliable IF code and documentation as experimental #203

Closed

2 tasks

mdbenito linked an issue Dec 1, 2022 that may be closed by this pull request

Reduce memory usage in influence calculation #128

Closed

mdbenito removed a link to an issue Dec 1, 2022

Reduce memory usage in influence calculation #128

Closed

Xuzzo added 2 commits December 6, 2022 12:46

Merge branch 'develop' into cnn_influence_example

9c62032

add theory appendix to imagenet notebook

7ada9a6

Xuzzo self-assigned this Dec 6, 2022

Xuzzo added 2 commits December 6, 2022 17:52

moved methods to notebook support

1f36c05

update to imagenet notebook - docs to notebook support methods

c6c6da8

AnesBenmerzoug reviewed Dec 8, 2022

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

Xuzzo added 4 commits December 9, 2022 10:53

cosmetic changes to notebook

d203ad4

trying to solve tox issue in ci

7d38b89

fix ci data loading in notebook

b70432d

minor changes to docs

b16287e

Xuzzo marked this pull request as ready for review December 9, 2022 13:51

Xuzzo requested review from mdbenito and AnesBenmerzoug December 9, 2022 13:51