Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders
This repository contains code for the paper, Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders by Zahra Atashgahi, Ghada Sokar, Tim van der Lee, Elena Mocanu, Decebal Constantin Mocanu, Raymond Veldhuis, and Mykola Pechenizkiy. This work is published in the Machine Learning journal (ECML-PKDD 2022 journal track). For more information please read the paper at https://arxiv.org/abs/2012.00560 or https://link.springer.com/article/10.1007/s10994-021-06063-x.
We run this code on Python 3. Following Python packages have to be installed before executing the project code:
- numpy
- scipy
- sklearn
- Cython (optional - To use the fast implementation)
To run the code you can use the following lines:
- select dataset and the number of training epochs:
dataset="madelon" epoch=100
- Train sparse-DAE:
python3 ./QuickSelection/train_sparse_DAE.py --dataset_name $dataset --epoch $epoch
- Use the trained model weights to select features:
python3 ./QuickSelection/QuickSelection.py --dataset_name $dataset
There are two implementations for back-propagation in Sparse_DAE.py
.
If you are running this code on Linux and you want to exploit fast implementation, you can use Cython to run it. You need to first install sparseoperation
. Use the following line to install it on your environment:cythonize -a -i ./QuickSelection/sparseoperations.pyx
.
But if you are on Windows, please change the back-propagation method in the Sparse_DAE.py
file. Please note that the running time will be much higher. More details can be found there.
On the MNIST dataset, first, we train the sparse-denoising-autoencoder (sparse-DAE). Then, we select the 50 most important features using the strength of the input neurons of the trained sparse-DAE. We visualize the features selected for each class separately. In Figure below, each picture at different epochs is the average of the 50 selected features of all the samples of each class along with the average of the actual samples of the corresponding class. As we can see, during training, these features become more similar to the pattern of digits of each class. Thus, QuickSelection is able to find the most relevant features for all classes.
If you use this code, please consider citing the following paper:
@article{atashgahi2021quick,
title={Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders},
author={Atashgahi, Zahra and Sokar, Ghada and van der Lee, Tim and Mocanu, Elena and Mocanu, Decebal Constantin and Veldhuis, Raymond and Pechenizkiy, Mykola},
journal={Machine Learning},
pages={1--38},
year={2021},
publisher={Springer}
}
Starting of the code is "sparse-evolutionary-artificial-neural-networks" which is available at: https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks
@article{Mocanu2018SET,
author = {Mocanu, Decebal Constantin and Mocanu, Elena and Stone, Peter and Nguyen, Phuong H. and Gibescu, Madeleine and Liotta, Antonio},
journal = {Nature Communications},
title = {Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science},
year = {2018}, doi = {10.1038/s41467-018-04316-3},
url = {https://www.nature.com/articles/s41467-018-04316-3 }}
email: [email protected]