Skip to content

A coupled vertical federated learning framework that boosts the model performance with record similarities (NeurIPS 2022)

License

Notifications You must be signed in to change notification settings

Xtra-Computing/FedSim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FedSim

GitHub license PyTorch

FedSim is a coupled vertical federated learning framework that boosts the training with record similarities.

Requirements

  1. Install conda 4.14 following https://www.anaconda.com/products/distribution
  2. Clone this repo by
git clone https://github.com/JerryLife/FedSim.git
  1. Create environment (named fedsim) and install required basic modules.
conda env create -f environment.yml
conda activate fedsim
  1. Install torch and torchvision according to your cuda version with pip. For RTX 3090, we installed torch==1.8.2 and torchvision==0.9.2 as below.
pip3 install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
  1. Ensure all the required folders are created (which should exist upon git clone).
mkdir -p runs ckp log cache

Datasets

In this repo, due to the size limit, we include two datasets house and game in the data/ folder.

data
├── beijing 				(house)
│   ├── airbnb_clean.csv	(Secondary)
│   └── house_clean.csv		(Primary)
└── hdb						(hdb)
    ├── hdb_clean.csv		(Primary)
    └── school_clean.csv	(Secondary)

Linkage and Training

The linkage and training of each dataset is combined in a single script.

FedSim without adding noise

The scripts without adding noise are located under src/ in the format of src/train_<dataset>_<algorithm>.py. You can run each script by

python src/train__.py [-g gpu_index] [-p perturbed_noise_on_similarity] [-k number_of_neighbors] [--mlp-merge] [-ds] [-dw]

  • -g/--gpu: GPU index to run this script. If GPU of this index is not available, CPU will be used instead.
  • -k/--top-k: Number of neighbors to extract from possible matches, which should be less than the value of "knn_k". ($K$ in the paper)
  • -p/--leak-p: The probability of leakage of bloom filters. ($\tau$ in the paper)
  • --mlp-merge: whether to replace CNN merge model with MLP merge model
  • -ds/--disable-sort: whether to distable the sort gate
  • -dw/--disable-weight: whether to disable the weight gate

Taking house dataset dataset as an example:

python src/train_beijing_fedsim.py -g 1 -p 1e0 -k 5 -ds

runs FedSim on house dataset with $\tau=1$ (no added noise), $K=5$, merging with CNN, disabling sort gate, enabling weight gate.

FedSim with noise added

The scripts with adding noise are located in src/priv_scripts in the same format as the scripts without noise. The only difference are some hyperparamter settings. You may run these scripts by similar command. For example,

python src/train_beijing_fedsim.py -g 1 -p 1e-2 -k 5 -ds

runs FedSim on house dataset with noise satisfying $\tau=0.01$ added, $K=5$, merging with CNN, disabling sort gate, enabling weight gate.

Citation

@inproceedings{NEURIPS2022_84b74416,
 author = {Wu, Zhaomin and Li, Qinbin and He, Bingsheng},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {21087--21100},
 publisher = {Curran Associates, Inc.},
 title = {A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning},
 url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/84b744165a0597360caad96b06e69313-Paper-Conference.pdf},
 volume = {35},
 year = {2022}
}

About

A coupled vertical federated learning framework that boosts the model performance with record similarities (NeurIPS 2022)

Topics

Resources

License

Stars

Watchers

Forks

Languages