feat(datasets) Add semantic partitioner #3663

KarhouTam · 2024-06-21T02:31:48Z

New Feature

Implement semantic partitioning scheme (named SemanticPartitioner).

Semantic partitioning scheme is proposed by What Do We Mean by Generalization in Federated Learning? (accepted by ICLR 2022)

(Cited from the paper)
Semantic partitioner's goal is to reverse-engineer the federated dataset-generating process so that each client possesses semantically similar data.

Checklist

Any other comments?

I have to mention that this partitioner has a strong dependency on the PyTorch library, so I don't know if this is allowed or not.

It's my first time to contribute flower with new feature.

Happy to see contributors's comments and suggestions ❤.

KarhouTam · 2024-07-01T15:06:34Z

Some checks failed due to the strong dependency on pytorch, sklearn and scipy.

datasets/flwr_datasets/partitioner/image_semantic_partitioner.py

adam-narozniak · 2024-08-26T10:00:56Z

datasets/flwr_datasets/partitioner/image_semantic_partitioner.py

+        with torch.no_grad():
+            for i in range(0, images.shape[0], self._batch_size):
+                idxs = list(range(i, min(i + self._batch_size, images.shape[0])))
+                batch = torch.tensor(images[idxs], dtype=torch.float, device=device)
+                if batch.shape[1] == 1:
+                    batch = batch.broadcast_to((batch.shape[0], 3, *batch.shape[2:]))
+                embedding_list.append(efficient_net(batch).cpu().numpy())


I'm curious, what is the reason that you handle batches "by hand" instead of using DataLoader?

I think using DataLoader needs to trans images to torch.utils.data.Dataset first and is better used in multiple round data loading. Here I just want to traverse images one time. Anyway, just my habit.

datasets/flwr_datasets/partitioner/image_semantic_partitioner.py

adam-narozniak · 2024-08-26T10:09:02Z

@KarhouTam I left a few more comments and questions.

add comments

change its type hint to Optional[int] move seed args to the end

adam-narozniak · 2024-09-11T13:35:13Z

I wanted to merge the code, but checking the results visually on the mnist dataset doesn't seem to produce results such that the same partition has images in the same style (D.3 Visualization of Semantically Partitioned MNIST Dataset from the paper). I'm gonna investigate it further but we should be able to produce something similar.

KarhouTam · 2024-09-11T14:03:34Z

Maybe some hyperparameter settings are not the same. I remember that I reproduced this partitioner by the official repo: https://github.com/google-research/federated/tree/master/generalization.

adam-narozniak · 2024-09-19T12:18:54Z

Hi, I haven't found the problem; however, before resolving that, we won't be able to merge this PR. We need to ensure that we can reproduce these results.

KarhouTam · 2024-09-21T16:30:21Z

Hi, @adam-narozniak.
I've followed the source and fix some different process codes in the partitioner.

Also I evaluate the partitioner with the same settings in the paper and here is the graph:

KarhouTam · 2024-10-23T14:47:22Z

Hi, @adam-narozniak. I noticed that this PR has not been updated after my commits and comments on the partitioner performance. Is there still any unresolved issues with this PR?

adam-narozniak · 2024-11-27T10:22:04Z

What parameters did you use? In the paper, directly, I saw just information about the number of clients = 300 clients and it gives me this plot (which is not convincing)

KarhouTam · 2024-11-27T13:07:38Z

Hi, @adam-narozniak, I reviewed the source code from Google and noticed some parameter discrepancies between the paper and the code. I’ve updated them based on the official source code.

Could you please help test the latest version with 300 clients on MNIST? My laptop doesn’t have enough memory to handle the EfficientNet-B7 model with 300 partition settings. Most parameters don’t require any changes.

KarhouTam added 11 commits June 20, 2024 17:16

Initial completion

68732b0

Add item in partitioner/__init__.py

2e2137f

Optimize test cases

cd7bcbc

Add comments, examples, more functions

f865947

Add item on README

96029a6

Add paper title comment

5f0b62e

Correct import of example in class docs

da2f047

Merge branch 'main' into semantic-partitioner

54ac47f

Sort imports

f178eff

Reformat

e8cb7ab

Merge branch 'main' into semantic-partitioner

2edf546

KarhouTam force-pushed the semantic-partitioner branch 7 times, most recently from 35ce9fc to 2edf546 Compare June 21, 2024 09:20

Refactor codes, add test cases and comments

b4cfd2f

This comment was marked as resolved.

Sign in to view

KarhouTam marked this pull request as ready for review June 21, 2024 09:41

KarhouTam requested review from jafermarq, tanertopal and danieljanes as code owners June 21, 2024 09:41

KarhouTam force-pushed the semantic-partitioner branch from c5758de to b4cfd2f Compare July 1, 2024 14:40

KarhouTam added 4 commits July 1, 2024 22:41

Fix bugs introduced by refactoring

0f88f93

Remove unused hint

a1f2866

Fix for passing checks

dff38c4

Merge branch 'main' into semantic-partitioner

6a819cb

add arg introduction

00bddf6