-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets) Add semantic partitioner #3663
base: main
Are you sure you want to change the base?
Conversation
35ce9fc
to
2edf546
Compare
This comment was marked as resolved.
This comment was marked as resolved.
c5758de
to
b4cfd2f
Compare
Some checks failed due to the strong dependency on |
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
with torch.no_grad(): | ||
for i in range(0, images.shape[0], self._batch_size): | ||
idxs = list(range(i, min(i + self._batch_size, images.shape[0]))) | ||
batch = torch.tensor(images[idxs], dtype=torch.float, device=device) | ||
if batch.shape[1] == 1: | ||
batch = batch.broadcast_to((batch.shape[0], 3, *batch.shape[2:])) | ||
embedding_list.append(efficient_net(batch).cpu().numpy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, what is the reason that you handle batches "by hand" instead of using DataLoader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using DataLoader
needs to trans images
to torch.utils.data.Dataset
first and is better used in multiple round data loading. Here I just want to traverse images
one time. Anyway, just my habit.
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
datasets/flwr_datasets/partitioner/image_semantic_partitioner.py
Outdated
Show resolved
Hide resolved
@KarhouTam I left a few more comments and questions. |
add comments
change its type hint to Optional[int] move seed args to the end
I wanted to merge the code, but checking the results visually on the mnist dataset doesn't seem to produce results such that the same partition has images in the same style (D.3 Visualization of Semantically Partitioned MNIST Dataset from the paper). I'm gonna investigate it further but we should be able to produce something similar. |
Maybe some hyperparameter settings are not the same. I remember that I reproduced this partitioner by the official repo: https://github.com/google-research/federated/tree/master/generalization. |
Hi, I haven't found the problem; however, before resolving that, we won't be able to merge this PR. We need to ensure that we can reproduce these results. |
Hi, @adam-narozniak. Also I evaluate the partitioner with the same settings in the paper and here is the graph: |
Hi, @adam-narozniak. I noticed that this PR has not been updated after my commits and comments on the partitioner performance. Is there still any unresolved issues with this PR? |
Hi, @adam-narozniak, I reviewed the source code from Google and noticed some parameter discrepancies between the paper and the code. I’ve updated them based on the official source code. Could you please help test the latest version with 300 clients on MNIST? My laptop doesn’t have enough memory to handle the EfficientNet-B7 model with 300 partition settings. Most parameters don’t require any changes. |
New Feature
Implement semantic partitioning scheme (named
SemanticPartitioner
).Semantic partitioning scheme is proposed by What Do We Mean by Generalization in Federated Learning? (accepted by ICLR 2022)
(Cited from the paper)
Semantic partitioner's goal is to reverse-engineer the federated dataset-generating process so that each client possesses semantically similar data.
Checklist
#contributions
)Any other comments?
I have to mention that this partitioner has a strong dependency on the PyTorch library, so I don't know if this is allowed or not.
It's my first time to contribute flower with new feature.
Happy to see contributors's comments and suggestions ❤.