-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only extracting part of the intermediate feature with DataParallel #1
Comments
I found some discussions about this hook with DataParallel issue. https://discuss.pytorch.org/t/aggregating-the-results-of-forward-backward-hook-on-nn-dataparallel-multi-gpu/28981 |
I made a quick fix by changing
to feature_maps[str(input[0].device)][module_name] = output and torchextractor/torchextractor/extractor.py Line 122 in b48ea75
to # nested dictionary
from collections import defaultdict
self.feature_maps = defaultdict(lambda: defaultdict(dict)) Now the test example will output the features from each device: print(features_gpu['cuda:0']["module.layer1"].shape)
print(features_gpu['cuda:1']["module.layer1"].shape)
# torch.Size([4, 64, 56, 56])
# torch.Size([4, 64, 56, 56]) Can you please address this issue in torchextractor? Thanks. |
Hi @wydwww! |
@antoinebrl Thanks. Please see my updated reply. I missed a line in my previous fix. |
Gently ping @antoinebrl
Wondering if torchextractor can work with DistributedDataParallel. Thanks. |
Hi @antoinebrl,
I am using
torch.nn.DataParallel
on a 2-GPU machine with a batch size of N. Data parallel training will split the input data batch into 2 pieces sequentially and sends them to GPUs.When using torchextractor to obtain the intermediate feature, the input data size and the output size are both N as expected, but the feature size becomes N/2. Does this mean we only extract the features of one GPU? I'm not sure because I didn't find an exact match.
Can you please explain why this happens? Maybe the normal behavior is returning features from all GPUs or from a specified one?
A minimal example to reproduce:
The text was updated successfully, but these errors were encountered: