Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling .to(device) on classification loss functions, or calling .to(device) on the parameters inside forward() #139

Closed
fleawood opened this issue Jul 14, 2020 · 2 comments
Labels

Comments

@fleawood
Copy link

Hello, I have been using this library recently. I find a bug in ProxyAnchorLoss.

class ProxyAnchorLoss(WeightRegularizerMixin, BaseMetricLossFunction):
def __init__(self, num_classes, embedding_size, margin = 0.1, alpha = 32, **kwargs):
super().__init__(**kwargs)
self.proxies = torch.nn.Parameter(torch.randn(num_classes, embedding_size))
torch.nn.init.kaiming_normal_(self.proxies, mode='fan_out')
self.num_classes = num_classes
self.margin = margin
self.alpha = alpha
def compute_loss(self, embeddings, labels, indices_tuple):
miner_weights = lmu.convert_to_weights(indices_tuple, labels).unsqueeze(1)
prox = torch.nn.functional.normalize(self.proxies, p=2, dim=1) if self.normalize_embeddings else self.proxies
cos = lmu.sim_mat(embeddings, prox)

self.proxies is defined in Line 14, however, it doesn't move to the same device as embeddings explicitly. An error will raise when I use CUDA

Traceback (most recent call last):
  File "base.py", line 130, in <module>
    main()
  File "base.py", line 126, in main
    trainer.train(num_epochs=num_epochs)
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/trainers/base_trainer.py", line 85, in train
    self.forward_and_backward()
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/trainers/base_trainer.py", line 112, in forward_and_backward
    self.calculate_loss(self.get_batch())
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/trainers/metric_loss_only.py", line 12, in calculate_loss
    self.losses["metric_loss"] = self.maybe_get_metric_loss(embeddings, labels, indices_tuple)
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/trainers/metric_loss_only.py", line 16, in maybe_get_metric_loss
    return self.loss_funcs["metric_loss"](embeddings, labels, indices_tuple)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/losses/base_metric_loss_function.py", line 37, in forward
    loss_dict = self.compute_loss(embeddings, labels, indices_tuple)
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/losses/proxy_anchor_loss.py", line 23, in compute_loss
    cos = lmu.sim_mat(embeddings, prox)
  File "/usr/local/lib/python3.7/site-packages/pytorch_metric_learning/utils/loss_and_miner_utils.py", line 27, in sim_mat
    return torch.matmul(x, y.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat2' in call to _th_mm

My solution is adding this one line code before sim_mat is called

prox = prox.to(embeddings.device)
@KevinMusgrave KevinMusgrave changed the title A bug in ProxyAnchorLoss Calling .to(device) on classification loss functions, or calling .to(device) on the parameters inside forward() Jul 14, 2020
@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Jul 14, 2020

Normally I move the loss function to the gpu:

loss_func = ProxyAnchorLoss()
loss_func = loss_func.to(torch.device('cuda'))

However, I can also do what you've suggested, to make it more convenient. If I make this change, it'll be to all loss functions with a weight matrix (ArcFace, NormalizedSoftmaxLoss etc...)

@KevinMusgrave KevinMusgrave added enhancement New feature or request question A general question about the library and removed question A general question about the library labels Jul 14, 2020
@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Jul 31, 2020

This is fixed in v0.9.90.dev0:

pip install pytorch-metric-learning==0.9.90.dev0

In my opinion, it's still better to move the loss function to the device like in my previous comment. This is because it should be on the device before you create the loss function's optimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants