-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spspmm raises error in cuda but works well in cpu #3097
Comments
Thanks for reporting. Interestingly, it works for me. Can you show me the outputs when modified as follows? for i in range(1, depth + 1):
print(edge_index.shape)
print(edge_index.min(), edge_index.max(), x.size(0))
edge_index, edge_weight = self.augment_adj(edge_index, edge_weight,
x.size(0))
print(edge_index.shape)
print(edge_index.min(), edge_index.max(), x.size(0))
print('----------')
... |
Test the failed case as follow: device = torch.device('cuda')
model = MyNet(3, 64, 4, 0.5, 3).to(device)
data = torch.load('failed.pt').to(device)
y = model(data.x, data.edge_index) The outputs the of the failed case using cpu are:
The codes raise error using cuda
|
This is weird, as your outputs do not violate the |
The bug occurs in the the function def spspmm_sum(src: SparseTensor, other: SparseTensor) -> SparseTensor:
assert src.sparse_size(1) == other.sparse_size(0)
rowptrA, colA, valueA = src.csr()
rowptrB, colB, valueB = other.csr()
value = valueA
if valueA is not None and valueA.dtype == torch.half:
valueA = valueA.to(torch.float)
if valueB is not None and valueB.dtype == torch.half:
valueB = valueB.to(torch.float)
M, K = src.sparse_size(0), other.sparse_size(1)
rowptrC, colC, valueC = torch.ops.torch_sparse.spspmm_sum(
rowptrA, colA, valueA, rowptrB, colB, valueB, K)
print('--------------')
print('A', rowptrA.shape, rowptrA.max().item(), colA.max().item(), valueA.max().item())
print('B', rowptrB.shape, rowptrB.max().item(), colB.max().item(), valueB.max().item())
print('C', rowptrC.shape, rowptrC.max().item(), colC.max().item(), valueC.max().item())
print('--------------')
if valueC is not None and value is not None:
valueC = valueC.to(value.dtype)
return SparseTensor(row=None, rowptr=rowptrC, col=colC, value=valueC,
sparse_sizes=(M, K), is_sorted=True)
The cpu outputs are
The cuda outputs are
The cuda
It can be found that |
Thank you very much! Really helpful. Can you let me know how you tried to install |
I installed torch-sparse via pip as discrebed the PyG documents and I also tried to install it from source. The problem ouccurs in both way. |
Hi, I have exactly the same issue with cuda 10.2, pytorch 1.9.0 and python 3.7 . On what combination of cuda\pytorch\python does this bug not happen in? |
As far as I know, this may be dependent on the GPU (see rusty1s/pytorch_sparse#174), but I'm not entirely sure :( |
Thanks! |
I also encounter this exact same bug using the provided snippet, and also using the vanilla graph unet from the repository. Reproduced with RTX 3080, cuda 11.3, pytorch 1.10.1, python 3.8 |
If you have data + code snippet to reproduce, please let me know :) |
I have exact same issue when I use Titan RTX and RTX 3090. Is there any way to solve it? |
@wrccrwx @kimkyusik It's really a bummer that I cannot reproduce this issue. I'm really sorry. I basically followed the instructions from the
Any chance you can debug where our routine crashes by installing |
🐛 Bug
To Reproduce
The net is similar with Graph UNet, but has only downsample blocks. The Code is
Test the MyNet as follow. The test data can be download in Google Drive
Expected behavior
The error log is
When I set
model.eval()
ordevice='cpu'
, the code works well.Environment
Additional context
The text was updated successfully, but these errors were encountered: