Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of implementation of InstanceNorm1d in PyTorch Geometric #705

Closed
WMF1997 opened this issue Sep 27, 2019 · 4 comments
Closed

Problem of implementation of InstanceNorm1d in PyTorch Geometric #705

WMF1997 opened this issue Sep 27, 2019 · 4 comments

Comments

@WMF1997
Copy link

WMF1997 commented Sep 27, 2019

❓ Questions & Help

hello @rusty1s @Magnetar99:

I manage to implement the InstanceNorm1d shown in #687 . And I just write a demo program... here is my code.

my implementation and some error...

my code

# layers_1.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_geometric
import torch_geometric.nn as gnn


import torch_scatter

# class GraphBatch(nn.Module):
#     def __init__(self):

#     def forward(self, x, edge_index):


if __name__ == '__main__':
    torch.manual_seed(0)
    x = torch.randn(30, 40)  # 30 nodes, with 20 dimensions
    batch = torch.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])  
    # 30 nodes, and they are in 3 different graphs, use torch.repeat_interleave to make it much better. 
    # 3 different graphs
    edge_index_0 = torch.randint(0, 12, [2, 40])
    edge_index_1 = torch.randint(12, 18, [2, 20])
    edge_index_2 = torch.randint(18, 30, [2, 50])
    edge_index = torch.cat((edge_index_0, edge_index_1, edge_index_2), dim=1)
    p = torch.randperm(110)
    edge_index_ = edge_index[:, p]  # shuffle the input edge_index, note that features are not shuffled.... (just for a simple test...)
    mu = torch_scatter.scatter_mean(x, batch, dim=0)
    sigma = torch_scatter.scatter_std(x, batch, dim=0)
    x_mu, x_sigma = mu[batch], sigma[batch]
    x_norm = (x - x_mu.unsqueeze(1)) / x_sigma
    x0 = x_norm[:12]
    x1 = x_norm[12:18]
    x2 = x_norm[18]

This is my code, and I think this code is like the code provided in #687 .

error?

And I want to check what is happening, or, what mean and std is, I found a problem? 👇ScreenShots
image

what it should be

as we all know that after normalization, the output's mean should be 0. and output's standard deviation should be 1. .
and why such kind of thing happen? (I know that this "can be regarded as" a small problem (since there is a slight differece with what it should be), but I just wonder why.

?? is there any papers, or ideas about "normalization on graph data?" ??

I know that in thomas kipf's gcn paper, A+I is called "renormalization trick", but perhaps this trick is not the normalizaiton in this issue. So... I just want to ask if there is some paper about normalization on "features"(not the normalization of "structure", i means, methods like "renormalization trick") of the graph?

Appendix: mean and std in numpy and pytorch

When I am finding out the reasons, I tried the mean and std in numpy and pytorch
Take such "simple" 1d array/tensor [1., 2., 3., 4.]as an example.

numpy

import numpy as np
x_np = np.array([1., 2., 3., 4.,])
print (x_np.mean())
print (x_np.std())
print ()
x_mean_ = x_np.sum() / np.prod(x_np.shape)
print (x_mean_)
x_std_ = (np.sum((x_np - x_mean_)**2) / np.prod(x_np.shape)) ** 0.5

ScreenShot👇
image

pytorch

import torch
x_t = torch.tensor([1., 2., 3., 4.])
print (x_t.mean())
print (x_t.std())
x_mean_ = x_t.sum() / 4
x_std_ = (torch.sum((x_t - x_mean_)**2) / (4 - 1)) ** 0.5  # NOTE(WMF): -1!!!
print (x_mean_)
print (x_std_)

ScreenShot👇
image

pytorch_scatter

import torch
import torch_scatter
x = torch.tensor([1., 2., 3., 4.,])
x_ = x.unsqueeze(0)
xT= x_.transpose(1, 0)
batch_0 = torch.tensor([0, ])  # NOTE(WMF): I know that there is ONLY ONE node in the graph, with 4 features. 
batch_1 = torch.tensor([0,0,0,0])  # NOTE(WMF): In another view

# the following 2 are wrong examples... DO NOT USE LIKE THAT... 
x_mean_0 = torch_scatter.scatter_mean(x_, batch_0, dim=0)
x_std_0 = torch_scatter.scatter_std(x_, batch_0, dim=0)

print (x_mean_0)
print (x_std_0)
print ()

# this time can be all right
x_mean_1 = torch_scatter.scatter_mean(xT, batch_1, dim=0)
x_std_1 = torch_scatter.scatter_std(xT, batch_1, dim=0)

print (x_mean_1)
print (x_std_1)
print ()


# or it can be written as:
x_mean_2 = torch_scatter.scatter_mean(x_, batch_1, dim=1)
x_std_2 = torch_scatter.scatter_std(x_, batch_1, dim=1)

print (x_mean_2)
print (x_std_2)
print ()

ScreenShot👇 (sorry, this is too long, and i do not capture the code part)
image

What I want to express here is:
In Image's Normalization, we sum up all the pixel's values, and then calculate it. (Batch Normalization or Instance Normalization) But here, when i operate the scatter_std, the feature_dimension, does not reduce(onlydim='s dimension reduce)?

And I think that /4 or /(4-1) is not the key point, since it is a problem of statistics, or numerical analysis. (when lim n->∞, the 2 approximations can be viewed as the same...)

final note:

#684 may be meaningless, and some of the thoughts may have no use. (perhaps some thoughts are too ahead of time, i.e. few people may pay attention to that. And I close the issue. Many sorry....

yours sincerely,
@WMF1997

@rusty1s
Copy link
Member

rusty1s commented Sep 27, 2019

Hi,
There is a weird unsqueeze(1) in your computation which yields incorrect results. Computing x_norm via

x_norm = (x - x_mu) / x_sigma

yields the correct results. Regarding normalization, it can be applied by normalizing the aggregation (as in GCN) or applied afterwards on the node features (like in GIN). For normalization on node features, I have yet to see anything other than BatchNorm.

@WMF1997
Copy link
Author

WMF1997 commented Sep 27, 2019

hello @rusty1s

an apologize

First, I apologize that unsqueeze(1), I tried some other methods, but i forgot to change that. Now it is fixed. And is the result is okay..?

In [3]: x_norm.mean()
Out[3]: tensor(-1.0928e-09)

In [4]: x_norm.std()
Out[4]: tensor(0.9491)

In [6]: x0.mean()
Out[6]: tensor(0.0016)

In [7]: x0.std()
Out[7]: tensor(0.9515)

In [8]: x1.mean()
Out[8]: tensor(0.0005)

In [9]: x1.std()
Out[9]: tensor(0.9334)

In [10]: x2.mean()
Out[10]: tensor(-0.0018)

In [11]: x2.std()
Out[11]: tensor(0.9564)

But... the mean is close to 0.0 , and std is close to 1.0 . That is what I wonder. I guess it is the computational error? Or something else?

@rusty1s
Copy link
Member

rusty1s commented Sep 27, 2019

Well, first of all you index wrong features IMO

x0 = x_norm[:12]
x1 = x_norm[12:18]
x2 = x_norm[18]

In addition, for std being exactly 1 you need much more samples. You get the same numerical instabililties when using regular mean() and std() calls.

@WMF1997
Copy link
Author

WMF1997 commented Sep 27, 2019

oops! I got that!
I should have use

batch = torch.repeat_interleave(torch.tensor([12, 6, 12]))

and after I fixed it, all the means are allright.

so... this is alright. I should be more careful next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants