Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S4 Listops have nan loss #138

Open
lthilnklover opened this issue Feb 28, 2024 · 5 comments
Open

S4 Listops have nan loss #138

lthilnklover opened this issue Feb 28, 2024 · 5 comments

Comments

@lthilnklover
Copy link

lthilnklover commented Feb 28, 2024

First of all, thank you for the comprehensive code base for all variants of S4 models.

However, as I try to run the Listops experiments with S4 (HYYT version), the losses for train, test and val all become nan after 1 epoch.

I ran the following script:

python -m train experiment=lra/s4-listops wandb=null

The final accuracy is also way below the reported accuracy (train=0.17).

Is there something that I have done wrong..?

@radarFudan
Copy link

I came across the same problem and decreasing learning rate by 10 cannot solve this problem.

@bngcode
Copy link

bngcode commented Jun 13, 2024

Same problem here. I am using a completely different dataset for audio processing. I extracted the S4ND and S4 layers into a different neural network architecture and I also got NaN after one epoch because the self.log_dt in SSKernelNPLR is nan. This must have happened during backpropagation because it is not updated otherwise (I believe)?

@albertfgu
Copy link
Contributor

Sorry for not responding to this. I don't know why this is happening. I haven't revisited these experiments in a long time, but I'm quite confident that they were reproducible in the past. Perhaps something has changed in the libraries or perhaps there are some numerical issues on certain hardware

@icannotnamemyself
Copy link

icannotnamemyself commented Dec 2, 2024

same problem, here is a solution to circumvent by changing the SSKernelNPLR class:

with torch.no_grad():
    # Increase the internal length if needed
    while rate * L > self.L:
        self.double_length()
    dt = torch.exp(self.log_dt) * rate
    B = _r2c(self.B)
    C = _r2c(self.C)
    P = _r2c(self.P)
    Q = P.conj() if self.Q is None else _r2c(self.Q)
    w = self._w()

I don't know whether this will be detrimental to the performance or not, at least, no nan ever reported.

@chreissel
Copy link

I have the same problem. @icannotnamemyself, could you comment a bit more on your solution? I am not sure I understand exactly the reasoning or where to make the modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants