S4 Listops have nan loss #138

lthilnklover · 2024-02-28T03:54:27Z

First of all, thank you for the comprehensive code base for all variants of S4 models.

However, as I try to run the Listops experiments with S4 (HYYT version), the losses for train, test and val all become nan after 1 epoch.

I ran the following script:

python -m train experiment=lra/s4-listops wandb=null

The final accuracy is also way below the reported accuracy (train=0.17).

Is there something that I have done wrong..?

The text was updated successfully, but these errors were encountered:

radarFudan · 2024-03-24T07:28:09Z

I came across the same problem and decreasing learning rate by 10 cannot solve this problem.

bngcode · 2024-06-13T16:05:35Z

Same problem here. I am using a completely different dataset for audio processing. I extracted the S4ND and S4 layers into a different neural network architecture and I also got NaN after one epoch because the self.log_dt in SSKernelNPLR is nan. This must have happened during backpropagation because it is not updated otherwise (I believe)?

albertfgu · 2024-08-21T20:04:50Z

Sorry for not responding to this. I don't know why this is happening. I haven't revisited these experiments in a long time, but I'm quite confident that they were reproducible in the past. Perhaps something has changed in the libraries or perhaps there are some numerical issues on certain hardware

icannotnamemyself · 2024-12-02T07:09:48Z

same problem, here is a solution to circumvent by changing the SSKernelNPLR class:

with torch.no_grad():
    # Increase the internal length if needed
    while rate * L > self.L:
        self.double_length()
    dt = torch.exp(self.log_dt) * rate
    B = _r2c(self.B)
    C = _r2c(self.C)
    P = _r2c(self.P)
    Q = P.conj() if self.Q is None else _r2c(self.Q)
    w = self._w()

I don't know whether this will be detrimental to the performance or not, at least, no nan ever reported.

chreissel · 2024-12-20T20:34:20Z

I have the same problem. @icannotnamemyself, could you comment a bit more on your solution? I am not sure I understand exactly the reasoning or where to make the modifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S4 Listops have nan loss #138

S4 Listops have nan loss #138

lthilnklover commented Feb 28, 2024 •

edited

Loading

radarFudan commented Mar 24, 2024

bngcode commented Jun 13, 2024

albertfgu commented Aug 21, 2024

icannotnamemyself commented Dec 2, 2024 •

edited

Loading

chreissel commented Dec 20, 2024

S4 Listops have nan loss #138

S4 Listops have nan loss #138

Comments

lthilnklover commented Feb 28, 2024 • edited Loading

radarFudan commented Mar 24, 2024

bngcode commented Jun 13, 2024

albertfgu commented Aug 21, 2024

icannotnamemyself commented Dec 2, 2024 • edited Loading

chreissel commented Dec 20, 2024

lthilnklover commented Feb 28, 2024 •

edited

Loading

icannotnamemyself commented Dec 2, 2024 •

edited

Loading