Variable input sequence length #236

ruoyxue · 2024-03-12T15:04:59Z

Dear authors,
This is an amazing work!
I'm working with variable sequence lengths of video data. In one batch, there could be several videos with different frame numbers, and they will be padded to the same length. When I use transformer, I use attention masks to solve the problem of variable input lengths, but I do not see a similar mask in the Mamba forward function. Is there any solutions for dealing with variable lengths in a batch when using Mamba? Thanks!

ruoyxue · 2024-03-13T02:48:32Z

What's more, I'm using bidirectional Mamba, is there any solutions for processing data with variable lengths in a batch?

tridao · 2024-03-13T05:02:15Z

Variable length is not currently implemented but will be in the future. For now you can pad your sequences.

EricPaul03 · 2024-03-13T06:17:37Z

I want to know what actions I should take on my sequence to achieve the entire pad process. For example, if my sequence length is 128 and I want to pad to a length of 200, can I directly cut it back to the first 128 before the skip connection of Mamba? The output of the last 72 dimensions will not interfere with my results, right?

tridao · 2024-03-13T07:35:52Z

Yes padding tokens should be on the right.

season0528 · 2024-03-14T09:32:55Z

Hi @tridao @ruoyxue @EricPaul03

For training on variable-length sequence, it is okay to right padding to the maximum length with zeros as a alternative solution. But padded tokens might have some side effect on the hardware utilization because computing resources would be wasted on the meaningless padded tokens.

We have supported variable-length sequence for mamba block. #244

Hope it helps!

ZJEast · 2024-03-17T10:37:54Z

any progress ? I want to use variable-length sequences, too. And test the performance of Mamba on some Reinforcement Learning tasks.

ZJEast · 2024-03-17T11:49:39Z

I think it might be an alternative solution to use "gather" to solve this problem. Like

    def hidden_state(self, input_ids: Tensor, input_len: Tensor):
        hs: Tensor = self.backbone(input_ids, inference_params=None)
        B, L, D = hs.shape
        l = (input_len - 1).view(B, 1, 1)
        hs = hs.gather(1, l.expand(B, 1, D))
        return hs.view(B, D)

Here "input_len" is the length of the sequence.

EricPaul03 · 2024-03-19T02:14:18Z

I think it might be an alternative solution to use "gather" to solve this problem. Like
    def hidden_state(self, input_ids: Tensor, input_len: Tensor):
        hs: Tensor = self.backbone(input_ids, inference_params=None)
        B, L, D = hs.shape
        l = (input_len - 1).view(B, 1, 1)
        hs = hs.gather(1, l.expand(B, 1, D))
        return hs.view(B, D)
Here "input_len" is the length of the sequence.

Hello, thank you very much for your answer. I don't quite understand your code yet. Can you tell me where to put this code and how to understand it? Thank you again!

ZJEast · 2024-03-19T02:29:51Z

Because Mamba is a RNN architecture, the output of the padding words is dirty, what you need to do is just to drop the dirty output and prevent them from inference. I think it can be done in many various ways in pytorch. The padding should be on the right.

It really works for me.

EricPaul03 · 2024-03-19T02:36:00Z

Because Mamba is a RNN architecture, the output of the padding words is dirty, what you need to do is just to drop the dirty output and prevent them from inference. I think it can be done in many various ways in pytorch. The padding should be on the right.

It really works for me.

But for performance reasons, I would like to use bidirectional mamba, which includes two scanning processes from left to right and from right to left. Is it not possible to directly fill on the right side?

ZJEast · 2024-03-19T02:42:57Z

[1, 2, 3] -> [1, 2, 3, 3, 2, 1] ???
maybe you can try this preprocessing.

season0528 mentioned this issue Mar 14, 2024

[Feature] Support variable-length sequences for mamba block #244

Open

AndyCao1125 mentioned this issue Jun 4, 2024

Question about Using Mamba/Mamba2 with Variable Input Lengths #356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable input sequence length #236

Variable input sequence length #236

ruoyxue commented Mar 12, 2024

ruoyxue commented Mar 13, 2024

tridao commented Mar 13, 2024

EricPaul03 commented Mar 13, 2024

tridao commented Mar 13, 2024

season0528 commented Mar 14, 2024 •

edited

Loading

ZJEast commented Mar 17, 2024

ZJEast commented Mar 17, 2024

EricPaul03 commented Mar 19, 2024

ZJEast commented Mar 19, 2024 •

edited

Loading

EricPaul03 commented Mar 19, 2024

ZJEast commented Mar 19, 2024

Variable input sequence length #236

Variable input sequence length #236

Comments

ruoyxue commented Mar 12, 2024

ruoyxue commented Mar 13, 2024

tridao commented Mar 13, 2024

EricPaul03 commented Mar 13, 2024

tridao commented Mar 13, 2024

season0528 commented Mar 14, 2024 • edited Loading

ZJEast commented Mar 17, 2024

ZJEast commented Mar 17, 2024

EricPaul03 commented Mar 19, 2024

ZJEast commented Mar 19, 2024 • edited Loading

EricPaul03 commented Mar 19, 2024

ZJEast commented Mar 19, 2024

season0528 commented Mar 14, 2024 •

edited

Loading

ZJEast commented Mar 19, 2024 •

edited

Loading