Question about PaddingMask #975

tulkasm · 2025-01-16T11:17:17Z

tulkasm
Jan 16, 2025

Hi! Could you help me understand the function of PaddingMask trim in the context of auto-regressive sequence processing tasks (e.g. instruction fine-tuning)?
Suppose we work with the example specified in the doc of Collator:

{
    "is_ragged": True/False # True if padding was needed
    "seqs": [[1, 4, 5, 0], [1, 2, 3, 4]]  # "(Tensor) concatenated and padded tensors from the input
    "seq_lens": [3, 4]  # A tensor describing the original length of each input tensor
}

so our batch size is 2 and we've right padded the first tensor:

fairseq2/native/src/fairseq2n/data/collater.cc

Lines 299 to 300 in dba8c52

    
           at::Tensor tmp = at::pad_sequence( 
        
               tensors, /*batch_first=*/true, static_cast<float64>(pad_value));

Now when we prepare the input we slice:

# as_auto_regressive_input
...
    seqs, targets = batch.seqs[:, :-1], batch.seqs[:, 1:]

    if batch.padding_mask is None:
        padding_mask = None
    else:
        padding_mask = batch.padding_mask.trim(1)
...

I'm trying to understand what information PaddingMask encapsulates after this point, since we're using the same PaddingMask instance for both the input_batch and target_batch.

We track the number of elements in the batch here:

    def num_elements(self) -> int:
        """Return the number of elements in the batch."""
        if self.padding_mask is None:
            return self.seqs.numel()

        return int(self.padding_mask.seq_lens.sum())

but our seq_lens were originally [3,4], and now they are [2,3], even though our seqs fed to the model should be [[1,4,5],[1,2,3]], i.e. we have 6 non-padding tokens, but padding_mask.seq_lens.sum() would return 5. Is trim supposed to indicate that we've trimmed on the left or to reflect the slicing of the labels? (why?)

In this context, does the materialized padding mask actually reflect the padding positions (otherwise why are we including it in the attention op)? i.e. if we materialize we should get [[True, True, False], [True, True, True]], but since we sliced the inputs seqs[:,:-1] we actually have no padding left (in this example).

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about PaddingMask #975

{{title}}

Replies: 0 comments

Select a reply

Question about PaddingMask #975

tulkasm Jan 16, 2025

Replies: 0 comments

tulkasm
Jan 16, 2025