You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Could you help me understand the function of PaddingMask trim in the context of auto-regressive sequence processing tasks (e.g. instruction fine-tuning)?
Suppose we work with the example specified in the doc of Collator:
{
"is_ragged": True/False # True if padding was needed
"seqs": [[1, 4, 5, 0], [1, 2, 3, 4]] # "(Tensor) concatenated and padded tensors from the input
"seq_lens": [3, 4] # A tensor describing the original length of each input tensor
}
so our batch size is 2 and we've right padded the first tensor:
I'm trying to understand what information PaddingMask encapsulates after this point, since we're using the same PaddingMask instance for both the input_batch and target_batch.
We track the number of elements in the batch here:
defnum_elements(self) ->int:
"""Return the number of elements in the batch."""ifself.padding_maskisNone:
returnself.seqs.numel()
returnint(self.padding_mask.seq_lens.sum())
but our seq_lens were originally [3,4], and now they are [2,3], even though our seqs fed to the model should be [[1,4,5],[1,2,3]], i.e. we have 6 non-padding tokens, but padding_mask.seq_lens.sum() would return 5. Is trim supposed to indicate that we've trimmed on the left or to reflect the slicing of the labels? (why?)
In this context, does the materialized padding mask actually reflect the padding positions (otherwise why are we including it in the attention op)? i.e. if we materialize we should get [[True, True, False], [True, True, True]], but since we sliced the inputs seqs[:,:-1] we actually have no padding left (in this example).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi! Could you help me understand the function of PaddingMask
trim
in the context of auto-regressive sequence processing tasks (e.g. instruction fine-tuning)?Suppose we work with the example specified in the doc of Collator:
so our batch size is 2 and we've right padded the first tensor:
fairseq2/native/src/fairseq2n/data/collater.cc
Lines 299 to 300 in dba8c52
Now when we prepare the input we slice:
I'm trying to understand what information PaddingMask encapsulates after this point, since we're using the same PaddingMask instance for both the input_batch and target_batch.
but our seq_lens were originally [3,4], and now they are [2,3], even though our seqs fed to the model should be [[1,4,5],[1,2,3]], i.e. we have 6 non-padding tokens, but
padding_mask.seq_lens.sum()
would return 5. Istrim
supposed to indicate that we've trimmed on the left or to reflect the slicing of the labels? (why?)seqs[:,:-1]
we actually have no padding left (in this example).Thanks!
Beta Was this translation helpful? Give feedback.
All reactions