You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the v3 training overview, but something is unclear to me. I would be glad if you could explain what is that I am missing.
I understand this new version accepts Dataset objects as the training dataset.
If we have a single negative for each example (e.g. each example is in the form of (anchor, pos, neg), we can easily convert it to a dataset object with three columns right?
But what if we have multiple negatives? What should be the proper input format in that case? Especially when there are varying number of negatives for each examples, like one example have two negatives: (query, pos, neg1, neg2) and other has three: (query, pos, neg1, neg2, neg3).
Thanks in advance :)
The text was updated successfully, but these errors were encountered:
Good question! To use multiple negatives, you have to add more columns to the dataset. In Sentence Transformers, the order of the columns is most important. So, to use a loss that supports multiple negatives, e.g. as shown in the Loss Overview:
In the provided example we have 1 anchor, 1 positive and 20 negatives for each row right? Another question I was trying to ask was what if we have different number of negatives for each row.
Like one row with: 1 anchor, 1 positive, 10 negatives. Another row with 1 anchor, 1 positive, 5 negatives. What would be your suggested approach to such dataset?
In the provided example we have 1 anchor, 1 positive and 20 negatives for each row right?
Indeed, exactly 20 for each row.
I'm afraid you have to use a fixed amount of negatives for each row. The reasoning is that we like to fit all embeddings into large tensors behind the scenes, which is not possible if some rows have a different amount of values.
Hello there,
I have read the v3 training overview, but something is unclear to me. I would be glad if you could explain what is that I am missing.
I understand this new version accepts Dataset objects as the training dataset.
If we have a single negative for each example (e.g. each example is in the form of (anchor, pos, neg), we can easily convert it to a dataset object with three columns right?
But what if we have multiple negatives? What should be the proper input format in that case? Especially when there are varying number of negatives for each examples, like one example have two negatives: (query, pos, neg1, neg2) and other has three: (query, pos, neg1, neg2, neg3).
Thanks in advance :)
The text was updated successfully, but these errors were encountered: