Multiple Mask support in Pipeline #10158

naveenjafer · 2021-02-12T19:26:25Z

🚀 Feature request

The fill mask feature as a part of the pipeline currently only supports a single mask for the inputs. It could be expanded to predict and return the results for multiple masks in the same sentence too.

Motivation

There are use cases where one would ideally have more than just a single mask where they would need a prediction from the model. For example, smarter template filling in outputs returned to users etc. Could also be used in better study of the implicit knowledge that BERT models have accumulated during pre-training.

Your contribution

I should be able to raise a PR for the same. The output JSON schema would have to be slightly modified, but I can go ahead and complete the same if there is no other obvious issue that slipped my mind as to why only a single [MASK] token needs to be supported.

naveenjafer · 2021-02-15T20:45:28Z

@LysandreJik
The current implementation for a single mask returns the data as a list of

{  
   "sequence" : "the final sequence with the mask added",  
   "score" :  "the softmax score",  
   "token" : "the token ID used in filling the MASK",  
   "token_str" : "the token string used in filling the MASK"  
}

When returning the results for sentences with multiple masks, it is not possible to maintain the same return format of the JSON. I propose to have a different pipeline call for this 'fill-mask-multiple' or something along those lines. The return format I have proceeded with is

{  
   "sequence" : "the final sequence with all the masks filled by the model,  
   "scores" :  ["the softmax score of mask 1",  "the softmax score of mask 2", ...]
   "tokens" : ["the token ID used in filling mask 1",  "the token ID used in filling mask 2", ...]
   "token_strs" : ["the token string used in filling mask 1",  "the token string used in filling mask 2", ...]
}

Some minor changes will be made to the input param "targets" to support optional targets for each of the mask.

If having 2 separate pipelines does not seem a great idea, we could just club them both right now into one single pipeline call irrespective of whether it is a single mask or multiple mask. The return json type would change, I am not sure about the impact/how feasible it would be to bring that across in minor version updates.

Would really benefit from some expert advice since I am sort of new here.

PS: I have currently implemented the functionality for the pytorch framework, getting the same done in tf too.

LysandreJik · 2021-02-16T12:25:54Z

This change seems okay to me. Since you have already some functionality for PyTorch, do you mind opening a PR (even a draft PR), so that we may play around with it and talk about the potential improvements? Thanks! Pinging @Narsil too

LysandreJik added the Feature request Request for a new feature label Feb 13, 2021

naveenjafer mentioned this issue Feb 16, 2021

the change from single mask to multi mask support for pytorch #10222

Closed

LysandreJik mentioned this issue Mar 31, 2021

Filling more than 1 masked token at a time #3609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Mask support in Pipeline #10158

Multiple Mask support in Pipeline #10158

naveenjafer commented Feb 12, 2021

naveenjafer commented Feb 15, 2021

LysandreJik commented Feb 16, 2021

Multiple Mask support in Pipeline #10158

Multiple Mask support in Pipeline #10158

Comments

naveenjafer commented Feb 12, 2021

🚀 Feature request

Motivation

Your contribution

naveenjafer commented Feb 15, 2021

LysandreJik commented Feb 16, 2021