Add attention_bias
argument in transformer block and transformer layer modules, addressing change in MCore
#10004
Job | Run time |
---|---|
16s | |
16s | |
25s | |
57s |