Multi Head Relative Positional Embedding #161

EmoX777 · 2024-05-17T17:21:45Z

EmoX777
May 17, 2024

Hi Leondgarse,

Could you please explain the math used behind the MultiHeadRelativePositionalEmbedding()? Where can I find a source/paper/article that explains it?

Thank you.

leondgarse · 2024-05-19T12:33:28Z

leondgarse
May 19, 2024
Maintainer

The source code is in beit.py#L117, and it's ported from timm beit.py#L61. This implementation is introduced from BEiT article PDF 2106.08254 BEIT: BERT Pre-Training of Image Transformers, where UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training and Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer are marked as related.

May make a vision of the replative positional index like:

import matplotlib.pyplot as plt
from keras_cv_attention_models import attention_layers
aa = attention_layers.MultiHeadRelativePositionalEmbedding(with_cls_token=False)
aa.build([1, 4, 10, 10])
print(aa.relative_position_index)
# [[12, 11, 10,  7,  6,  5,  2,  1,  0],
#  [13, 12, 11,  8,  7,  6,  3,  2,  1],
#  [14, 13, 12,  9,  8,  7,  4,  3,  2],
#  [17, 16, 15, 12, 11, 10,  7,  6,  5],
#  [18, 17, 16, 13, 12, 11,  8,  7,  6],
#  [19, 18, 17, 14, 13, 12,  9,  8,  7],
#  [22, 21, 20, 17, 16, 15, 12, 11, 10],
#  [23, 22, 21, 18, 17, 16, 13, 12, 11],
#  [24, 23, 22, 19, 18, 17, 14, 13, 12]]
plt.imshow(aa.relative_position_index)

1 reply

EmoX777 May 27, 2024
Author

Dear @leondgarse

Thank you for the useful links.

Do you have any idea about the possibility of loading a pre-trained model using Pytorch into a Keras model? I want to use the self-supervised BEiT pre-trained model on Imagenet-22k without the fine-tuning on the imagenet-1k dataset. Here I found the self-supervised pre-trained on ImageNet-22k in .pth format. I have no idea how to load its weights into your BEiT implementation. Any suggestion?

Thanks in advance.

leondgarse · 2024-05-28T07:04:51Z

leondgarse
May 28, 2024
Maintainer

Just try custom-vit-models-load-and-convert-weights-from-timm-torch-model:

from keras_cv_attention_models import beit
mm = beit.BeitBasePatch16(pretrained=None, classifier_activation=None, num_classes=21841)
beit.keras_model_load_weights_from_pytorch_model(mm, 'beit_base_patch16_224_pt22k_ft22k.pth')
# >>>> Save model to: beit_base_patch16_224.h5
# >>>> Trying to load index file: /home/leondgarse/.keras/datasets/imagenet21k_class_index.json
# >>>> Keras model prediction: [('n02121808', 'domestic_cat, house_cat, Felis_domesticus, Felis_catus', 10.603789), ('n01317541', 'domestic_animal, domesticated_animal', 10.452324), ('n02123159', 'tiger_cat', 9.973137), ('n00015388', 'animal, animate_being, beast, brute, creature, fauna', 9.916692), ('n01318894', 'pet', 9.427024)]

1 reply

EmoX777 May 28, 2024
Author

Thank you @leondgarse for your quick response.
What I want exactly is to load the weights of the pre-trained model without fine-tuning. See the highlighted model below:

I tried to use the above method but I got the below error:

ValueError Traceback (most recent call last)
Cell In[4], line 3
1 from keras_cv_attention_models import beit
2 mm = beit.BeitBasePatch16(pretrained=None, classifier_activation=None, num_classes=21841)
----> 3 beit.keras_model_load_weights_from_pytorch_model(mm, '/kaggle/input/pre-trained-beit/beit_base_patch16_224_pt22k.pth')

File /opt/conda/lib/python3.10/site-packages/keras_cv_attention_models/beit/beit.py:668, in keras_model_load_weights_from_pytorch_model(keras_model, timm_vit_model, save_name)
665 full_name_align_dict = {"cls_token": -1, "positional_embedding": -1}
666 additional_transfer = {attention_layers.MultiHeadRelativePositionalEmbedding: lambda ww: [ww[0].T]}
--> 668 download_and_load.keras_reload_from_torch_model(
669 torch_model=timm_vit_model,
670 keras_model=keras_model,
671 input_shape=keras_model.input_shape[1:-1],
672 skip_weights=skip_weights,
673 unstack_weights=unstack_weights,
674 tail_align_dict=tail_align_dict,
675 full_name_align_dict=full_name_align_dict,
676 tail_split_position=1,
677 additional_transfer=additional_transfer,
678 save_name=save_name if save_name is not None else (keras_model.name + "_{}.h5".format(keras_model.input_shape[1])),
679 do_convert=True,
680 # do_predict=False if "eva_giant" in keras_model.name else True,
681 )

File /opt/conda/lib/python3.10/site-packages/keras_cv_attention_models/download_and_load.py:639, in keras_reload_from_torch_model(torch_model, keras_model, input_shape, skip_weights, unstack_weights, tail_align_dict, full_name_align_dict, tail_split_position, additional_transfer, specific_match_func, save_name, do_convert, do_predict, verbose)
637 if verbose > 0:
638 print(">>>> Keras reload torch weights:")
--> 639 keras_reload_stacked_state_dict(keras_model, stacked_state_dict, aligned_names, additional_transfer, save_name=save_name, verbose=verbose)
640 if verbose > 0:
641 print()

File /opt/conda/lib/python3.10/site-packages/keras_cv_attention_models/download_and_load.py:504, in keras_reload_stacked_state_dict(model, stacked_state_dict, layer_names_matched_torch, additional_transfer, save_name, verbose)
501 if verbose > 0:
502 print(" After: [{}] torch: {}, tf: {}".format(kk, [ii.shape for ii in torch_weight], [ii.shape for ii in tf_weights]))
--> 504 tf_layer.set_weights(torch_weight)
506 if save_name is None:
507 save_name = model.name + ".h5"

File /opt/conda/lib/python3.10/site-packages/keras/src/engine/base_layer.py:1822, in Layer.set_weights(self, weights)
1820 ref_shape = param.shape
1821 if not ref_shape.is_compatible_with(weight_shape):
-> 1822 raise ValueError(
1823 f"Layer {self.name} weight shape {ref_shape} "
1824 "is not compatible with provided weight "
1825 f"shape {weight_shape}."
1826 )
1827 weight_value_tuples.append((param, weight))
1828 weight_index += 1

ValueError: Layer block0_attn_gamma weight shape (768,) is not compatible with provided weight shape (732, 12).

leondgarse · 2024-05-29T07:28:59Z

leondgarse
May 29, 2024
Maintainer

That one has a different architecture, updated add use_shared_pos_emb_for_attn parameter for beit supporting raw model w/o any finetuning. Try:

from keras_cv_attention_models import beit
mm = beit.BeitBasePatch16(pretrained=None, classifier_activation=None, num_classes=8192, use_shared_pos_emb_for_attn=True)
beit.keras_model_load_weights_from_pytorch_model(mm, 'beit_base_patch16_224_pt22k.pth')

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi Head Relative Positional Embedding #161

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Multi Head Relative Positional Embedding #161

EmoX777 May 17, 2024

Replies: 3 comments · 2 replies

leondgarse May 19, 2024 Maintainer

EmoX777 May 27, 2024 Author

leondgarse May 28, 2024 Maintainer

EmoX777 May 28, 2024 Author

leondgarse May 29, 2024 Maintainer

EmoX777
May 17, 2024

Replies: 3 comments 2 replies

leondgarse
May 19, 2024
Maintainer

EmoX777 May 27, 2024
Author

leondgarse
May 28, 2024
Maintainer

EmoX777 May 28, 2024
Author

leondgarse
May 29, 2024
Maintainer