_convert_id_to_tokens for XLNet not working #945

chris-boson · 2019-08-01T19:13:25Z

text = self.tokenizer.convert_ids_to_tokens(token_list)
  File "/home/lambda/repos/pytorch-transformers/pytorch_transformers/tokenization_utils.py", line 444, in convert_ids_to_tokens
    tokens.append(self._convert_id_to_token(index))
  File "/home/lambda/repos/pytorch-transformers/pytorch_transformers/tokenization_xlnet.py", line 170, in _convert_id_to_token
    token = self.sp_model.IdToPiece(index)
  File "/home/lambda/python-envs/research/lib/python3.6/site-packages/sentencepiece.py", line 187, in IdToPiece
    return _sentencepiece.SentencePieceProcessor_IdToPiece(self, id)
TypeError: in method 'SentencePieceProcessor_IdToPiece', argument 2 of type 'int'

I find that if I explicitly convert ids to integers it works fine. In tokenization_xlnet.py

def _convert_id_to_token(self, index, return_unicode=True):
    """Converts an index (integer) in a token (string/unicode) using the vocab."""
    token = self.sp_model.IdToPiece(int(index))
    if six.PY2 and return_unicode and isinstance(token, str):
        token = token.decode('utf-8')
    return token

The text was updated successfully, but these errors were encountered:

thomwolf · 2019-08-05T15:17:43Z

Which command can we use to reproduce the behavior?

chris-boson · 2019-08-05T16:49:24Z

Upon further testing, looks like this tokenizer doesn't like numpy arrays, the other ones seem to be fine

import numpy as np
from pytorch_transformers import XLNetTokenizer, TransfoXLTokenizer, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
print(tokenizer.convert_ids_to_tokens(np.array([3, 4, 6, 2356])))
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
print(tokenizer.convert_ids_to_tokens(np.array([3, 4, 6, 2356])))
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
print(tokenizer.convert_ids_to_tokens(np.array([3, 4, 6, 2356]).tolist()))
print(tokenizer.convert_ids_to_tokens(np.array([3, 4, 6, 2356]))) # Above error

stale · 2019-10-04T16:52:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix label Oct 4, 2019

stale bot closed this as completed Oct 11, 2019

LysandreJik mentioned this issue Dec 10, 2019

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_convert_id_to_tokens for XLNet not working #945

_convert_id_to_tokens for XLNet not working #945

chris-boson commented Aug 1, 2019 •

edited

Loading

thomwolf commented Aug 5, 2019 •

edited

Loading

chris-boson commented Aug 5, 2019

stale bot commented Oct 4, 2019

_convert_id_to_tokens for XLNet not working #945

_convert_id_to_tokens for XLNet not working #945

Comments

chris-boson commented Aug 1, 2019 • edited Loading

thomwolf commented Aug 5, 2019 • edited Loading

chris-boson commented Aug 5, 2019

stale bot commented Oct 4, 2019

chris-boson commented Aug 1, 2019 •

edited

Loading

thomwolf commented Aug 5, 2019 •

edited

Loading