Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问GPT2TokenizerFast的实现有规划吗 #16

Open
zjcDM opened this issue Mar 8, 2023 · 4 comments
Open

请问GPT2TokenizerFast的实现有规划吗 #16

zjcDM opened this issue Mar 8, 2023 · 4 comments

Comments

@zjcDM
Copy link

zjcDM commented Mar 8, 2023

No description provided.

@zjcDM zjcDM closed this as completed Mar 8, 2023
@zjcDM
Copy link
Author

zjcDM commented Mar 8, 2023

已经实现

@zjcDM zjcDM reopened this Mar 8, 2023
@zjcDM zjcDM changed the title 请问GPT2Tokenizer的实现有规划吗 请问GPT2TokenizerFast的实现有规划吗 Mar 8, 2023
@mymagicpower
Copy link
Owner

用这个方法:

1. pom 配置

    <dependency>
        <groupId>ai.djl.huggingface</groupId>
        <artifactId>tokenizers</artifactId>
        <version>0.19.0</version>
    </dependency>

private static final HuggingFaceTokenizer tokenizer;

2. 例子代码

# 声明
static {
    try {
        tokenizer =
                HuggingFaceTokenizer.builder()
                        .optManager(manager)
                        .optPadding(true)
                        .optPadToMaxLength()
                        .optMaxLength(MAX_LENGTH)
                        .optTruncation(true)
                        .optTokenizerName("openai/clip-vit-large-patch14")
                        .build();
        // sentence-transformers/msmarco-distilbert-dot-v5
        // openai/clip-vit-large-patch14
        // https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5
        // https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/tokenizer/tokenizer_config.json
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

# 使用
List<String> tokens = tokenizer.tokenize(prompt);

@mymagicpower
Copy link
Owner

@zjcDM
Copy link
Author

zjcDM commented Apr 3, 2023

https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/README.md

你好,这个好像无法自定义词表?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants