Skip to content

Latest commit

 

History

History
34 lines (27 loc) · 1.16 KB

Readme.md

File metadata and controls

34 lines (27 loc) · 1.16 KB

Python Glove Loader

A simple loader for glove embeddings that supports a memory saving embedding store and preprocessing for twitter data.

Look at the following example to see how it works:

from glove import load_embedding, tokenize, HashTagMode

# Tokenize tweets (thanks to @tokestermw and @ppope for their basis implementation; see method docs)
tokens = tokenize(
    "Hello #World, this is a sample util from @dhartung to " +
    "illustrate the power of https://github.com/dhartung/python-glove-loader :)",
    hashtag_mode=HashTagMode.REPLACE
)
print(tokens)

# Load whole embedding in memory
glove = load_embedding("./glove.twitter.27B.25d.txt", keep_in_memory=True)
# OR Don't load full embeddings into memory (saves a lot of RAM with large embeddings)
glove = load_embedding("./glove.twitter.27B.25d.txt", keep_in_memory=False)

# Read embeddings
print(glove.get_embedding("potato"))
print(glove.get_embeddings(["potato", "house", "mouse"]))

# Parse tweets
print(
    glove.get_tweet_embeddings(
        "Hello #World, this is a sample util from @dhartung to " +
        "illustrate the power of https://github.com/dhartung/python-glove-loader :)"
    )
)