Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word embeddings from word2vec fails to load correctly #1656

Closed
wants to merge 2 commits into from

Conversation

lambdaofgod
Copy link
Contributor

Loading from word2vec format fails if txt file contains first line with number of words and dimensionality.

For example gensim exports to .txt file with the first line like this:

10000 50

In the current version loading will set dimensionality to 1 and fail to load vectors.

I fixed this by checking whether first line contains exactly two tokens, and treating the second one as dimensionality.

@tomaarsen
Copy link
Collaborator

Hello!

Thanks for your PR pointing me to this issue! I've resolved this issue by merging #1875 which also had a clean solution. As a result, I will me closing this, and the next version of Sentence Transformers should include word2vec support.

  • Tom Aarsen

@tomaarsen tomaarsen closed this Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants