Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Compute embedding cosine distance #35

Merged
merged 1 commit into from
Jul 15, 2024
Merged

Conversation

adenyes
Copy link
Contributor

@adenyes adenyes commented Jul 14, 2024

Distance between embeddings is typically cosine distance. That is the complement of the dot product over the product of vector norm/magnitude/length. 1 - (a dot b) / (||a|| * ||b|)

It looks like the terminology on the denominator might have been ambiguous so it was implemented as product of the length of the embedding arrays, which is always the same and produced incorrect results.

This is a more typical implementation and produces verifiably correct results. The tests have been updated to check these.

Test plan:

  • Computed cosine distance using another calculator and compared with result for test
  • Used existing tests, new tests are green
  • Patched local app to compare results from this crate vs. a javascript implementation of cosine distance. Results are identical within 6 sig digits.

Copy link
Owner

@rellfy rellfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for your contribution!

@rellfy rellfy merged commit 0dcd0e0 into rellfy:master Jul 15, 2024
1 check passed
@rellfy
Copy link
Owner

rellfy commented Jul 15, 2024

Released as 1.0.0-alpha.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants