Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

Fix bruteforce cosine #1021

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fix bruteforce cosine #1021

wants to merge 4 commits into from

Conversation

hhy3
Copy link
Member

@hhy3 hhy3 commented Aug 4, 2023

No description provided.

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hhy3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify
Copy link

mergify bot commented Aug 4, 2023

@hhy3 Please associate the related issue to the body of your Pull Request. (eg. “issue: #”)

std::unique_ptr<float[]> norms = nullptr;
if (is_cosine) {
norms = std::make_unique<float[]>(nb);
faiss::fvec_norms_L2(norms.get(), (const float*)xb, dim, nb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to calculate this from origin data directly? the xb could be much large and concurrent search requests will copy many data to run out of memory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, we search growing chunk by chunk, which means the nb cannot be bigger than 1024. So 4KB memory is needed for each search.

The reason why put it here is to accelerate the calculation when nq is big

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liliu-z the sealed segment without index will also go brute force, and the sealed segment is with only 1 chunk, which with size of all rows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liliu-z the sealed segment without index will also go brute force, and the sealed segment is with only 1 chunk, which with size of all rows

Good catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhy3 Let's put it in

@mergify
Copy link

mergify bot commented Aug 8, 2023

⚠️ The sha of the head commit of this PR conflicts with #1025. Mergify cannot evaluate rules on this PR. ⚠️

@hhy3 hhy3 force-pushed the bf_cosine branch 2 times, most recently from a661b44 to 70634fc Compare August 8, 2023 09:30
@hhy3
Copy link
Member Author

hhy3 commented Aug 8, 2023

/kind improvement

@mergify mergify bot added needs-dco DCO is missing in this pull request. and removed dco-passed DCO check passed. ci-passed labels Aug 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved kind/improvement needs-dco DCO is missing in this pull request. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants