-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(NODE-6537): add support for binary vectors #730
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nbbeeken
force-pushed
the
NODE-6537-vector-api
branch
from
November 14, 2024 15:08
adbf3e5
to
28faa17
Compare
nbbeeken
force-pushed
the
NODE-6534-spec
branch
from
November 14, 2024 22:17
d3fe6e0
to
6f4acc4
Compare
nbbeeken
force-pushed
the
NODE-6537-vector-api
branch
4 times, most recently
from
November 15, 2024 04:37
e89e295
to
7c08ddf
Compare
nbbeeken
force-pushed
the
NODE-6534-spec
branch
from
November 15, 2024 15:04
6f4acc4
to
a5ed30d
Compare
nbbeeken
force-pushed
the
NODE-6537-vector-api
branch
3 times, most recently
from
November 15, 2024 15:56
1b9304d
to
0ff7122
Compare
nbbeeken
commented
Nov 15, 2024
nbbeeken
force-pushed
the
NODE-6537-vector-api
branch
4 times, most recently
from
November 15, 2024 19:56
8e70664
to
e0572bc
Compare
nbbeeken
force-pushed
the
NODE-6537-vector-api
branch
from
November 15, 2024 20:52
e0572bc
to
af3f9cd
Compare
addaleax
reviewed
Nov 18, 2024
baileympearson
requested changes
Nov 18, 2024
nbbeeken
added
the
Primary Review
In Review with primary reviewer, not yet ready for team's eyes
label
Nov 18, 2024
addaleax
reviewed
Nov 18, 2024
addaleax
previously approved these changes
Nov 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blocking concerns from me
baileympearson
approved these changes
Nov 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
What is changing?
Add vector helper APIs to Binary.
Is there new documentation needed for these changes?
No
What is the motivation for this change?
New Binary Vector
sub_type
ease of interoperability with native types.Release Highlight
BSON Binary Vector Support!
The
Binary
class has new helpers to assist with using the newly minted Vectorsub_type
of Binarysub_type == 9
🎉! For more on how these types can be used with MongoDB take a look at How to Ingest Quantized Vectors!Here's a summary of the API:
Relatively self-explanatory: each one supports converting to and constructing from a native Javascript data type that corresponds to one of the three vector types:
Int8
,Float32
,PackedBit
.Vector Bytes Format
When a Binary is
sub_type
9 the first two bytes are set to important metadata about the vector.binary.buffer[0]
- Thedatatype
that indicates what the following bytes are.binary.buffer[1]
- Thepadding
amount, a value 0-7 that indicates how many bits to ignore in aPackedBit
vector.Packed Bits 📦
static fromPackedBits(array: Uint8Array, padding: number = 0)
When handling packed bits, the last byte may not be entirely used. For example, a PackedBit vector =
[0xFF, 0xF0]
with padding =4
ignores those last four 0s making the bit vector logically equal to 12 ones.Important
When using the
fromPackedBits
method to set your padding amount to avoid inadvertently extending your bit vector.Unpacking Bits 🧳
Packed bits get special treatment with two styles of conversion methods to suit your vector-y needs.
toBits
will return individually addressable bits shifted apart into an array.fromBits
takes the same format in reverse and packs the bits into bytes.Notice there is no argument to set the
padding
. That is because it can be determined by the array's length. Recall those 12 ones from the previous example, well, the padding has to be 4 to reach a multiple of 8.Caution
We highly encourage using ONLY these methods to interact with vector data and avoid operating directly on the byte format. Other Binary class methods (
put()
,write()
read()
, andvalue()
) and direct access of data in a Binary'sbuffer
beyond the 1st index should only be used in exceptional circumstances and with extreme caution after closely consulting the BSON Vector specification.Details to keep in mind
read()
always returns unsigned bytes.2
.Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript