Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-6537): add support for binary vectors #730

Merged
merged 13 commits into from
Nov 18, 2024

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Nov 14, 2024

Description

What is changing?

Add vector helper APIs to Binary.

Is there new documentation needed for these changes?

No

What is the motivation for this change?

New Binary Vector sub_type ease of interoperability with native types.

Release Highlight

BSON Binary Vector Support!

The Binary class has new helpers to assist with using the newly minted Vector sub_type of Binary sub_type == 9 🎉! For more on how these types can be used with MongoDB take a look at How to Ingest Quantized Vectors!

Here's a summary of the API:

class Binary {
  toInt8Array(): Int8Array;
  toFloat32Array(): Float32Array;
  toPackedBits(): Uint8Array;

  static fromInt8Array(array: Int8Array): Binary;
  static fromFloat32Array(array: Float32Array): Binary;
  static fromPackedBits(array: Uint8Array, padding: number = 0): Binary;
}

Relatively self-explanatory: each one supports converting to and constructing from a native Javascript data type that corresponds to one of the three vector types: Int8, Float32, PackedBit.

Vector Bytes Format

When a Binary is sub_type 9 the first two bytes are set to important metadata about the vector.

  • binary.buffer[0] - The datatype that indicates what the following bytes are.
  • binary.buffer[1] - The padding amount, a value 0-7 that indicates how many bits to ignore in a PackedBit vector.

Packed Bits 📦

static fromPackedBits(array: Uint8Array, padding: number = 0)

When handling packed bits, the last byte may not be entirely used. For example, a PackedBit vector = [0xFF, 0xF0] with padding = 4 ignores those last four 0s making the bit vector logically equal to 12 ones.

    F    F    F    0
[1111 1111 1111]   // ignored: the four 0s are padding

Important

When using the fromPackedBits method to set your padding amount to avoid inadvertently extending your bit vector.

Unpacking Bits 🧳

Packed bits get special treatment with two styles of conversion methods to suit your vector-y needs. toBits will return individually addressable bits shifted apart into an array. fromBits takes the same format in reverse and packs the bits into bytes.

Notice there is no argument to set the padding. That is because it can be determined by the array's length. Recall those 12 ones from the previous example, well, the padding has to be 4 to reach a multiple of 8.

class Binary {
  toBits(): Int8Array;
  static fromBits(bits: ArrayLike<number>): Binary;
}

Caution

We highly encourage using ONLY these methods to interact with vector data and avoid operating directly on the byte format. Other Binary class methods (put(), write() read(), and value()) and direct access of data in a Binary's buffer beyond the 1st index should only be used in exceptional circumstances and with extreme caution after closely consulting the BSON Vector specification.

Details to keep in mind

  • A javascript engine's endianness is platform dependent whereas BSON is always in little-endian format so if viewing bytes as Float32s take care to re-order bytes as needed.
  • Int8 vectors are signed bytes but read() always returns unsigned bytes.
  • The vector data begins at offset 2.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-6537-vector-api branch from adbf3e5 to 28faa17 Compare November 14, 2024 15:08
@nbbeeken nbbeeken force-pushed the NODE-6537-vector-api branch 4 times, most recently from e89e295 to 7c08ddf Compare November 15, 2024 04:37
@nbbeeken nbbeeken force-pushed the NODE-6537-vector-api branch 3 times, most recently from 1b9304d to 0ff7122 Compare November 15, 2024 15:56
.evergreen/run-big-endian-test.sh Outdated Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
@nbbeeken nbbeeken marked this pull request as ready for review November 15, 2024 16:47
@nbbeeken nbbeeken force-pushed the NODE-6537-vector-api branch 4 times, most recently from 8e70664 to e0572bc Compare November 15, 2024 19:56
Base automatically changed from NODE-6534-spec to main November 15, 2024 20:50
@nbbeeken nbbeeken force-pushed the NODE-6537-vector-api branch from e0572bc to af3f9cd Compare November 15, 2024 20:52
.evergreen/run-big-endian-test.sh Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
test/node/bson_binary_vector.spec.test.ts Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
test/node/binary.test.ts Show resolved Hide resolved
test/node/binary.test.ts Show resolved Hide resolved
src/binary.ts Show resolved Hide resolved
.evergreen/run-big-endian-test.sh Outdated Show resolved Hide resolved
.github/docker/Dockerfile Outdated Show resolved Hide resolved
@nbbeeken nbbeeken added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Nov 18, 2024
@nbbeeken nbbeeken requested a review from addaleax November 18, 2024 19:54
addaleax
addaleax previously approved these changes Nov 18, 2024
Copy link
Contributor

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking concerns from me

@baileympearson baileympearson merged commit d7bdcec into main Nov 18, 2024
8 checks passed
@baileympearson baileympearson deleted the NODE-6537-vector-api branch November 18, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Primary Review In Review with primary reviewer, not yet ready for team's eyes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants