Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned bit arrays on the JavaScript target #3946

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

richard-viney
Copy link
Contributor

@richard-viney richard-viney commented Dec 3, 2024

Summary 📘

This PR adds support for unaligned bit arrays on the JavaScript target.

In expressions:

  • Arbitrary sized integer segments:
    <<1:4>>
    <<12:9-little>>
    <<12:29-big>>
    <<1234:100-little>>
  • Arbitrary sized bits segments:
    <<<<0xABCD:15>>:bits-10>>

In patterns:

  • Arbitrary sized int segments:
    let assert <<_:7, i:19-little-signed>> = <<0xABCDEF12:26>>
  • Sized and unsized bits segments:
    let assert <<_:7, a:bits-3, b:size(14)-bits, c:bits>> = <<0xABCDEF:24, 0x1234:16>>

There is a compiler warning if the above features are used when gleam.toml specifies a version < v1.9.0.


Implementation Details 🛠️

  • The BitArray class in the prelude now has bitSize, byteSize, bitOffset, and rawBuffer properties.
    • The value of any unused high bits in the buffer's first byte, and any unused low bits in the buffer's final byte, are undefined.
    • Public API for use by FFI code is now: bitSize, byteSize, bitOffset, rawBuffer, byteAt(index).
    • Deprecated APIs still used by existing FFI code: get buffer(), get length(). Using these emits a deprecation warning at runtime, and throws an exception if the bit array is unaligned (i.e. bitOffset isn't zero or bitSize isn't a multiple of 8).
  • JSDoc annotations have been added to allow type-checking by adding // @ts-check to the top of the file.

Implications for @external JavaScript code 🌍

  • All existing JavaScript FFI code that operates on bit arrays needs to be updated to support unaligned bit arrays.
  • Code that isn't updated will:
    • Continue to work, provided it is only used with aligned bit arrays.
    • Emit deprecation warnings at runtime on use of BitArray.length and BitArray.buffer.
    • Throw a runtime exception if these deprecated APIs are called on a unaligned bit array.
  • No existing code breaks because unaligned bit arrays on JavaScript weren't previously possible.

Implications for gleam/stdlib 🤝

  • I have the updates for gleam/stdlib ready, mostly affecting gleam/bit_array. It can only be merged once this PR goes in as its tests won't run on stable Gleam. It may be necessary to run the new stdlib tests on nightly for a short period, with them segregated into their own file so they can be included/excluded depending on the active Gleam version. I'll sort that out once this PR makes it through review.
  • Future stdlib versions that support unaligned bit arrays on JavaScript work fine on earlier Gleam versions, there are no compatibility concerns there.

Testing 🧪

There's certainly some complexity and tricky bitwise operations here, mostly in the JavaScript prelude. The following has been done to ensure correctness:

  • Many new tests added to language_tests.gleam, and test/javascript_prelude.
  • Every path and branch through the code that performs slicing, concatenation, and conversion to ints/floats is covered by at least one test.
  • Extensive fuzzing has been performed on these operations which validated millions of combinations of bit array contents, segment sizes, offsets, endianness, signedness, etc. on JavaScript against the result on the Erlang target.
  • Issues found during fuzz testing were fixed and added to the language tests and prelude tests.

@richard-viney richard-viney marked this pull request as ready for review December 3, 2024 12:22
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from 2496b81 to bb69dd9 Compare December 5, 2024 01:45
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from b347126 to 9aa5446 Compare December 13, 2024 09:32
Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what a fantastic bit of work! Thank you!

I've not digested these changes properly yet, but I have some initial questions from my first review.

There's a lot of public functions in the API of the bit array class now, but generally the aim is to have none. What is the motivation for having these?

Similar for the deprecated functions, we can remove them.

There's quite a lot of code in the class. Could we move them to free-standing functions and have the generated code only use them if absolutely necessary? That would help with JavaScript bundlers performing dead code elimination.

Some existing tests have been removed or changed, why is this? New features shouldn't alter existing tests, it makes it harder to review and to prevent regressions.

Thanks again!

compiler-core/src/javascript/expression.rs Outdated Show resolved Hide resolved
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 6 times, most recently from 544c327 to 0f6fbdb Compare December 23, 2024 00:38
@richard-viney
Copy link
Contributor Author

richard-viney commented Dec 23, 2024

Thanks for taking a look!

There's a lot of public functions in the API of the bit array class now, but generally the aim is to have none. What is the motivation for having these?

These have been reduced down to the following: bitSize, byteSize, and byteAt().

Similar for the deprecated functions, we can remove them.

I've removed all except the BitArray.length and BitArray.buffer` accessors, because removing these would break most/all JS FFI code that operates on bit arrays.

There's quite a lot of code in the class. Could we move them to free-standing functions and have the generated code only use them if absolutely necessary? That would help with JavaScript bundlers performing dead code elimination.

Done.

Some existing tests have been removed or changed, why is this? New features shouldn't alter existing tests, it makes it harder to review and to prevent regressions.

The diff on the tests looked a bit more complex in places than it actually was so I've moved some things around to make it easier to digest. Some existing tests did need to be tweaked or removed, e.g. those that were testing for compilation errors if unaligned bit arrays were used on the JS target.

@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 4 times, most recently from 9c7c0dc to 5561409 Compare December 29, 2024 22:30
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from da179df to 0847df6 Compare December 31, 2024 23:06
@richard-viney richard-viney marked this pull request as draft January 1, 2025 09:27
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 6 times, most recently from 40047c3 to 945ed06 Compare January 3, 2025 21:19
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch from 945ed06 to 7651919 Compare January 4, 2025 02:50
@richard-viney
Copy link
Contributor Author

richard-viney commented Jan 4, 2025

This PR has been updated to make slicing bit arrays an O(1) operation under all circumstances, matching Erlang's performance characteristics. The BitArray.bitOffset member was added to enable this.

The main write-up has been updated for this change.

@richard-viney richard-viney marked this pull request as ready for review January 4, 2025 02:53
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 6 times, most recently from 4988b97 to c23f5a4 Compare January 10, 2025 12:22
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 5 times, most recently from 9928d33 to 30be6f1 Compare January 23, 2025 23:39
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 2 times, most recently from 74a82c0 to 49e0009 Compare February 8, 2025 03:32
Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible work, thank you so much! Sorry for taking so long to get back to you here, I'm still playing catch-up after some family matters.

This all looks great but I have left one discussion point below RE JavaScript dead code elimination.

}

/** @internal */
sliceToInt(start, end, isBigEndian, isSigned) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods mean that the large functions will never be removed by a JS bundler, isn't that right? Given their large size we want it to be that when unused there's no references to these functions so they get removed during bundling and minification so Gleam users don't need to pay a price for a feature they are not using.

Copy link
Contributor Author

@richard-viney richard-viney Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad8f1c0 restructures things such that the three slicing functions are removable by esbuild. It's an unrelated change to the rest of this PR as the previous slicing functions also weren't able to be removed, but yes they've gotten larger so it matters a bit more now.

There's still an increase in JS code size due to toBitArray() now being more complicated, the need for a BitArray.equals() function, the new deprecation warnings on access of BitArray.buffer and BitArray.length, etc. This code:

pub fn main() {
  <<>>
}

Now bundles and minifies to 2965 bytes (1095 bytes gzip'ed) whereas previously it was 1560 bytes (676 bytes gzip'ed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the best 💜

@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from c89b876 to e60bc99 Compare February 10, 2025 00:44
Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Thank you so much!!! I've one question inline RE unused JS functions

@@ -11,10 +11,10 @@ fn go(x) {


----- COMPILED JAVASCRIPT
import { makeError } from "../gleam.mjs";
import { makeError, bitArraySliceToInt } from "../gleam.mjs";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this gets imported? I can't see it used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. It was happening because of the discarded integer segment.

@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch from b011d2d to 2516b0b Compare February 11, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants