Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge async APIs and generic reader/writer support #11

Merged
merged 9 commits into from
May 20, 2024
Merged

Conversation

rossnichols
Copy link
Owner

@rossnichols rossnichols commented May 12, 2024

This PR continues the work from @oratory in #4. The API surface is unchanged, but the following updates were made:

  • Revised the documentation for the new APIs
  • Fixed the new sample code and added to test suite
  • Made async an option and renamed yieldCheckFn to yieldCheck
  • Fixed handling of trailing nils for serialization input
  • Fixed test code interfering with NaN tests
  • Optimized creation of closures and temporary tables
  • Async serialization will now raise errors in the same way as synchronous serialization
  • Library are now compatible with Lua 5.4

This PR also merges in the generic reader/writer support from @Meorawr in #12. The API surface is again unchanged, but I revised the implementation a bit as I was combining it with the above work.

@rossnichols
Copy link
Owner Author

@Stanzilla @Nnoggie FYI - want to take a look before I submit?

This was referenced May 12, 2024
@Nnoggie
Copy link

Nnoggie commented May 15, 2024

Looked over it and tested it against our use case again.
Very happy with how it turned out also really liking the updated documentation.
LGTM!

@rossnichols
Copy link
Owner Author

Thanks @Nnoggie! Before merging it, I'm waiting for @Meorawr to prep his "generic reader/writer" support (#8, #10) so I can evaluate using that model for where to add the coroutine suspension points. I'll create a separate PR for that one when it's ready, and if you can perform your local validation against it as well, I'd appreciate it - it'd be good to know if there are any performance differences.

oratory and others added 2 commits May 16, 2024 11:47
* Add chunks mode for coroutine processing

* Requested updates

* Split sync/async functions, others

* Further updates
* Implement generic support for custom readers

This permits a new options table to be supplied to the Deserialize
and DeserializeValue functions on the library which may contain two
keys that control how the library reads bytes from the stream for
deserialization, and how it detects the end-of-stream state.

As-is this implementation is fully backwards compatible and allows
users of the library to throw processing into a coroutine for large
inputs if required, meeting some use cases that have prompted similar
PRs whilst also being more flexible and much simpler.

The new signatures for Deserialize and DeserializeValue are as follows:

  function LibSerialize:DeserializeValue(input[, options])
  function LibSerialize:Deserialize(input[, options])

The options parameter, if supplied, is expected to be a table that
may contain the following optional keys:

  - readBytes
  - atEnd

The "readBytes" key if present is expected to point to a function
of the signature "readBytes(input, i, j)" and must return a string
whose length is equal to (j - i) + 1). The "input" parameter is the
same as-supplied to Deserialize/DeserializeValue verbatim. The "i"
and "j" parameter represent an inclusive range of indices ([i, j])
typically as byte offsets within the input value. This convention is
identical to that provided by many Lua functions, such as string.sub,
table.concat, and unpack.

The "atEnd" key if present is expected to point to a function of the
signature "atEnd(input, i)". As with "readBytes", the "input" parameter
is whatever value was supplied to Deserialize/DeserializeValue, and the
"i" parameter represents the offset of the stream.

By default if neither of these keys are supplied, or no options table
is given, then the implementation will just default to using string.sub
for "readBytes" and for "atEnd" will test if the offset exceeds the
length of the input as-provided by the length operator.

Examples are provided in the test cases, where we test the following
scenarios:

  - A case where "input" may have additional data after the payload that
    we don't want deserializing.
  - A case where the "input" is a table of character bytes.
  - A case where we yield from within "readBytes" and feed data from the
    main thread, simulating receiving data with a potentially unknown
    bound.

* Implement support for custom writers

This mirrors recent changes submitted for the deserialization API and
provides a generic system for custom writer support to control how and
where the byte stream generated during serialization is written. The
change as-is is fully backwards compatible and passes the existing and
expanded test cases.

Primary use cases would align with existing async serialization attempts
but don't put the complexity of thread management or throttling logic
into the library, instead leaving the minutia of how those aspects are
handled to callers. It also enables some other advanced use cases such
as streaming the data to a destination without buffering it fully
beforehand, which could enable efficiencies elsewhere.
@rossnichols rossnichols marked this pull request as draft May 16, 2024 18:50
@rossnichols
Copy link
Owner Author

Ok I've combined #4 and #12 into this PR, but I still need to look at the proper way to combine them - this current iteration just keeps both implementations.

@rossnichols
Copy link
Owner Author

Update - I've moved the built-in support for async into the reader/writer layer, rather than the serialize/deserialize layer. It's still written so that someone using a custom reader/writer can take advantage of it. I still need to do a pass over the documentation and tests.

@rossnichols
Copy link
Owner Author

@Meorawr some of the updates I made here were inspired by your comment in your implementation:

    -- To minimize changes elsewhere with this initial implementation, this
    -- function must return new closures that bind the 'object' to the first
    -- parameter of the above functions. This could be optimized to remove the
    -- indirection, but requires modifying all call sites of these functions.

Do my changes align with how you envisioned the optimization? I still have a layer of indirection because the reader/writer calls are flowing through a specific function (Reader_X / Writer_X) before calling the "true" implementation, since that's where I put my hook that implements the "inbox" async coroutine yield. I tried to avoid usage of closures where possible, though.

@rossnichols rossnichols changed the title Add support for asynchronous serialization and deserialization Merge async APIs and generic reader/writer support May 16, 2024
@rossnichols rossnichols marked this pull request as ready for review May 17, 2024 05:04
@rossnichols
Copy link
Owner Author

Alright, the PR is ready for review! I've completed my merging of the two PRs and updated the tests and readme. @Nnoggie @Stanzilla mind validating one more time locally? @Meorawr any comments on my merging of your PR?

@rossnichols
Copy link
Owner Author

I think the PR's in a good enough shape to submit. Will keep an eye on any issues being opened.

@rossnichols rossnichols merged commit e121103 into main May 20, 2024
@Stanzilla
Copy link

Thank you for landing this, we'll report back if there are any issues!

@rossnichols rossnichols deleted the async-staging branch May 21, 2024 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants