Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async support #450

Merged
merged 6 commits into from
Aug 14, 2022
Merged

Async support #450

merged 6 commits into from
Aug 14, 2022

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Jul 31, 2022

First of all, I would like to thank @999eagle for her work. It was very useful for quickly obtaining a working result and its iterative improvement.

After week of experimenting and prepare work I think, that the result suits me. I tried to focus on those points:

  • maximum reuse the code. For that all key methods rewritten using macros, so the sync and async code is the same
  • try to minimize intermediate wrappers, because it is painful to pass long named types to functions, like in that example:

    quick-xml/src/reader/mod.rs

    Lines 324 to 326 in 9ccc686

    ///
    /// fn into_line_and_column(reader: Reader<Cursor<&[u8]>>) -> (usize, usize) {
    /// let end_pos = reader.buffer_position();

I have considered several options

  • original suggestion with three intermediate structs: SliceReader, BufferedReader and AsyncReader:
    impl                  Reader<SliceReader> { ... }// for &[u8]
    impl<R: BufRead>      Reader<BufferedReader<R>> { ... }
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
    I have realized that we have enough only two of them therefore...
  • two intermediate newtype structs, SyncReader and AsyncReader
    impl<R>               Reader<SyncReader<R>> { ... }// for &[u8] and BufRead
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
    After further experiments, it became clear that we can do without SyncReader, so...
  • (this one) sync code stays the same. Async code is implemented using an adapter struct, which I've name TokioAdapter (former AsyncReader)
    impl                  Reader<&[u8]> { ... }
    impl<R: BufRead>      Reader<R> { ... }
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
  • do not introduce intermediate structs and just implement everything for Reader<R>
    impl                  Reader<&[u8]> { ... }
    impl<R: BufRead>      Reader<R> { ... }
    impl<R: AsyncBufRead> Reader<R> { ... }
    In this option the conflicts of names for internal methods would appear. In principle, there is nothing unsolvable, but if we decide to realize support for other asynchronous library, it will be perhaps not really convenient. However, my attempts to implement support for futures were unsuccessful -- API differs a little. Besides, there is nothing difficult in principle, but so far it seems that it is possible to do without it as as there are adapters

As it seemed, all realization differs slightly. I especially like the 4th option, as it eliminates the need for a long type if you need to pass Reader to the function. I think, I'll make also another PR with that variant.

I decided to not implement from_file for async reader, because it is required additional tokio feature and it seems that it can be easely done outside the crate.

Also, I did not implement read_text_into_async API, because it available not on all readers and I think, it should be reworked to return all intermediate notes as text as required for #257 and also was discussed in #154.

I'm not sure, whether we have to Send for inner type, as was originally implemented by @999eagle in #417, currently I do not require that. It seems, that compiler should automatically guarantee that when it can, but how to check that?

@999eagle, could you also review that?

Closes #417
Closes #425

@Mingun Mingun requested a review from dralley July 31, 2022 20:16
@codecov-commenter
Copy link

Codecov Report

Merging #450 (c00bc06) into master (9ccc686) will increase coverage by 0.16%.
The diff coverage is 78.40%.

@@            Coverage Diff             @@
##           master     #450      +/-   ##
==========================================
+ Coverage   51.33%   51.49%   +0.16%     
==========================================
  Files          28       29       +1     
  Lines       13312    13435     +123     
==========================================
+ Hits         6834     6919      +85     
- Misses       6478     6516      +38     
Flag Coverage Δ
unittests 51.49% <78.40%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
benches/macrobenches.rs 0.00% <ø> (ø)
benches/microbenches.rs 0.00% <ø> (ø)
examples/custom_entities.rs 0.00% <ø> (ø)
examples/nested_readers.rs 0.00% <ø> (ø)
examples/read_buffered.rs 0.00% <ø> (ø)
examples/read_texts.rs 0.00% <0.00%> (ø)
src/de/mod.rs 74.59% <ø> (+0.47%) ⬆️
src/events/mod.rs 68.60% <ø> (ø)
src/lib.rs 12.66% <0.00%> (+0.38%) ⬆️
src/se/mod.rs 93.81% <ø> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ccc686...c00bc06. Read the comment docs.

benches/macrobenches.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/reader/ns_reader.rs Outdated Show resolved Hide resolved
@dralley
Copy link
Collaborator

dralley commented Aug 1, 2022

Will need a few days to fully review.. just leaving a few comments for now.

Copy link
Contributor

@999eagle 999eagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of this PR looks good, thank you very much for adapting my work!

I haven't yet compared the old and new code exactly to make 100% sure they match, but they seem to match. There are a few changes I'd like to see, most notably the macros being moved to their own module to improve readability and get the async implementation out of buffered_reader.rs.

src/reader/async_tokio.rs Outdated Show resolved Hide resolved
@@ -12,6 +12,216 @@ use crate::events::Event;
use crate::name::QName;
use crate::reader::{is_whitespace, BangType, ReadElementState, Reader, XmlSource};

macro_rules! impl_buffered_source {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro might be better in a separate module as it's used for both sync and async implementations of buffered IO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know... Having macro in this place first of all indicates that it implements a buffering reader. Because "buffering" trait of this macro seems to me more important, than "async" trait, it is defined here. Moving it into another module will break the explicit link with that usage place, as I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's your choice in the end. Keeping it here would work too.

src/reader/async_tokio.rs Outdated Show resolved Hide resolved
src/reader/async_tokio.rs Outdated Show resolved Hide resolved
src/reader/async_tokio.rs Outdated Show resolved Hide resolved
@@ -127,11 +127,152 @@ macro_rules! configure_methods {
};
}

macro_rules! read_event_impl {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these macros should be moved into a separate module (see also my comment in buffered_reader.rs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of their definition there is that here is one usage place and those macros is very tied to functions where they used. I think it will be better to leave them here.

@@ -23,6 +24,8 @@ pretty_assertions = "1.2"
regex = "1"
serde = { version = "1.0", features = ["derive"] }
serde-value = "0.7"
tokio = { version = "1.20", default-features = false, features = ["macros", "rt"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't want to have support for from_file functions for the async reader, but I think we should still have at least one test which actually reads from a file, which would require the fs feature here. At least in dev-dependencies I think it's okay to have more features.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I do not see why we need to test actual reading from a file. I doubt that this test could catch something that is not catched by the test that read from memory. For example, we also does not have tests for sync reader that reads from a file (we have only two examples for that).

Changelog.md Outdated Show resolved Hide resolved
Changelog.md Outdated Show resolved Hide resolved
@Mingun Mingun mentioned this pull request Aug 3, 2022
18 tasks
@dralley
Copy link
Collaborator

dralley commented Aug 5, 2022

I really wish we could try out the internal buffering idea (no more read_event_into(), &str returned in all cases) before introducing all these macros, as I think there may be an opportunity there for macro-less deduplication. The macros do make it a lot harder to actually read and understand the code. However I'm also tied up with work again for the next week and a half though so I don't have much time to spend prototyping.

So I would say I'm -0, I won't block it but I'm not thrilled about that particular change to be honest.

@Mingun
Copy link
Collaborator Author

Mingun commented Aug 14, 2022

I really wish we could try out the internal buffering idea

I think that the idea to allow the user to manage the buffer with the event content as he wish has they own benefits. Actually, that approach very similar to how Read trait in the standard library is defined: you provide a buffer that will filled with some content.

So we could investigate that idea, and maybe even provide a special buffering reader for users that do not want to manage the buffer himself, but we should keep the current API. That API allows to apply the reader as an operator to a buffer and create events over it, which does not tied to the reader itself, so it can be even destroyed if it is not required anymore.


Since there are no fundamental objections, I think I will merge it in a few hours.

@Mingun Mingun merged commit 2bf2d2d into tafia:master Aug 14, 2022
@Mingun Mingun deleted the async branch August 14, 2022 15:19
@dralley
Copy link
Collaborator

dralley commented Sep 5, 2022

@999eagle Do you feel like this feature is living up to your expectations? Any issues with it in practice?

@999eagle
Copy link
Contributor

999eagle commented Sep 9, 2022

Hi @dralley, I just now got around to doing so, but I just published my crate using this feature to implement async xml deserialization with derive-macros.

The actual crate I'm working on with this is also coming along fine, so (at least for now) this is fulfilling my expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants