Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncExtracter TODOs #170

Closed
4 tasks done
NobodyXu opened this issue Jun 9, 2022 · 3 comments · Fixed by #180
Closed
4 tasks done

AsyncExtracter TODOs #170

NobodyXu opened this issue Jun 9, 2022 · 3 comments · Fixed by #180
Labels
Report: enhancement Request for improvements to existing features or code

Comments

@NobodyXu
Copy link
Member

NobodyXu commented Jun 9, 2022

A summary of what could be/need to be improved:

  • Adjust the size of the mpsc::channel to keep the extracter busy while avoiding reading in too much (OOM) and gives feedback to the server. Fix could be dynamically decide the buffer based on fmt: PkgFmt since some extracters (PkgFmg::Bin and PkgFmt::Zip) do less work on the critical path and need bigger buffer size. Completely remove the mpsc channel, runs the downloader and extracter on the same thread. Refactor and optimize fetch_crate_cratesio #180
  • Exploring other channels, like single-producer single-consumer thread-safe fixed-size channel, since it is likely to be much faster than mpsc which requires synchronization between producers. Most likely to be premature optimization.
  • The aforementioned of improvement "the ZipArchiver can read bytes out of the file while the bytes are being appended into the file." We should request for the feature in upstream instead of coming up with homebrew solution Single-pass ZIP generation zip-rs/zip-old#16 Decompress from stream zip-rs/zip-old#314
  • Split API extract_archive_stream: Create a separate API for PkgFmt::Bin. Refactor and optimize fetch_crate_cratesio #180
@NobodyXu NobodyXu changed the title Improve AsyncExtracter TODO: Improve AsyncExtracter Jun 9, 2022
@NobodyXu NobodyXu changed the title TODO: Improve AsyncExtracter AsyncExtracter TODOs Jun 9, 2022
@NobodyXu
Copy link
Member Author

NobodyXu commented Jun 10, 2022

For thread-safe spsc, I found:

Both provides a sync interface but should be trivial to adapt them in async by using tokio::sync::Notify.

Edit:

Maybe this will be proved to be a premature optimization, so perhaps we should try the 1st TODO to adjust size of mpsc::channel before trying SPSC.

@passcod passcod added the Report: enhancement Request for improvements to existing features or code label Jun 10, 2022
@NobodyXu
Copy link
Member Author

I am thinking about running the downloader in the same blocking thread as the extracter using tokio::runtime::Handle.

This would have the following advantage:

  • Remove the mpsc channel, which:
    • Remove synchronization required for mpsc.
    • Remove the internal buffering of the mpsc channel, which avoid potentially OOM situation.
  • Improve data locality since it no longer needs to be sent over thread.

The disadvantages would be that the downloader can no longer be run in parallel to the extracter.

If the bottleneck is the decompressor, then the downloader should also pause and wait for the decompressor to consume the data.

But if the bottleneck is the network, then that might be an issue.

@passcod What's your thought on this?

@NobodyXu
Copy link
Member Author

NobodyXu commented Jun 17, 2022

Alternatively, we can do:

  • Set the size of the mpsc channel to a very small number, e.g. 4.
  • tokio::spawn the downloader so that it can be run in parallel with the extracter.
  • Use tokio::task::block_in_place to run the extracter on the core thread so that we don't have to spawn another thread.
  • Use tokio::runtime::Handle to poll tokio::sync::mpsc::Receiver::recv so that the core thread can still run async tasks if the data is not yet ready.

Compared to the previous approach I proposed, this preserve the parallelism between the downloader and the extracter, while reduce the internal buffer of mpsc and make it possible for the downloader and extracter to be run on the same thread (though not guaranteed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Report: enhancement Request for improvements to existing features or code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants