Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IAsyncEnumerable support to TransformManyBlock #49264

Merged
merged 2 commits into from
Jun 24, 2022

Conversation

manandre
Copy link
Contributor

@manandre manandre commented Mar 6, 2021

Contributes to #30863

cc @onionhammer @stephentoub

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Mar 6, 2021

Tagging subscribers to this area:
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #30863

cc @onionhammer @stephentoub

Author: manandre
Assignees: -
Labels:

area-System.Threading.Tasks, new-api-needs-documentation

Milestone: -

@jeffhandley
Copy link
Member

Tagging @eiriktsarpalis as he might be interested in reviewing this.

@jeffhandley
Copy link
Member

Tagging @adamsitnik, @carlossanlop, and @jozkee for reviewing this.

@jeffhandley
Copy link
Member

Pinging @adamsitnik, @carlossanlop, and @jozkee for a review on this.

@manandre manandre force-pushed the iasyncenumerable-full branch from 47bb2d9 to a9f0f80 Compare June 16, 2021 22:14
@terrajobst terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021
@jeffhandley
Copy link
Member

@adamsitnik This PR is assigned to you for follow-up/decision before the RC1 snap.

{
// Validate arguments.
if (transform == null) throw new ArgumentNullException(nameof(transform));
Initialize(messageId => ProcessMessageWithTask(transform, messageId), dataflowBlockOptions, ref _source!, ref _target!, ref _reorderingBuffer, async: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the benefits of changing private ctor to a method? You need to pass few more extra args, take references and use ! to pretend that fields which are nulls are not nulls. I don't like this part of the refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an attempt to solve an issue reported by @stephentoub during a previous review in #37876 (comment)
To duplicate code or not to duplicate code... that is the question!

@adamsitnik adamsitnik requested a review from stephentoub July 28, 2021 13:26
@adamsitnik
Copy link
Member

@stephentoub could you please review this PR as well? I don't feel confident signing off DataFlow PRs yet.

@adamsitnik adamsitnik added this to the 6.0.0 milestone Jul 28, 2021
@carlossanlop
Copy link
Member

@manandre The PR description says "contributes to" instead of "fixes". Does that mean there will still be some pending work to address after you merge this, or is this the last PR to fully address the issue?

@stephentoub
Copy link
Member

could you please review this PR as well?

Yes. I need to set aside some time for it... I'm still a bit hesitant around all the code duplication.

@manandre
Copy link
Contributor Author

@manandre The PR description says "contributes to" instead of "fixes". Does that mean there will still be some pending work to address after you merge this, or is this the last PR to fully address the issue?

It should be the last PR to complete the IAsyncEnumerable support in Dataflow.

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Aug 17, 2021
@adamsitnik
Copy link
Member

FWIW I've tried to wrap my head around the additional duplication and the only idea I had was to use map Task<IEnumerable<TOutput>> to IAsyncEnumerable<TOutput> and use IAsyncEnumerable everywhere.

Pseudocode:

private async IAsyncEnumerable<TOutput> Map(Task<IEnumerable<TOutput>> oldWay)
{
    foreach (TOutput item in await oldWay.ConfigureAwait(false))
    {
        yield return item;
    }
}

The obvious disadvantage would be the perf hit, but the real blocker is that DataFlow targets monikers like net461 where IAsyncEnumerable does not exist.

Based on #30863 (comment) I've moved it to 7.0.

@jozkee jozkee self-assigned this Sep 23, 2021
@jozkee
Copy link
Member

jozkee commented Sep 27, 2021

@manandre any thoughts on the code duplication issue pointed out by @adamsitnik and @stephentoub?

@manandre manandre force-pushed the iasyncenumerable-full branch from 29606e9 to 08d0312 Compare September 27, 2021 21:21
Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manandre Let us know if you'd be able to try the approach @stephentoub described. Thanks!

@manandre
Copy link
Contributor Author

manandre commented Oct 21, 2021

If we want to implement the TransformManyBlock over its "IAsyncEnumerable-based transform" implementation, we will have to address multiple issues:

  • netstandard2.1 (minimum) is required, then support of net461 and netstandard2.0 TFMs must be dropped from the System.Threading.Tasks.Dataflow library. Is it possible?
  • the internal TargetCore class cannot be used without doing sync-over-async as its "callAction" callback is synchronous, and writing a new AsyncTargetCore would also lead to a lot of duplicated code :/
  • impacting the existing IEnumerable<T> and Task<IEnumerable<T>> implementations could introduce some significative performance regressions, what does not seem recommended on such a perf-critical framework like Dataflow...

I am still ready to contribute for the full support of IAsyncEnumerable in Dataflow, but, I would need some help to address these issues before starting any new implementation.

@buyaa-n
Copy link
Contributor

buyaa-n commented Feb 2, 2022

  • netstandard2.1 (minimum) is required, then support of net461 and netstandard2.0 TFMs must be dropped from the System.Threading.Tasks.Dataflow library. Is it possible?

Normally we would if-def the target platforms, but i guess that would cause again a code dup which we want to avoid, @joperezr mentioned we could not drop those TFMs, what we should do here @stephentoub?

  • the internal TargetCore class cannot be used without doing sync-over-async as its "callAction" callback is synchronous, and writing a new AsyncTargetCore would also lead to a lot of duplicated code :/

From the SslStream implementation that abstracting sync over async read and write that @stephentoub metnioned above doesn't require everything to be async for a real, for example:

public ValueTask<int> ReadAsync(Memory<byte> buffer) =>
new ValueTask<int>(_stream.Read(buffer.Span));
public ValueTask WriteAsync(byte[] buffer, int offset, int count)
{
_stream.Write(buffer, offset, count);
return default;
}

  • impacting the existing IEnumerable<T> and Task<IEnumerable<T>> implementations could introduce some significative performance regressions, what does not seem recommended on such a perf-critical framework like Dataflow...

For me that really depend on implementation and not have to inroduce perf regression, @manandre have you tried it and got some perf results?

@buyaa-n buyaa-n added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 2, 2022
@ghost ghost added the no-recent-activity label Feb 16, 2022
@ghost
Copy link

ghost commented Feb 16, 2022

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@stephentoub
Copy link
Member

This is still on me to spend some time on and try to come up with a better approach.

@ghost ghost removed the no-recent-activity label Feb 16, 2022
Copy link
Contributor

@deeprobin deeprobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I just added a few small code style notes and I would like to see some comments on the test flow so that the tests can be understood when roughly reading over them.

@ghost ghost added the no-recent-activity label Mar 11, 2022
@ghost
Copy link

ghost commented Mar 11, 2022

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@stephentoub stephentoub force-pushed the iasyncenumerable-full branch from 08d0312 to 8da7858 Compare March 24, 2022 02:20
@ghost ghost removed no-recent-activity needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 24, 2022
@deeprobin
Copy link
Contributor

@stephentoub Whats the current status of this?

@stephentoub
Copy link
Member

@deeprobin, why are you asking about status on this PR?

@deeprobin
Copy link
Contributor

Because I have planned to look at the DataFlow APIs in a private project and would like to consume this API.

@stephentoub
Copy link
Member

My plan is for it to be in .NET 7.

@stephentoub stephentoub force-pushed the iasyncenumerable-full branch from 8da7858 to 18aa3de Compare June 24, 2022 01:29
@stephentoub
Copy link
Member

Over the last few weeks I experimented with various approaches, but there's enough variation between the three IEnumerable<T>, Task<IEnumerable<T>>, and IAsyncEnumerable<T> code paths that it's difficult to share meaningfully. I fixed a few issues in the PR, e.g. we shouldn't be synchronously blocking in the IAsyncEnumerable<T> code path, we should be ensuring work is run on the right scheduler, etc. I also added some more tests.

@manandre, thank you very much for working on this, and thank you for your patience.

@stephentoub stephentoub merged commit 2a01ceb into dotnet:main Jun 24, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading.Tasks community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.