Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes and logging improvements #619

Closed
wants to merge 8 commits into from

Conversation

anushprabhu
Copy link

@anushprabhu anushprabhu commented Sep 13, 2018

Checklist

  • I have read the contribution guidelines.
  • I added or modified the existing tests to cover the change (we do not allow our test coverage to go down).
  • This pull-request is submitted against the master branch.

Description of the changes

  • Fixing bugs and improving logging in AMQP logs.

Reference/Link to the issue solved with this PR (if any)

@anushprabhu
Copy link
Author

@CIPop is added to the review. #Closed

@anushprabhu
Copy link
Author

@alextolp is added to the review. #Closed

@anushprabhu
Copy link
Author

@rajeevmv is added to the review. #Closed

@@ -9,6 +9,7 @@
<OutputPath>.\bin\$(Configuration)</OutputPath>
<DocumentationFile>$(OutputPath)\$(TargetFramework)\$(AssemblyName).xml</DocumentationFile>
<RootDir>$(MSBuildProjectDirectory)\..\..\..\..</RootDir>
<NoWarn>CA2007</NoWarn>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! Thank you @mounicaArroju for pointing it out. (This is why the build was probably not failing anymore with the current code.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have ConfigureAwait with DotNetty. See other comment.


In reply to: 217796093 [](ancestors = 217796093)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: there is always an implicit configuration of the awaited operation. (Default is true.)
Default FXCop rules are making the ConfigureAwait usage mandatory: your semantic is identical to ConfigureAwait(true). I don't agree with that either since the control has already been returned to the dotNetty thread when the first await is encountered.

catch (Exception)
{
Abort();
catch (Exception)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never catch Exception without rethrowing. This needs to be somehow captured and marshalled to the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't rethrow b/c of the Task.Delay. Please let us know if there's a better way to do this


In reply to: 217763304 [](ancestors = 217763304)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - I don't understand the reason but this is a bad practice in general.

@@ -140,7 +146,7 @@ protected override async void DoBeginRead()
do
{
byteBuffer = allocHandle.Allocate(allocator);
allocHandle.LastBytesRead = await DoReadBytes(byteBuffer).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the build - ConfigureAwait is mandatory on all await statements.
Why are you changing the behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were seeing hangs in mqtt runners and the recommendation from Max was that we could never use ConfigureAwait with dot netty. After removing this, we never saw hangs again. We should avoid adding ConfigureAwaits in paths that use their own thread pool


In reply to: 217763495 [](ancestors = 217763495)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could never use ConfigureAwait with dot netty.

@alextolp ConfigureAwait is mandatory by FXCop rules. Not setting it will break the build. You can either set it to true (return on this context) or false (any context).

@nayato I don't understand the reasoning of not using any thread to continue the execution. Up to the first await statement the void callback will execute on the original thread. After that, the thread returns to dotNetty. Why spend the cycles to ensure that we're returning on that random thread at a future time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CIPop the whole point is that it is not a random thread. DotNetty overrides current task scheduler so if you don't opt-out with ConfigureAwait continuation will be posted to executor thread. Primary benefit of doing that is to avoid synchronization on data access and random issues associated with data races when we fail to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nayato - didn't know that! (Really cool!)
In that case the right code here would be .ConfigureAwait(true).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have explicit .ConfigureAwait(true) as it is the default?


In reply to: 218609670 [](ancestors = 218609670)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise build will break due to the FXCop rule. Our code should be explicit instead of implicit. DotNetty is a special case as it has its own TaskScheduler. Most library code should always do .ConfigureAwait(false) to allow completions on the current thread and avoid a thread switch.


using (var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(30)))
Task.Run(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never start a task that is not awaited by something. The result is not captured in this case and so the application will have no chance to get exceptions, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Christian - we observed hangs in mqtt client and the recommendation from Max was to add a 5s delay before closing. Could you recommend another way where the client can close but we don't block it for 5s?


In reply to: 217779370 [](ancestors = 217779370)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a connection - this handler is a void async which means it's not awaited by the caller anyway. We may need to lock to avoid calling CloseAsync from multiple threads.
Are you by any chance calling non-static methods on ProvisioningClient from multiple threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CIPop to give some context here: I don't know about 5 sec but the dispatch there is to avoid blocking executor thread. WS client tends to deadlock on simultaneous closure. Locking CloseAsync while running on executor thread will not buy us anything. That said, I don't see why we couldn't await on that task except that if CloseAsync deadlocks we would still want to make progress wherever we await it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alextolp, are you using .NET Framework or .NET Core when you're seeing the deadlock?

@nayato, according to MSDN Task.Run uses the default task scheduler (I read that as the current task scheduler and might be wrong). Wouldn't that if written like this, it would still run in the dotNetty executor loop?

WebSocketCloseStatus.NormalClosure,
string.Empty,
cancellationTokenSource.Token).ConfigureAwait(false);
using (var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(30)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate the cancellation token passed by the public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two cancellation tokens have different roles. The one passed by the public API is about the caller timeout.

This one is about timeout on ws.Close()

If caller timed out, we should not pass a cancelled token to Close() and let it happen anyway


In reply to: 217779575 [](ancestors = 217779575)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the operation token expired we should not attempt to do graceful WS termination. Instead we should reset the TCP connection.


if (Logging.IsEnabled) Logging.Info(this, $"{nameof(AmqpClientConnection)}.{nameof(OpenAsync)}: Sent Protocol Header: {_sentHeader.ToString()} writeAsyncResult: {writeAsyncResult} completedSynchronously: {args.CompletedSynchronously}");

if (args.CompletedSynchronously)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this fixes #552. Please remove all TODO and Ignore statements from the e2e tests.
/cc @abhipsaMisra @prmathur-microsoft we need to check this for IoT Hub client as well.

@@ -184,6 +193,8 @@ async Task<ClientWebSocket> CreateClientWebSocketAsync(Uri websocketUri, TimeSpa

private void OnWriteHeaderComplete(TransportAsyncCallbackArgs args)
{
if (Logging.IsEnabled) Logging.Enter(this, $"{nameof(AmqpClientConnection)}.{nameof(OnWriteHeaderComplete)}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enter is missing Exit. Please ensure that all Enter statements have an equivalent Exit.

@@ -9,6 +9,7 @@
<OutputPath>.\bin\$(Configuration)</OutputPath>
<DocumentationFile>$(OutputPath)\$(TargetFramework)\$(AssemblyName).xml</DocumentationFile>
<RootDir>$(MSBuildProjectDirectory)\..\..\..\..</RootDir>
<NoWarn>CA2007</NoWarn>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! Thank you @mounicaArroju for pointing it out. (This is why the build was probably not failing anymore with the current code.)

try
{
if (message.Security is SecurityProviderX509)
{
security = (SecurityProviderX509)message.Security;
SecurityProviderX509 x509Security = (SecurityProviderX509)message.Security;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, doesn't look like it's needed


In reply to: 217852616 [](ancestors = 217852616)

@CIPop
Copy link
Member

CIPop commented Sep 14, 2018

Thanks @anushprabhu ! I left a few comments.
Please squash into a single commit as well.

Copy link
Member

@CIPop CIPop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alextolp.
There are still a few issues to address.


using (var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(30)))
Task.Run(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a connection - this handler is a void async which means it's not awaited by the caller anyway. We may need to lock to avoid calling CloseAsync from multiple threads.
Are you by any chance calling non-static methods on ProvisioningClient from multiple threads?

WebSocketCloseStatus.NormalClosure,
string.Empty,
cancellationTokenSource.Token).ConfigureAwait(false);
using (var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(30)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the operation token expired we should not attempt to do graceful WS termination. Instead we should reset the TCP connection.

catch (Exception)
{
Abort();
catch (Exception)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - I don't understand the reason but this is a bad practice in general.

@@ -140,7 +146,7 @@ protected override async void DoBeginRead()
do
{
byteBuffer = allocHandle.Allocate(allocator);
allocHandle.LastBytesRead = await DoReadBytes(byteBuffer).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could never use ConfigureAwait with dot netty.

@alextolp ConfigureAwait is mandatory by FXCop rules. Not setting it will break the build. You can either set it to true (return on this context) or false (any context).

@nayato I don't understand the reasoning of not using any thread to continue the execution. Up to the first await statement the void callback will execute on the original thread. After that, the thread returns to dotNetty. Why spend the cycles to ensure that we're returning on that random thread at a future time?

@@ -9,6 +9,7 @@
<OutputPath>.\bin\$(Configuration)</OutputPath>
<DocumentationFile>$(OutputPath)\$(TargetFramework)\$(AssemblyName).xml</DocumentationFile>
<RootDir>$(MSBuildProjectDirectory)\..\..\..\..</RootDir>
<NoWarn>CA2007</NoWarn>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: there is always an implicit configuration of the awaited operation. (Default is true.)
Default FXCop rules are making the ConfigureAwait usage mandatory: your semantic is identical to ConfigureAwait(true). I don't agree with that either since the control has already been returned to the dotNetty thread when the first await is encountered.

@CIPop CIPop self-assigned this Sep 18, 2018
@CIPop CIPop added the do-not-merge PR is not ready to be merged in. label Sep 18, 2018
@CIPop
Copy link
Member

CIPop commented Sep 18, 2018

Blocking merge: @prmathur-microsoft @tameraw @yzhong94 - we will not accept new PRs until we can enable all E2E tests.

  • For DPS changes: AMQP Provisioning tests hanging #552 appears to be addressed by this PR - please remove all associated [Ignore] statements (search code base for #552).
  • There is a new feature added for IoT Hub Service client that requires a new E2E test.

@CIPop CIPop mentioned this pull request Sep 19, 2018
2 tasks
@CIPop
Copy link
Member

CIPop commented Oct 5, 2018

@anushprabhu @alextolp I will cherry pick some of the fixes as well as do the ConfigureAwait into a separate PR.
Sorry about the wait - we're switching to a new engineering process which took a bit longer than expected.

@CIPop
Copy link
Member

CIPop commented Oct 11, 2018

@anushprabhu @alextolp - I cherry picked some of the items from here. Please rebase and submit a new PR for the IoT enhancement (we cannot accept that w/o a test).

@CIPop CIPop closed this Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge PR is not ready to be merged in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants