Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceClient SetMethodDefaultHandlerAsync not invoked #1409

Closed
raza707mirza opened this issue Jun 18, 2020 · 15 comments
Closed

DeviceClient SetMethodDefaultHandlerAsync not invoked #1409

raza707mirza opened this issue Jun 18, 2020 · 15 comments
Assignees
Labels
bug Something isn't working. fix-checked-in Fix checked into main or preview, but not yet released.

Comments

@raza707mirza
Copy link

Describe the bug

DeviceClient SetMethodDefaultHandlerAsync handler is not triggered on internet disconnection instantly. It triggers after 15 to 20 minutes. Below are the logs

IoT Hub connection status Changed Status: Connected Reason: Connection_Ok Time: 3:09:15 PM +02
IoT Hub connection status Changed Status: Disconnected_Retrying Reason: Communication_Error Time: 3:26:29 PM +02

I disconnected the internet at 3:10:00 PM +02 and communication error was thrown after 16 minutes. I have created a sample code which reproduces the issue

Steps to reproduce

using Microsoft.Azure.Amqp.Transport;
using Microsoft.Azure.Devices.Client;
using Microsoft.Azure.Devices.Client.Transport.Mqtt;
using Microsoft.Azure.Devices.Shared;
using System;
using System.Threading;
using System.Threading.Tasks;

namespace IOTClientTest
{
    class Program
    {
        private DeviceClient _client;
        static async Task Main(string[] args)
        {
            
            await new Program().RunAsync(collection =>
            {
                return null;
            }, (new CancellationTokenSource()).Token);
            Console.Read();
        }
        
        public async Task RunAsync(Func<TwinCollection, Task<TwinCollection>> twinUpdateHandler, CancellationToken cancellationToken)
        {
            var settings = new ITransportSettings[]
            {
                new MqttTransportSettings(TransportType.Mqtt_Tcp_Only)
                {
                    KeepAliveInSeconds = 10,
                }
            };

            _client = DeviceClient.CreateFromConnectionString(
                "HostName=XXXX;SharedAccessKey=XXXXX",
                "XXXX", settings);
            
            
            var retryPolicy = new ExponentialBackoff(5,
                TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(120),
                TimeSpan.FromSeconds(5));
            _client.SetRetryPolicy(retryPolicy);

            
            _client.SetConnectionStatusChangesHandler((status, reason) =>
            {
                Console.WriteLine($"IoT Hub connection status Changed Status: {status} Reason: {reason} Time: {DateTime.Now.ToString("h:mm:ss tt zz")}");
            });
            
             await _client.OpenAsync();
            

            await _client.SetMethodHandlerAsync("executeShell", async (req, context) =>
            {
                await Task.Delay(0);
                return new MethodResponse(500);
            },null);

            await _client.SetMethodDefaultHandlerAsync(MethodHandler, null);
            
            await _client.SetDesiredPropertyUpdateCallbackAsync(async (collection, context) =>
                {
                    var updated = await twinUpdateHandler(collection);
                    await _client.UpdateReportedPropertiesAsync(updated);
                }
                , null);

            await ReceiveMessagesAsync(cancellationToken);
        }

        private async Task<MethodResponse> MethodHandler(MethodRequest methodRequest, object parameter)
        {
            await Task.Delay(0);
            return new MethodResponse(500);
        }

        private async Task ReceiveMessagesAsync(CancellationToken cancellationToken)
        {
            while (!cancellationToken.IsCancellationRequested)
            {
                var message = await _client.ReceiveAsync(TimeSpan.FromSeconds(10));
                if (message != null)
                {
                    //Do something with received message...
                }
            }
        }
        
    }
}

Expected behavior

On internet disconnection, device client should change its status in not more than 10 seconds

Actual behavior

Device client throws status change after 16 minutes

Versions used

Add following information:

  • dotnet --info
    .NET Core SDK (reflecting any global.json):
    Version: 3.1.201
    Commit: b1768b4ae7

Runtime Environment:
OS Name: zorin
OS Version: 15
OS Platform: Linux
RID: linux-x64
Base Path: /usr/share/dotnet/sdk/3.1.201/

Host (useful for support):
Version: 3.1.3
Commit: 4a9f85e9f8

.NET Core SDKs installed:
2.2.402 [/usr/share/dotnet/sdk]
3.1.201 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download

Repository is available here: https://github.com/raza707mirza/iottest

@asergaz
Copy link
Contributor

asergaz commented Jun 23, 2020

@timtay-microsoft
Copy link
Member

Just to clarify this issue a bit, do you mean that the ConnectionStatusChangesHandler isn't calling back in a timely manner once you sever the connection? I'm not sure I follow why you would expect the MethodDefaultHandler to call back once connection to your hub is lost

@raza707mirza
Copy link
Author

The problem is not related to MethodDefaultHandler. The problem is that when my internet gets disconnected, handler associated with SetConnectionStatusChangesHandler gets triggered after 16 minutes. I should receive "disconnect" status change instantly or with in a short span of time.

  1. Device client connects with iot hub
  2. I disconnected internet cable.
  3. SetConnectionStatusChangesHandler gets triggered after 16 minutes with disconnect status

The issue is why 16 minutes? why not couple of seconds?

@asergaz
Copy link
Contributor

asergaz commented Jun 24, 2020

@raza707mirza
Copy link
Author

@timtay-microsoft see also some questions already asked in SO thread: https://stackoverflow.com/questions/62509807/deviceclient-setmethoddefaulthandlerasync-not-invoked-timely?noredirect=1#comment110634312_62509807

I posted this issue on stackoverflow

@timtay-microsoft
Copy link
Member

Do you see similar behavior from AMQP?

@raza707mirza
Copy link
Author

Yes I face same issue with AMQP

@aDisplayName
Copy link

aDisplayName commented Jul 6, 2020

We are facing the same problem when using Amqp TCP transportation. But with difference behavior between under Linux Host OS and Windows Host OS.

To investigate the problem shown in our production code, we created a test program, and let it run natively under both Windows and Ubuntu. The behavior matches what we observed in our production code.

The code is attached at the bottom, and is a modification from sample.
Dotnet Core SDK 3.1.301

Running under Windows 10 2004

Network (Wi-Fi or Ethernet with actual cable) is disabled after 07/06/2020 10:11:06, and recovered around 07/06/2020 10:11:37. Connection status callback has been correctly called at 07/06/2020 10:11:26, and 07/06/2020 10:11:39

Observation

The log shows the IoT API call behaves as expected when OS outgoing network is disconnected and reconnectd.

logs

info: iot_hub_console.Program[0]
      07/06/2020 10:11:02 Connection Changed to Connected, reason: Connection_Ok
info: iot_hub_console.Program[0]
      07/06/2020 10:11:02 > Sending message: {"messageId":0,"temperature":34.581885607718434,"humidity":66.23193476639312}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:04 > Sending message: {"messageId":1,"temperature":26.791552278116136,"humidity":70.45665901641206}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:05 > Sending message: {"messageId":2,"temperature":28.257923602712303,"humidity":75.41429636786427}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:06 > Sending message: {"messageId":3,"temperature":27.46112215214461,"humidity":76.32177038878285}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:26 Connection Changed to Disconnected, reason: Retry_Expired
fail: iot_hub_console.Program[0]
      07/06/2020 10:11:26 Sending failed with exception: Amqp resource is disconnected.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:27 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:28 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:29 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:30 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:32 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:33 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:34 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:35 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:36 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:37 IoT hub is not in connected status. Retry in 1 sec.
info: iot_hub_console.Program[0]
      07/06/2020 10:11:39 Connection Changed to Connected, reason: Connection_Ok
info: iot_hub_console.Program[0]
      07/06/2020 10:11:40 > Sending message: {"messageId":15,"temperature":34.620029092589405,"humidity":70.5172166649798}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:41 > Sending message: {"messageId":16,"temperature":23.161959963926094,"humidity":62.38569488860001}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:42 > Sending message: {"messageId":17,"temperature":20.15808975564227,"humidity":62.054706170249126}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:43 > Sending message: {"messageId":18,"temperature":33.06548762045125,"humidity":65.81292976895949}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:45 > Sending message: {"messageId":19,"temperature":20.34460033073304,"humidity":64.15884891718572}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:46 > Sending message: {"messageId":20,"temperature":34.995594464706066,"humidity":79.53893590697038}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:47 > Sending message: {"messageId":21,"temperature":26.345460105848247,"humidity":60.53315801570805}
info: iot_hub_console.Program[0]
      07/06/2020 10:11:48 > Sending message: {"messageId":22,"temperature":21.02706348571324,"humidity":76.4740033338191}

Running under Ubuntu 18.04

We simulated the network outage by enable/disable network interface. The same behavior has been observed in earlier by unplugging / plugging the network cable.

Observation:

  • The connection status was not detected and the status change call back was not called when network is disabled.
  • Each SendEventAsync call timed-out with exception thrown.
  • Each SendEventAsync call still timed-out with exception thrown, even after the network is re-enabled. (see log after 10:00:42)
  • The status change call back will be called around the 16~18 mins mark even the connection has already recovered. Later call to SendEventAsyc becomes normal.

Test

The following commands are executed to disable and enable network connection

usr:~$ date &&  sudo ifconfig eth01 down
Mon Jul  6 09:58:38 CST 2020
usr:~$ date && sudo ifconfig eth01 up
Mon Jul  6 10:00:42 CST 2020

Here is the log output

info: iot_hub_console.Program[0]
      07/06/2020 09:57:23 Connection Changed to Connected, reason: Connection_Ok
=======(... log removed .....)
info: iot_hub_console.Program[0]
      07/06/2020 09:58:22 > Sending message: {"messageId":48,"temperature":20.48416517930299,"humidity":65.84768619660646}
info: iot_hub_console.Program[0]
      07/06/2020 09:58:23 > Sending message: {"messageId":49,"temperature":23.276906108146957,"humidity":78.22200227446015}
info: iot_hub_console.Program[0]
      07/06/2020 09:58:24 > Sending message: {"messageId":50,"temperature":21.664010238211606,"humidity":77.38302766223579}
info: iot_hub_console.Program[0]
      07/06/2020 09:58:25 > Sending message: {"messageId":51,"temperature":22.193653319121175,"humidity":73.95748929770546}
=======(# network is disabled at 09:58:38)
fail: iot_hub_console.Program[0]
      07/06/2020 09:59:26 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:00:27 Sending failed with exception: The operation timed out.
=======(# network is re-enabled at 10:00:42)
fail: iot_hub_console.Program[0]
      07/06/2020 10:01:28 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:02:29 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:03:30 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:04:31 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:05:32 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:06:33 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:07:34 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:08:35 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:09:36 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:10:37 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:11:39 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:12:40 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:13:41 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:14:42 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:15:43 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:16:44 Sending failed with exception: The operation timed out.
fail: iot_hub_console.Program[0]
      07/06/2020 10:16:46 Sending failed with exception: Amqp resource is disconnected.
info: iot_hub_console.Program[0]
      07/06/2020 10:16:46 Connection Changed to Disconnected, reason: Retry_Expired
info: iot_hub_console.Program[0]
      07/06/2020 10:16:49 Connection Changed to Connected, reason: Connection_Ok
info: iot_hub_console.Program[0]
      07/06/2020 10:16:49 > Sending message: {"messageId":71,"temperature":26.16616177426938,"humidity":69.26580996684069}
info: iot_hub_console.Program[0]
      07/06/2020 10:16:50 > Sending message: {"messageId":72,"temperature":23.35750292677316,"humidity":77.706788274323}
info: iot_hub_console.Program[0]
      07/06/2020 10:16:51 > Sending message: {"messageId":73,"temperature":20.53248327483073,"humidity":74.95956457916627}
info: iot_hub_console.Program[0]
      07/06/2020 10:16:52 > Sending message: {"messageId":74,"temperature":33.129898879271884,"humidity":77.1627867394885}
info: iot_hub_console.Program[0]
      07/06/2020 10:16:53 > Sending message: {"messageId":75,"temperature":21.88209333311864,"humidity":65.81818129206923}


Testing code

The code is attached at the bottom, and is a modification from sample.

using System;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.Devices.Client; // 1.26.0
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;

namespace iot_hub_console
{
    class Program
    {
    // Comandline: --DeviceConnectionString "<Actual Connection String"
        static async Task Main(string[] args)
        {
            var accessLock = new object();
            var iotStatus = ConnectionStatus.Disconnected;
            var webConfig = new ConfigurationBuilder()
                .AddEnvironmentVariables()
                .AddCommandLine(args)
                .Build();
            using var loggerFactory = LoggerFactory.Create(builder => { builder.AddConsole(); });
            var logger = loggerFactory.CreateLogger<Program>();
            var messageId = 0;

            var amqpTransportSettings = new AmqpTransportSettings(TransportType.Amqp_Tcp_Only)
            {
                OpenTimeout = TimeSpan.FromSeconds(60), 
                OperationTimeout =
                    TimeSpan.FromSeconds(60), 
                AmqpConnectionPoolSettings = new AmqpConnectionPoolSettings
                {
                    MaxPoolSize = 3,
                    Pooling = true
                }
            };
            var deviceClient = DeviceClient.CreateFromConnectionString(webConfig["DeviceConnectionString"],
                new ITransportSettings[] {amqpTransportSettings});

            void StatusChangedHandler(
                ConnectionStatus status,
                ConnectionStatusChangeReason reason)
            {
                lock (accessLock)
                {
                    iotStatus = status;
                }
                logger.LogInformation("{0} Connection Changed to {1}, reason: {2}"
                    , DateTime.Now
                    , status
                    , reason);
            }

            deviceClient.SetConnectionStatusChangesHandler(StatusChangedHandler);
            deviceClient.SetRetryPolicy(new NoRetry());
            var rand = new Random();
            try
            {
                while (true)
                {
                    ConnectionStatus currentStatus;
                    lock (accessLock)
                    {
                        currentStatus = iotStatus;
                    }

                    try
                    {
                        var currentTemperature = 20 + rand.NextDouble() * 15;
                        var currentHumidity = 60 + rand.NextDouble() * 20;

                        var telemetryDataPoint = new
                        {
                            messageId = messageId++,
                            temperature = currentTemperature,
                            humidity = currentHumidity
                        };
                        var messageString = JsonConvert.SerializeObject(telemetryDataPoint);
                        var message = new Message(Encoding.ASCII.GetBytes(messageString));
                        message.Properties.Add("temperatureAlert", (currentTemperature > 30) ? "true" : "false");

                        await deviceClient.SendEventAsync(message);
                        logger.LogInformation("{0} > Sending message: {1}", DateTime.Now, messageString);
                    }
                    catch (Exception e)
                    {
                        if (currentStatus != ConnectionStatus.Connected)
                        {
                            logger.LogInformation("{0} IoT hub is not in connected status. Retry in 1 sec.",
                                DateTime.Now);
                        }
                        else
                        {
                            logger.LogError("{0} Sending failed with exception: {1}", DateTime.Now, e.Message);
                        }
                    }

                    await Task.Delay(1000);
                }

            }
            catch (Exception e)
            {
                logger.LogError(e, "{0} Unhandled Exception", DateTime.Now);
            }
        }
    }
}

@aDisplayName
Copy link

tag @xinchen10, since it might be related to azure-amqp

@abhipsaMisra abhipsaMisra self-assigned this Jul 17, 2020
@abhipsaMisra
Copy link
Member

To update a couple of things: the SDK is relying on the OS TCP stack to inform that the disconnect has happened, and the OS can take a couple of retries before relaying this information. This might be what is causing the connection status change handler to get invoked with a 15min delay on Linux.

For Mqtt, the client does send a ping request every 75 seconds, but does not seem to be monitoring the ping response being received from the broker. I am analyzing if we should be adding this additional check for monitoring the connection status change.

@abhipsaMisra
Copy link
Member

I've added a fix for the mqtt layer, where the sdk now monitors for a ping response and disconnects if the delay between sending a ping request and receiving a response is >30secs (this value is currently not configurable).
@raza707mirza Could you give it a try, and see if this resolves your issue.

For the amqp implementation, the amqp library encapsulates this ping request-response logic from us; all that the device sdk does is set the IdleTimeout. If it doesn't work with that setting, @xinchen10 might be able to provide some pointers on that.

@raza707mirza
Copy link
Author

@abhipsaMisra, I tested it with mqtt and it is working as expected and it resolves my issue.

@abhipsaMisra
Copy link
Member

This fix has now been released: https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/lts_2020-8-19

@az-iot-builder-01
Copy link
Contributor

@timtay-microsoft, @raza707mirza, @aDisplayName, @abhipsaMisra, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

@asergaz
Copy link
Contributor

asergaz commented Aug 31, 2020

@raza707mirza please mark the answer on Stack Overflow as well. Thank you so much! https://stackoverflow.com/questions/62509807/deviceclient-setmethoddefaulthandlerasync-not-invoked-timely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. fix-checked-in Fix checked into main or preview, but not yet released.
Projects
None yet
Development

No branches or pull requests

6 participants