Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timed out waiting for device to connect #98

Closed
mitkodev opened this issue Jul 30, 2018 · 21 comments
Closed

Timed out waiting for device to connect #98

mitkodev opened this issue Jul 30, 2018 · 21 comments

Comments

@mitkodev
Copy link

I have an iot edge device that has been running since Thursday, but at some point during the weekend it lost connection to the Iot Hub.
Currently when I try to invoke a direct method on the device, I receive:

Device {"Message":"{\"errorCode\":404103,\"trackingId\":\"6605cb0c73b448b8b9031181e3655112-G:17-TimeStamp:07/30/2018 15:01:49\",\"message\":\"Timed out waiting for device to connect.\",\"info\":{\"timeout\":\"00:00:00\"},\"timestampUtc\":\"2018-07-30T15:01:49.2598589Z\"}","ExceptionMessage":""} not registered

In the azure portal, the device information states that:

Warning: Your device is disconnected from this IoT Hub. The information displayed below is not up to date. You can go to the device twin to see the last update time. If you have not set up your IoT Edge device, please follow IoT Edge QuickStart instructions. If you have already set up your device, please check device connectivity

the edge runtime response is N/A and the last twin updated properties are from Thursday.

The iot edge runtime was started in debug mode, just to check this issue but at the moment it is impossible to get the log from the edgeAgent, because it is endless.

How do I debug this issue? Is there any timeouts or closing connections after a certain amount of time?
If i restart the edge runtime on the device, everything will be fine and it will work again, but I want to get to the cause of this.

Any information will be appreciated.

@varunpuranik
Copy link
Contributor

Are you trying to invoke a method on a module on the device, or on the device itself? Because, if you have IoT Edge runtime running on an edge device, nothing actually connects to IoTHub as the device itself. It is just the modules deployed on the device that connect to IoT Hub.
So if you try to invoke a method on the device, then the request is expected to time out.
On the other hand, if you have a module that accepts method requests and responds to it, you should be able to invoke a method on it. But you will have to do it programmatically, as I believe the portal doesn't support that yet.

But you mention it worked fine initially, so was curious what is receiving those method invocations.

As for the logs, if there are too many, you can use the --tail option to get only the last few logs, like so -
$>iotedge logs --tail 1000

The connection should not close after a period of time. But we have seen issues where modules using MQTT lose connections to EdgeHub after a while. So if you are using MQTT in your modules, can you try switching to AMQP? That should make it more reliable.

@mitkodev
Copy link
Author

I am using the c# sdk to invoke a method on a device module, and everything is working as expected, but after some time it looks like the device module loses its connection to the iot hub and i get the above mentioned exception.

I've just switched to AMQP and will check what's happening.

            var amqpTransportSettings = new AmqpTransportSettings(TransportType.Amqp_Tcp_Only);
            ITransportSettings[] settings = { amqpTransportSettings };
            var ioTHubModuleClient = await ModuleClient.CreateFromEnvironmentAsync(settings);
            await ioTHubModuleClient.OpenAsync();
            logger.Information("IoT Hub module client initialized.");

@varunpuranik
Copy link
Contributor

I see, in that case it might be related to this SDK issue - Azure/azure-iot-sdk-csharp#558

Switching to AMQP should help.

@mitkodev
Copy link
Author

mitkodev commented Aug 2, 2018

Unfortunately it didn't solve the issue,

I've switched to AMQP yesterday and just left the module idle for 24 hours. I've just tried now to invoke a direct method on the device module and got the same exception:

DeviceNotFoundException: Device {"Message":"{\"errorCode\":404103,\"trackingId\":\"446c79c120e948c6ae19daf905b4ee84-G:1-TimeStamp:08/02/2018 07:38:11\",\"message\":\"Timed out waiting for device to connect.\",\"timestampUtc\":\"2018-08-02T07:38:11.6963858Z\"}","ExceptionMessage":""} not registered

I've seen similar issues just opened, so I'm going to open a ticket directly to Azure team through my subscription, because this is not acceptable.

@jason-e-gross
Copy link

jason-e-gross commented Aug 2, 2018 via email

@pavele
Copy link

pavele commented Aug 7, 2018

This is a very serious problem and still no progress on it. How long it will take to have a look at the most critical part of the Iot? This is definitely similar to this one:
Azure/azure-iot-sdk-csharp#558 (comment)

@varunpuranik
Copy link
Contributor

@ddimitrov90 One question - Is your module only receiving methods, or also sending/receiving telemetry?

@jason-e-gross - The AMQP disconnect issue - we haven't seen it so far. Can you describe your scenario a bit? Again, is the module just opening the connection, or is it also doing anything (sending telemetry, etc?). Do you have a consistent repro?

@mitkodev
Copy link
Author

mitkodev commented Aug 9, 2018

@varunpuranik the edge device is constantly talking with the iot hub by sending telemetry and also constantly receiving information from the cloud by method invocations.

here is a graph from the last 2 days. Every minute there is a method invocation and if the invocation is successful, the edge device will report back to the cloud that it is alive. You can see the graph going to 0 in early Wednesday morning - for no reason, we've started receiving DeviceNotFound exception, when trying to invoke the method. The resolution to the problem was to restart the iot edge runtime in the morning, when I got to work.

obustats

@jason-e-gross
Copy link

jason-e-gross commented Aug 9, 2018 via email

@varunpuranik
Copy link
Contributor

@ddimitrov90 - I am aware of the behavior of the Edge runtime and I am also aware of the methods link being dropped issue - we are actively looking into it
The reason I asked was to help with the investigation, since the EdgeAgent only receives methods and twins (sends no telemetry). So wanted to know if your module was doing something similar, or was it losing the methods link in spite of continuously sending telemetry.

@jason-e-gross I see, so basically you are getting the ConnectionStatusChangesHandler callback after 4 mins of inactivity, a status and reason of disabled/client closed. We have seen odd behavior from the ConnectionStatusChangesHandler before (where it would report incorrect status), which is why we don't use it in the Edge runtime code. In any case, I will follow up on this and get back to you.

@jason-e-gross
Copy link

jason-e-gross commented Aug 9, 2018

Here's what I see:

08/09/2018 17:41:56 | INIT: Creating ModuleClient from EnvironmentAsync
08/09/2018 17:41:57 | INIT: OpenAsync Started
{"Id":1,"Exception":null,"Status":5,"IsCanceled":false,"IsCompleted":true,"IsCompletedSuccessfully":true,"CreationOptions":0,"AsyncState":null,"IsFaulted":false}
08/09/2018 17:42:07 | INIT: OpenAsync Completed
08/09/2018 17:42:11 | INIT: DirectMethod handler enabled.
08/09/2018 17:42:11 | INIT: Desired Propertes handler enabled.
08/09/2018 17:42:11 | INIT: EventMessage Input Handler created on all incoming (catch-all)
08/09/2018 17:42:12 | INIT: Reported Property Set (Status:Ready)
08/09/2018 17:42:12 | -------------------------------------------------------------------------
08/09/2018 17:42:12 | MAIN: **BOOTUP COMPLETE** -- READY AND WAITING
08/09/2018 17:42:12 | --------------------------------------------------------------------------
08/09/2018 17:45:57 | WARNING: DETECTED CONNECTION CHANGE [status:Disabled, reason:Client_Close]
08/09/2018 17:45:57 | ATTEMPT TO RECONNECT SUCCEEDED - BACK IN BUSINESS

The two to notice are:
08/09/2018 17:42:07 | INIT: OpenAsync Completed
and
08/09/2018 17:45:57 | WARNING: DETECTED CONNECTION CHANGE [status:Disabled, reason:Client_Close]

The thing is -- if I don't reconnect, it's right and proper disconnected - it doesn't receive any events or direct-methods. With every module, it's right on the 4 minute money mark (or near enough - like above, 3:57). With each time - it's always Disabled/Client-Close. These modules spin up, do a bit of work - add event delegates, then sit quiet waiting for events 99.9% of the time.

There's nothing in docker logs -f edgeAgent/edgeHub at the same time that would show anything regarding the disconnect. Doing the reconnect on catching it -- solves the issue - and it doesn't happen again - so ... eh, no biggy. I'll trade this easy way to handle it vs MQTT's never-connect. =)

@mitkodev
Copy link
Author

mitkodev commented Aug 23, 2018

EDIT: i'm just providing more info, so may be we can find a workaround, i'm not expecting that it is already fixed :)

I've updated the runtime to 1.0.1 and the system modules to 1.0.1 and still got the same problem. I'm using MQTT. You can check the logs that I have.
At 12:18 I've stopped the internet of the device.
At 12:30 I've turned it back on.

The device messages going to the iot hub are working, however the device method invocation is still falling and it won't fix until i restart the runtime :(

DeviceNotFoundException: Device {"Message":"{\"errorCode\":404103,\"trackingId\":\"01ef177c1ee743f69bb764df597fd993-G:19-TimeStamp:08/23/2018 12:46:37\",\"message\":\"Timed out waiting for device to connect.\",\"timestampUtc\":\"2018-08-23T12:46:37.1307506Z\"}","ExceptionMessage":""} not registered

Here is the log, the device module that holds the callback for the device method is in localstorage

2018-08-23 12:18:31.040 +00:00 [INF] - Exiting connected state
2018-08-23 12:18:31.045 +00:00 [INF] - Entering unreachable state
2018-08-23 12:19:31.064 +00:00 [INF] - Entering disconnected state
2018-08-23 12:22:37.001 +00:00 [INF] - Existing token not found for testobu/obucommunication. Getting new token from the client...
2018-08-23 12:22:37.001 +00:00 [WRN] - Closing connection for device: testobu/obucommunication, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/obucommunication., 
2018-08-23 12:22:37.001 +00:00 [INF] - Disposing MessagingServiceClient for device Id testobu/obucommunication because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/obucommunication.
2018-08-23 12:22:37.001 +00:00 [INF] - Closing device proxy for device Id testobu/obucommunication
2018-08-23 12:22:37.001 +00:00 [INF] - Device connection removed for device testobu/obucommunication
2018-08-23 12:22:38.038 +00:00 [INF] - Obtained new token for client testobu/obucommunication that expires in 00:59:59.9613605
2018-08-23 12:22:38.038 +00:00 [INF] - New cloud connection created for device testobu/obucommunication
2018-08-23 12:22:38.038 +00:00 [INF] - Exiting disconnected state
2018-08-23 12:22:38.039 +00:00 [INF] - Entering connected state
2018-08-23 12:22:38.040 +00:00 [INF] - Successfully generated identity for clientId testobu/obucommunication and username ubuntumitko/testobu/obucommunication/?api-version=2018-06-30&DeviceClientType=Microsoft.Azure.Devices.Client%2F1.18.0%20%28.NET%20Core%204.6.26725.06%3B%20Linux%204.15.0-29-generic%20%2331~16.04.1-Ubuntu%20SMP%20Wed%20Jul%2018%2008%3A54%3A04%20UTC%202018%3B%20X64%29
2018-08-23 12:22:38.040 +00:00 [INF] - ClientAuthenticated, testobu/obucommunication, 45b8ee17
2018-08-23 12:22:38.040 +00:00 [INF] - New device connection for device testobu/obucommunication
2018-08-23 12:22:38.040 +00:00 [INF] - Bind device proxy for device testobu/obucommunication
2018-08-23 12:22:38.040 +00:00 [INF] - Binding message channel for device Id testobu/obucommunication
2018-08-23 12:22:38.041 +00:00 [INF] - Set subscriptions from session state for testobu/obucommunication
2018-08-23 12:22:46.201 +00:00 [INF] - Exiting connected state
2018-08-23 12:22:46.201 +00:00 [INF] - Entering unreachable state
2018-08-23 12:23:37.018 +00:00 [INF] - Entering disconnected state
2018-08-23 12:23:37.022 +00:00 [INF] - Updating reported properties for testobu/$edgeHub in cloud failed with error System.TimeoutException Operation timeout expired.
2018-08-23 12:24:37.043 +00:00 [WRN] - Error in connection established callback for client testobu/$edgeHub - System.TimeoutException Operation timeout expired.
2018-08-23 12:25:27.291 +00:00 [INF] - Starting compaction of stores
2018-08-23 12:25:27.293 +00:00 [INF] - Starting compaction of store messages
2018-08-23 12:25:27.328 +00:00 [INF] - Starting compaction of store iothub
2018-08-23 12:25:27.340 +00:00 [INF] - Starting compaction of store checkpoints
2018-08-23 12:25:27.369 +00:00 [INF] - Starting compaction of store default
2018-08-23 12:25:27.369 +00:00 [INF] - Starting compaction of store testobu/obucommunication/activate
2018-08-23 12:25:27.369 +00:00 [INF] - Starting compaction of store sessions
2018-08-23 12:25:27.392 +00:00 [INF] - Starting compaction of store testobu/devicesetup/validatorsRegistration
2018-08-23 12:25:27.418 +00:00 [INF] - Starting compaction of store twins
2018-08-23 12:25:37.043 +00:00 [WRN] - Error in connection established callback for client testobu/obucommunication - System.TimeoutException Operation timeout expired.
2018-08-23 12:26:37.051 +00:00 [WRN] - Error in connection established callback for client testobu/devicesetup - System.TimeoutException Operation timeout expired.
2018-08-23 12:27:37.058 +00:00 [WRN] - Error in connection established callback for client testobu/localstorage - System.TimeoutException Operation timeout expired.
2018-08-23 12:28:37.060 +00:00 [WRN] - Error in connection established callback for client testobu/tracking - System.TimeoutException Operation timeout expired.
2018-08-23 12:29:57.444 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
2018-08-23 12:29:57.457 +00:00 [INF] - Cleaned up 38 messages from queue for endpoint iothub and 38 messages from message store.
2018-08-23 12:30:27.462 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint testobu/obucommunication/activate
2018-08-23 12:30:27.462 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint testobu/obucommunication/activate and 0 messages from message store.
2018-08-23 12:30:55.360 +00:00 [INF] - Exiting disconnected state
2018-08-23 12:30:55.360 +00:00 [INF] - Entering connected state
2018-08-23 12:30:55.694 +00:00 [INF] - Synced cloud's reported properties at version 0 with edge for testobu/$edgeHub
2018-08-23 12:30:57.465 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint testobu/devicesetup/validatorsRegistration
2018-08-23 12:30:57.466 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint testobu/devicesetup/validatorsRegistration and 0 messages from message store.
2018-08-23 12:34:54.000 +00:00 [INF] - Existing token not found for testobu/devicesetup. Getting new token from the client...
2018-08-23 12:34:54.000 +00:00 [WRN] - Closing connection for device: testobu/devicesetup, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/devicesetup., 
2018-08-23 12:34:54.000 +00:00 [INF] - Disposing MessagingServiceClient for device Id testobu/devicesetup because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/devicesetup.
2018-08-23 12:34:54.000 +00:00 [INF] - Closing device proxy for device Id testobu/devicesetup
2018-08-23 12:34:54.000 +00:00 [INF] - Device connection removed for device testobu/devicesetup
2018-08-23 12:34:54.076 +00:00 [INF] - Obtained new token for client testobu/devicesetup that expires in 00:59:59.9236266
2018-08-23 12:34:54.076 +00:00 [INF] - New cloud connection created for device testobu/devicesetup
2018-08-23 12:34:54.076 +00:00 [INF] - Successfully generated identity for clientId testobu/devicesetup and username ubuntumitko/testobu/devicesetup/?api-version=2018-06-30&DeviceClientType=Microsoft.Azure.Devices.Client%2F1.18.0%20%28.NET%20Core%204.6.26725.06%3B%20Linux%204.15.0-29-generic%20%2331~16.04.1-Ubuntu%20SMP%20Wed%20Jul%2018%2008%3A54%3A04%20UTC%202018%3B%20X64%29
2018-08-23 12:34:54.076 +00:00 [INF] - ClientAuthenticated, testobu/devicesetup, 3f6b23ed
2018-08-23 12:34:54.077 +00:00 [INF] - New device connection for device testobu/devicesetup
2018-08-23 12:34:54.078 +00:00 [INF] - Bind device proxy for device testobu/devicesetup
2018-08-23 12:34:54.078 +00:00 [INF] - Binding message channel for device Id testobu/devicesetup
2018-08-23 12:34:54.081 +00:00 [INF] - Set subscriptions from session state for testobu/devicesetup
2018-08-23 12:34:58.002 +00:00 [INF] - Existing token not found for testobu/localstorage. Getting new token from the client...
2018-08-23 12:34:58.002 +00:00 [WRN] - Closing connection for device: testobu/localstorage, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/localstorage., 
2018-08-23 12:34:58.002 +00:00 [INF] - Disposing MessagingServiceClient for device Id testobu/localstorage because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device testobu/localstorage.
2018-08-23 12:34:58.004 +00:00 [INF] - Closing device proxy for device Id testobu/localstorage
2018-08-23 12:34:58.004 +00:00 [INF] - Device connection removed for device testobu/localstorage
2018-08-23 12:34:59.022 +00:00 [INF] - Obtained new token for client testobu/localstorage that expires in 00:59:59.9771352
2018-08-23 12:34:59.022 +00:00 [INF] - New cloud connection created for device testobu/localstorage
2018-08-23 12:34:59.023 +00:00 [INF] - Successfully generated identity for clientId testobu/localstorage and username ubuntumitko/testobu/localstorage/?api-version=2018-06-30&DeviceClientType=Microsoft.Azure.Devices.Client%2F1.18.0%20%28.NET%20Core%204.6.26725.06%3B%20Linux%204.15.0-29-generic%20%2331~16.04.1-Ubuntu%20SMP%20Wed%20Jul%2018%2008%3A54%3A04%20UTC%202018%3B%20X64%29
2018-08-23 12:34:59.023 +00:00 [INF] - ClientAuthenticated, testobu/localstorage, 1edc3653
2018-08-23 12:34:59.023 +00:00 [INF] - New device connection for device testobu/localstorage
2018-08-23 12:34:59.024 +00:00 [INF] - Bind device proxy for device testobu/localstorage
2018-08-23 12:34:59.024 +00:00 [INF] - Binding message channel for device Id testobu/localstorage
2018-08-23 12:34:59.025 +00:00 [INF] - Set subscriptions from session state for testobu/localstorage

Is it possible to attach a handler for such disconnect events and register again the method callback?

is this ever called - turn off/on the network and nothing was logged in the console.

ioTHubModuleClient.SetConnectionStatusChangesHandler((status, reason) =>
            {
                Console.WriteLine(status);
                Console.WriteLine(reason);
            });

@varunpuranik
Copy link
Contributor

@ddimitrov90 - Thanks for the info. The logic to handle registering subscriptions when connectivity is restored already exists, so if it is not working for you, then it might be a bug. I will investigate.

Looking at your logs, what is the device/module that is trying to receive method callbacks? Was it obucommunication/devicesetup/localstorage? Was it connected and able to get method callbacks before the device was disconnected?

As for the callback in your module not getting called - since your module is connected to EdgeHub, it never really "sees" that the network has gone down (the EdgeHub shields it from this).

@mitkodev
Copy link
Author

@varunpuranik before I disconnect the network, everything is working properly. All messages from the device are sent to the Iot Hub and the device method invocations are successful. Both obucommunication and localstorage modules have registered for method callbacks.

After I restore the network, the messages from the device to the Iot Hub are working properly again, however the device method invocations are failing with the above mentioned exception.

@varunpuranik
Copy link
Contributor

varunpuranik commented Aug 23, 2018

The reason I ask is because, from the logs, it seems like the modules localstorage/devicesetup connected only after the network was restored, so method callbacks on those two at least, should have nothing to do with the previous network going down event.

@toolboc
Copy link
Member

toolboc commented Aug 24, 2018

I am having a similar issue, using the iot-edge node sdk with [email protected] with Mqtt transport.

All attempts to call client.invokeDeviceMethod return a 404 with the following response body:

"{"Message":"{\"errorCode\":404103,\"trackingId\":\"2ce04a2d881c4010aaf5f6cfacd2ee95-G:0- 
TimeStamp:08/24/2018 17:49:33\",\"message\":\"Timed out waiting for device to connect.\",\"info\": 
{\"timeout\":\"00:00:00\"},\"timestampUtc\":\"2018-08- 
24T17:49:33.201157Z\"}","ExceptionMessage":""}"

In addition, the response headers include:

 err.response.headers["iothub-errorcode"] ="DeviceNotOnline"

The same response is also given when attempting to invoke a direct method from the Azure Portal:

image

This is very odd because the device reports as online in the portal and the associated edge-runtime and modules are running on the device without issue.

image

If I had to guess, this seems like it could be an issue with the IoT Hub side improperly obtaining the actual device status.

Here is a snippet of the code in question (Assuming a valid connectionString, methodName,deviceId, and moduleId there is really no reason why this shouldn't work):


var Client = require('azure-iothub').Client;

var connectionString = '<ConnectionString>';
var methodName = '<MethodName>';
var deviceId = '<DeviceId>';
var moduleId = '<ModuleId>';

const write = {
    peripheral: {
      name: "alarm",
      type: "Led",
    },
    state: {
      method: "on"
    }
  };

var client = Client.fromConnectionString(connectionString);

var methodParams = {
    methodName: methodName,
    payload: write,
    timeoutInSeconds: 30
};

client.invokeDeviceMethod(deviceId, moduleId, methodParams, function (err, result) {
    if (err) {
        console.error('Failed to invoke method \'' + methodName + '\': ' + err);
    } else {
        console.log(methodName + ' on ' + deviceId + ':');
        console.log(JSON.stringify(result, null, 2));
    }
});

@myagley
Copy link
Contributor

myagley commented Sep 11, 2018

For tracking purposes, we have a fix for this issue currently running in a long haul test. The fix is in the C# SDK: Azure/azure-iot-sdk-csharp#611

@myagley
Copy link
Contributor

myagley commented Nov 5, 2018

The combination of the 1.0.4 release of the Edge Hub and the new extended offline features of IoT Edge should resolve this issue. Please note that extended offline is currently not supported in East US or West Europe while it is in preview.

@ancaantochi
Copy link
Contributor

Closing, please reopen if the issue can still be reproduced

@pospospos2007
Copy link

I got same issue when I trying to invike the direct method .

Response:
{'Message': '{"errorCode":404103,"trackingId":"0d5a62f7f00047f9b1841ce486963a57-TimeStamp:09/07/2019 22:38:41","message":"Timed out waiting for device to connect.","info":{"timeout":"00:01:00"},"timestampUtc":"2019-09-07T22:38:41.872842Z"}', 'ExceptionMessage': ''}

I was following this document
https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-direct-methods

I generated the sas-token and run curl command.But nothing works!

our company is a client of Microsoft. The Microsoft guy told me they are investgating.
So depressed!

@alextran1502
Copy link

I am also having this issue invoking direct method in the device level. If I restart the device, I can get it to work for a few hours and after that it would show 'disconnect' state and I can no longer invoke the direct method. I am using Node SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants