-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timed out waiting for device to connect #98
Comments
Are you trying to invoke a method on a module on the device, or on the device itself? Because, if you have IoT Edge runtime running on an edge device, nothing actually connects to IoTHub as the device itself. It is just the modules deployed on the device that connect to IoT Hub. But you mention it worked fine initially, so was curious what is receiving those method invocations. As for the logs, if there are too many, you can use the --tail option to get only the last few logs, like so - The connection should not close after a period of time. But we have seen issues where modules using MQTT lose connections to EdgeHub after a while. So if you are using MQTT in your modules, can you try switching to AMQP? That should make it more reliable. |
I am using the c# sdk to invoke a method on a device module, and everything is working as expected, but after some time it looks like the device module loses its connection to the iot hub and i get the above mentioned exception. I've just switched to AMQP and will check what's happening. var amqpTransportSettings = new AmqpTransportSettings(TransportType.Amqp_Tcp_Only);
ITransportSettings[] settings = { amqpTransportSettings };
var ioTHubModuleClient = await ModuleClient.CreateFromEnvironmentAsync(settings);
await ioTHubModuleClient.OpenAsync();
logger.Information("IoT Hub module client initialized."); |
I see, in that case it might be related to this SDK issue - Azure/azure-iot-sdk-csharp#558 Switching to AMQP should help. |
Unfortunately it didn't solve the issue, I've switched to AMQP yesterday and just left the module idle for 24 hours. I've just tried now to invoke a direct method on the device module and got the same exception:
I've seen similar issues just opened, so I'm going to open a ticket directly to Azure team through my subscription, because this is not acceptable. |
AMQP will also disconnect at almost exactly 4 minutes after opening the connection. With a status and reason of disabled/client closed. Using a connection state handler you can capture and re open it, and it doesn't do it again.
…________________________________
From: Dimitar Dimitrov <[email protected]>
Sent: Thursday, August 2, 2018 2:40 AM
To: Azure/iotedge
Cc: Subscribed
Subject: Re: [Azure/iotedge] Timed out waiting for device to connect (#98)
Unfortunately it didn't solve the issue,
I've switched to AMQP yesterday and just left the module idle for 24 hours. I've just tried now to invoke a direct method on the device module and got the same exception:
DeviceNotFoundException: Device {"Message":"{\"errorCode\":404103,\"trackingId\":\"446c79c120e948c6ae19daf905b4ee84-G:1-TimeStamp:08/02/2018 07:38:11\",\"message\":\"Timed out waiting for device to connect.\",\"timestampUtc\":\"2018-08-02T07:38:11.6963858Z\"}","ExceptionMessage":""} not registered
I've seen similar issues just opened, so I'm going to open a ticket directly to Azure team through my subscription, because this is not acceptable.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#98 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/APvMfHp4xsxmz27Yy2ZZUXtKK5zQco2mks5uMq0DgaJpZM4VmqB0>.
|
This is a very serious problem and still no progress on it. How long it will take to have a look at the most critical part of the Iot? This is definitely similar to this one: |
@ddimitrov90 One question - Is your module only receiving methods, or also sending/receiving telemetry? @jason-e-gross - The AMQP disconnect issue - we haven't seen it so far. Can you describe your scenario a bit? Again, is the module just opening the connection, or is it also doing anything (sending telemetry, etc?). Do you have a consistent repro? |
@varunpuranik the edge device is constantly talking with the iot hub by sending telemetry and also constantly receiving information from the cloud by method invocations. here is a graph from the last 2 days. Every minute there is a method invocation and if the invocation is successful, the edge device will report back to the cloud that it is alive. You can see the graph going to 0 in early Wednesday morning - for no reason, we've started receiving DeviceNotFound exception, when trying to invoke the method. The resolution to the problem was to restart the iot edge runtime in the morning, when I got to work. |
Yes, I can repro it by all the modules. The modules do work, then sit idle waiting for events from other modules or DMs from the hub. When I get into the office, I'll dump some logs and reply. I use the connectionstate handler to try and catch a few.
…________________________________
From: Varun Puranik <[email protected]>
Sent: Tuesday, August 7, 2018 9:59 AM
To: Azure/iotedge
Cc: jason-e-gross; Mention
Subject: Re: [Azure/iotedge] Timed out waiting for device to connect (#98)
@ddimitrov90<https://github.com/ddimitrov90> One question - Is your module only receiving methods, or also sending/receiving telemetry?
@jason-e-gross<https://github.com/jason-e-gross> - The AMQP disconnect issue - we haven't seen it so far. Can you describe your scenario a bit? Again, is the module just opening the connection, or is it also doing anything (sending telemetry, etc?). Do you have a consistent repro?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#98 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/APvMfA4Jx9X-YpzFP7Sr5iYOCotge4mwks5uOatigaJpZM4VmqB0>.
|
@ddimitrov90 - I am aware of the behavior of the Edge runtime and I am also aware of the methods link being dropped issue - we are actively looking into it @jason-e-gross I see, so basically you are getting the ConnectionStatusChangesHandler callback after 4 mins of inactivity, a status and reason of disabled/client closed. We have seen odd behavior from the ConnectionStatusChangesHandler before (where it would report incorrect status), which is why we don't use it in the Edge runtime code. In any case, I will follow up on this and get back to you. |
Here's what I see:
The two to notice are: The thing is -- if I don't reconnect, it's right and proper disconnected - it doesn't receive any events or direct-methods. With every module, it's right on the 4 minute money mark (or near enough - like above, 3:57). With each time - it's always Disabled/Client-Close. These modules spin up, do a bit of work - add event delegates, then sit quiet waiting for events 99.9% of the time. There's nothing in |
EDIT: i'm just providing more info, so may be we can find a workaround, i'm not expecting that it is already fixed :) I've updated the runtime to 1.0.1 and the system modules to 1.0.1 and still got the same problem. I'm using MQTT. You can check the logs that I have. The device messages going to the iot hub are working, however the device method invocation is still falling and it won't fix until i restart the runtime :(
Here is the log, the device module that holds the callback for the device method is in localstorage
Is it possible to attach a handler for such disconnect events and register again the method callback? is this ever called - turn off/on the network and nothing was logged in the console.
|
@ddimitrov90 - Thanks for the info. The logic to handle registering subscriptions when connectivity is restored already exists, so if it is not working for you, then it might be a bug. I will investigate. Looking at your logs, what is the device/module that is trying to receive method callbacks? Was it obucommunication/devicesetup/localstorage? Was it connected and able to get method callbacks before the device was disconnected? As for the callback in your module not getting called - since your module is connected to EdgeHub, it never really "sees" that the network has gone down (the EdgeHub shields it from this). |
@varunpuranik before I disconnect the network, everything is working properly. All messages from the device are sent to the Iot Hub and the device method invocations are successful. Both obucommunication and localstorage modules have registered for method callbacks. After I restore the network, the messages from the device to the Iot Hub are working properly again, however the device method invocations are failing with the above mentioned exception. |
The reason I ask is because, from the logs, it seems like the modules localstorage/devicesetup connected only after the network was restored, so method callbacks on those two at least, should have nothing to do with the previous network going down event. |
I am having a similar issue, using the iot-edge node sdk with [email protected] with Mqtt transport. All attempts to call
In addition, the response headers include:
The same response is also given when attempting to invoke a direct method from the Azure Portal: This is very odd because the device reports as online in the portal and the associated edge-runtime and modules are running on the device without issue. If I had to guess, this seems like it could be an issue with the IoT Hub side improperly obtaining the actual device status. Here is a snippet of the code in question (Assuming a valid
|
For tracking purposes, we have a fix for this issue currently running in a long haul test. The fix is in the C# SDK: Azure/azure-iot-sdk-csharp#611 |
The combination of the 1.0.4 release of the Edge Hub and the new extended offline features of IoT Edge should resolve this issue. Please note that extended offline is currently not supported in East US or West Europe while it is in preview. |
Closing, please reopen if the issue can still be reproduced |
I got same issue when I trying to invike the direct method . Response: I was following this document I generated the sas-token and run curl command.But nothing works! our company is a client of Microsoft. The Microsoft guy told me they are investgating. |
I am also having this issue invoking direct method in the device level. If I restart the device, I can get it to work for a few hours and after that it would show 'disconnect' state and I can no longer invoke the direct method. I am using Node SDK. |
I have an iot edge device that has been running since Thursday, but at some point during the weekend it lost connection to the Iot Hub.
Currently when I try to invoke a direct method on the device, I receive:
In the azure portal, the device information states that:
the edge runtime response is N/A and the last twin updated properties are from Thursday.
The iot edge runtime was started in debug mode, just to check this issue but at the moment it is impossible to get the log from the edgeAgent, because it is endless.
How do I debug this issue? Is there any timeouts or closing connections after a certain amount of time?
If i restart the edge runtime on the device, everything will be fine and it will work again, but I want to get to the cause of this.
Any information will be appreciated.
The text was updated successfully, but these errors were encountered: