-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edge Hub removes module connection #4678
Comments
Hi @fbarresi, Can you send us the logs of edge daemon from the same time period? edgeHub made a call there which was not answered, liked to see the daemon side. you can get the daemon logs as described at https://docs.microsoft.com/en-us/azure/iot-edge/troubleshoot?view=iotedge-2018-06 if you use support bundle, I will need the content from iotedged.txt |
Hi @vipeller , Thank you for your reply. Hier the Logfile. There is a previous exception, but at the time the problem starts (about 14:00) there is nothing. edge-Agent logs
|
@fbarresi sorry, but those are the logs of edgeAgent. What I need is edge daemon. I know that this is confusing with those many components. executing the command: sudo iotedge support-bundle it will create a compressed file. opening that file, there will be a file with name iotedged.txt. I will need the content of that file in the time period when the problem happened |
Hi @fbarresi, I am reading the latest update from Soumitra, and just to be sure, I repeat what the problem is: Some context: the modules uses sas authentication, which for during connection they generate a token which works for an hour. Before the hour expires, clients need to provide a new token. When it happens, edgeHub stores the new token. For storing it calls edge daemon to encrypt the data. From the logs it seems that time to time edgeHub cannot do the encryption using edger daemon, because that calls fail with timeout. When this happens, edgeHub disconnects that client (as it does not have a valid token anymore) The expected behavior in this case that clients detect that they got disconnected, they connect back and with the new connection they provide a new authentication token. Now edgeHub either can or cannot encrypt it (considering the timeout-error mentioned above). If it fails again, the just described disconnection/connection loop start again, otherwise everything is supposed to start working. So far I focused on why the timeout exception happened (that is why I wanted the daemon logs here and from Soumitra), but according to the latest feedback, you are less worried about the timeout error (because other modules seems to recover after that), and the question is that why the apiconnector module cannot do the same. From the log piece you provided above, there is a repeating "Module GP731/apiconnector is not connected" which happens when edgeHub wants to route a message to a module but that is not connected. If you are using an SDK like the C# sdk or C sdk in your modules, that is supposed to automatically connect back when they get disconnected. My first question would be that what do you use in apiconnector? (I could not see it from its log) |
Dear @vipeller, Thank you for your reply. Thank you also for have information exchange with Soumitra. As I told Soumitra the edge deamon logs were empty for this time span. So we cannot get more help there. Of course I'm interested in a solution for the timeout errors, but these errors are a transient state where the module communication doesn't stop to work. And it is right: at the moment I'm more worried about keeping the system resilient and stable with an auto-reconnect or a module restart because the data-recovery procedure in such a case is very expensive have to be done as quick as possible before other systems in the production will miss those data. We use the C# SDK and it is the same code basis for all our modules, also for the apiconnector. |
also having the same issue in iotedge 1.0.10.2. Is affecting production deployment. Is there a solution coming for this? |
Here are the edge daemon logs for my case... The edgehub logs file is called "EdgeHub-logs-ModuleConnectionLoss.txt" and you can reference/coordinate when exception occurs with the daemon logs.. |
forgot to attach the edgehub log file sorry.. |
Hi @jsucco-growlink , thank you for join to this issue. We had a meeting with @vipeller regarding this issue. He wanted to setup a simulation in his own in order to reproduce and better understand the error. |
@jsucco-growlink @fbarresi we were able to repro the issue at our environment and we have a better understanding where the problem can be - we could not find the exact problem yet, though. We are working on it. |
FYI: for a better analysis you can activate debug messages for edgeAgent and edgeHub |
This issue is being marked as stale because it has been open for 30 days with no activity. |
A fix is going to be released with 1.1.3 |
Preamble
I have many Edge devices working on production environment.
Every device has many modules that send data to the upstream and one module (named
apiconnector
) that sends the upstream to another API.The Problem
Since an upgrade to the version 1.0.10, I get sporadically errors regarding the apiconnector module.
The module is running but the input stream get cut from the edgeHub.
I saw in the logfile from edgeHub that the problem may be related to an error while updating the identity token.
Otherwise there is no retry procedure for such an error even if the edgeHub writes a warning every minutes that a module is disconnected.
Expected Behavior
Running Edge modules receive the input stream in a reliable way. A transient update error for identity tokens doesn't effect the behavior of the modules.
After an error occurs the edgeHub should try to restore the an healthy state or report the error (for example to the edgeAgent)
Current Behavior
An Edge module get disconnected from its input stream, but keep be running.
Steps to Reproduce
Unable to reproduce... 😢
Context (Environment)
(sorry, I cannot fill in all these Information, but I will give an update)
Device Information
Runtime Versions
iotedge version
]:docker version
]: ?Note: when using Windows containers on Windows, run
docker -H npipe:////./pipe/iotedge_moby_engine version
insteadLogs
edge-hub logs
Additional Information
I never had such a situation with version 1.0.8.4 .
The text was updated successfully, but these errors were encountered: