-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution') #473
Comments
@MiguelCHR - can you tell us a little more about your environment. Are you running inside a docker container? If so, what is the base image? If not, what OS are you using? Are you trying to connect to IotHub or IotEdge? If you have a connection string, what happens when you try to ping or tracert the HostName from your connection string? |
Hi: If so, what is the base image? Are you trying to connect to IotHub or IotEdge? If you have a connection string, what happens when you try to ping or tracert the HostName from your connection string? Thanks in advance |
Can you be more specific on your base image? I see images in there built on alpine, debian, fedora, and ubuntu. I'm asking because I've seen intermittent failures from the dns resolver that alpine uses and I've never been impressed by it. More specifically, there are issues with the nslookup from busybox 1.28 which is used in Alpine 3.9. moby/libnetwork#2371. I can't say for sure that this is your issue, but it could be. |
We currently using Debian, thanks |
Hi any update in this error? we are currently having a lot of devices on the field with this issue, thanks in advance |
Our library should only be considering this a fatal error if this is the first connection to IoTHub for the current run of the executable. it assumes that a failed connection is a configuration issue until the it can connect once. Once the library connects successfully, it assumes that the configuration is valid and that the failure is transient. If the error is fatal (connecting for the first time), it fails immediately. If the error is transient (because it has previously connected), it retries until a connection can be established. We are working to improve this behavior. Just out of curiosity, is this causing an API to raise an exception, or are you observing this by some other means (e.g. API timeout of observation of logging output)? |
We fix this error by restarting the device, this error never happen at the first connection it happens when the device if online and reporting for some time, at least this is for our case. Yes the exception is raised by an API Thanks! |
This may be fixed in 2.1.1, but I'm going to review this specific issue a little further before I pronounce it fixed I've made some fixes but there are more extreme fixes I could make. Please let me know if this is resolved. |
Thanks Bert i will be updating all our devices to 2.1.1, I will let you guys know if anything else comes up |
Hi, we still getting this error on 2.1.1
It seems it is the same that was reported before, the devices are offline in the front end and online in the back end, Still need to update to 2.1.2, it may fix this? Thanks |
Also, not sure if this is the place to ask but we haven't found any documentation regarding the removal of logs, so how can we remove all the logs from the azure.iot.device library? Thanks in advance |
@BertKleewein my customer has the same issue: Repeating the same test scenario (changing from network O2 to Telekom) May 27 13:16:13: ERROR:transport.connect raised error Do you have a fix for this issue? |
@MiguelCHR - you can remove almost all of the logging by calling |
@MiguelCHR - azure-iot-device 2.1.3 has been released to pypi. 2.1.1, 2.1.2, and 2.1.3 all contain various stability and reliability fixes. Any of them could fix problems that have this symptom. Can you please try to reproduce this with the latest version. If you can make it happen, what is the device client API that failed? |
Thanks for the heads up Bert, I will keep an eye on the devices at 2.1.3 version to see if we can catch any errors, I will let you know ASAP. |
Hi @BertKleewein , I'm facing these errors now, I made some changes to my python script in the modules, and its giving
What is causing these connection error? I have made all the changes to my python script while I was inside the container and I made sure it was working fine before I deployed to my device. I'm usign azure-iot-device=2.1.1 and I also tried 2.1.4 but still eror. |
@nishad1092, can you tell me what client API is failing please? |
Hi @BertKleewein , |
Everytime i deploy it through VScode to my edge device, it fails and gives out this issue, This never happened before actually, its happening after I updated my python script, But i havent changed or added any new libraries |
But what is "it"? What is failing? I'm asking because having errors show up via logger.error() is not the same as APIs failing. Our code currently logs error messages even when we handle the error and I'm trying to discriminate between "an error was logged but was successfully handled" and "an error caused a client API to fail". |
Sure Sir, I really couldnt figure out what went wrong, Please tell me what you want to know? I have two modules, One which gets message from Cloud invoke, and another one gets the message as input from first module, Basically a intermodule commuinication. So far It has worked fine, And whenever I want to make changes, I go into the container make changes and then I commit it, Recently I have made changes but I have not made any huge changes to anyh library or anythign, very small changes to my python script. My python script has a MQTT and a topic too. Just now I ran the working version of my module and went inside the container and made all changes I need and it is working fine inside container as a edge module, but when I pushed those changes through my deployment manifest file, it is failing continuous. |
Both modules gives Connected with result code 0 The above exception was the direct cause of the following exception: Traceback (most recent call last): |
@BertKleewein basically When I see edgeHub, I have three modules, each module being a client, these two clients are not gettign connected. |
Please tell me if I understand this correctly: You're calling IoTHubModuleClient.connect() and IoTHubModuleClient.connect() is raising an exception that you are able to catch from your client app. But, your app only catches this exception if you use a deployment manifest to deploy. If you change your code by editing a live container based on an older build, but still on the same machine with the same install of IoTEdge, then IoTHubModuleClient.connect() does not fail. Is this correct? |
@BertKleewein you are correct, This is exactly what I have been facing for two days. |
@BertKleewein , Any idea on what might be causing this issues? |
I'm at a loss. The error you're reporting (gaierror(-2, 'Name or service not known')) is an error from the underlying network stack saying that it can't get the address of the machine it's trying to connect to. Nothing that we changed should affect this. Since you're connecting to edgeHub, it means it can't find address of the edgeHub machine. It might be worth trying to ping your edgeHub machine from inside the container to see if it can resolve.
Since you're manually calling IoTHubModuleClient.connect and it' failing, another option is to sleep for a few seconds and try calling connect again. I don't necessarily like this option, but this might be easier than fixing your network configuration. |
@BertKleewein Sure, let me check once and Ill get back to you. But also, Since my module is failed, i cant get inside the container. Ill try to get insside the cointainer with my previous successful build |
Hi @BertKleewein , im not able to ping the hostname of docker inside container and Also I get same error when I ping hostname outside the container, it gives Name or service not known, |
I gave a 30 seconds sleep before await module_client.connect(), but after that same error shown up |
Hi @BertKleewein , I had this particular parameters in my manifest file which was conflicting these err,:
I saw one of this config, from another Azure post which is why I had it in the first place.\ Now all modules are running fine. Thank you so much for your support |
@nishad1092 - that explains it. If the OS can't resolve the hostname to an IP address, then we won't be able to connect. It looks like you broke your network configuration, maybe with that change to your manifest, and it looks like this problem isn't related to the azure-iot-device python library at all. |
Hi @BertKleewein , Sorry for the late reply. Yes, I was thinking it is related to azure-iot-device python library, hence posted here. But with this manifest config it was working before, I dont know what suddenly went through, Anyways all good now. Thank you Bert |
@BertKleewein, @MiguelCHR, @mikechari, @dschenzer, @nishad1092, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey |
Hi i am having this problem, this is for azure-iot-device 2.1.0
Any ideas? thanks in advance
AB#7366699
The text was updated successfully, but these errors were encountered: