Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoT Edge Modules are getting recreated if iotedge service restarts #4866

Open
niravart7383 opened this issue Apr 19, 2021 · 10 comments
Open

Comments

@niravart7383
Copy link

Expected Behavior

Modules should be as it is after iotedge restart

Current Behavior

Module containers are getting recreated if iotedge service restarts

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Bring IoT Edge device using DPS (Device Provisioning Service) with TPM as authentication
  2. Deploy custom modules
  3. Check containerId using iotedge list command
  4. Restart iotedge service using Restart-Service iotedge
  5. All the containers will be recreated with difference containerId

Context (Environment)

OS: Windows IoT 1809 (LTSC)

Output of iotedge check

Click here

√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ Windows host version is supported - OK
√ config.yaml has correct hostname - OK
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
‼ latest security daemon - Warning
    Installed IoT Edge daemon has version 1.0.10.4 but 1.1.1 is the latest stable version available.
    Please see https://aka.ms/iotedge-update-runtime for update instructions.
√ host time is close to real time - OK
√ container time is close to host time - OK
√ DNS server - OK
√ production readiness: certificates - OK
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
    Container engine is not configured to rotate module logs which may cause it run out of disk space.
    Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
    You can ignore this warning if you are setting log policy per module in the Edge deployment.
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its C:\Windows\Temp\edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
√ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK

Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with DPS endpoint - OK
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK

19 check(s) succeeded.

Device Information

  • OS: Windows IoT 1809 (LTSC)
  • Architecture : amd64
  • Container OS [Windows containers]:

Runtime Versions

  • aziot-edged [run iotedge version]:
  • Edge Agent [1.0.10.4]:
  • Edge Hub [1.0.10.4]:

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

edge-agent logs

2021-04-19 17:13:55.011 +00:00 Edge Agent Main()
<6> 2021-04-19 10:13:55.411 -07:00 [INF] - Initializing Edge Agent.
<6> 2021-04-19 10:13:55.726 -07:00 [INF] - Version - 1.0.10.4.37804714 (57772714c81c8b823a5ef05bf11bf343b923fb6a)
<6> 2021-04-19 10:13:55.727 -07:00 [INF] -
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

<6> 2021-04-19 10:13:55.793 -07:00 [INF] - Experimental features configuration: {"Enabled":false,"DisableCloudSubscriptions":false}
<6> 2021-04-19 10:13:56.014 -07:00 [INF] - Installing certificates [CN=Azure IoT CA TestOnly Root CA:3/24/2026 6:05:47 AM] to CertificateAuthority
<6> 2021-04-19 10:13:56.234 -07:00 [INF] - Starting metrics listener on Host: *, Port: 9600, Suffix: /metrics
<6> 2021-04-19 10:13:56.490 -07:00 [INF] - Updating performance metrics every 05m:00s
<6> 2021-04-19 10:13:56.496 -07:00 [INF] - Started operation Get system resources
<6> 2021-04-19 10:13:56.498 -07:00 [INF] - Collecting metadata metrics
<6> 2021-04-19 10:13:56.576 -07:00 [INF] - Set metadata metrics: 1.0.10.4.37804714 (57772714c81c8b823a5ef05bf11bf343b923fb6a), {"Enabled":false,"DisableCloudSubscriptions":false}, {"OperatingSystemType":"windows","Architecture":"x86_64","Version":"1.0.10.4 (57772714c81c8b823a5ef05bf11bf343b923fb6a)","Provisioning":{"Type":"dps.tpm","DynamicReprovisioning":false},"ServerVersion":"19.03.12+azure","KernelVersion":"10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)","OperatingSystem":"Windows 10 Enterprise LTSC 2019 Version 1809 (OS Build 17763.1879)","NumCpus":2,"Virtualized":"unknown"}, True
<6> 2021-04-19 10:13:56.611 -07:00 [INF] - Started operation Checkpoint Availability
<6> 2021-04-19 10:13:56.620 -07:00 [INF] - Started operation refresh twin config
<6> 2021-04-19 10:13:56.645 -07:00 [INF] - Edge agent attempting to connect to IoT Hub via Amqp_Tcp_Only...
<6> 2021-04-19 10:13:57.057 -07:00 [INF] - Created persistent store at C:\Windows\TEMP\edgeAgent
<6> 2021-04-19 10:13:57.118 -07:00 [INF] - Started operation Metrics Scrape
<6> 2021-04-19 10:13:57.118 -07:00 [INF] - Started operation Metrics Upload
Scraping frequency: 01:00:00
Upload Frequency: 1.00:00:00
<6> 2021-04-19 10:13:57.485 -07:00 [INF] - Registering request handler UploadModuleLogs
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler GetModuleLogs
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler UploadSupportBundle
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler RestartModule
<6> 2021-04-19 10:13:59.548 -07:00 [INF] - Edge agent connected to IoT Hub via Amqp_Tcp_Only.
<6> 2021-04-19 10:14:00.261 -07:00 [INF] - Initialized new module client with subscriptions enabled
<6> 2021-04-19 10:14:00.572 -07:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 16 and reported properties version 29.
<6> 2021-04-19 10:14:02.647 -07:00 [INF] - Plan execution started for deployment 16
<6> 2021-04-19 10:14:02.680 -07:00 [INF] - Executing command: "Command Group: (\n  [Create module edgeHub]\n  [Start module edgeHub]\n)"
<6> 2021-04-19 10:14:02.686 -07:00 [INF] - Executing command: "Create module edgeHub"
<6> 2021-04-19 10:14:03.586 -07:00 [INF] - Executing command: "Start module edgeHub"
<6> 2021-04-19 10:14:04.706 -07:00 [INF] - Executing command: "Command Group: (\n  [Create module ddiotedgeremoteaccessmodule]\n  [Start module ddiotedgeremoteaccessmodule]\n)"
<6> 2021-04-19 10:14:04.706 -07:00 [INF] - Executing command: "Create module ddiotedgeremoteaccessmodule"


Additional Information

It is applicable to TPM auth with DPS only, It is working fine with SAS token authentication

@lfitchett
Copy link
Contributor

@niravart7383 Can you confirm that your iotedged version matches your edgeAgent. Run iotedge version

@lfitchett lfitchett self-assigned this Apr 19, 2021
@niravart7383
Copy link
Author

@niravart7383 Can you confirm that your iotedged version matches your edgeAgent. Run iotedge version

Yes, it matches exactly

@lfitchett
Copy link
Contributor

Hey @niravart7383, sorry for the delay. I can confirm that this is expected behavior. This is because restarting triggers the deprovisioning flow, which for DPS with TPM results in a new identity.

If you want to avoid this, you can set always_reprovision_on_startup to false:

always_reprovision_on_startup: true

If you update to the LTS 1.1.x, the field is now AutoReprovisioningMode, and can be set to Dynamic, AlwaysOnStartup, and OnErrorOnly.

@niravart7383
Copy link
Author

niravart7383 commented Apr 29, 2021

Hi

I will check the same and let you know.
Will it stop reprovisioning also during startup if the flag will be false? If it really stops reprovisioning then there may be a security threat as we are not reaching to DPS anymore during startup

Isn't it?

@niravart7383
Copy link
Author

niravart7383 commented Apr 29, 2021

Hey @niravart7383, sorry for the delay. I can confirm that this is expected behavior. This is because restarting triggers the deprovisioning flow, which for DPS with TPM results in a new identity.

If you want to avoid this, you can set always_reprovision_on_startup to false:

always_reprovision_on_startup: true

If you update to the LTS 1.1.x, the field is now AutoReprovisioningMode, and can be set to Dynamic, AlwaysOnStartup, and OnErrorOnly.

I have tested the same by setting flag always_reprovision_on_startup to true
I have checked multiple times and everytime it recreates the container where we are losing our data and files.

I have attached two screenshots, where you can find the containerIds are different.

image
image

@lfitchett
Copy link
Contributor

Hey @niravart7383, the always_reprovision_on_startup needs to be set to false. It is the re-provisioning that is causing the containers to be reset.

Sorry if the config file I linked above caused confusion, I simply linked to an example config in our repo that had the field.

@lfitchett
Copy link
Contributor

In addition, if you are worried about losing data in a container, you can use volume mounting to store permanent files on the host filesystem: https://docs.docker.com/storage/volumes/

@niravart7383
Copy link
Author

Hey @niravart7383, the always_reprovision_on_startup needs to be set to false. It is the re-provisioning that is causing the containers to be reset.

Sorry if the config file I linked above caused confusion, I simply linked to an example config in our repo that had the field.

I have checked with both, true/false
the behavior is same.

@niravart7383
Copy link
Author

In addition, if you are worried about losing data in a container, you can use volume mounting to store permanent files on the host filesystem: https://docs.docker.com/storage/volumes/

I agree !!

But why the behaviour is different, the same thing is not happening If I am not using TPM

@github-actions
Copy link

This issue is being marked as stale because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants