Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM extension TeamServicesAgentLinux fails with ModuleNotFoundError #3156

Closed
jblafage opened this issue Nov 9, 2020 · 28 comments
Closed

VM extension TeamServicesAgentLinux fails with ModuleNotFoundError #3156

jblafage opened this issue Nov 9, 2020 · 28 comments

Comments

@jblafage
Copy link

jblafage commented Nov 9, 2020

Context

I have generated a Linux VM image (Ubuntu 20.04) using Packer, based on scripts used by Microsoft Hosted agents. The goal is to have similar machine to run Self-Hosted agents and to instantiate it thanks to the VM scale set pools
on Azure DevOps.

VM image is generated and works fine. It is based on latest commits from main branch of actions/virtual-environments repository.

What's not working?

After creating a VM scale set on Azure based on generated VM image, and after using this VM scale set with a new VM scale set pool on Azure DevOps, an agent is instantiated using a VM extension: Microsoft.VisualStudio.Services.TeamServicesAgentLinux

The problem is that we get an error during execution of this VM extension:

[ExtensionOperationError] Non-zero exit code: 1, /var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/AzureRM.py [stdout] [stderr] Running scope as unit: Microsoft.VisualStudio.Services.TeamServicesAgentLinux_1.21.0.0_c99d6fee-757e-488d-9f10-158b724a33bc.scope Traceback (most recent call last): File "/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/AzureRM.py", line 9, in <module> import Utils.HandlerUtil as Util File "/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/HandlerUtil.py", line 62, in <module> import RMExtensionStatus ModuleNotFoundError: No module named 'RMExtensionStatus'

It seems Python scripts cannot run using Python version 3:

TeamServicesAgentLinux extension version used: 1.21.0.0

How to solve the issue?

  • Is there a more recent version of TeamServicesAgentLinux extension available and compliant with Python3?
  • If yes, could it be used by Azure DevOps for the VM scale set pools?
  • Is the source code of this extension is available somewhere?
@mjroghelia
Copy link
Contributor

@bishal-pdMSFT is the Microsoft.VisualStudio.Services.TeamServicesAgentLinux extension in your wheelhouse?

@sdobrodeev sdobrodeev added the bug label Dec 4, 2020
@lbergeron01
Copy link

We are having the same issue with ubuntu20 image since Python3 became default in October.

There's been no update on this issue since 28 days, but I see in announcement that image being rolled since Nov. 30th, so I guess there is a fix that exist somewhere, but not available to the public?
actions/runner-images#1816

@bishal-pdMSFT
Copy link

Adding @tejasd1990

@tejasd1990
Copy link

ack. will investigate and update

@chandlerkent
Copy link

I am also running into the same error, but using CentOS 8. This works fine in our environment using CentOS 7.

Error message:

Traceback (most recent call last):
  File "./AzureRM.py", line 9, in <module>
    import Utils.HandlerUtil as Util
  File "/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/HandlerUtil.py", line 62, in <module>
    import RMExtensionStatus
ModuleNotFoundError: No module named 'RMExtensionStatus'

If I try to run with python2 using alternatives --set python python2 (after yum install python2), I get a different error:

Traceback (most recent call last):
  File "./AzureRM.py", line 9, in <module>
    import Utils.HandlerUtil as Util
  File "/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/HandlerUtil.py", line 69, in <module>
    from WAAgentUtil import waagent
  File "/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/WAAgentUtil.py", line 42, in <module>
    waagent = imp.load_source('waagent', agentPath)
  File "/usr/sbin/waagent", line 46, in <module>
    raise ImportError("Can't load waagent")
ImportError: Can't load waagent

I have also opened a case using our Azure support account:

120121724004184

If it is helpful, here's the ARM template for the extension:

{
	"name": "AzureDevOps",
	"properties": {
		"publisher": "Microsoft.VisualStudio.Services",
		"type": "TeamServicesAgentLinux",
		"typeHandlerVersion": "1.0",
		"autoUpgradeMinorVersion": true,
		"settings": {
			"VSTSAccountName": "[parameters('azureDevOpsAccountName')]",
			"TeamProject": "[parameters('azureDevOpsDeploymentGroupTeamProject')]",
			"DeploymentGroup": "[parameters('azureDevOpsDeploymentGroupName')]",
			"Tags": "[parameters('azureDevOpsDeploymentGroupTags')]"
		},
		"protectedSettings": {
			"PATToken": "[parameters('azureDevOpsDeploymentGroupPersonalAccessToken')]"
		},
		"provisionAfterExtensions": [
			"logAnalytics"
		]
	}
}

Full JSON for extension status (/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/HandlerStatus):

{
  "name": "Microsoft.VisualStudio.Services.TeamServicesAgentLinux",
  "version": "1.21.0.0",
  "status": "NotReady",
  "code": 1007,
  "message": "[ExtensionOperationError] Non-zero exit code: 1, /var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/AzureRM.py\n[stdout]\n\n\n[stderr]\nTraceback (most recent call last):\n  File \"/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/AzureRM.py\", line 9, in <module>\n    import Utils.HandlerUtil as Util\n  File \"/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/HandlerUtil.py\", line 69, in <module>\n    from WAAgentUtil import waagent\n  File \"/var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.0/Utils/WAAgentUtil.py\", line 42, in <module>\n    waagent = imp.load_source('waagent', agentPath)\n  File \"/usr/sbin/waagent\", line 46, in <module>\n    raise ImportError(\"Can't load waagent\")\nImportError: Can't load waagent\n",
  "extensions": []
}

waagent -version:

WALinuxAgent-2.2.46 running on centos 8.3.2011
Python: 3.6.8
Goal state agent: 2.2.52

@snnn
Copy link
Member

snnn commented Dec 18, 2020

As a workaround, you may add the following two lines to

apt-get remove -y python-is-python3
apt-get install -y python-is-python2

the bottom of "/images/linux/scripts/base/repos.sh" ,and remove the line

"python-is-python3"

in images/linux/toolsets/toolset-2004.json.

(This is an unofficial suggestion, I'm not part of the team)

jamesrcounts added a commit to jamesrcounts/terraform-packer that referenced this issue Jan 4, 2021
@brandongodby78
Copy link

@tejasd1990 Any update on this? Following up on our Azure support case 120121724004184, and they keep pointing us to get updates here. I am a colleague of the poster above @chandlerkent

@tejasd1990
Copy link

Hi all, we are working on a fix for this, and will soom rollout. We will update on the rollout progress.

@litan1106
Copy link

@tejasd1990 thanks. We will love the Python3 fix.

@brandongodby78
Copy link

@tejasd1990 Did you release something here? our releases using this extension broke yesterday. Comparing them from the working versions we see version 1.21.0.1 is being deployed now vs 1.21.0.0.

Version 1.21.0.1 is causing our VMSS to fail deployments

@bishal-pdMSFT
Copy link

@brandongodby78 what is the failure you are seeing? Please share the OS details and the settings for the extension.

@bishal-pdMSFT
Copy link

@brandongodby78 please share the failure logs so that we can take a look?

@ramakrpr
Copy link

Hi ,

we also see the same issue now as we deploy the VM Extension

2021-02-23T12:13:16.2827677Z ##[error]ERROR --- An error has occurred: ERROR --- 12:13:16 PM - The deployment 'deploymentagent' failed with error(s). Showing 1 out of 1 error(s).
Status Message: The handler for VM extension type 'Microsoft.VisualStudio.Services.TeamServicesAgentLinux' has reported terminal failure for VM extension 'TeamServicesAgentLinux' with error message: '[ExtensionOperationError] Non-zero exit code: 126, /var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.1/AzureRM.py
[stdout]

[stderr]
/bin/sh: /var/lib/waagent/Microsoft.VisualStudio.Services.TeamServicesAgentLinux-1.21.0.1/AzureRM.py: /usr/bin/python3: bad interpreter: No such file or directory
'.

'Install handler failed for the extension. More information on troubleshooting is available at https://aka.ms/vmextensionlinuxtroubleshoot' (Code:VMExtensionHandlerNonTransientError)

@jnovick
Copy link

jnovick commented Feb 24, 2021

I am now running into this issue as well. My error message is the same as the one posted above. I am using RHEL7 I believe.

@brandongodby78
Copy link

@bishal-pdMSFT Sorry, for some reason I am not getting emails that I was tagged on a comment so my apologies. We are receiving the same error as @ramakrpr posted while trying to deploy a VMSS via ARM on CentOs 7.7.

We were able to get past this error by upgrading to CentOs 8.3 which is, of course, running Python3. This did however cause new unrelated issues, but I will mention them incase others ran into them.

We are using the customData feature of the ARM template to deploy base64 data onto our machines. According to the documentation this data is deployed to a file called CustomData. On CentOs 7.7 this was indeed the case, so we have bootstrap scripts that decode the data in that file and do some processing of it. Well, deploying CentOs 8.3 with ARM we do not see that file being created, but rather the ovf-env.xml file is being created. So we re-wrote our bootstrap to decode the data from the XML file rather than the customData file (which was JSON). Here is the documentation of that feature: https://docs.microsoft.com/en-us/azure/virtual-machines/custom-data#linux

Ultimately we have a workaround now, but having to upgrade our clusters to CentOs 8 under emergency duress is not what we want to do. Getting Python 3 to work on CentOs 7 seemed too challenging after about 6 hours of troubleshooting.

I am happy to share any more information or logs you need.

@tejasd1990
Copy link

@ramakrpr @jnovick @brandongodby78 , are you able to get past this error using higher os version images which contain python3? That would be centos8+ and rhel8+ What issues are you facing while using those?
We are working on a fix to handle hack-compat with python2, and will rollout it shortly. I will update on that once done.

@ramakrpr
Copy link

ramakrpr commented Feb 25, 2021

Hi @tejasd1990 unfortunately we dont have RHEL8 as an approved image to be utilized in our enterprise. This will take time. When do you expect the fix to be rolled out?

@bishal-pdMSFT
Copy link

@ramakrpr the fix would be at least 2-3 weeks away as unfortunately we are dealing with another urgent issue.
You could try installing python 3 on rhel 7 as a workaround.

@brandongodby78
Copy link

@tejasd1990 @bishal-pdMSFT Yes, I do have a work around to deploy CentOs 8, however upgrading to CentOs 8 under duress is not something we would normally do. We would want to very slowly roll out a large upgrade like that.

Also, I do have ticket open (121021824006622) regarding an issue we are experiencing after upgrading to CentOs 8 and that is the custom data feature of arm templates is not working as expected on CentOs 8. It is not deploying the base64 data to the /var/lib/waagent/customdata file as it is supposed to do as documented here: https://docs.microsoft.com/en-us/azure/virtual-machines/custom-data#linux

@brandongodby78
Copy link

I will also say that installing Python 3 on CentOs 7 did not solve the problem, unless you create a symlink so that python > python3, but if you do that a lot of other things break. To many things on CentOs 7 don't work with Python 3 as the default python.

@jnovick
Copy link

jnovick commented Feb 25, 2021

I was able to just install python3 and have everything work fine. I did not need to create a symlink for python > python3. As a separate issue, I had to install git 2.x which was a hassle, but this problem was not too bad for me to workaround. I put a custom script extension on my VMSS to run sudo yum -y install python3. Unfortunately, RHEL8 is not yet approved for our use, but RHEL7 stops having this error with the installation of python3.

@tejasd1990
Copy link

tejasd1990 commented Feb 26, 2021

@brandongodby78

unless you create a symlink so that python > python3

this should not be required. The new extension version expects the interpreter with name python3, not python

Also, I do have ticket open (121021824006622) regarding an issue we are experiencing after upgrading to CentOs 8 and that is the custom data feature of arm templates is not working as expected on CentOs 8. It is not deploying the base64 data to the /var/lib/waagent/customdata file as it is supposed to do as documented here: https://docs.microsoft.com/en-us/azure/virtual-machines/custom-data#linux

ok, so the issue with centos8 you were saying earlier is not with the TeamServicesAgent extension, but with arm template deployment, correct?

@brandongodby78
Copy link

@tejasd1990 We have confirmed that if you don't create the symlink that the agent works as expected, so we are able to remain on CentOs 7 for the short term. Thank you for the quick response. Also, yes the problem with CentOs 8 is with the ARM template, not the agent.

@adhodgson1
Copy link

Sorry but can I confirm that since the agent upgrade (we use VMSS agents so can't control the version) we don't need to install Python2 on our images anymore and can keep /usr/bin/python as Python3?

@tejasd1990
Copy link

@adhodgson1 , the extension expects /usr/bin/python3 to be present on the vm, which would be the case on Azure vms except the very old ones which don;t have python3 installed by default

@ramakrpr
Copy link

ramakrpr commented Mar 3, 2021

For me it works fine now with RHEL 7 and Python 3. I’ve used the custom scrip extension to deploy python 3 . Works now like a charm
Need to test with cantos 7 though

@tejasd1990
Copy link

Hi all,
we have rolled out the python2 backcompat. Let us know whether it is working for you, or any issue.

@github-actions
Copy link

github-actions bot commented Sep 8, 2021

This issue has had no activity in 180 days. Please comment if it is not actually stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests