-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgraded Xen-tools from 4.15 to 4.19.0 #4186
Conversation
roja-zededa
commented
Aug 29, 2024
- Able to add ninja through apk in xen-tools/Docker and xen/Docker
- Changed the seabios and Xen version to 1.16.3 and 4.19.0 respectively
- Added 12-remove-vanillaqemu4.19-cpupinning.patch which removes new qemu_thread_set_affinity implementation (QEMU 8.0.4), also retained Nikolay CPU Pinning PatchNo.15
- Warning treated as error flag is available by default so removed 12-disable-Werror-to-build-under-gcc-11.2.patch
- NetdevTapOptions doesn't have has_br member so changed 10-bridge-helper-support.patch
- Bydefault vhost-vsock and vhost-scsi is enabled so removing the corresponding enable flags from xen-tools/Dockerfile
- Removed 11-char-socket-revert.patch as it's unnecessary
- Removed [realtime] option from kvm.go and replaced it with [overcommit], hypervisor.go unit test need to be changed to reflect [overcommit]
- 08-Revert__Revert__vfio_pci-quirks_c__Disable_stolen_memory_for_igd_VFIO__.patch looked super messy, so cleaned it.
- Replaced [realtime] with [overcommit] in kvm_test.go for the unit test case to pass
6d45ae7
to
7280a02
Compare
@OhmSpectator @rene @rouming Fixed unit test case, looks good to me. Please merge this when you can. P.S. Had to move #4133 here because of the Request Code Owners Review / auto_request_review error. |
There are still unanswered question/change request in the original PR. |
Do you mean this one? #4133 (comment) |
|
@roja-zededa we usually don't merge all the patches in one commit: difficult to review, difficult to revert, difficult to maintain. Each logical change goes to its own commit. For example test fixes is ideal candidate for a separate commit with its own good description why the test needs a fix; or removal of old patches, or docker changes - those are good candidates for a separate commits. We try to follow Linux kernel best practices and those are well described here: https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#separate-your-changes Update: your PR description consists of 10 sentences, which perfectly illustrate the "logical change" approach. Of course there should not be exact 10 commits, but some of them definitely "want" to be separated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some comments here and from #4133 to respond to, but kicking off tests.
It makes sense to rebase on the latest master before kicking the tests. |
@shjala I have only added libgcc so I didn't know why there were sudo, bash, yajl. But, I can successfully build an image without these utils. @rene Can I remove sudo, bash, yajl from Dockerfile? |
1b25649
to
7280a02
Compare
7280a02
to
80611a1
Compare
Tested with the latest master, and no issues were found. |
df6005f
to
6d9e7fa
Compare
@roja-zededa This is the source of the error: https://github.com/qemu/qemu/blob/01dc65a3bc262ab1bec8fe89775e9bbfa627becb/ui/vnc.c#L4069 The problem should be reproduced by using vnc and password. Try to follow build recommendations from here: https://bugs.gentoo.org/832494 |
Is that "Cipher backend does not support DES algorithm" check new in the new version of qemu? |
Apparently something was changed, for example qemu/qemu@83bee4b |
@Roja-Eswaran You can run locally these commands to reproduce:
|
038f0df
to
949b192
Compare
This issue has been resolved by adding gnutls and gnutls-dev in Xen-tools/Dockerfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kick off tests
@Roja-Eswaran please rebase on master (to pick up #4290) so we can get the build to work again. |
Upgraded xen/Dockerfile and xen-tools/Dockerfile to point new version (1) Removed the following unused patches: 11-char-socket-revert.patch, 12-disable-Werror-to-build-under-gcc-11.2.patch, 15-qemu-Set-the-affinity-of-QEMU-threads-according-to-t.patch, 16-imammedo_x86_acpi_use_offset_instead_of_pointer_when_using_build_header.patch, 0003-arch-arm-small-hack-for-rpi4-usb.patch Signed-off-by: Roja Eswaran <[email protected]>
Patch 12-remove-vanillaqemu4.19-cpupinning.patch which removes new qemu_thread_set_affinity implementation (QEMU 8.0.4) Patch 20-return_zero_close_if_special_file.patch which removes a CVE fix Signed-off-by: Roja Eswaran <[email protected]>
Replacing [realtime] with [overcommit] as it's deprecated in QEMU 8.0.4 provided by xen-tools 4.19.0 Signed-off-by: Roja Eswaran <[email protected]>
Adding gnutls and gnutls-dev in xen-tools/Dockerfile for VNC error Signed-off-by: Roja Eswaran <[email protected]>
58630ff
to
9647550
Compare
Done! Please kickoff the tests. |
I am unable to launch native containers. The eden test failed again! I was able to reproduce it locally and here is the log:
FWIW: AMD64/KVM, Sometimes the zedcontroller throws me this error while launching native container:
|
Only these logs are not enough to know the root cause of the crash.
This error message happens a lot while deploying native containers when running EVE device on QEMU (I think it takes too long to setup the bridge device in this case). However, the virt. ethernet hook script retries a couple of times, so usually it succeeds after the second retry. If it manages to run the container, you don't need to worry about it.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kicking off Eden Tests...
Eden tests are green. Should we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-running the same qemu device as before
make -e HV=xen ACCEL=y QEMU_MEMORY=16384 MEDIA_SIZE=65536 SSH_PORT=2323 pkg/pillar pkg/watchdog rootfs live run
where the controller is deploying two VMs and two containers, the alpine container fails after 5 minutes, This is not happening when I run the same thing on master, nor when I run it with KVM on this PR. So that failure needs to be investigated.
@eriknordmark Could you please share the error trace? |
@eriknordmark I don't face any issue while launching three VMs and three native containers at the same time on this PR. https://zedcontrol.alpha.zededa.net/edge-nodes/223c376d-242c-4da2-8487-253c6d599c07/details/status. Could you please share the error log messages so that I can reproduce the issue locally? |
I sent you a pointer to the Kibana logs when running xen-amd64. I kicked off a run with the kvm image from your PR and today I see some new qmp errors (didn't see those a week ago) Do you want a kibana saved search for those as well? The kvm error is for a device called sc-supermicro-zc2 in the alpha cluster. |
@eriknordmark I don't think so my PR is responsible for sc-supermicro-zc2 failure as I have seen a similar error in PR:4259 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like some container (the alpine-cont we use in ztest) can have issues with a 9p mount when run on the xen hypervisor. But let's debug that in parallel with merging this (assuming no test issues discovered overnight for the kvm image).
I've seen a panic in the guest VM kernel (in p9_xen_create) which looks like an old guest kernel (missing a check for addr == NULL which is in the 6.1 kernel). So not an issue with this PR.