-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup Job Fails to Abort When libvirtd Daemon Restarts or Disconnects #205
Comments
What happens to the backup job anyway if the libvirtd daemon is restarted? It seems the Qemu process still keeping the NDB handle open, otherwise data transfer would fail. Is the backup job still listed in domjobs output after a libvrtd restart? Really a corner case here, i dont know if want to invest time to fix this.. Problem to fix this is that the libvirt connection is initialized at application startup, there is probably no easy way to re-connect without re-designing a large part of the code base. One way to solve this would probably be a second thread which checks for the connection state and re-initializes it if its broken, but then this is still prone to race conditions. Another one would be to slap an decorator on each function within the virt/client.py which re-initializes a connection if it is broken, or redesign the module at all that for each operation a new connection is opened (which isnt that great,too). But then you would need to handle this gracefully too, what if libvirtd doesnt come up for several minutes or even hours?) I dont see any options to the libvirt flags upon connection opening that would introduce an automatic reconnect. |
|
registerCloseCallback looks interesting. |
|
hi, can you check branch issue_205 with its commits.
|
Ok, thank you very much, all I can do is try it in a test installation and let you know. |
this one made it through: https://github.com/abbbi/virtnbdbackup/actions/runs/10487766874/artifacts/1836691428 |
It runs fine for me!! I have created a pull to include retries #206 |
Version used
2.11
Describe the bug
When performing a backup, if the libvirtd daemon restarts or disconnects unexpectedly, the backup job fails to complete and throws an error. The process is unable to stop the backup task due to a broken connection, resulting in a "Broken pipe" error. The application does not handle this scenario correctly, leading to an unhandled exception and terminating the process.
Steps to Reproduce:
Expected behavior
The backup process should be able to detect the disconnection of the libvirtd daemon and handle it gracefully by reconnecting and attempting to abort the job. Ideally, the application should attempt to restore the connection to libvirtd and properly abort the backup task, instead of throwing an unhandled exception.
Hypervisor information:
Logfiles:
[2024-08-20 01:33:55] INFO lib common - printVersion [main]: Version: 2.11 Arguments: /usr/lib/python-exec/python3.11/virtnbdbackup -d svrdynamics --noprogress -q --compress -l auto -o /srv/backup/svrdynamics/2024-33
[2024-08-20 01:33:55] INFO root virtnbdbackup - main [main]: Backup level: [auto]
[2024-08-20 01:33:55] INFO root virtnbdbackup - main [main]: Compression enabled, level [2]
[2024-08-20 01:33:56] INFO root virtnbdbackup - main [main]: Libvirt library version: [10003000]
[2024-08-20 01:33:56] INFO root virtnbdbackup - main [main]: NBD library version: [1.18.1]
[2024-08-20 01:33:56] INFO root check - targetDir [main]: Backup mode auto, target folder is empty: executing full backup.
[2024-08-20 01:33:56] INFO root disktype - Optical [main]: Skipping attached [cdrom] device: [sdb].
2024-08-20 01:33:56] INFO root virtnbdbackup - main [main]: Backup will save [4] attached disks.
2024-08-20 01:33:56] INFO root virtnbdbackup - main [main]: Concurrent backup processes: [4]
2024-08-20 01:33:56] INFO root checkpoint - create [main]: Loading checkpoints from: [/srv/backup/svrdynamics/2024-33/svrdynamics.cpt]
2024-08-20 01:33:56] INFO root checkpoint - redefine [main]: Loading checkpoint list from: [/srv/backup/svrdynamics/2024-33/checkpoints]
2024-08-20 01:33:56] INFO root checkpoint - create [main]: Checkpoint handling.
2024-08-20 01:34:02] INFO root checkpoint - create [main]: Using checkpoint name: [virtnbdbackup.0].
2024-08-20 01:34:02] INFO root virtnbdbackup - main [main]: Local NBD Endpoint socket: [/var/tmp/virtnbdbackup.2045415]
2024-08-20 01:34:02] INFO root virtnbdbackup - main [main]: Temporary scratch file target directory: [/var/tmp]
2024-08-20 01:34:02] INFO root job - start [main]: Starting backup job.
2024-08-20 01:34:33] INFO fs fs - freeze [main]: Freezed [5] filesystems.
2024-08-20 01:34:34] INFO fs fs - thaw [main]: Thawed [5] filesystems.
2024-08-20 01:34:34] INFO root virtnbdbackup - main [main]: Started backup job with checkpoint, saving information.
2024-08-20 01:34:34] INFO root checkpoint - backup [main]: Saving checkpoint config to: [/srv/backup/svrdynamics/2024-33/checkpoints/virtnbdbackup.0.xml]
2024-08-20 01:34:34] INFO nbd client - connect [vda]: Waiting until NBD server at [nbd+unix:///vda?socket=/var/tmp/virtnbdbackup.2045415] is up.
2024-08-20 01:34:34] INFO nbd client - connect [vdb]: Waiting until NBD server at [nbd+unix:///vdb?socket=/var/tmp/virtnbdbackup.2045415] is up.
2024-08-20 01:34:34] INFO nbd client - connect [vdc]: Waiting until NBD server at [nbd+unix:///vdc?socket=/var/tmp/virtnbdbackup.2045415] is up.
2024-08-20 01:34:34] INFO nbd client - connect [sda]: Waiting until NBD server at [nbd+unix:///sda?socket=/var/tmp/virtnbdbackup.2045415] is up.
2024-08-20 01:34:35] INFO nbd client - connect [vda]: Connection to NBD backend succeeded.
2024-08-20 01:34:35] INFO nbd client - connect [vdb]: Connection to NBD backend succeeded.
2024-08-20 01:34:35] INFO root disk - _getExtentHandler [vda]: Using qemu tools to query extents
2024-08-20 01:34:35] INFO root disk - _getExtentHandler [vdb]: Using qemu tools to query extents
2024-08-20 01:34:35] INFO nbd client - connect [vdc]: Connection to NBD backend succeeded.
2024-08-20 01:34:35] INFO nbd client - connect [sda]: Connection to NBD backend succeeded.
2024-08-20 01:34:35] INFO root disk - _getExtentHandler [vdc]: Using qemu tools to query extents
2024-08-20 01:34:35] INFO root disk - _getExtentHandler [sda]: Using qemu tools to query extents
2024-08-20 01:34:35] INFO root disk - backup [vda]: Got 31 extents to backup.
2024-08-20 01:34:35] INFO root disk - backup [vda]: 53687091200 bytes disk size
2024-08-20 01:34:35] INFO root disk - backup [vda]: 823189504 bytes of data extents to backup
2024-08-20 01:34:35] INFO root target - get [vda]: Write data to target file: [/srv/backup/svrdynamics/2024-33/vda.full.data.partial].
2024-08-20 01:34:35] INFO root disk - backup [vda]: Creating thin provisioned stream backup image
2024-08-20 01:34:36] INFO root disk - backup [sda]: Got 357 extents to backup.
2024-08-20 01:34:36] INFO root disk - backup [sda]: 128849018880 bytes disk size
2024-08-20 01:34:36] INFO root disk - backup [sda]: 128540803072 bytes of data extents to backup
2024-08-20 01:34:36] INFO root target - get [sda]: Write data to target file: [/srv/backup/svrdynamics/2024-33/sda.full.data.partial].
2024-08-20 01:34:36] INFO root disk - backup [sda]: Creating thin provisioned stream backup image
2024-08-20 01:34:36] INFO root disk - backup [vdc]: Got 123 extents to backup.
2024-08-20 01:34:36] INFO root disk - backup [vdc]: 107374182400 bytes disk size
2024-08-20 01:34:36] INFO root disk - backup [vdc]: 65519288320 bytes of data extents to backup
2024-08-20 01:34:36] INFO root target - get [vdc]: Write data to target file: [/srv/backup/svrdynamics/2024-33/vdc.full.data.partial].
2024-08-20 01:34:36] INFO root disk - backup [vdc]: Creating thin provisioned stream backup image
2024-08-20 01:34:37] INFO root disk - backup [vdb]: Got 17 extents to backup.
2024-08-20 01:34:37] INFO root disk - backup [vdb]: 214748364800 bytes disk size
2024-08-20 01:34:37] INFO root disk - backup [vdb]: 214353313792 bytes of data extents to backup
2024-08-20 01:34:37] INFO root target - get [vdb]: Write data to target file: [/srv/backup/svrdynamics/2024-33/vdb.full.data.partial].
2024-08-20 01:34:37] INFO root disk - backup [vdb]: Creating thin provisioned stream backup image
2024-08-20 01:34:42] INFO root disk - backup [vda]: Backup of disk [vda] finished, file: [/srv/backup/svrdynamics/2024-33/vda.full.data]
2024-08-20 01:34:42] INFO root metadata - backupChecksum [vda]: Checksum for file: [/srv/backup/svrdynamics/2024-33/vda.full.data]:[948353423]
2024-08-20 01:34:42] INFO root metadata - backupChecksum [vda]: Saving checksum to: [/srv/backup/svrdynamics/2024-33/vda.full.data.chksum]
2024-08-20 01:49:32] INFO root disk - backup [vdc]: Backup of disk [vdc] finished, file: [/srv/backup/svrdynamics/2024-33/vdc.full.data]
2024-08-20 01:49:32] INFO root metadata - backupChecksum [vdc]: Checksum for file: [/srv/backup/svrdynamics/2024-33/vdc.full.data]:[1331928550]
2024-08-20 01:49:32] INFO root metadata - backupChecksum [vdc]: Saving checksum to: [/srv/backup/svrdynamics/2024-33/vdc.full.data.chksum]
2024-08-20 02:02:42] INFO root disk - backup [sda]: Backup of disk [sda] finished, file: [/srv/backup/svrdynamics/2024-33/sda.full.data]
2024-08-20 02:02:42] INFO root metadata - backupChecksum [sda]: Checksum for file: [/srv/backup/svrdynamics/2024-33/sda.full.data]:[496616561]
2024-08-20 02:02:46] INFO root metadata - backupChecksum [sda]: Saving checksum to: [/srv/backup/svrdynamics/2024-33/sda.full.data.chksum]
2024-08-20 02:14:24] INFO root disk - backup [vdb]: Backup of disk [vdb] finished, file: [/srv/backup/svrdynamics/2024-33/vdb.full.data]
2024-08-20 02:14:24] INFO root metadata - backupChecksum [vdb]: Checksum for file: [/srv/backup/svrdynamics/2024-33/vdb.full.data]:[3927655927]
2024-08-20 02:14:24] INFO root metadata - backupChecksum [vdb]: Saving checksum to: [/srv/backup/svrdynamics/2024-33/vdb.full.data.chksum]
2024-08-20 02:14:24] INFO root virtnbdbackup - main [main]: Backup jobs finished, stopping backup task.
2024-08-20 02:14:24] WARNING virt client - stopBackup [main]: Failed to stop backup job: [Cannot write data: Broken pipe]
2024-08-20 02:14:24] INFO root metadata - backupConfig [main]: Saving VM config to: [/srv/backup/svrdynamics/2024-33/vmconfig.virtnbdbackup.0.xml]
2024-08-20 02:14:24] INFO root metadata - backupBootConfig [main]: Save additional boot config [loader] to: [/srv/backup/svrdynamics/2024-33/edk2-x86_64-code.fd.virtnbdbackup.0]
2024-08-20 02:14:24] INFO root metadata - backupBootConfig [main]: Save additional boot config [nvram] to: [/srv/backup/svrdynamics/2024-33/svrdynamics_VARS.fd.virtnbdbackup.0]
2024-08-20 02:14:24] INFO root metadata - backupDiskInfo [main]: Saved qcow image config to: [/srv/backup/svrdynamics/2024-33/vda.virtnbdbackup.0.qcow.json]
2024-08-20 02:14:24] INFO root metadata - backupDiskInfo [main]: Saved qcow image config to: [/srv/backup/svrdynamics/2024-33/vdb.virtnbdbackup.0.qcow.json]
2024-08-20 02:14:24] INFO root metadata - backupDiskInfo [main]: Saved qcow image config to: [/srv/backup/svrdynamics/2024-33/vdc.virtnbdbackup.0.qcow.json]
2024-08-20 02:14:24] INFO root metadata - backupDiskInfo [main]: Saved qcow image config to: [/srv/backup/svrdynamics/2024-33/sda.virtnbdbackup.0.qcow.json]
Traceback (most recent call last):
File "/usr/lib/python-exec/python3.11/virtnbdbackup", line 493, in
main()
File "/usr/lib/python-exec/python3.11/virtnbdbackup", line 471, in main
if domObj.autostart() == 1:
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/libvirt.py", line 741, in autostart
raise libvirtError('virDomainGetAutostart() failed')
libvirt.libvirtError: internal error: client socket is closed
The text was updated successfully, but these errors were encountered: