Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM disk corruption with Apple Silicon #1957

Closed
EdwardMoyse opened this issue Oct 27, 2023 · 50 comments
Closed

VM disk corruption with Apple Silicon #1957

EdwardMoyse opened this issue Oct 27, 2023 · 50 comments

Comments

@EdwardMoyse
Copy link

EdwardMoyse commented Oct 27, 2023

Tip

EDIT by @AkihiroSuda

For --vm-type=vz, this issue seems to have been solved in Lima v0.19 (#2026)


Description

Lima version: 0.18.0
macOS: 14.0 (23A344)
VM: Almalinux9

I was trying to do a big compile, using a VM with the attached configuration (vz)

NAME           STATUS     SSH                VMTYPE    ARCH       CPUS    MEMORY    DISK      DIR
myalma9        Running    127.0.0.1:49434    vz        aarch64    4       16GiB     100GiB    ~/.lima/myalma9

The build aborted with:

from /Volumes/Lima/build/build/AthenaExternals/src/Geant4/source/processes/hadronic/models/lend/src/[xDataTOM_LegendreSeries.cc](http://xdatatom_legendreseries.cc/):7:
/usr/include/bits/types.h:142:10: fatal error: /usr/include/bits/time64.h: Input/output error

And afterwards, even in a different terminal, I see:

[emoyse@lima-myalma9 emoyse]$ ls
bash: /usr/bin/ls: Input/output error

I was also logged into a display, and there I saw e.g.

Screenshot 2023-10-26 at 17 44 45

If I try to log in again with:

limactl shell myalma9

each time I see something like the following appear in the display window:

[56247.6427031] Core dump to l/usr/lib/systemd/systemd-coredump pipe failed

Edit: there has been a lot of discussion below, and the corruption can happen with both vz and qemu, and on external (to the VM) and internal disks. Some permutations seem more likely to provoke a corruption than others. I have reproduced my experiments in the table in the following comment below.

@EdwardMoyse
Copy link
Author

In case it is relevant, I was compiling in a separate APFS (Case-sensitive) Volume as described here. This volume seems absolutely fine - so the corruption seems limited to the VM itself. I can't see how this could have happened with 100GB but I wonder if it's possible that it ran out of space? I could try increasing the disk size, but the whole point of using an external volume was that this would not be necessary.

@AkihiroSuda
Copy link
Member

  • Is this specific to AlmaLinux 9?
  • Is this specific to --vm-type=vz?
  • Do you see some meaningful message in dmesg?

@EdwardMoyse
Copy link
Author

Hmm. I just tried again but compiling in /tmp rather than the case-sensitive volume, and this worked fine. A colleague has confirmed a similar experience - problems with /Volumes/Lima, but works fine in /tmp. So my best guess right now is it is some interaction with an APFS Volume and Lima (which might also explain the following "stuck VM" discussion : #1666)

Answering your other questions:

  • I'm not sure if it is specific to AlmaLinux9 since my use-case requires that particular OS. But I will try doing a big compile on a different OS and in a Volume and to see if I can replicate.
  • We have not replicated with qemu but this is so slow that this is very hard to do. I will try.
    I will also try again with dmesg running.

@afbjorklund

This comment was marked as off-topic.

@afbjorklund

This comment was marked as off-topic.

@EdwardMoyse
Copy link
Author

EdwardMoyse commented Oct 30, 2023

EDIT: One potential feature could be to be able create disk images on an attached disk, instead of under LIMA_HOME.
You can probably use symlinks from _disks as a workaround, but would be better with some optional flag support...

This would, I think, really help us.

Our use-case is this - we want to be able to edit files from within macOS, but then compile inside Almalinux9. The codebase we are compiling is relatively large (>4 million lines of C++) and can take up to 400 GB of temporary compilation space. I was reluctant to make separate VMs with this much local storage, especially since a lot of us will be working on laptops. Ideally we would have a large build area (possibly on an external drive), accessible from several VMs, and with very fast disk io to the VM (since otherwise the build time can become unusably slow). We do NOT, in general, need to be able to access this build area from the host (at least, not with fast io - it would mainly be to examine compilation failures)

(I will get back to the other tests shortly - but I'm currently travelling with limited work time, and it seems very likely that the issue is related to compiling outside the VM)

@AkihiroSuda
Copy link
Member

(I will get back to the other tests shortly - but I'm currently travelling with limited work time, and it seems very likely that the issue is related to compiling outside the VM)

I'm not sure how virtiofs affects the XFS disk, but maybe this issue should be reported to Apple?

@afbjorklund
Copy link
Member

afbjorklund commented Oct 30, 2023

I was under the impression that the problem was with the /Volumes/Lima mount, but the logs say vda2...

  - location: /Volumes/Lima
    writable: true

So the remote filesystem is a separate topic*, from this ARM64 disk corruption. Sorry for the added noise.

Though I don't see how switching from remote /Volumes/Lima to local /tmp could have helped, then...


* should continue in a different discussion

Note that disk images cannot be shared...
(they can be unplugged and remounted)

@AkihiroSuda
Copy link
Member

Is this relevant?

(UTM uses vz too)

Looks like people began to hit this issue since September, so I wonder if Apple introduced a regression on that time?

I still can't repro the issue locally though.
(macOS 14.1 on Intel MacBookPro 2020, macOS 13.5.2 on EC2 mac2-m2pro)

@AkihiroSuda
Copy link
Member

Can anybody confirm this rumor?

utmapp/UTM#4840 (comment)

Is it me or deactivating ballooning solves the problem?
I've deactivated it two weeks ago, and no problem since on my side.

Removing these lines will disable ballooning:

lima/pkg/vz/vm_darwin.go

Lines 598 to 604 in 7cb2b2e

configuration, err := vz.NewVirtioTraditionalMemoryBalloonDeviceConfiguration()
if err != nil {
return err
}
vmConfig.SetMemoryBalloonDevicesVirtualMachineConfiguration([]vz.MemoryBalloonDeviceConfiguration{
configuration,
})

@wdormann
Copy link

wdormann commented Oct 30, 2023

For what it's worth, I believe I've narrowed down the problem that I've noticed in utmapp/UTM#4840 to having used an external SSD drive. I've not reproduced the corruption if the VM lives on my Mac's internal storage.

@EdwardMoyse Your separate APFS volume... is it on the same storage device that your Mac runs on, or is it a separate external device?

@AkihiroSuda I've not seen disabling the Balloon device to help with preventing corruption. At least, if I'm working with a QEMU-based VM that lives on my external SSD storage, it has Balloon Device un-checked by default, and the VM's filesystem will eventually corrupt under heavy disk load. So I believe this is a red herring.
Screenshot 2023-10-30 at 8 58 25 AM

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Oct 30, 2023

I'm working with a QEMU-based VM

Probably, you are hitting a different issue with a similar symptom ?

@EdwardMoyse
Copy link
Author

@wdormann my APFS Volume is on same device (SSD) as macOS. It's not an external device in my case.

@wdormann
Copy link

Thanks for the input. I've been testing the disk itself, and it has yet to report errors.
Given your successful test in /tmp, these both seem to point to a problem using a non-OS volume for the underlying VM OS storage?

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Oct 30, 2023

I think I reproduced the issue with the default Ubuntu template:

[  299.527200] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3793: comm apport: iget: checksum invalid
[  299.527255] Aborting journal on device vda1-8.
[  299.527293] EXT4-fs error (device vda1): ext4_journal_check_start:83: comm cp: Detected aborted journal
[  299.528985] EXT4-fs error (device vda1): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
[  299.530464] EXT4-fs (vda1): Remounting filesystem read-only
[  299.530515] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3794: comm apport: iget: checksum invalid
[  299.535137] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3795: comm apport: iget: checksum invalid
[  299.538878] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3796: comm apport: iget: checksum invalid
[  299.543827] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3797: comm apport: iget: checksum invalid
[  299.550614] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3798: comm apport: iget: checksum invalid
[  299.551947] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3799: comm apport: iget: checksum invalid
[  299.553651] EXT4-fs error (device vda1): ext4_lookup:1851: inode #3800: comm apport: iget: checksum invalid
[  299.821872] audit: type=1131 audit(1698675832.913:271): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  299.821967] BUG: Bad rss-counter state mm:0000000013fa5858 type:MM_FILEPAGES val:43
[  299.821980] BUG: Bad rss-counter state mm:0000000013fa5858 type:MM_ANONPAGES val:3
[  299.821982] BUG: non-zero pgtables_bytes on freeing mm: 4096
[  299.822551] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
[  299.822566] Mem abort info:
[  299.822566]   ESR = 0x0000000096000004
[  299.822568]   EC = 0x25: DABT (current EL), IL = 32 bits
[  299.822569]   SET = 0, FnV = 0
[  299.822570]   EA = 0, S1PTW = 0
[  299.822570]   FSC = 0x04: level 0 translation fault
[  299.822571] Data abort info:
[  299.822572]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  299.822573]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  299.822574]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  299.822575] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100970000
[  299.822576] [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
[  299.822604] Internal error: Oops: 0000000096000004 [#1] SMP
[  299.822615] Modules linked in: tls nft_chain_nat overlay xt_tcpudp xt_nat xt_multiport xt_mark xt_conntrack xt_comment xt_addrtype xt_MASQUERADE nf_tables nfnetlink ip6table_filter iptable_filter ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables veth bridge stp llc tap isofs binfmt_misc nls_iso8859_1 vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock virtiofs joydev input_leds drm 
[  299.822800] Unable to handle kernel paging request at virtual address fffffffffffffff8
[  299.822805] Mem abort info:
[  299.822805]   ESR = 0x0000000096000004
[  299.822806]   EC = 0x25: DABT (current EL), IL = 32 bits
[  299.822807]   SET = 0, FnV = 0
[  299.822808]   EA = 0, S1PTW = 0
[  299.822809]   FSC = 0x04: level 0 translation fault
[  299.822810] Data abort info:
[  299.822810]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  299.822811]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  299.822812]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  299.822813] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000864e50000
[  299.822814] [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
[  361.102020] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  361.102094] rcu:     1-...0: (1 GPs behind) idle=e0b4/1/0x4000000000000000 softirq=23608/23609 fqs=6997
[  361.102102] rcu:              hardirqs   softirqs   csw/system
[  361.102103] rcu:      number:        0          0            0
[  361.102104] rcu:     cputime:        0          0            0   ==> 30000(ms)
[  361.102105] rcu:     (detected by 3, t=15002 jiffies, g=38213, q=860 ncpus=4)
[  361.102107] Task dump for CPU 1:
[  361.102108] task:systemd         state:S stack:0     pid:1     ppid:0      flags:0x00000002
[  361.102111] Call trace:
[  361.102118]  __switch_to+0xc0/0x108
[  361.102180]  seccomp_filter_release+0x40/0x78
[  361.102203]  release_task+0xf0/0x238
[  361.102216]  wait_task_zombie+0x124/0x5c8
[  361.102218]  wait_consider_task+0x244/0x3c0
[  361.102220]  do_wait+0x178/0x338
[  361.102222]  kernel_waitid+0x100/0x1e8
[  361.102224]  __do_sys_waitid+0x2bc/0x378
[  361.102226]  __arm64_sys_waitid+0x34/0x60
[  361.102228]  invoke_syscall+0x7c/0x128
[  361.102230]  el0_svc_common.constprop.0+0x5c/0x168
[  361.102231]  do_el0_svc+0x38/0x68
[  361.102232]  el0_svc+0x30/0xe0
[  361.102234]  el0t_64_sync_handler+0x148/0x158
[  361.102236]  el0t_64_sync+0x1b0/0x1b8
[  541.118359] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  541.118368] rcu:     1-...0: (1 GPs behind) idle=e0b4/1/0x4000000000000000 softirq=23608/23609 fqs=27191
[  541.118371] rcu:              hardirqs   softirqs   csw/system
[  541.118372] rcu:      number:        0          0            0
[  541.118373] rcu:     cputime:        0          0            0   ==> 210020(ms)
[  541.118375] rcu:     (detected by 3, t=60007 jiffies, g=38213, q=1790 ncpus=4)
[  541.118377] Task dump for CPU 1:
[  541.118379] task:systemd         state:S stack:0     pid:1     ppid:0      flags:0x00000002
[  541.118382] Call trace:
[  541.118383]  __switch_to+0xc0/0x108
[  541.118390]  seccomp_filter_release+0x40/0x78
[  541.118393]  release_task+0xf0/0x238
[  541.118396]  wait_task_zombie+0x124/0x5c8
[  541.118399]  wait_consider_task+0x244/0x3c0
[  541.118401]  do_wait+0x178/0x338
[  541.118403]  kernel_waitid+0x100/0x1e8
[  541.118405]  __do_sys_waitid+0x2bc/0x378
[  541.118407]  __arm64_sys_waitid+0x34/0x60
[  541.118409]  invoke_syscall+0x7c/0x128
[  541.118411]  el0_svc_common.constprop.0+0x5c/0x168
[  541.118412]  do_el0_svc+0x38/0x68
[  541.118413]  el0_svc+0x30/0xe0
[  541.118415]  el0t_64_sync_handler+0x148/0x158
[  541.118417]  el0t_64_sync+0x1b0/0x1b8

(Non-minimum, non-deterministic) repro steps:

  • Create a mac2-m2pro (32GB RAM) instance on EC2, with macOS 13.5.2 AMI, and a gp2 EBS volume
  • Install Lima v0.18.0
  • Run limactl start --vm-type=vz --cpus=4 --memory=32 --disk=100 --name=vm1
  • Run limactl start --vm-type=vz --cpus=4 --memory=32 --disk=100 --name=vm2
  • For each of the VMs, run cp -a /Users/ec2-user/some-large-directory ~.
    Some of them may fail with cp: ...: Read-only filesystem

Filesystems:

% mount
/dev/disk5s2s1 on / (apfs, sealed, local, read-only, journaled)
devfs on /dev (devfs, local, nobrowse)
/dev/disk5s5 on /System/Volumes/VM (apfs, local, noexec, journaled, noatime, nobrowse)
/dev/disk5s3 on /System/Volumes/Preboot (apfs, local, journaled, nobrowse)
/dev/disk1s2 on /System/Volumes/xarts (apfs, local, noexec, journaled, noatime, nobrowse)
/dev/disk1s1 on /System/Volumes/iSCPreboot (apfs, local, journaled, nobrowse)
/dev/disk1s3 on /System/Volumes/Hardware (apfs, local, journaled, nobrowse)
/dev/disk5s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
/dev/disk3s4 on /private/tmp/tmp-mount-mDoJ7V (apfs, local, journaled, nobrowse)

% stat -f %Sd / 
disk5s1

% stat -f %Sd /Users/ec2-user/.lima         
disk5s1

The VM disk is located in the default path ~/.lima.

@AkihiroSuda
Copy link
Member

Tried to remove the balloon, but the filesystem still seems to break intermittently

[ 1674.027587] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35601: comm apport: iget: checksum invalid
[ 1674.030317] Aborting journal on device vda1-8.
[ 1674.031818] EXT4-fs error (device vda1): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
[ 1674.031896] EXT4-fs error (device vda1): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[ 1674.033116] EXT4-fs (vda1): Remounting filesystem read-only
[ 1674.033147] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35602: comm apport: iget: checksum invalid
[ 1674.036501] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35603: comm apport: iget: checksum invalid
[ 1674.037738] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35604: comm apport: iget: checksum invalid
[ 1674.038828] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35605: comm apport: iget: checksum invalid
[ 1674.040034] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35606: comm apport: iget: checksum invalid
[ 1674.041091] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35606: comm apport: iget: checksum invalid
[ 1674.042199] EXT4-fs error (device vda1): ext4_lookup:1851: inode #35606: comm apport: iget: checksum invalid

@AkihiroSuda AkihiroSuda changed the title Apparent disk corruption with almalinux9 Apparent disk corruption with vz Oct 30, 2023
@AkihiroSuda AkihiroSuda added the help wanted Extra attention is needed label Oct 30, 2023
@EdwardMoyse
Copy link
Author

Thanks for the input. I've been testing the disk itself, and it has yet to report errors. Given your successful test in /tmp, these both seem to point to a problem using a non-OS volume for the underlying VM OS storage?

I might be perhaps misunderstanding you, but I don't think I am using "non-OS volume for the underlying VM OS storage".

For clarity, here is my setup:

  • I create a VM using the standard limactl start almalinux9.yaml --name=alma9, and the VM exists on the main macOS volume.
  • I create a separate APFS (Case-sensitive) Volume, and make it mountable from within the VM:
- location: /Volumes/Lima
   writable: true
  • if I compile our software in /Volumes/Lima I get disk corruption, if I use /tmp for the same operation, it works fine.

So I would characterise this as rathe ra problem using a non-OS volume for the intensive disk operations from within the VM.

@wdormann
Copy link

I'll admit I'm not familiar with Lima.
When you say "make it mountable from within the VM", what does that mean?

  • You have a virtual hard disk file that lives on that separate APFS volume, and your VM is configured to have that as a second disk drive?
  • You boot the VM, and somehow from Linux user/kernel land mount your /Volumes/Lima directory? (How?)
  • Something else?

Perhaps Lima does this all for you under the hood, but I suppose that I'd need to know exactly what it's doing to have any hope of understanding what's going on.

@EdwardMoyse
Copy link
Author

I'll admit I'm not familiar with Lima. When you say "make it mountable from within the VM", what does that mean?

  • You have a virtual hard disk file that lives on that separate APFS volume, and your VM is configured to have that as a second disk drive?
  • You boot the VM, and somehow from Linux user/kernel land mount your /Volumes/Lima directory? (How?)

It's the latter (but I cannot tell you any technicalities how it works). From within both the host and the VM I can access /Volumes/Lima. See https://lima-vm.io/docs/config/mount/

@wdormann
Copy link

Do you specify a mount type in your limactl command line and/or config file?
Or, from the VM, what does the mount command report for the filesystem in question?

@EdwardMoyse
Copy link
Author

EdwardMoyse commented Nov 15, 2023

My apologies for the delay in replying, but i have been looking into this. The workflow is the same - compile https://gitlab.cern.ch/atlas/atlasexternals using the attached template with various configurations of host, qemu/vz, cores and memory.

TLDR; updating to 6.5.10-1 was more stable on M2 (even on 'shared' volume /tmp/lima), but apparently worse on M1 Pro (though the M1Pro has more cores and we pushed this a lot harder). Updating to 6.6.1 was better on M1 Pro (have not tested M2 yet) but got xfs corruption at the very end.

With 6.6.1 I also disabled sleeping on guest:

sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

(from hint here)

VM Type Kernel Cores RM (GB) Where Attempt 1 Attempt 2 Attempt 3 Host Processor
qemu 5.14 6 24 /tmp Crash + xfs Crash + xfs Crash + xfs M1 Pro
vz 5.14 6 24 /Volumes/Lima Crash + xfs     M1 Pro
vz 5.14 6 24 /tmp OK     M1 Pro
qemu 5.6.10.1 6 24 /tmp OK (but slow)     M1 Pro
vz 5.6.10.1 6 24 /Volumes/Lima Crash + xfs     M1 Pro
vz 5.6.10.1 6 24 /tmp Crash a Crash b   M1 Pro
vz 6.6.1 6 24 /tmp xfs   M1 Pro
vz 6.6.2-1 4 12 /home/emoyse.linux xfs     M1 Pro

Notes:

  • xfs means xfs corruption was reported.
  • Once xfs corruption has occurred, I trash the VM and restart
  • Often crash is preceded in dmesg by e.g "hrtimer: interrupt took 32332585ns"
  • crash a in /var/log/messages I see :
978.3062161 BUG: Bad rss-counter state mm:0000000076c5940f type:M_FILEPAGES val: 402
[978.3067761 BUG: Bad rss-counter state mm:0000000076c5940f type:MM_ANONPAGES val:206
978.3071421 BUG: non-zero pgtables_bytes on freeing mm: 69632
[+0.0116951 BUG: Bad rss-counter state mm:0000000076c5940f type:MM FILEPAGES val: 402
  • crash b I see:
Nov 7 16:44:19 lima-myalma92 kernel: BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 2164s!
Nov 7 16:44:19 lima-myalma92 kernel: Showing busy workqueues and worker pools:
Nov 7 16:44:19 lima-myalma92 kernel: workqueue events: flags=0x0
Nov 7 16:44:19 lima-myalma92 kernel: pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Nov 7 16:44:19 lima-myalma92 kernel: pending: drm_fb_helper_damage_work [drm_kms_helper]
Nov 7 16:44:19 lima-myalma92 kernel: workqueue mm_percpu_wq: flags=0x8
Nov 7 16:44:19 lima-myalma92 kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Nov 7 16:44:19 lima-myalma92 kernel: pending: vmstat_update
  • for the last run with 6.6.1 it all completed fine and looked great, but then I got:
[emoyse@lima-alma9661c6 tmp]$ ls
bash: /usr/bin/ls: Input/output error

And in the display I see:
image

@wdormann
Copy link

FWIW, I've added some test results and comments here: utmapp/UTM#4840 (comment)

I've not ruled out that there is some issue with the macOS filesystem/hypervisor layer, but I've only seen corruption with a Linux VM, and not macOS or Windows doing the exact same thing, from the exact same VM disk backing. What is interesting to me is that if I take the exact same disk and reformat it as APFS instead of ExFAT, Linux 6.5.6 or 6.4.15 will not experience disk corruption. My theory is that given an unfortunate combination of speed/latency/something-else for disk backing, a Linux VM might experience disk corruption.

@AkihiroSuda
Copy link
Member

My theory is that given an unfortunate combination of speed/latency/something-else for disk backing, a Linux VM might experience disk corruption.

Could you submit your insight to Apple? Probably via https://www.apple.com/feedback/macos.html

@wdormann
Copy link

wdormann commented Nov 17, 2023

I have, just to hedge my bets.
However, if Windows, macOS, and I just recently tested FreeBSD, all work flawlessly under the exact same workload, using the same host disk backing, and only Linux has a problem, I'd say that this is a Linux problem. Not Apple.
Screenshot 2023-11-18 at 8 00 32 AM

@afbjorklund
Copy link
Member

afbjorklund commented Nov 19, 2023

I can trigger filesystem corruption if my external disk is formatted with ExFAT

Oh, so that might be why it is mostly affecting external disks ? Did people forget to (re-)format them before using ?

EDIT: no, not so simple

"I create a separate APFS (Case-sensitive) Volume,"

@EdwardMoyse
Copy link
Author

I can trigger filesystem corruption if my external disk is formatted with ExFAT

Oh, so that might be why it is mostly affecting external disks ? Did people forget to (re-)format them before using ?

EDIT: no, no so simple

"I create a separate APFS (Case-sensitive) Volume,"

And for me, I'm not using external (to the VM) disks any more - if you look at the table I posted here you will see that in the Where column, I'm mostly using /tmp to work in i.e. completely inside the VM. Using an external disk might provoke the corruption earlier, but it's certainly not the only route to it (though later kernels seem quite a bit more stable).

@hasan4791
Copy link
Contributor

hasan4791 commented Nov 20, 2023

In my case it occurs with internal disk nd very frequent on fedora images. Just create fedora vm and do dnf update, corruption happens immediately.
btrfs scrub start /

EDIT: vz in my case

@wdormann
Copy link

Using an external disk might provoke the corruption earlier, but it's certainly not the only route to it (though later kernels seem quite a bit more stable).

I don't recall if I mentioned it here, but through eliminating variables I was able to pinpoint a configuration for a likely-to-corrupt-older-Linux-kernels situation, and that is having the VM hosted on an ExFAT-formatted partition (which just happens to be on an external disk for me). Based on how macOS/APFS works, I don't think it's even possible for me to test how ExFAT might perform on my internal disk. At least not without major reconfiguration of my system drive.

If others are able to reproduce the disk corruption without relying on ExFAT at the host level, that at least helps eliminate the ExFAT-layer possibility of where the problem lies. At least for me, I've been able to avoid the problem by reformatting my external disk to APFS, as that seems to tweak at least one of the required variables to see this bug happen. At least if the Linux kernel version is new enough.

At a conceptual level, it is indeed possible that Linux may be doing nothing wrong at all. In other words, it could be possible that Linux just happens to be unlucky enough to express the disk usage patterns that can trigger a bug that presents symptoms as a corrupted (BTRFS in my case) file system. But I suspect that being able to positively acknowledge the difference between a somewhat unlikely to see Linux data corruption bug and a bug at the macOS hypervisor / storage level is probably beyond my skill set.

@wdormann
Copy link

Ok, just to throw a wrench into the works, I did notice my FreeBSD VM eventually experiencing disk corruption, but only after about a day or so of running the stress test. As opposed to the minute or two that it takes for Linux to corrupt itself.
Screenshot 2023-11-20 at 8 32 50 PM

The same VM clone but running from an APFS filesystem seems fine:
Screenshot 2023-11-20 at 8 32 37 PM

@mbentley
Copy link

mbentley commented Nov 21, 2023

So it seems like there are a lot of references to people mentioning issues related to external disks and non-APFS filesystems. I am using the internal disk on my m2 mini with the default APFS filesystem and I've experienced disk corruption once but haven't specifically been able to force it to be reproduced but I haven't tried very hard to be honest but I did want to point out that maybe external disks and other filesystems may not be the specific cause but may just be easier to trigger compared to internal APFS.

I run Debian Bookworm and after repairing the filesystem with a fsck I did also upgrade my kernel from linux-image-cloud-arm64 6.1.55-1 to 6.5.3-1~bpo12+1 in backports.

@afbjorklund
Copy link
Member

The above table also lists corrupting when running with qemu/hvf, so it might not even be unique to vz...

@EdwardMoyse
Copy link
Author

It is not unique to vz, and it is not unique to external disks.

With Almalinux 9.2 + kernel 6.6.2-1 I just got corruption with sudo yum update -y

:-(

@EdwardMoyse EdwardMoyse changed the title Apparent disk corruption with vz VM disk corruption Nov 22, 2023
@EdwardMoyse
Copy link
Author

Okay, I updated the title and the original comment to hopefully clarify that this is a problem with every conceivable permutation of lima.

Unfortunately for me lima is completely unusable at the moment, and so for the moment I'm giving up.

@wpiekutowski
Copy link

I can reproduce this with 2 methods: stress-ng --iomix 4 (for filesystems with data checksums) and parallel cp of big files and then sha256sum *. Details: utmapp/UTM#4840 (comment)

Are you able to reproduce this as well?

@afbjorklund
Copy link
Member

Okay, I updated the title and the original comment to hopefully clarify that this is a problem with every conceivable permutation of lima.

It still seems to be unique to one operating system and one hardware architecture, though? Maybe even Apple's issue.

@EdwardMoyse
Copy link
Author

Okay, I updated the title and the original comment to hopefully clarify that this is a problem with every conceivable permutation of lima.

It still seems to be unique to one operating system and one hardware architecture, though? Maybe even Apple's issue.

Sorry, yes. I was being very single-minded in my statement above! I will rephrase the title.

@EdwardMoyse EdwardMoyse changed the title VM disk corruption VM disk corruption with Apple Silicon Nov 22, 2023
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Nov 22, 2023

The above table also lists corrupting when running with qemu/hvf, so it might not even be unique to vz...

This issue might be worth reporting to https://gitlab.com/qemu-project/qemu/-/issues too, if the issue is reproducible with bare QEMU (without using Lima)

@wdormann
Copy link

At the risk of further fragmentation of the discussion of this issue, but at the potential benefit of getting the right eyeballs, I've filed: https://gitlab.com/qemu-project/qemu/-/issues/1997

(i.e., yes this can be reproduced with QEMU, as opposed to the Apple Hypervisor Framework)

@AkihiroSuda
Copy link
Member

This may fix the issue for vz:

( Thanks to @wpiekutowski utmapp/UTM#4840 (comment) @wdormann utmapp/UTM#4840 (comment) )

@EdwardMoyse
Copy link
Author

Oh wow - I've run my test twice with the patched version of lima and no corruption or crashes! From reading the ticket, it's more a workaround than a complete fix, but I'll happily take it! Thanks @AkihiroSuda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants