-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add workarounds to prevent filesystem corruption on Apple Virtualization VMs. #5919
Conversation
This successfully workarounds the file corruption problem. I didn't notice any new issues so far with NVMe. |
Updated to include setting the caching mode to .cached for macOS 12 hosts. Now, there is uncertainty about whether to continue using nvme as the default for creating new VMs, as |
I'd say for Linux, we use Edit: ⬆️ was about doing all these behind the scenes or having good defaults for new users. Otherwise if NVMe isn't enabled by default for macOS >= 14 and caching for older macOS, then people will still experience corruption and will have to live with it (and blame Linux) or research what to do. In both cases they will waste their lives on dealing with this. My reasoning is that it seems that BTW It might be good idea to let user decide which mode is the best for their use case, but that would have to be an advanced setting with some explanations. The same applies to synchronization modes. But I think that should be first discussed with @osy as UI changes are needed. Looking at QEMU support, where one can adjust almost everything on a very low level, I'd say this is something that would be consistent with the general look and feel of UTM. |
I see that NVMe defaults to |
Defaulting to NVMe isn't great because it might render some VMs unbootable. It might be a good idea to figure out how to add more info about NVMe and why it was introduced and pros/cons. |
This PR does not change the options of existing disks. Instead, when a user creates a new VM, the new disk created with this VM will be set as an NVMe drive on macOS 14+ hosts by the following lines in the wizard: UTM/Platform/Shared/VMWizardState.swift Lines 256 to 259 in 6c02a1a
This is what I meant by "default". Not magically using NVMe for every disk, but to decide what to use for newly created VMs. |
Thanks, this makes sense. |
I did some more testing and for some workloads, like unit tests, I see about 15-20% slowdown with NVMe drive which is getting significant for my use case. I'm going to run my VMs in virtio mode with host caching. |
Now I'm thinking it again on using nvme as default for new VMs. @wpiekutowski did you experience any issues with forced cached mode on virtio yet? If the virtio is good enough then probably we can default to that for better performance. |
I haven't used UTM much last week, so can't say one way or another. |
When I tested with Fedora, it was using the upstream Fedora kernel (the 6.5 branch kernel) and did not exhibit this issue. It now appears that macOS 14.2 might have modified the NVMe device. Experiencing a straightforward NVMe I/O error is something we haven't encountered before. I'm re-testing in the same environment using stress-ng to see if I can reproduce it. It's possible that the NVMe disk also requires the caching mode to be set to |
Strange, my environment used with pre-14.2 with a fast 10-minute stress-ng cannot reproduce this. @wangqiang1588 does this always occur or does it happens randomly? |
I Install debian 12.4.0 twice , it happens twice.
always fails on "repo sync" command then i change it to caching + viritio ,same steps , success same steps ubuntu 22.04.3 linux kernel 5.15 nvme disk never failure happens (same aosp zip file) looks like some nvme driver changed after 5.15 kernel make this failure |
On NixOS unstable channel + Linux kernel 6.6.7 using uncached NVMe all my tests are passing. No disk corruption during |
maybe the number of files make this bug, "repo sync" make a lot of files at very short time (.h, .c ,.cpp,.java ...) |
Fedora 39 Test ,not found this bug, looks like only happen on linux kernel 6.1 , let me try to upgrade debian's kernel |
today upgrade macOS Sonoma 14.2 to 14.2.1 every thing works again . linux kernel 6.1 linux kernel 6.5 both works fine on debian 12.4 maybe this is a bug of macOS Sonoma 14.2 . |
Can confirm tart works perfectly for me (see reference to cirruslabs/tart above). No disk corruption after two days. Tart uses NVME disks. Host: Sonoma 14.2.1. Guest: Ubuntu 24.04 (Noble Numbat). Kernel 6.6.7. |
@gnattu Is this ready to be merged? Can you rebase from the main branch? |
This also helps prevent filesystem corruption on Apple Virtualization and works for pre-Sonoma hosts.
@osy I just rebased from main. One change I've made is that I reverted the nvme as default and prefers virtio with cached mode because there are new reports that the nvme mode is still causing errors in some cases where the virtio + cached mode works just fine. |
🎉 can’t wait. |
Any news on when this might get merged and released? |
An early beta release with this fix is much appreciated as some other projects did. Otherwise it is practically not possible to use a stable linux vm with apple virtualization, no? |
For reference this seems to be solved here some time ago lima-vm/lima#2026 And there seems to be no issues about corruption after that. |
@osy I apologize for pinging, do you think this could be merged? UTM experience is just so buggy without this. |
This will be included in the next release. If you need this now, you can follow the build instructions along with the prebuilt dependencies. Having multiple people ask about a release isn't going to speed it up. If you want to help maintain this project please let me know. |
I'm already using a version built by myself with the patch, but I need to keep it up-to-date and it isn't signed. As I said, I apologize for pinging, I just wanted to make sure that this didn't accidentaly slip your mind. I understand the effort an open source maintainer needs to put into his projects, so, thank you! No claim or demand on my part. If I can help with testing this (and reporting my test findings) I'll do it. But I don't think I have the bandwidth to become a maintainer for this project. |
I guess this is a fairly helpfull issue. People put info, solution and test. I dont think there is a room for apologize since this is a corruption issue, there is a fix and there are sincere people trying to be helpful. Since this is 4 month old fix and it is we people tend to forget among other things people are just trying to land a kindly reminder. |
This is a workaround for #4840. Apple introduced a new
VZNVMExpressControllerDeviceConfiguration
in macOS 14 Sonoma's Virtualization Framework, emulating an NVMe disk for Linux guests. This emulated disk is slower than the virtio disk from my initial testing, but it does not have the filesystem corruption issue present on the default virtio interface in most Linux kernels. macOS guests cannot use this device configuration, but they do not experience the filesystem corruption issue in the first place. On macOS 12+ hosts that cannot useVZNVMExpressControllerDeviceConfiguration
, the cache mode of virtio drives in Linux VMs are changed to.cached
because it is verified to also fix the filesystem corruption: #4840 (comment)This PR:
.cached