Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-130126 / 24.10 / Wipe unused boot-pool disks #89

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

themylogin
Copy link
Contributor

No description provided.

@themylogin themylogin requested a review from a team July 19, 2024 13:20
@bugclerk
Copy link

@bugclerk bugclerk changed the title Wipe unused boot-pool disks NAS-130126 / 24.10 / Wipe unused boot-pool disks Jul 19, 2024
@themylogin
Copy link
Contributor Author

time 1:00

@bugclerk
Copy link

Copy link
Contributor

@yocalebo yocalebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been working on this for a few days now. This won't fix the issue. The only way to reliably remove the zpool labels is to run zpool labelclear -f /dev/disk<p1/2/3>

@@ -42,14 +46,18 @@ async def install(disks, set_pmbr, authentication, post_install, sql, callback):
raise InstallError(f"Command {' '.join(e.cmd)} failed:\n{e.stderr.rstrip()}")


async def format_disk(device, set_pmbr, callback):
async def wipe_disk(device, callback):
if (result := await run(["wipefs", "-a", device], check=False)).returncode != 0:
callback(0, f"Warning: unable to wipe partition table for {device}: {result.stderr.rstrip()}")

# Erase both typical metadata area.
await run(["sgdisk", "-Z", device], check=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, the double call to sgdisk -Z is completely unnecessary since wipefs -a does everything we need it to.

@themylogin
Copy link
Contributor Author

themylogin commented Jul 19, 2024

@yocalebo IIRC we agreed on this general idea some time ago. They only thing that needs to be fixed is the wipe_disk function definition.

@yocalebo
Copy link
Contributor

@yocalebo IIRC we agreed on this general idea some time ago. They only thing that needs to be fixed is the wipe_disk function definition.

Sorry, so what are we trying to do here? You want to wipe any disk that has a boot-pool label on them so truenas boots up properly after install, right?

@themylogin
Copy link
Contributor Author

@yocalebo I am offering a UI to do it. I worked under assumption that the way we wipe the disk is correct. If it is not, I can either make changes here (wipe each partition individually?) or rebase on top of your code that fixes the issue.

@yocalebo
Copy link
Contributor

I see, so it just so happens that you're working on same area where there is a bug. The function we have right now wipes partition tables but it doesn't touch any filesystem labels. The issue that was found be QE recently is kind of annoying. They fresh install 24.04.2 and choose to install using no SWAP. They then fresh install using swap. Because of where zfs puts the zpool label information, the swap partition now has a boot-pool label inside the swap partition. Only way to clean that is to do zpool labelclear -f /dev/disk<swap-partition>.

admin@qe-realmini[~]$ sudo parted /dev/sde unit s p
Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sde: 234441648s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start      End         Size        File system  Name  Flags
 1      4096s      6143s       2048s                          bios_grub, legacy_boot
 2      6144s      1054719s    1048576s    fat32              boot, esp
 4      1054720s   34609151s   33554432s   zfs                swap
 3      34609152s  234441614s  199832463s  zfs

Notice the zfs name inside the swap partition. Even if you do wipefs -a and sgdisk -Z it won't touch the zpool label offsets inside that partition table. Only way I've found to reliably wipe the disk from those labels is to run zpool labelclear -f /dev/disk<blah. You can actually run zpool labelclear -f /dev/disk on the whole disk, but I started to test that a bit more thoroughly and then I got pulled off onto another issue.

My branch doesn't have many changes. It simply removes the sgdisk -Z calls (literally no reason to do this). And then i added a zpool labelclear -f /dev/disk call. This is easily tested on a live system btw. To test the zpool labels are cleared you can run zdb -l /dev/disk

@yocalebo
Copy link
Contributor

Btw, I did some historical investigation and found a commit you made here: https://ixsystems.atlassian.net/browse/NAS-108809 but this has never worked. wipefs -a -t zfs_member doesn't work 😄

@themylogin themylogin force-pushed the installer-boot-pool-erase branch 3 times, most recently from 7e359ad to 1a1f02b Compare July 19, 2024 16:20
@themylogin
Copy link
Contributor Author

@yocalebo I tested it and it works

@themylogin
Copy link
Contributor Author

time 1:00

with installation_lock:
try:
if not os.path.exists("/etc/hostid"):
await run(["zgenhostid"])

for disk in disks:
for disk in destination_disks:
Copy link
Contributor

@yocalebo yocalebo Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've changed it so that you're iterating destination_disks but you're passing the disks list object into format_disks. This is sloppy. The same goes for for disk in wipe_disks. If you kept those there for logging purposes, then we should moving the logging into format_disk and wipe_disk. Also, because we're now dealing with multiple disks, we should rename the functions to wipe_disks and format_disks accordingly.

@themylogin themylogin force-pushed the installer-boot-pool-erase branch 2 times, most recently from cac2ccb to 27753cf Compare July 22, 2024 13:55
@themylogin themylogin requested a review from yocalebo July 22, 2024 17:20
@themylogin themylogin force-pushed the installer-boot-pool-erase branch from 27753cf to 587bc0b Compare July 23, 2024 07:39
@themylogin themylogin merged commit 1301af8 into master Jul 23, 2024
2 checks passed
@themylogin themylogin deleted the installer-boot-pool-erase branch July 23, 2024 07:40
@bugclerk
Copy link

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants