You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in this issue, I would like to share my experience in writing a volume plugin for zfs filesystem for the smapiv3. In this implementation, the volume plugin for zfs represents Storage Repositories (SR) as pools and Volumes as zfs volumes. A zfs volume is a dataset that represents a block device and is created in the context of a filesystem. When creating a volume, it can be accessed as a raw block device, e.g., /dev/zvol/pepe. Zfs supports operations over volumes like snapshot(), clone() or promote(). This simplifies the driver, but, in some cases, the actual implementation becomes a bit tricky because the XAPI does some operations in a certain order that can't be done in a zfs filesystem in the same order. I though this PoC could be interesting to understand better why those tasks are tricky. This is a summary and there are still many things that I am not sure about.
First of all, the xe sr-create command ends up invoking zpool create [name] to create a pool. The only required parameter is the block devices in which the zfs fs will be installed. The xe vdi-create ends up invoking zfs create -V [pool] [name] [size]. This creates a new volume in the pool. The volume can be accessed as a raw block device at /dev/zvol/[name]
In zfs, snapshots are taken by relying on the zfs snapshot [volume] command. When a new snapshot is created, the new volume.id is used as the name for the snapshot. Snapshots are read-only volumes. To access the snapshot, we have to either mount it or create a clone from the snapshot. Otherwise, the snapshot is not accessible. The current PoC always creates a clone when a snapshot is taken. The clone is named as the snapshot but it belongs to the pool where is created. You can see this in the following output. When the snapshot @2 is created the cloned volume 2 is created too.
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
hola 10,3G 8,58G 48,5K /hola
hola/1 10,3G 18,9G 12K -
hola/1@2 0B - 12K -
hola/2 0B - 12K -
This cloned volume is important when issuing xe vm-copy, i.e., create-vm-from-snapshot, since the command requires accessing the volume to copy the content for the new VM's VDI. This is not possible if the volume is a snapshot.
Another example is the command xe snapshot-revert. This command reverts the state of a VM from a snapshot. The first step tries to destroy the current VDI. However, this is not possible since the current VDI has children, e.g., the snapshot. The correct way to do it is to directly clone from the snapshot, promote the new volume and finally destroy the main VDI. The current PoC only accepts snapshot for the clone method and it worked around to destroy the parent VDI just after the new volume is promoted. The current implementation of xe snapshot-revert is as follows:
Destroy parent vdi but fail to remove it from db and zfs
Clone from snapshot
A volume is created from the snapshot
The volume is promoted
The parent volume is destroyed
The children of the main volume are promoted to the clone.
Note that this works only if reverting is from the latest snapshot. Otherwise, the main volume can't be destroyed because there are still newer snapshots that can't be promoted to the new clone.
The current implementation of volume destroy relies on the zfs destroy command. The method checks if the volume is a snapshot or a volume. If it is a snapshot, the method first builds the correct path to the snapshot, and then, destroys it. The method also checks if there is a clone with the same name and destroys it. This aims at removing the clone that the snapshot command creates. Note that when trying to destroy a volume with children, the zfs destroy fails but the vdi-destroy success.
This is an overall summary of the current PoC. I may be missing some chunks. I may release a design document soon that explains all the implementation details and decisions.
The text was updated successfully, but these errors were encountered:
MatiasVara
changed the title
Take-away lessons of volume plugin (PoC) for zfs
Take-away lessons of a volume plugin (PoC) for zfs
Aug 25, 2022
The code here was merged into xapi-project:xen-api
We can set up a session this week, probably with @edwintorok to speak about smapi v3 and future devekopment
Hello everyone,
in this issue, I would like to share my experience in writing a volume plugin for zfs filesystem for the smapiv3. In this implementation, the volume plugin for zfs represents Storage Repositories (SR) as pools and Volumes as zfs volumes. A zfs volume is a dataset that represents a block device and is created in the context of a filesystem. When creating a volume, it can be accessed as a raw block device, e.g.,
/dev/zvol/pepe
. Zfs supports operations over volumes like snapshot(), clone() or promote(). This simplifies the driver, but, in some cases, the actual implementation becomes a bit tricky because the XAPI does some operations in a certain order that can't be done in a zfs filesystem in the same order. I though this PoC could be interesting to understand better why those tasks are tricky. This is a summary and there are still many things that I am not sure about.First of all, the
xe sr-create
command ends up invokingzpool create [name]
to create a pool. The only required parameter is the block devices in which the zfs fs will be installed. Thexe vdi-create
ends up invokingzfs create -V [pool] [name] [size]
. This creates a new volume in the pool. The volume can be accessed as a raw block device at/dev/zvol/[name]
In zfs, snapshots are taken by relying on the
zfs snapshot [volume]
command. When a new snapshot is created, the newvolume.id
is used as the name for the snapshot. Snapshots are read-only volumes. To access the snapshot, we have to either mount it or create a clone from the snapshot. Otherwise, the snapshot is not accessible. The current PoC always creates a clone when a snapshot is taken. The clone is named as the snapshot but it belongs to the pool where is created. You can see this in the following output. When the snapshot @2 is created the cloned volume 2 is created too.This cloned volume is important when issuing
xe vm-copy
, i.e., create-vm-from-snapshot, since the command requires accessing the volume to copy the content for the new VM's VDI. This is not possible if the volume is a snapshot.Another example is the command
xe snapshot-revert
. This command reverts the state of a VM from a snapshot. The first step tries to destroy the current VDI. However, this is not possible since the current VDI has children, e.g., the snapshot. The correct way to do it is to directly clone from the snapshot, promote the new volume and finally destroy the main VDI. The current PoC only accepts snapshot for the clone method and it worked around to destroy the parent VDI just after the new volume is promoted. The current implementation ofxe snapshot-revert
is as follows:Note that this works only if reverting is from the latest snapshot. Otherwise, the main volume can't be destroyed because there are still newer snapshots that can't be promoted to the new clone.
The current implementation of volume destroy relies on the
zfs destroy
command. The method checks if the volume is a snapshot or a volume. If it is a snapshot, the method first builds the correct path to the snapshot, and then, destroys it. The method also checks if there is a clone with the same name and destroys it. This aims at removing the clone that the snapshot command creates. Note that when trying to destroy a volume with children, thezfs destroy
fails but the vdi-destroy success.This is an overall summary of the current PoC. I may be missing some chunks. I may release a design document soon that explains all the implementation details and decisions.
The text was updated successfully, but these errors were encountered: