Libvirtd - Linux virtualisation

September 7, 2022 15:32

VIRTUAL MACHINE

VM - Bind to a VLAN-tagged interface

  1. Create the tagged interface(s - here 42 and 44 for example on eno1) on the hosts network configuration /etc/network/interfaces - in this example we’ll only dhcp on the untagged/parent interface (you could dhcp on them all, but this will confuse your dhcp as the same mac is used everywhere):
    allow-hotplug eno1
    iface eno1 inet dhcp
    
    auto eno1.42
    iface eno1.42 inet manual
    
    auto eno1.44
    iface eno1.44 inet manual
    
  2. Reconnect your virt-manager and the new virtual interface should be there to bridge into.

VM - Allow multicast packages

Multicast packages are generated by e.g. the avahi daemon or minidlna and is neccessary to use the avahi zeroconf service (needed for media streaming etc).

  1. Show all running vms: sudo virsh list
  2. Edit the xml file of the machine, which should be allowed to send out these packages: sudo virsh edit X
  3. Go down to the network interface which should be allowed (e.g. NOT the [LOCALSTORAGENET]) to do that and change the following code
    <devices>
    ...
    <interface type='XXX'>
    ...
    </interface>
    ...
    </devices>
    
    to
    <devices>
    ...
    <interface type='XXX' trustGuestRxFilters='yes'>
    ...
    </interface>
    ...
    </devices>
    
  4. Cold-boot the vm.

VM - Enable TRIM to save image space

This frees the unsuded space inside a vm also on disk. This may accelerates fragmentation, but on zfs it will take quite some time until that happens. The steps 1-2 are only needed for vms created before qemu v4.0 (indicated by the supported machine types (qemu-system-x86_64 -machine help), which should list pc-q35-4.1)! You can use with newer qemu versions the virtio storage directly, as it also then supports discarding (reference).

  1. Change disk type to “SCSI” and set discard="unmap"
  2. Change the controller type to “VirtIO SCSI” (virtio-scsi)
  3. Enable the trim service inside the vm (on older versions of Debian first run sudo cp /usr/share/doc/util-linux/examples/fstrim.{service,timer} /etc/systemd/system):
    sudo systemctl enable fstrim.timer
    sudo systemctl start fstrim.timer
    sudo fstrim -av
    
    The last command just trims the ssd/disks for the first time.
  4. Cold-boot the vm.

When the fstrim command exits too fast check with sudo lsblk -o MOUNTPOINT,DISC-MAX,FSTYPE for any 0B entries - in that case that disk does not support TRIM -> you have done something wrong. Also you may want to check with du -h [DISK_IMAGE] the really used space for the images (they should shrink during the first fstrim)…

Watchdog & Panic Notifier

Libvirt has a watchdog feature, which can e.g. reboot a vm on crash - other than the “panic notifier” device, which just powers the vm down. How to setup the watchdog:

  1. Add the Watchdog device into the vm
  2. Inside the vm:
    sudo apt install watchdog
    sudo systemctl enable watchdog
    
  3. Enable the device inside the service config /etc/watchdog.conf:
    watchdog-device = /dev/watchdog
    realtime        = yes
    priority        = 1
    
  4. Cold-Boot the vm. If you ever wich to test the watchdog, you may crash the kernel with sync; echo c > /proc/sysrq-trigger as root!

VM - Install windows support here

SERVER

Setup

  1. sudo apt install libvirt-daemon-system libvirt-clients qemu-kvm qemu-utils ovmf
  2. sudo apt install dnsmasq

libvirtd does not require firewalld and ebtables (or explicit ovmf) anymore on Debian 11.

Allow a user to control the kvm

  1. sudo adduser [USER] kvm
  2. sudo adduser [USER] libvirt

Enable automatic freezing of guests at host reboot

Install those

sudo apt-get install -y python3 python3-libvirt

This are the needed files & scripts:

  • Service file /etc/systemd/system/libvirtd-guard.service
    [Unit]
    Description=Save some libvirtd guests to disk or destroy them
    Wants=libvirtd.service
    Requires=virt-guest-shutdown.target
    # libvirt-guests.service is in "After", because during shutdown this order reversed!
    After=network.target
    After=time-sync.target
    After=libvirtd.service
    After=virt-guest-shutdown.target
    After=libvirt-guests.service
    
    [Service]
    Type=oneshot
    TimeoutStopSec=0
    RemainAfterExit=yes
    StandardOutput=journal+console
    ExecStart=/usr/bin/python3 -u /root/libvirtd-guard.py --plymouth start
    ExecStop=/usr/bin/python3 -u /root/libvirtd-guard.py --plymouth stop
    
    [Install]
    WantedBy=multi-user.target
    
  • Main script to /root/libvirtd-guard.py
    #!/usr/bin/python3
    import sys, time, logging, datetime, argparse, subprocess
    import libvirt # apt install python3-libvirt
    import xml.etree.ElementTree as XML_ET
    logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s', level=logging.INFO)
    
    parser = argparse.ArgumentParser()
    parser.add_argument('action', help='Perform the tasks for the "start" or "stop" action')
    parser.add_argument('--dry-run', help='Do not really shutdown or save anyone', action='store_true')
    parser.add_argument('--plymouth', help='Enable plymouth splash screen integration (shows progress there during the action)', action='store_true')
    parser.add_argument('--timeout', help='How long should we wait for machines to shutdown on their own?', type=int, default=300)
    parser.add_argument('--reminder-interval', help='How long until the shudown signal is repeated?', type=int, default=5)
    args = parser.parse_args()
    
    needToHideMessages = False
    showPlymouthWarning = True
    def log(msg):
        global showPlymouthWarning, needToHideMessages
        logging.info(msg)
        if args.plymouth:
            msg = msg.replace('"', '\\"')
            try:
                subprocess.run(['plymouth', 'display-message', '--text', msg], timeout=2).check_returncode()
                needToHideMessages = True
            except subprocess.CalledProcessError as e:
                if showPlymouthWarning:
                    logging.warning(f'Failed to display message on plymouth ({e.returncode})!')
                    showPlymouthWarning = False
            except subprocess.TimeoutExpired:
                logging.warning('Timeout while displaying message on plymouth!')
    
    try:
        if args.action == 'start':
            logging.info('I am just here for the after-party! Doing nothing.')
        elif args.action == 'stop':
            end = datetime.datetime.now() + datetime.timedelta(seconds=args.timeout)
            log(f'Every running virtual machine has now {args.timeout} seconds (until {end}) to shut itself down.')
    
            conn = libvirt.open('qemu:///system')
            if conn is None:
                raise RuntimeError('Failed to connect to libvirt!')
    
            domainIDs = conn.listDomainsID()
            if domainIDs is None:
                raise RuntimeError('Failed to retrieve list of domains!')
    
            # Determine if some VM has the LIBVIRTD_GUARD_SAVEME flag. If yes, this vm will be saved and not shutdowned!
            vms = []
            vmsToSave = []
            vmsToShutdown = []
            for domainID in domainIDs:
                domainHandle = conn.lookupByID(domainID)
                xmlRoot = XML_ET.fromstring(domainHandle.XMLDesc())
                xmlDesc = xmlRoot.find('description')
                save = xmlDesc != None and xmlDesc.text.find('LIBVIRTD_GUARD_SAVEME') != -1
                vms.append(domainHandle)
                if save:
                    vmsToSave.append(domainHandle)
                else:
                    vmsToShutdown.append(domainHandle)
    
            log(f'Following virtual machines are currently running: {", ".join([d.name() for d in vms if d.isActive()])}')
    
            for domain in vmsToShutdown:
                log(f'Sending shutdown signal to {domain.name()}...')
                try:
                    if not args.dry_run:
                        domain.shutdown()
                except Exception as e:
                    # In case the domain is already shutdown...
                    logging.exception(f'Failed to shutdown {domain.name()}!')
    
            for domain in vmsToSave:
                log(f'Creating a managed save state of {domain.name()}...')
                try:
                    if not args.dry_run:
                        domain.managedSave()
                except Exception as e:
                    # In case the domain is already... Saved?
                    logging.exception(f'Failed to save {domain.name()}!')
                    if not args.dry_run:
                        domain.shutdown()
    
            while datetime.datetime.now() < end:
                stillRunning = [d for d in vmsToShutdown if d.isActive()]
                if len(stillRunning) == 0:
                    break
                log(f'Following virtual machines are still running ({(end - datetime.datetime.now()).seconds + 1}s): {", ".join([d.name() for d in stillRunning])}')
                for domain in stillRunning:
                    if domain.isActive():
                        try:
                            if not args.dry_run:
                                domain.shutdown()
                        except Exception as e:
                            # In case the domain is already shutdown since our initial check...
                            logging.exception(f'Failed to shutdown {domain.name()}!')
                time.sleep(args.reminder_interval)
    
            for domain in vms:
                if domain.isActive():
                    log(f'Destroying {domain.name()}, because it failed to shutdown in time!')
                    try:
                        if not args.dry_run:
                            domain.destroy()
                    except Exception as e:
                        # In case the domain is shutdown on its own (close one)
                        logging.exception(f'Failed to destroy {domain.name()}!')
    
            if conn != None:
                conn.close()
    
            log('Okay - all virtual machines destroyed.')
        else:
            logging.critical('Huh? Unknown action operator!')
            sys.exit(1)
    
    except Exception as e:
        logging.exception(f'Whoops, something went very wrong: {e}')
        sys.exit(2)
    except:
        logging.exception('Whoops, something went very wrong?!')
        sys.exit(3)
    
    if needToHideMessages:
        try:
            # Not using hide-message, as some screens just ignore that?!
            subprocess.run(['plymouth', 'display-message', '--text', ''], timeout=2)
        except subprocess.CalledProcessError as e:
            logging.warning(f'Error while hiding message on plymouth ({e.returncode})!')
        except subprocess.TimeoutExpired:
            logging.warning('Timeout while hiding message on plymouth!')
    

Install the libvirtd-guard service

  1. Add the two files from above
  2. Set permissons for them sudo chmod 500 /root/libvirtd-guard.py
  3. Enable the new service with sudo systemctl enable libvirtd-guard

Use it!

Normally every vritual machine will be just shutdown with the system. But every virtual machine with the text LIBVIRTD_GUARD_SAVEME in its description will be saved (and restored during boot, if auto-start is enabled) instead!

Set static IPs for the VMs

…inside any isolated network, hosted by the host itself - just modify the respective network config with sudo virsh net-edit [LOCALSTORAGENET_NAME] and add:

<network>
...
<dhcp>
...
<host mac='52:54:00:6c:3c:01' name='vm1' ip='192.168.122.11'/>
<host mac='52:54:00:6c:3c:02' name='vm2' ip='192.168.122.12'/>
...
</dhcp>
...
</network>

Thanks Serverfault!

Shared folders

Some performance-notes out of the real world:

  • NFS is fast, but is more complex to manage as it is IP authentication based (or you get Kerberos somehow working)
  • Samba works, but it is complicated to setup any new user as always a corresponding system user is needed… Also special characters break things…
  • 9p just works, but is always slow (small package size -> good directory listing, but bandwith limit by cpu performance; big package size -> painfully slow directoy listing, but good bandwith) - also the caching is funny. Also special characters break things…

NFS with ZFS

Make sure (on both client and server) that NFS is installed already:

sudo apt install nfs-kernel-server

Also make sure to use for the [SEVRER_IP] a ip, which does not utilizes any bridged interface between guest and host. This won’t work!

Server: Share creation

Enable pool export:

sudo zfs set sharenfs=on [POOL_NAME]

Instead of a simple on you could also pass any NFS option to it - here some examples:

  • rw/ro: Set the write mode - append e.g. [email protected] or [email protected]/24 to restrict it to specific clients/networks
  • root_squash*/no_root_squash: Should the root uid remapped to an anonymous request? Needed when chown should work…
  • all_squash/no_all_squash*: Should every client uid remapped to ↓?
  • anonuid & anongid: Set the remapping target uid/gui (defaults to the user nobody)

* -> Default!

Client: Mount it!

After exporting the dataset on the server, query the exact name on the clients by using:

showmount --exports [SERVER_IP]

Then you can mount it for testing purposes directly with:

sudo mount -t nfs [SERVER_IP]:[ZFS_EXPORT_PATH] [TARGET_PATH]

You may add it into the /etc/fstab:

[SERVER_IP]:[ZFS_EXPORT_PATH] [TARGET_PATH]  nfs      defaults,_netdev,x-systemd.automount,x-systemd.requires=network-online.target    0       0

You may also append x-systemd.idle-timeout=5m to only mount the share when needed (it gets umounted after the specified time without access). Also here an interesting NFS option:

  • soft/hard: When hard NFS will retry the connection forever when it fails (freezes the application triggering it; NFS defaults to hard!)

KVM (9p)

Just add a new mapped shared folder with a new [TARGET_PATH]. To mount it, just insert following line into the guests /etc/fstab:

[TARGET_PATH]    [LOCAL_PATH]       9p      trans=virtio,version=9p2000.L,msize=262144,_netdev    0       0

IF you get emergency boot failures - insert the following into /etc/initramfs-tools/modules:

9p
9pnet
9pnet_virtio

…and update sudo update-initramfs -u!

If the listing of much files is too slow, try enabling the cache (copied from here):

cache=mode	specifies a caching policy.  By default, no caches are used.
        none = default no cache policy, metadata and data
                alike are synchronous.
        loose = no attempts are made at consistency,
                intended for exclusive, read-only mounts
        fscache = use FS-Cache for a persistent, read-only
	            cache backend.
        mmap = minimal cache that is only used for read-write
                mmap.  Northing else is cached, like cache=none

Samba (CIFS)

Install server…

sudo apt install samba

Add a virtual isolated network for loopback communication with the host and vm

  • Make sure to enable DHCP, so the host will listen to the clients (instead being REALLY isolated).
  • Add this interface (e.g. virbr1) to the firewall (trusted zone is okay - because the VMs should have a second interface anyway which is in the same network like the host)…
  • Note that the host can contact the VMs ONLY using that networks IPs from this network!
  • Because the host is always faster than the other network interfaces you REALLY SHOULD apply the following fix:
    1. Use the command sudo virsh net-edit [LOCALSTORAGENET_NAME] to open the xml-configuration-file of the virtual network.
    2. Add there the following code (if you add any other entry than the one domain=… the host will resolve the request for the client - so don’t be confused if the /etc/resolv.conf specifies then the host as dns provider)…
      <network>
      ...
      <dns>
      <forwarder domain='router.domain'/>
      <forwarder addr='1.1.1.1'/>
      </dns>
      ...
      </network>
      
      …to forward any request to either the real network dns provider or e.g. Cloudflare!
    3. Save it, restart the network and reboot any vms to apply the fix!

Setup the smb.conf to…

#THIS ALL REQUIRES samba
#This is lacated at /etc/samba/smb.conf

[global]
#Network stuff
workgroup = WORKGROUP
server string = %h
#Following: Set it to the servers local IP (the one from virbr1 / localhost)
#hosts allow = localhost 127.0.0.1 192.168.0.0/24
#hosts deny = 0.0.0.0/0
dns proxy = no
disable netbios = yes
name resolve order = bcast host

#Permissions USE sudo smbpasswd -a USER to add user, USE sudo smbpasswd -x USER to remove user
guest account = nobody
security = user
encrypt passwords = true
invalid users = root
guest ok = no

#Stuff
unix extensions = yes
unix password sync = no
usershare owner only = yes
#Log size in Kb
max log size = 50

#Server role inside the network
server role = standalone server

#Fix the permissions to allow group access!
#force user = [USER (Only if neccessary)]
force group = [FSgroup]
#Following seems to be useless with the following fixes...
#create mask = 770
#FIX permission: File: UPPER bound for the bits
create mode = 770
#FIX permission: File: LOWER bound for the bits
force create mode = 770
#FIX permission: Directory: UPPER bound for the bits
directory mode = 770
#FIX permission: Directory: LOWER bound for the bits
force directory mode = 770

#
#NOTE:
#browseable = no -> Hidden share
#

[Share1]
    path = [PATH]
    available = yes
    #Following to hide it anyways!
    browseable = no
    guest ok = no
    #Following to make read only if no user is in the write list!
    writeable = no
    valid users = [VirtUsers]
    write list = [VirtUsers]

VM - Allow a vm access to a specific share…

Nett2Know: Use sudo pdbedit -L to get current user list…

  1. Add an account on the host (nologin, nohome) with sudo adduser --no-create-home --shell /usr/sbin/nologin --disabled-login [USER]
  2. Add this account to the FSgroup sudo adduser [USER] [FSgroup]
  3. Allow samba to map to this account (now is a good PWD neccessary) sudo smbpasswd -a [USER]
  4. Add the account to the shares at the smb.conf
  5. Add the share to the vm and save the credentials there (next paragraph)

Setup a vm to access and mount a specific share

Add this to fstab (it will mount on first access - this is neccessary, because some (…) systemd instances ignore the _netdev option) //[HOST_LOCALSTORAGENET_IP]/[SHARE_NAME] [TARGET_PATH] cifs noauto,x-systemd.automount,x-systemd.idle-timeout=5m,_netdev,nouser,mapchars,cache=strict,noacl,credentials=[CREDENTIAL_FILE (e.g. /root/creds)],domain=workgroup,uid=root,gid=[VM_SHARED_FOLDER_GROUP],file_mode=0770,dir_mode=0770 0 0 On cd-failures with error -13 you fucked up the password or username! Use cache=strict to fix ghosting folders (if they still appear use ’none’ - BUT THIS WILL IMPACT PERFORMACE). When there are no ghosting folders or files you can try to use ’loose’ to further improve performance.

Setup a vm to make shares available (needed only ONCE)…

  1. Install cifs sudo apt install cifs-utils
  2. Add the host localstorage interface to /etc/network/interfaces: iface [INTERFACE_NAME] inet dhcp
  3. Add a group for the shares sudo addgroup [VM_SHARED_FOLDER_GROUP]
  4. Add a user to this group sudo addgroup [USER (e.g. www-data)] [VM_SHARED_FOLDER_GROUP]
  5. Create the authentication file (e.g. /root/creds):
    username=[USERNAME]
    password=[PASSWORD]
    
  6. Set permissons for the credential file sudo chmod 500 [CREDENTIAL_FILE (e.g. /root/creds)]

CLIENT(S)

Setup management-client

sudo apt install virt-manager spice-client-gtk gir1.2-spiceclientgtk-3.0

Setup viewonly-client

sudo apt install virt-viewer

MORE INFO