VT-d, vfio and GPU passthrough, Virtualization in a nutshell (RHEL8)

It's really an bad idea to build a machine just for gaming so when an unexpected hardware arrives to my hand: A free RTX-2060 super but without a machine to power it, I decided to build it in a virtual way: a virtual machine with Windows and GPU attached to it.

You can imaging it as another kind of "service" provided by the server, a kind of private Google Stadia, if any demand for gaming or Windows based job arrives, start up VM and it's ready to be connected by remote control software, for example: parsec.

Pros: Better management when it's a VM, and better migration possibility.
Better support for running Hackintosh.
No affects to other service running in the same hardware.

Cons: Some performance lost compare to Bare Metal.
Hard to configure and more than a bit spare time than watching USB install.
Gaming is a resource intensive thing, it requires a lots of CPU power.

this tutorial is only for Red Hat Enterprise Linux 8, possibly works on CentOS 8

Assumes you have VT-d(Virtualization Technology Direct I/O) or AMD-v enabled CPU and motherboard, pretty all hardware today you consider to be a viable virtualization option have it. including some of Sandy Bridge processor most of Ivy Bridge processor, most of LGA2011 Xeon processor or Sandy Bridge-E processor, and P67 motherboard, X79/C602 motherboard and so on.

For operating system, Linux is my choice, familiar for me and works well enough. It's a bit heavier than ESXi if you have a regular install, ESXi does support PCI device pass-though and works with GPU.

First is open VT-d option on the motherboard, in some BIOS, it is not on by default, for example on Asus X99 motherboard it looks like:

Virtualization Technology and VT-d (Intel VIrtualization Technology direct io)

And make sure you have second graphics card (like an ASPEED onboard graphics) other than the GPU you are going to pass-though, make Xorg or wayland lights up on that card instead.

I do this by add this to my /etc/X11/xorg.conf.d folder:

Section "Device"
 Identifier                "Device0"
 Driver                    "ast"
 VendorName                "ASPEED Technology"
 BusID          	   "PCI:7:0:0"
EndSection

10-aspeed.conf

And then we can set up grub by edit /etc/default/grub, append the following to GRUB_CMDLINE_LINUX :

GRUB_CMDLINE_LINUX= "...[This is where your previous parameter goes]... intel_iommu=on"

Use command grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg to update the EFI boot parameter and then reboot.

After boot up host device, checkout dmesg | grep IOMMU to confirm you have VT-d correctly enabled, output should be like [ 0.000000] DMAR: IOMMU enabled. And then checkout IOMMU group you want to pass-though:

#!/bin/bash
shopt -s nullglob
for g in /sys/kernel/iommu_groups/*; do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

Source: Arch Linux wiki

The /sys/kernel/iommu_groups/ folder contains group of devices you can split, you should pass one entire group to a VM at once, split targets in group can cause problem.

Now, we have to modify dracut in order to load vfio-pci and other kernel module correctly, the effect is similar like modify mkinitcpio in Arch Linux.

Add a file into /etc/dracut.conf.d/ , and then generate a new initramfs image like this; reboot after this process:

echo add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd" > /etc/dracut.conf.d/local.conf
dracut -f --kver `uname -r`

Then we can tell vfio to claim device during boot sequence, to locate device id you want to passthrough, use lspci -nn to show name and device:

......
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106 [GeForce RTX 2060 SUPER] [10de:1f06] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU106 USB 3.1 Host Controller [10de:1ada] (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C UCSI Controller [10de:1adb] (rev a1)
02:00.0 Non-Volatile memory controller [0108]: Intel Corporation Optane SSD 900P Series [8086:2700]
03:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:5009] (rev 01)
......
08:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142]
......

here I want to passthrough NVIDIA TU106, Sandisk NVMe drive, and ASMedia USB Host device, I only have to claim VGA, Audio and Serial bus controller, like this:

GRUB_CMDLINE_LINUX= "...[This is where your previous parameter goes]... kvm.ignore_msrs=1 vfio-pci.ids=10de:1f06,10de:10f9,10de:1adb video=efifb:off"

ignore_msrs is to prevent some issue where your CPU isn't supported, you can ignore this line if didn't encounter any crash issue in guest, vfio-pci.ids is those device ids you want to passthrough listed before, and video=efifb:0ff is to turn off efi frame buffer.

Now you can setup virtual machine freely, UEFI firmware is what I tested, make sure edk2-ovmf is installed, and setup virtual machine by it:

And now you can add PCI device from Add Hardware, and start to install virtual machine.

To hide KVM status from being detected by NVIDIA driver, modify virtual machine XML file like:

<domain type="kvm">
...
  <feature>
    <kvm>
      <hidden state="on"/>
    </kvm>
  </feature>
...
  <qemu:commandline>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null'/>
  </qemu:commandline>
</domain>

This can work without qemu:commandline stuff

A small note: remove Display Spcie and Video QXL(or anything you have), to output graphics only to GPU you have passthrough.
And not all GPU is supported by OVMF firmware, you can install setup machine first and then delete virtual graphics.