首页 / 平台管理 / 虚拟化管理 / 物理 GPU 直通环境准备

物理 GPU 直通环境准备

虚拟机物理 GPU 直通是指在虚拟化环境中，将实际的图形处理单元（GPU）直接分配给虚拟机，使其能够直接访问和利用物理 GPU，从而达到在虚拟机中却可以获得与在物理机上直接运行的同等图形性能，避免虚拟图形适配器引起的性能瓶颈，从而提升整体性能。

约束与限制

物理 GPU 直通功能需使用 kubevirt-gpu-device-plugin，但 kubevirt-gpu-device-plugin 目前没有 CPU 架构为 ARM64 的镜像，因此无法在 CPU 架构为 ARM64 的操作系统中使用此功能。

前提条件

Chart 及镜像准备

获取下述 Chart 及镜像并上传至镜像仓库中，本文档以 build-harbor.example.cn 仓库地址为例进行介绍，具体 Chart 及镜像的获取方式请联系相关人员。

Chart

build-harbor.example.cn/example/chart-gpu-operator:v23.9.1

镜像

build-harbor.example.cn/3rdparty/nvidia/gpu-operator:v23.9.0
build-harbor.example.cn/3rdparty/nvidia/cloud-native/gpu-operator-validator:v23.9.0
build-harbor.example.cn/3rdparty/nvidia/cuda:12.3.1-base-ubi8
build-harbor.example.cn/3rdparty/nvidia/kubevirt-gpu-device-plugin:v1.2.4
build-harbor.example.cn/3rdparty/nvidia/nfd/node-feature-discovery:v0.14.2

开启 IOMMU

在不同操作系统开启 IOMMU 的操作会有所区别，具体请参考对应操作系统文档，本文以 CentOS 为例进行介绍，所有命令均在终端中执行。

编辑 /etc/default/grub 文件，在 GRUB_CMDLINE_LINUX 配置项中增加 intel_iommu=on iommu=pt。
```
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet intel_iommu=on iommu=pt"
```
执行下述命令生成 grub.cfg 文件。
```
grub2-mkconfig -o /boot/grub2/grub.cfg
```
重新启动服务器。
执行下述命令确认 IOMMU 是否开启成功。若回显信息中显示 IOMMU enabled，则表示开启成功。
```
dmesg | grep -i iommu
```

操作步骤

注意：下述所有命令如无特殊说明均需在对应集群 Master 节点上的 CLI 工具中执行。

创建命名空间

执行下述命令创建名称为 gpu-system 的命名空间，若出现 namespace/gpu-system created 回显信息则表示已创建成功。

kubectl create ns gpu-system

部署 gpu-operator

执行下述命令部署 gpu-operator。

export REGISTRY=<registry> # 将 <registry> 部分替换成 gpu-operator 镜像所在的仓库地址，例如：export REGISTRY=build-harbor.example.cn

cat <<EOF | kubectl create -f -
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
  annotations:
    auto-recycle: "true"
    interval-sync: "true"
  name: gpu-operator
  namespace: gpu-system
spec:
  destination:
    cluster: ""
    namespace: "gpu-operator"
  source:
    charts:
    - name: <chartName> # 需使用实际的 chart 路径替换 <chartName> 部分，例如：name = example/chart-gpu-operator
      releaseName: gpu-operator
      targetRevision: v23.9.1
    repoURL: $REGISTRY
  timeout: 120
  values:
    global:
      registry:
        address: $REGISTRY
    nfd:
      enabled: true
    sandboxWorkloads:
      enabled: true
      defaultWorkload: "vm-passthrough"
EOF

执行下述命令检查 gpu-operator 是否已同步，若 SYNC 显示为 Synced 表示已同步。

kubectl -n gpu-system get apprelease gpu-operator

回显信息：

NAME           SYNC           HEALTH        MESSAGE        UPDATE   AGE
gpu-operator   Synced         Ready         chart synced   28s      32s

执行下述命令获取所有节点名称，并找到 GPU 节点名称。
```
kubectl get nodes -o wide
```

执行下述命令查看 GPU 节点是否已有可直通的 GPU，若回显信息中出现类似于 nvidia.com/GK210GL_TESLA_K80 的 GPU 信息，则表示已有可直通的 GPU。

kubectl get node <gpu-node-name> -o jsonpath='{.status.allocatable}' # 使用步骤 3 中获取到的 GPU 节点名称替换 <gpu-node-name> 部分

回显信息：

{"cpu":"39","devices.kubevirt.io/kvm":"1k","devices.kubevirt.io/tun":"1k","devices.kubevirt.io/vhost-net":"1k","ephemeral-storage":"426562784165","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"122915848Ki","nvidia.com/GK210GL_TESLA_K80":"8","pods":"256"}

至此 gpu-operator 已经部署成功。

配置 Kubevirt

执行下述命令开启 DisableMDEVConfiguration 特性，若返回类似 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched 的回显信息，则表示开启成功。
```
kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='[{"op": "add", "path": "/spec/featureGates/disableMDevConfiguration", "value": true }]'
```

在 GPU 节点的终端中执行下述命令，获取 pciDeviceSelector。回显信息中的 10de:102d 部分即为 pciDeviceSelector 的值。

lspci -nn | grep -i nvidia

回显信息：

04:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
05:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
08:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
09:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
85:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
86:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
89:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
8a:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)

执行下述命令获取所有节点名称，并找到 GPU 节点名称。
```
kubectl get nodes -o wide
```

执行下述命令获取 resourceName。回显信息中的 nvidia.com/GK210GL_TESLA_K80 部分即为 resourceName 的值。

kubectl get node <gpu-node-name> -o jsonpath='{.status.allocatable}' # 使用 步骤 3 中获取到的 GPU 节点名称替换 <gpu-node-name> 部分

回显信息：

{"cpu":"39","devices.kubevirt.io/kvm":"1k","devices.kubevirt.io/tun":"1k","devices.kubevirt.io/vhost-net":"1k","ephemeral-storage":"426562784165","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"122915848Ki","nvidia.com/GK210GL_TESLA_K80":"8","pods":"256"}

执行下述命令添加直通 GPU。

注意：使用步骤 2 中获取的 pciDeviceSelector 值替换下述命令中的 <pci-devices-id> 部分时，pciDeviceSelector 中的 所有英文字母全部需要转换为大写。例如：获取到的 pciDeviceSelector 的值为 10de:102d，则应该替换为 export DEVICE=10DE:102D。

添加单块 GPU 卡

export DEVICE=<pci-devices-id> # 使用步骤 2 中获取的 pciDeviceSelector 替换 <pci-devices-id> 部分。例如：export DEVICE=10DE:102D
export RESOURCE=<resource-name> # 使用步骤 4 中获取的 resourceName 替换 <resource-name> 部分。例如：export RESOURCE=nvidia.com/GK210GL_TESLA_K80

kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
  {
    "op": "add",
    "path": "/spec/permittedHostDevices",
    "value": {
      "pciHostDevices": [
        {
          "externalResourceProvider": true,
          "pciDeviceSelector": "'"$DEVICE"'",
          "resourceName": "'"$RESOURCE"'"
        }
      ]
    }
  }
]'

添加多块 GPU 卡

注意：添加多块 GPU 卡时，每个用来替换 <pci-devices-id> 的 pciDeviceSelector 值必须不相同。

export DEVICE1=<pci-devices-id1> # 使用步骤 2 中获取的 pciDeviceSelector 替换 <pci-devices-id> 部分
export RESOURCE1=<resource-name1> # 使用步骤 4 中获取的 resourceName 替换 <resource-name> 部分
export DEVICE2=<pci-devices-id2> # 使用步骤 2 中获取的 pciDeviceSelector 替换 <pci-devices-id> 部分
export RESOURCE2=<resource-name2> # 使用步骤 4 中获取的 resourceName 替换 <resource-name> 部分

kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
  {
    "op": "add",
    "path": "/spec/permittedHostDevices",
    "value": {
      "pciHostDevices": [
        {
          "externalResourceProvider": true,
          "pciDeviceSelector": "'"$DEVICE"'",
          "resourceName": "'"$RESOURCE"'"
        },
        {
          "externalResourceProvider": true,
          "pciDeviceSelector": "'"$DEVICE2"'",
          "resourceName": "'"$RESOURCE2"'"
        }
      ]
    }
  }
]'

已经添加过 GPU 卡，再次添加新的 GPU 卡

export DEVICE=<pci-devices-id> # 使用步骤 2 中获取的 pciDeviceSelector 替换 <pci-devices-id> 部分
export RESOURCE=<resource-name> # 使用步骤 4 中获取的 resourceName 替换 <resource-name> 部分
export INDEX=<index> # index 是从 0 开始的数组编号，使用编号替换 <index> 部分。例如：已经添加过一块 GPU 卡，现在要新增一块 GPU 卡，那么 index 应该为 1，即 export INDEX=1

kubectl patch hco kubevirt-hyperconverged -n kubevirt --type='json' -p='
[
  {
    "op": "add",
    "path": "/spec/permittedHostDevices/pciHostDevices/'"${INDEX}"'",
    "value": {
      "externalResourceProvider": true,
      "pciDeviceSelector": "'"$DEVICE"'",
      "resourceName": "'"$RESOURCE"'"
    }
  }
]'

结果验证

上述步骤配置完成后，若在创建虚拟机时能够选择到对应物理 GPU，则表示物理 GPU 直通环境已经准备完成。

注意：若需配置物理 GPU 直通，请提前开启相关功能。

进入 Container Platform。
在左侧导航栏中，单击 虚拟化 > 虚拟机。
单击 创建虚拟机。
配置虚拟机 物理 GPU（Alpha） 参数。

参数说明

物理 GPU（Alpha） 选择配置的物理 GPU 的型号，仅可为每个虚拟机分配一张物理 GPU。
至此，物理 GPU 直通环境已经准备完成。

物理 GPU 直通环境准备

约束与限制

前提条件

Chart 及镜像准备

开启 IOMMU

操作步骤

创建命名空间

部署 gpu-operator

配置 Kubevirt

结果验证

相关操作

删除直通 GPU 的虚拟机

将 GPU 相关配置从 KubeVirt 配置中删除

卸载 gpu-operator