自建AI服务器使用PVE配置显卡直通虚拟机安装驱动、CUDA和cuDNN运行LLM大模型进行AI炼丹

2025年02月20日 21:16:27 · 本文共 6,115 字阅读时间约 22分钟 · 3,613 次浏览

随着开源大模型的普及，还有各种大佬开源的程序，AI大模型已经逐渐普及到每个人身边了，很多同学已经可以在本地笔记本上运行大模型了，但是本地笔记本毕竟性能差，而且笔记本还需要干别的活呢，不可能一直运行大模型，我习惯用PVE，所以记录一次我的部署过程，给我未来重复部署留下指南。

运行环境

主板：ASUS Z97-K R2.0
CPU：Intel CORE i7-4790 @ 3.6GHz
内存：32G[4*8G] DDR3 1600MHz
显卡：NVIDIA GeForce RTX 2080 Ti 22G

这套硬件去跑4位量化的Qwen2.5 32B Q4_K_M模型是比较流畅的，整个模型全部在 GPU 和显存上运行，对 CPU 和内存几乎没有负载；如果运行更大的模型比如72B的话，就需要分配到 CPU 和内存上一部分，运行的速度就非常慢了，只能说能跑，但不好用。

本文以外另一台机器

主板：X99-F8D PLUS
CPU：2颗 Intel Xeon E5-2660 v3
内存：512G[8*64G] 4DRX4 2400T
显卡：2张 NVIDIA GeForce RTX 2080 Ti 22G

这台机器是可以运行满血版4位量化的 Deepseek-R1 671B Q4_K_M模型，GPU 显存放不下，所以有一部分分配到 CPU 和内存中运行，速度就很慢了，只能说可以跑，但是用的话，就不太好了，两三秒种出一个字儿。

PVE 初始化设置

PVE的安装过程在此处省略，我们认为你已经安装完PVE了。

修改软件仓库源为清华源

cp /etc/apt/sources.list /etc/apt/sources.list.bak
nano /etc/apt/sources.list

改为：

# 默认注释了源码镜像以提高 apt update 速度，如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
# 以下安全更新软件源包含了官方源与镜像站配置，如有需要可自行修改注释切换
deb https://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
# deb-src https://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware

然后编辑/etc/apt/sources.list.d/pve-no-subscription.list文件

nano /etc/apt/sources.list.d/pve-no-subscription.list
deb https://mirrors.tuna.tsinghua.edu.cn/proxmox/debian/pve bookworm pve-no-subscription

删除企业源

rm /etc/apt/sources.list.d/ceph.list
rm /etc/apt/sources.list.d/pve-enterprise.list

更新软件源并且升级软件

apt update
apt upgrade

安装你习惯的软件

我比较习惯 vim 编辑器，后续我都是用 vim 进行编辑，这里安装：

vim：apt install vim

显卡直通

BIOS修改

开启VT-d/AMD-Vi技术
确保主板和CPU支持IOMMU（Intel VT-d或AMD-Vi）技术，并在BIOS中启用
同时启用UEFI引导和禁用CSM

启用IOMMU

编辑/etc/default/grub文件，在GRUB_CMDLINE_LINUX_DEFAULT中添加IOMMU相关参数：

Intel系统：

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

AMD系统：

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

更新GRUB配置：

update-grub

重启系统后，检查IOMMU是否正常启用：

dmesg | grep -e DMAR -e IOMMU

配置PCI直通

识别显卡的PCI设备ID：使用以下命令查看显卡的设备ID（如01:00.0为GPU，01:00.1为显卡的音频设备）：

lspci -nn | grep -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)

隔离设备到VFIO驱动：编辑/etc/modprobe.d/vfio.conf：

options vfio-pci ids=10de:1e07,10de:10f7

将10de:1e07和10de:10f7替换为你机器的实际的显卡和音频设备ID。

屏蔽默认驱动：编辑/etc/modprobe.d/blacklist.conf，添加以下内容：

blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb

然后生成新的initramfs：

update-initramfs -u

重启系统

验证设备隔离：重启系统后，检查设备是否被vfio-pci驱动绑定：

lspci -nnv | grep -i vfio

配置虚拟机

创建虚拟机，注意是KVM虚拟机，不是CT容器，LXC不一样，需要注意的地方：

类型必须选择q35
CPU类型必须选择 host
不要选择UEFI模式，安全启动Secure Boot要验证内核签名比较麻烦
在硬件中，添加PCIE设备，选择原始设备，选择我们的显卡，勾选所有功能、PCI-Express。

开启虚拟机并进入虚拟机

然后虚拟机开机，通过SSH进入虚拟机。

我安装的是 Ubuntu 24.04.1，安装步骤、修改软件库源为清华源在此忽略，我下面的命令都是在 root 用户下执行的；更新软件和安装需要的软件

apt update
apt upgrade
apt install gcc make dkms qemu-guest-agent

禁用开源的 nouveau 驱动，编辑 /etc/modprobe.d/blacklist-nouveau.conf 文件添加以下内容：

blacklist nouveau
options nouveau modeset=0

更新内核模块：update-initramfs -u

重启系统：reboot

重新SSH进入虚拟机。

下载驱动：https://www.nvidia.cn/drivers/lookup/

下载CUDA：https://developer.nvidia.com/cuda-downloads

下载cuDNN：https://developer.nvidia.com/cudnn-downloads

注意驱动和CUDA的版本要求，不是随意搭配的，我这里下载的版本是：

NVIDIA-Linux-x86_64-565.77.run
cuda_12.6.3_560.35.05_linux.run
cudnn-local-repo-ubuntu2404-9.6.0_1.0-1_amd64.deb

安装驱动

然后先安装驱动：

sh ./NVIDIA-Linux-x86_64-565.77.run

中间的提示就是选Nvidia、选继续、同意，你不点继续就终止了。

安装完成后，使用命令：nvidia-smi，当你看到下面的画面，说明安装成功！

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77 Driver Version: 565.77 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:01:00.0 Off | N/A |
| 0% 50C P0 33W / 260W | 1MiB / 22528MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

安装CUDA

安装CUDA，需要禁止安装 OpenGL，加一个参数：--no-opengl-libs，表示只安装驱动文件，不安装OpenGL文件，避免Ubuntu系统的opengl冲突，否则会导致登陆界面死循环。

sh ./cuda_12.6.3_560.35.05_linux.run --no-opengl-libs

然后同意协议，需要输入：accept，敲回车

然后按上下键，到 - [X] Driver 上面，按一下空格，将这个选择去掉，成为：- [ ] Driver

然后按上下键，选择 Install，敲回车，开始安装，大部分情况能直接成功，我这里不知道为什么，给出了下面的提示：

===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-12.6/
Please make sure that
- PATH includes /usr/local/cuda-12.6/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.6/lib64, or, add /usr/local/cuda-12.6/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 560.00 is required for CUDA 12.6 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log

Driver驱动没有安装是正常的，因为上面咱们自己安装了驱动，不需要CUDA帮我们再次安装了，Toolkit工具包已经安装成功，但是环境变量没有自动设置，需要手动将 /usr/local/cuda-12.6/bin 设置到 PATH 环境变量中；还有 LD_LIBRARY_PATH 环境变量、/etc/ld.so.conf文件。

所以，如果你也看到了上面的提示，就跟我手动设置环境变量和/etc/ld.so.conf文件吧，如果没有这个提示，那么就忽略，直接下一步验证。

修改环境变量，编辑 ~/.bashrc 文件，在文件末尾添加以下两行：

export PATH=/usr/local/cuda-12.6/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH

然后执行 source ~/.bashrc 立即生效。

编辑 /etc/ld.so.conf 文件：

添加 CUDA 库路径，在文件中添加：

/usr/local/cuda-12.6/lib64

然后运行 ldconfig 命令

安装完成后，执行这个命令：nvcc -V，可以看到版本信息，就是成功了：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

安装cuDNN

我的安装包是 cudnn-local-repo-ubuntu2404-9.6.0_1.0-1_amd64.deb，你需要改成你自己的，执行：

dpkg -i cudnn-local-repo-ubuntu2404-9.6.0_1.0-1_amd64.deb
cp /var/cudnn-local-repo-ubuntu2404-9.6.0/cudnn-*-keyring.gpg /usr/share/keyrings/
apt update
apt install cudnn

这个几乎没有安装失败的，所以就不验证了，还得找一个神经网络的程序运行才能验证，我懒得验证了。

安装 Ollama

我使用 Ollama 运行 LLM 大模型，所以我接下来是安装 Ollama 的步骤。

一键安装脚本：

curl -fsSL https://ollama.com/install.sh | sh

脚本执行内容是从 GitHub 上下载文件，如果你的网络受限，你可能会失败，手动安装的文档：https://github.com/ollama/ollama/blob/main/docs/linux.md

我这里网络使用魔法，没有受限，所以直接自动安装完成了，如果你手动安装，需要技术支持，也可以联系我付费人工支持。

输入 ollama -v 可以看到版本号，就是成功了。

我这里还需要修改 /etc/systemd/system/ollama.service 文件，让 ollama 监听 0.0.0.0 IP地址，所以在服务文件中增加环境变量：

# 监听的地址和端口
Environment="OLLAMA_HOST=0.0.0.0:11434"
# 保持模型不被自动卸载
Environment="OLLAMA_KEEP_ALIVE=-1"
# 并发数
Environment="OLLAMA_NUM_PARALLEL=4"
# 模型加载超时时间，大型模型好几百G的加载很慢，所以需要设置大
Environment="OLLAMA_LOAD_TIMEOUT=90m"
# 给GPU预留的显存空间，不能占满，要不推理没地方了
Environment="OLLAMA_GPU_OVERHEAD=536870912"
# 启用 CUDA 统一内存
Environment="GGML_CUDA_ENABLE_UNIFIED_MEMORY=1"

然后重启 ollama 执行：

systemctl daemon-reload
systemctl restart ollama

使用命令，拉取模型，然后运行 ollama run 就可以开始对话了：

ollama pull deepseek-r1:32b
ollama run deepseek-r1:32b

安装Docker

后续要安装Dify，依赖Docker环境，我这里一起把Docker也安装了吧，安装命令：

apt install ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

下一篇文章要是有时间，咱们聊聊 Dify？

商业用途请联系作者获得授权。
版权声明：本文为博主「任霏」原创文章，遵循 CC BY-NC-SA 4.0 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.renfei.net/posts/1626402130325676127