Nanos Unikernel移植Yolo推理服务
2024-11-27
在上一篇文章
环境准备
-
首先,在上一篇文章的基础上,请准备好所需的 Klib
nvidia_gpu
及驱动. ├── klibs │ └── gpu_nvidia └── nvidia ├── 535.113.01 │ ├── gsp_ga10x.bin │ └── gsp_tu10x.bin └── LICENSE
-
一个 Python3.10 的解释器(3.10.6 版本最好)
创建 Python 环境
-
首先需要创建一个 Python 虚拟环境,后续会将其映射到 Nanos 中,作为 YOLO 的运行环境
python -m venv .local --prompt yolo source .local/bin/activate
-
安装 Pytorch 和 Ultralytics
pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0+cu121 -f https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html pip install ultralytics
-
编写一个最小可用的 python 代码
main.py
from ultralytics import YOLO model = YOLO("yolo11n.pt") results = model("https://ultralytics.com/images/bus.jpg")
-
编写配置文件
config.json
{ "KlibDir": "./klibs", "Klibs": [ "gpu_nvidia" ], "RunConfig": { "GPUs": 1 }, "Dirs": [ "nvidia", ".local" ], "Args": [ "main.py" ] }
-
尝试运行程序
ops pkg load eyberg/python:3.10.6 -c config.json -n
不出意外的话还是出现了意外
running local instance booting /root/.ops/images/python3.10 ... [0.257582] en1: assigned 10.0.2.15 NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56 NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (circleci@690d85c32591) Wed Nov 27 02:11:20 AM UTC 2024 Loaded the UVM driver, major device number 0. Traceback (most recent call last): File "/.local/lib/python3.10/site-packages/numpy/_core/__init__.py", line 23, in <module> from . import multiarray File "/.local/lib/python3.10/site-packages/numpy/_core/multiarray.py", line 10, in <module> from . import overrides File "/.local/lib/python3.10/site-packages/numpy/_core/overrides.py", line 8, in <module> from numpy._core._multiarray_umath import ( ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/.local/lib/python3.10/site-packages/numpy/__init__.py", line 114, in <module> from numpy.__config__ import show as show_config File "/.local/lib/python3.10/site-packages/numpy/__config__.py", line 4, in <module> from numpy._core._multiarray_umath import ( File "/.local/lib/python3.10/site-packages/numpy/_core/__init__.py", line 49, in <module> raise ImportError(msg) ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.10 from "" * The NumPy version is: "2.1.3" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: libstdc++.so.6: cannot open shared object file: No such file or directory The above exception was the direct cause of the following exception: Traceback (most recent call last): File "//main.py", line 1, in <module> from ultralytics import YOLO File "/.local/lib/python3.10/site-packages/ultralytics/__init__.py", line 11, in <module> from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld File "/.local/lib/python3.10/site-packages/ultralytics/models/__init__.py", line 3, in <module> from .fastsam import FastSAM File "/.local/lib/python3.10/site-packages/ultralytics/models/fastsam/__init__.py", line 3, in <module> from .model import FastSAM File "/.local/lib/python3.10/site-packages/ultralytics/models/fastsam/model.py", line 5, in <module> from ultralytics.engine.model import Model File "/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 7, in <module> import numpy as np File "/.local/lib/python3.10/site-packages/numpy/__init__.py", line 119, in <module> raise ImportError(msg) from e ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python interpreter from there.
处理报错
动态依赖库缺失
其实仔细分析上述的报错内容,可以找出报错原因,就是缺失了 libstdc++.so.6
动态依赖库,将其补全即可,后续遇到 cannot open shared object file: No such file or directory
相似字样的报错都是动态依赖库缺失,需要一一补全。
-
创建目录
mkdir -p usr/lib
-
补全依赖,将下述列出的动态依赖库复制到刚才创建的
usr/lib
目录中,基本上在系统的/usr/lib/x86_64-linux-gnu/
目录中都能找到这些动态依赖库usr/ └── lib ├── libbsd.so.0 ├── libbz2.so.1.0 ├── libcuda.so.1 ├── libexpat.so.1 ├── libffi.so.7 ├── libfribidi.so.0 ├── libgcc_s.so.1 ├── libGLdispatch.so.0 ├── libglib-2.0.so.0 ├── libGL.so.1 ├── libGLX.so.0 ├── libgthread-2.0.so.0 ├── liblzma.so.5 ├── libmd.so.0 ├── libnvidia-ml.so.1 ├── libnvJitLink.so.12 ├── libpcre2-8.so.0 ├── libstdc++.so.6 ├── libutil.so.1 ├── libuuid.so.1 ├── libX11.so.6 ├── libXau.so.6 ├── libxcb.so.1 └── libXdmcp.so.6
-
修改配置文件,新增映射目录
"Dirs": [ "nvidia", ".local", "usr" ]
磁盘空间不足
出现 No space left on device
报错时,说明 Nanos 的磁盘空间不足了,原因是 Nanos 默认分配的磁盘空间比较小,需要在配置文件中分配一个较大的磁盘空间
修改配置文件,在其中添加
"BaseVolumeSz": "6g"
不受支持的操作系统
Traceback (most recent call last):
File "//main.py", line 1, in <module>
from ultralytics import YOLO
File "/.local/lib/python3.10/site-packages/ultralytics/__init__.py", line 11, in <module>
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld
File "/.local/lib/python3.10/site-packages/ultralytics/models/__init__.py", line 3, in <module>
from .fastsam import FastSAM
File "/.local/lib/python3.10/site-packages/ultralytics/models/fastsam/__init__.py", line 3, in <module>
from .model import FastSAM
File "/.local/lib/python3.10/site-packages/ultralytics/models/fastsam/model.py", line 5, in <module>
from ultralytics.engine.model import Model
File "/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 11, in <module>
from ultralytics.cfg import TASK2DATA, get_cfg, get_save_dir
File "/.local/lib/python3.10/site-packages/ultralytics/cfg/__init__.py", line 12, in <module>
from ultralytics.utils import (
File "/.local/lib/python3.10/site-packages/ultralytics/utils/__init__.py", line 817, in <module>
USER_CONFIG_DIR = Path(os.getenv("YOLO_CONFIG_DIR") or get_user_config_dir()) # Ultralytics settings dir
File "/.local/lib/python3.10/site-packages/ultralytics/utils/__init__.py", line 789, in get_user_config_dir
raise ValueError(f"Unsupported operating system: {platform.system()}")
ValueError: Unsupported operating system: Nanos
看报错内容,可以发现在调用 get_user_config_dir
函数时,出现了问题,这里贴出 get_user_config_dir
函数的一部分源码
if WINDOWS:
path = Path.home() / "AppData" / "Roaming" / sub_dir
elif MACOS: # macOS
path = Path.home() / "Library" / "Application Support" / sub_dir
elif LINUX:
path = Path.home() / ".config" / sub_dir
else:
raise ValueError(f"Unsupported operating system: {platform.system()}")
可以看出函数的核心就是通过操作系统类型来设置配置文件的目录,而 Nanos Unikernel 的 uname
系统调用默认返回的系统类型是 Nanos
,因此产生了报错,针对上述报错,这里有两种解决方法。
-
修改配置文件,手动指定
uname
系统调用的返回系统类型,因为 Unikernel 本质上也是属于 Linux,因此我们可以模拟 Linux 系统"ManifestPassthrough": { "uname": { "sysname": "Linux" } }
-
除上述方法外,可以在报错中发现,在调用
get_user_config_dir
函数前,代码会通过YOLO_CONFIG_DIR
环境变量来获取配置文件的目录,因此我们也可以在配置文件中添加环境变量来解决报错"Env": { "YOLO_CONFIG_DIR": "/.config" }
无法从 /proc/cpuinfo 解析处理器信息
Error in cpuinfo: failed to parse processor information from /proc/cpuinfo
Traceback (most recent call last):
File "//main.py", line 3, in <module>
model = YOLO("yolo11n.pt")
File "/.local/lib/python3.10/site-packages/ultralytics/models/yolo/model.py", line 23, in __init__
super().__init__(model=model, task=task, verbose=verbose)
File "/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 145, in __init__
self._load(model, task=task)
File "/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 285, in _load
self.model, self.ckpt = attempt_load_one_weight(weights)
File "/.local/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 912, in attempt_load_one_weight
model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in float
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
File "/.local/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 258, in _apply
self = super()._apply(fn)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in <lambda>
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: Failed to initialize cpuinfo!
上述报错的原因是缺少 /proc/cpuinfo
文件,Pytorch 在运行前会通过 cpuinfo 模块检查 cpu 的基本信息,而这些信息从读取 /proc/cpuinfo
而来,Unikernel 由于其特性,并不需要 proc 文件系统,因此缺少该文件,我们可以手动进行补全
-
创建目录并复制文件
mkdir proc cp /proc/cpuinfo ./proc
-
修改配置文件,添加映射
"Dirs": [ "nvidia", ".local", "usr", "proc" ]
最终运行效果
解决完成上述报错内容后,项目目录结构如下
.
├── .local
├── config.json
├── klibs
│ └── gpu_nvidia
├── main.py
├── nvidia
│ ├── 535.113.01
│ │ ├── gsp_ga10x.bin
│ │ └── gsp_tu10x.bin
│ └── LICENSE
├── proc
│ └── cpuinfo
└── usr
└── lib
├── libbsd.so.0
├── libbz2.so.1.0
├── libcuda.so.1
├── libexpat.so.1
├── libffi.so.7
├── ......
配置文件如下
{
"KlibDir": "./klibs",
"Klibs": [
"gpu_nvidia"
],
"RunConfig": {
"GPUs": 1
},
"Dirs": [
"nvidia",
".local",
"usr",
"proc"
],
"Args": [
"main.py"
],
"BaseVolumeSz": "6g",
"Env": {
"YOLO_CONFIG_DIR": "/.config"
}
}
运行
ops pkg load eyberg/python:3.10.6 -c config.json -n
running local instance
booting /root/.ops/images/python3.10 ...
[0.263430] en1: assigned 10.0.2.15
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (circleci@690d85c32591) Wed Nov 27 02:11:20 AM UTC 2024
Loaded the UVM driver, major device number 0.
[2.106690] en1: assigned FE80::B49A:8BFF:FE75:215C
Creating new Ultralytics Settings v0.0.6 file ✅
View Ultralytics Settings with 'yolo settings' or at '/.config/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to 'yolo11n.pt'...
100%|██████████| 5.35M/5.35M [00:00<00:00, 9.39MB/s]
Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
100%|██████████| 134k/134k [00:00<00:00, 835kB/s]
image 1/1 /bus.jpg: 640x480 4 persons, 1 bus, 72.3ms
Speed: 1.8ms preprocess, 72.3ms inference, 235.7ms postprocess per image at shape (1, 3, 640, 480)