源码编译rtp-llm记录
编译验证环境
https://rtp-llm.ai/build/zh_CN/start/install.html 官网上提供的pip whl包和docker镜像存在一些问题,拉镜像的网址访问不了,whl包需要更改名字才能安装,可能由于环境原因,安装后也存在一些奇怪的问题。
因此采用了源码编译方式,也更好的阅读代码。
使用A100机器,找了个vllm的镜像vllm-openai:0.8.0作为基础镜像编译和运行rtp-llm。编译过程做一下记录。
gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.4 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:3D:00.0 Off | 0 |
| N/A 31C P0 58W / 250W | 39324MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:3E:00.0 Off | 0 |
| N/A 28C P0 34W / 250W | 3MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:40:00.0 Off | 0 |
| N/A 27C P0 31W / 250W | 3MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-PCIE-40GB On | 00000000:41:00.0 Off | 0 |
| N/A 27C P0 32W / 250W | 3MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-PCIE-40GB On | 00000000:B1:00.0 Off | 0 |
| N/A 28C P0 35W / 250W | 5544MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-PCIE-40GB On | 00000000:B2:00.0 Off | 0 |
| N/A 29C P0 31W / 250W | 27MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-PCIE-40GB On | 00000000:B4:00.0 Off | 0 |
| N/A 29C P0 32W / 250W | 3MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-PCIE-40GB On | 00000000:B5:00.0 Off | 0 |
| N/A 33C P0 55W / 250W | 7MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
编译
rtp-llm# bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda12 --test_output=errors --test_env="LOG_LEVEL=INFO" --jobs=64
安装conda310 python
https://repo.anaconda.com/miniconda/
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_25.11.1-1-Linux-x86_64.sh
安装到/opt/conda310目录下
ln -s /opt/conda310/bin/python /usr/bin/python
或者避免系统和conda中so的冲突
mkdir -p /opt/conda310/bin
ln -s /usr/bin/python3 /opt/conda310/bin/python
ln -s /usr/include/ /opt/conda310/include
ln -s /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/ /opt/conda310/lib
安装cudnn
https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
wget https://developer.download.nvidia.com/compute/cudnn/9.19.1/local_installers/cudnn-local-repo-ubuntu2204-9.19.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.19.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.19.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn9-cuda-12
安装在
/usr/include/x86_64-linux-gnu/cudnn.h
ls /usr/lib/x86_64-linux-gnu/libcu
cp /usr/include/x86_64-linux-gnu/cudnn* /usr/local/cuda/include/
cp /usr/lib/x86_64-linux-gnu/libcudnn* /usr/local/cuda/lib64/
nccl 依赖
find /usr -name "nccl.h"
/usr/include/nccl.h
/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/cuda/nccl.h
/usr/local/lib/python3.10/dist-packages/nvidia/nccl/include/nccl.h
apt install libnccl2 libnccl-dev
cp /usr/include/nccl.h /usr/local/cuda/include/
cp /usr/lib/x86_64-linux-gnu/libnccl* /usr/local/cuda/lib64/
libtinfo.so.6
/usr/bin/bash: /opt/conda310/lib/libtinfo.so.6: no version information available (required by /usr/bin/bash)
解决:
cp -rf /lib/x86_64-linux-gnu/libtinfo.* /opt/conda310/lib/
bazel下载依赖访问不了
https://alibaba-aios-dev.oss-cn-hangzhou.aliyuncs.com/build/download_rewrite/
清空这个文件:
workspace/rtp-llm/bazel/bazel_downloader.cfg
error
bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda --test_output=errors --test_env="LOG_LEVEL=INFO" --jobs=64
ERROR: /workspace/ep/rtp-llm/rtp_llm/BUILD:64:12: no such package '@pip_cpu_torch//deep_gemm': BUILD file not found in directory 'deep_gemm' of external repository @pip_cpu_torch. Add a BUILD file to a directory to mark it as a package. and referenced by '//rtp_llm:deep_gemm'
ERROR: Analysis of target '//rtp_llm:rtp_llm' failed; build aborted:
INFO: Elapsed time: 0.118s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 29 targets configured)
Fetching ...itory @cutlass; Cloning 80243e0b8c644f281e2beb0c20fe78cf7b267061 of https://github.com/NVIDIA/cutlass.git
Fetching repository @pip_cpu_torch_lru_dict; Restarting.
Fetching repository @pip_cpu_torch_torch; Restarting.
Fetching repository @pip_cpu_torch_psutil; Restarting.
Fetching repository @torch_2.1_py310_cpu; starting
Fetching repository @cutlass_h_moe; starting
Fetching repository @pip_cpu_torch_concurrent_log_handler; Restarting.
Fetching repository @pip_cpu_torch_cpm_kernels; Restarting. ... (11 fetches)
编译命令改为:
bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda12 --test_output=errors --test_env="LOG_LEVEL=INFO" --jobs=64
就会去下载cuda torch而不是cpu torch了。
gcc: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
ERROR: /root/.cache/bazel/_bazel_root/035fa6624319d6400171ad427c1090b6/external/grpc/BUILD:1386:16: Compiling src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @grpc//:grpc_lb_policy_pick_first)
(cd /root/.cache/bazel/_bazel_root/035fa6624319d6400171ad427c1090b6/execroot/rtp_llm && \
exec env - \
CUDA_TOOLKIT_PATH=/usr/local/cuda/ \
CUDNN_INSTALL_PATH=/usr/local/cuda/ \
LD_LIBRARY_PATH='/lib64:/opt/conda310/lib/:/usr/local/cuda/compat/:/usr/local/nvidia/lib64:/usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs/:/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH' \
LIBRARY_PATH=/lib64:/opt/conda310/lib/ \
NCCL_HDR_PATH=/usr/local/cuda/include \
NCCL_INSTALL_PATH=/usr/local/cuda/ \
PATH=/root/.cache/bazelisk/downloads/sha256/79e4f370efa6e31717b486af5d9efd95864d0ef13da138582224ac9b2a1bad86/bin:/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
PWD=/proc/self/cwd \
PYTHON_BIN_PATH=/opt/conda310/bin/python3 \
TF_CUDA_CLANG=0 \
TF_CUDA_COMPUTE_CAPABILITIES=7.0,7.5,8.0,8.6,8.9,9.0 \
TF_CUDA_PATHS=/usr/local/cuda/ \
TF_CUDA_VERSION=12.4 \
TF_NCCL_VERSION=2 \
TF_NEED_CUDA=1 \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.o' '-DGRPC_ARES=0' '-DPB_FIELD_32BIT=1' '-DBAZEL_CURRENT_REPOSITORY="grpc"' -iquote external/grpc -iquote bazel-out/k8-opt/bin/external/grpc -iquote external/zlib_archive -iquote bazel-out/k8-opt/bin/external/zlib_archive -iquote external/com_github_nanopb_nanopb -iquote bazel-out/k8-opt/bin/external/com_github_nanopb_nanopb -isystem external/grpc/include -isystem bazel-out/k8-opt/bin/external/grpc/include -isystem external/zlib_archive -isystem bazel-out/k8-opt/bin/external/zlib_archive -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections '-DGTEST_USE_OWN_TR1_TUPLE=0' '-DEIGEN_MAX_CPP_VER=11' -O2 -g -Wall -Werror -Wno-unknown-pragmas -Wno-sign-compare -Wno-attributes -Wno-stringop-truncation -Wno-stringop-overflow -Wno-maybe-uninitialized -Wno-format-overflow -Wno-deprecated-declarations -DOPENSSL_IS_BORINGSSL '-DENABLE_BF16=1' '-DBUILD_CUTLASS_MIXED_GEMM=ON' -DC10_CUDA_NO_CMAKE_CONFIGURE_FILE '-DUSING_CUDA=1' '-DUSING_CUDA12=1' '-DUSE_OLD_TRT_FMHA=1' '-DFMHA_SUPPORT_SPLIT=1' '-DENABLE_FP8=1' '-std=c++17' -Wno-class-memaccess -c external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc -o bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.o)
# Configuration: 66a6ce2341b6b86f2282b848f79e99ce287c8dc74868cde0308708452dedcb65
# Execution platform: @local_config_platform//:host
In file included from external/grpc/src/core/ext/filters/client_channel/lb_policy_registry.h:24,
from external/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:28,
from external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:25:
external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h: In member function ‘virtual grpc_core::OrphanablePtr<grpc_core::LoadBalancingPolicy> grpc_core::LoadBalancingPolicyFactory::CreateLoadBalancingPolicy(grpc_core::LoadBalancingPolicy::Args) const’:
external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h:35:14: error: ignoring return value of ‘constexpr typename std::remove_reference<_Tp>::type&& std::move(_Tp&&) [with _Tp = grpc_core::LoadBalancingPolicy::Args&; typename std::remove_reference<_Tp>::type = grpc_core::LoadBalancingPolicy::Args]’, declared with attribute ‘nodiscard’ [-Werror=unused-result]
35 | std::move(args); // Suppress clang-tidy complaint.
| ~~~~~~~~~^~~~~~
In file included from /usr/include/c++/11/bits/stl_pair.h:59,
from /usr/include/c++/11/bits/stl_algobase.h:64,
from /usr/include/c++/11/memory:63,
from external/grpc/src/core/lib/gprpp/memory.h:27,
from external/grpc/src/core/lib/gprpp/inlined_vector.h:27,
from external/grpc/src/core/lib/iomgr/call_combiner.h:29,
from external/grpc/src/core/lib/channel/channel_stack.h:46,
from external/grpc/src/core/ext/filters/client_channel/client_channel_channelz.h:25,
from external/grpc/src/core/ext/filters/client_channel/lb_policy.h:24,
from external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h:24,
from external/grpc/src/core/ext/filters/client_channel/lb_policy_registry.h:24,
from external/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:28,
from external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:25:
/usr/include/c++/11/bits/move.h:104:5: note: declared here
104 | move(_Tp&& __t) noexcept
| ^~~~
cc1plus: all warnings being treated as errors
Target //rtp_llm:rtp_llm failed to build
INFO: Elapsed time: 8.088s, Critical Path: 7.22s
INFO: 5247 processes: 4995 internal, 252 local.
FAILED: Build did NOT complete successfully
请检查项目根目录下的 .bazelrc 文件
# ... existing configurations ...
# 针对 grpc 仓库中的所有文件,禁用 unused-result 错误
build --per_file_copt=external/grpc/.*@-Wno-error=unused-result
# 针对 grpc 仓库中的所有文件,禁用 tautological-compare 错误 (新增)
build --per_file_copt=external/grpc/.*@-Wno-error=tautological-compare
# 针对 boringssl 仓库中的所有文件,禁用 array-parameter 错误
build --per_file_copt=external/boringssl/.*@-Wno-error=array-parameter
# 针对 boringssl 仓库中的所有文件,不将任何警告视为错误
build --per_file_copt=external/boringssl/.*@-Wno-error
这样改太多了,先编译通过,删除build --copt -Werror。
编译成功物料处理
root@e7dcd907c6fe:/workspace/ep/rtp-llm# ls bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py
bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py
root@e7dcd907c6fe:/workspace/ep/rtp-llm# ls bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py
bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py
ln -sf `pwd`/bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py `pwd`/rtp_llm/cpp/model_rpc/proto/
ln -sf `pwd`/bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py `pwd`/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py
python3 -m rtp_llm.start_server -h
dowload model:
pip install huggingface_hub
huggingface-cli download Qwen/Qwen1.5-0.5B-Chat --local-dir ./qwen
huggingface-cli download Qwen/Qwen1.5-MoE-A2.7B --local-dir ./Qwen1.5-MoE-A2.7B
python3 -m rtp_llm.start_server --checkpoint_path=../models/qwen --model_type=qwen_2 --start_port=30000
File "/workspace/ep/rtp-llm/rtp_llm/ops/compute_ops.py", line 1, in <module>
from librtp_compute_ops import *
ImportError: /workspace/ep/rtp-llm/rtp_llm/../bazel-bin/librtp_compute_ops.so: undefined symbol: _ZStlsIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_NSt6thread2idE
或者
cc_library(
name = "deep_gemm_utils",
hdrs = ["utils.h", "JIT.h"],
srcs = ["JIT.cc"],
copts = copts(),
linkopts = ["-lpthread"], # 添加此行
deps = [
"//rtp_llm/cpp/utils:core_utils",
"@boringssl//:ssl",
],
visibility = ["//visibility:public"],
)
发现还是有这个问题:
find / -name "libstdc++.so*"
/opt/nvidia/nsight-compute/2024.1.0/host/linux-desktop-glibc_2_11_3-x64/libstdc++.so.6
/opt/conda310/lib/libstdc++.so.6.0.34
/opt/conda310/lib/libstdc++.so
/opt/conda310/lib/libstdc++.so.6
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so.6.0.34
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so.6
/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30-gdb.py
/usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
ldd查看链接的/usr/lib/x86_64-linux-gnu/libstdc++.so.6, 编译时可能用的/opt/conda310/lib
export LD_LIBRARY_PATH=/opt/conda310/lib:$LD_LIBRARY_PATH
或者避免系统和conda中so的冲突
rm -rf /opt/conda310
# 重新编译
mkdir -p /opt/conda310/bin
ln -s /usr/bin/python3 /opt/conda310/bin/python
ln -s /usr/include/ /opt/conda310/include
ln -s /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/ /opt/conda310/lib
验证
启动服务:
python3 -m rtp_llm.start_server --checkpoint_path=../models/qwen --model_type=qwen_2 --start_port=30000
cat client.py
import requests
port=30000
url = f"http://localhost:{port}/v1/chat/completions"
json_data = {
"messages": [
{
"role": "user",
"content": "What is the capital of China?"
}
]
}
response = requests.post(url, json=json_data)
print(f"Output 0: {response.json()}")
server端输出:
initLogger log_file_path: /workspace/ep/rtp-llm/rtp_llm/config/alog.conf
/workspace/ep/rtp-llm/rtp_llm/utils/grpc_util.py:35: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_new.cpp:1561.)
return torch.frombuffer(t.int32_data, dtype=torch.int32).reshape(list(t.shape))
client请求返回打印:The capital of China is Beijing.
Output 0: {'id': 'chat-', 'object': 'chat.completion', 'created': 1772370776, 'model': '', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The capital of China is Beijing.', 'partial': False}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 26, 'total_tokens': 34, 'completion_tokens': 8}, 'aux_info': {'cost_time': 23.463, 'iter_count': 8, 'prefix_len': 0, 'input_len': 26, 'output_len': 8, 'step_output_len': 8, 'first_token_cost_time': 3.398, 'wait_time': 0.077, 'pd_sep': False, 'cum_log_probs': [], 'beam_responses': [], 'softmax_probs': [], 'reuse_len': 0, 'local_reuse_len': 0, 'remote_reuse_len': 0, 'memory_reuse_len': 0, 'prefill_total_reuse_len': 0, 'prefill_local_reuse_len': 0, 'prefill_remote_reuse_len': 0, 'prefill_memory_reuse_len': 0, 'decode_total_reuse_len': 0, 'decode_local_reuse_len': 0, 'decode_remote_reuse_len': 0, 'decode_memory_reuse_len': 0, 'role_addrs': [], 'aux_string': ''}}