源码编译rtp-llm记录

编译验证环境

https://rtp-llm.ai/build/zh_CN/start/install.html 官网上提供的pip whl包和docker镜像存在一些问题，拉镜像的网址访问不了，whl包需要更改名字才能安装，可能由于环境原因，安装后也存在一些奇怪的问题。

因此采用了源码编译方式，也更好的阅读代码。

使用A100机器，找了个vllm的镜像vllm-openai:0.8.0作为基础镜像编译和运行rtp-llm。编译过程做一下记录。

gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"

nvidia-smi     
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  | 00000000:3D:00.0 Off |                    0 |
| N/A   31C    P0              58W / 250W |  39324MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          On  | 00000000:3E:00.0 Off |                    0 |
| N/A   28C    P0              34W / 250W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          On  | 00000000:40:00.0 Off |                    0 |
| N/A   27C    P0              31W / 250W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          On  | 00000000:41:00.0 Off |                    0 |
| N/A   27C    P0              32W / 250W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          On  | 00000000:B1:00.0 Off |                    0 |
| N/A   28C    P0              35W / 250W |   5544MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          On  | 00000000:B2:00.0 Off |                    0 |
| N/A   29C    P0              31W / 250W |     27MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          On  | 00000000:B4:00.0 Off |                    0 |
| N/A   29C    P0              32W / 250W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          On  | 00000000:B5:00.0 Off |                    0 |
| N/A   33C    P0              55W / 250W |      7MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

编译

rtp-llm# bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda12 --test_output=errors --test_env="LOG_LEVEL=INFO"  --jobs=64

安装conda310 python

https://repo.anaconda.com/miniconda/

wget https://repo.anaconda.com/miniconda/Miniconda3-py310_25.11.1-1-Linux-x86_64.sh
安装到/opt/conda310目录下

ln -s /opt/conda310/bin/python /usr/bin/python

或者避免系统和conda中so的冲突

mkdir -p /opt/conda310/bin
ln -s /usr/bin/python3 /opt/conda310/bin/python
ln -s /usr/include/ /opt/conda310/include
ln -s /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/ /opt/conda310/lib

安装cudnn

https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

wget https://developer.download.nvidia.com/compute/cudnn/9.19.1/local_installers/cudnn-local-repo-ubuntu2204-9.19.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.19.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.19.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn9-cuda-12

安装在

/usr/include/x86_64-linux-gnu/cudnn.h
ls /usr/lib/x86_64-linux-gnu/libcu

 cp /usr/include/x86_64-linux-gnu/cudnn* /usr/local/cuda/include/
 cp /usr/lib/x86_64-linux-gnu/libcudnn* /usr/local/cuda/lib64/

nccl 依赖

find /usr -name "nccl.h"
/usr/include/nccl.h
/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/cuda/nccl.h
/usr/local/lib/python3.10/dist-packages/nvidia/nccl/include/nccl.h

apt install libnccl2 libnccl-dev

cp /usr/include/nccl.h /usr/local/cuda/include/
cp /usr/lib/x86_64-linux-gnu/libnccl* /usr/local/cuda/lib64/

libtinfo.so.6

/usr/bin/bash: /opt/conda310/lib/libtinfo.so.6: no version information available (required by /usr/bin/bash)

解决：

cp -rf /lib/x86_64-linux-gnu/libtinfo.* /opt/conda310/lib/

bazel下载依赖访问不了

https://alibaba-aios-dev.oss-cn-hangzhou.aliyuncs.com/build/download_rewrite/

清空这个文件：

workspace/rtp-llm/bazel/bazel_downloader.cfg

error

bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda --test_output=errors --test_env="LOG_LEVEL=INFO"  --jobs=64
ERROR: /workspace/ep/rtp-llm/rtp_llm/BUILD:64:12: no such package '@pip_cpu_torch//deep_gemm': BUILD file not found in directory 'deep_gemm' of external repository @pip_cpu_torch. Add a BUILD file to a directory to mark it as a package. and referenced by '//rtp_llm:deep_gemm'
ERROR: Analysis of target '//rtp_llm:rtp_llm' failed; build aborted: 
INFO: Elapsed time: 0.118s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 29 targets configured)
    Fetching ...itory @cutlass; Cloning 80243e0b8c644f281e2beb0c20fe78cf7b267061 of https://github.com/NVIDIA/cutlass.git
    Fetching repository @pip_cpu_torch_lru_dict; Restarting.
    Fetching repository @pip_cpu_torch_torch; Restarting.
    Fetching repository @pip_cpu_torch_psutil; Restarting.
    Fetching repository @torch_2.1_py310_cpu; starting
    Fetching repository @cutlass_h_moe; starting
    Fetching repository @pip_cpu_torch_concurrent_log_handler; Restarting.
    Fetching repository @pip_cpu_torch_cpm_kernels; Restarting. ... (11 fetches)

编译命令改为：

bazelisk build //rtp_llm:rtp_llm --verbose_failures --config=cuda12 --test_output=errors --test_env="LOG_LEVEL=INFO"  --jobs=64

就会去下载cuda torch而不是cpu torch了。

gcc: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
ERROR: /root/.cache/bazel/_bazel_root/035fa6624319d6400171ad427c1090b6/external/grpc/BUILD:1386:16: Compiling src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @grpc//:grpc_lb_policy_pick_first) 
  (cd /root/.cache/bazel/_bazel_root/035fa6624319d6400171ad427c1090b6/execroot/rtp_llm && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda/ \
    CUDNN_INSTALL_PATH=/usr/local/cuda/ \
    LD_LIBRARY_PATH='/lib64:/opt/conda310/lib/:/usr/local/cuda/compat/:/usr/local/nvidia/lib64:/usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs/:/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH' \
    LIBRARY_PATH=/lib64:/opt/conda310/lib/ \
    NCCL_HDR_PATH=/usr/local/cuda/include \
    NCCL_INSTALL_PATH=/usr/local/cuda/ \
    PATH=/root/.cache/bazelisk/downloads/sha256/79e4f370efa6e31717b486af5d9efd95864d0ef13da138582224ac9b2a1bad86/bin:/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/opt/conda310/bin/python3 \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.0,7.5,8.0,8.6,8.9,9.0 \
    TF_CUDA_PATHS=/usr/local/cuda/ \
    TF_CUDA_VERSION=12.4 \
    TF_NCCL_VERSION=2 \
    TF_NEED_CUDA=1 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.o' '-DGRPC_ARES=0' '-DPB_FIELD_32BIT=1' '-DBAZEL_CURRENT_REPOSITORY="grpc"' -iquote external/grpc -iquote bazel-out/k8-opt/bin/external/grpc -iquote external/zlib_archive -iquote bazel-out/k8-opt/bin/external/zlib_archive -iquote external/com_github_nanopb_nanopb -iquote bazel-out/k8-opt/bin/external/com_github_nanopb_nanopb -isystem external/grpc/include -isystem bazel-out/k8-opt/bin/external/grpc/include -isystem external/zlib_archive -isystem bazel-out/k8-opt/bin/external/zlib_archive -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections '-DGTEST_USE_OWN_TR1_TUPLE=0' '-DEIGEN_MAX_CPP_VER=11' -O2 -g -Wall -Werror -Wno-unknown-pragmas -Wno-sign-compare -Wno-attributes -Wno-stringop-truncation -Wno-stringop-overflow -Wno-maybe-uninitialized -Wno-format-overflow -Wno-deprecated-declarations -DOPENSSL_IS_BORINGSSL '-DENABLE_BF16=1' '-DBUILD_CUTLASS_MIXED_GEMM=ON' -DC10_CUDA_NO_CMAKE_CONFIGURE_FILE '-DUSING_CUDA=1' '-DUSING_CUDA12=1' '-DUSE_OLD_TRT_FMHA=1' '-DFMHA_SUPPORT_SPLIT=1' '-DENABLE_FP8=1' '-std=c++17' -Wno-class-memaccess -c external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc -o bazel-out/k8-opt/bin/external/grpc/_objs/grpc_lb_policy_pick_first/pick_first.pic.o)
# Configuration: 66a6ce2341b6b86f2282b848f79e99ce287c8dc74868cde0308708452dedcb65
# Execution platform: @local_config_platform//:host
In file included from external/grpc/src/core/ext/filters/client_channel/lb_policy_registry.h:24,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:28,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:25:
external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h: In member function ‘virtual grpc_core::OrphanablePtr<grpc_core::LoadBalancingPolicy> grpc_core::LoadBalancingPolicyFactory::CreateLoadBalancingPolicy(grpc_core::LoadBalancingPolicy::Args) const’:
external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h:35:14: error: ignoring return value of ‘constexpr typename std::remove_reference<_Tp>::type&& std::move(_Tp&&) [with _Tp = grpc_core::LoadBalancingPolicy::Args&; typename std::remove_reference<_Tp>::type = grpc_core::LoadBalancingPolicy::Args]’, declared with attribute ‘nodiscard’ [-Werror=unused-result]
   35 |     std::move(args);  // Suppress clang-tidy complaint.
      |     ~~~~~~~~~^~~~~~
In file included from /usr/include/c++/11/bits/stl_pair.h:59,
                 from /usr/include/c++/11/bits/stl_algobase.h:64,
                 from /usr/include/c++/11/memory:63,
                 from external/grpc/src/core/lib/gprpp/memory.h:27,
                 from external/grpc/src/core/lib/gprpp/inlined_vector.h:27,
                 from external/grpc/src/core/lib/iomgr/call_combiner.h:29,
                 from external/grpc/src/core/lib/channel/channel_stack.h:46,
                 from external/grpc/src/core/ext/filters/client_channel/client_channel_channelz.h:25,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy.h:24,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy_factory.h:24,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy_registry.h:24,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy/subchannel_list.h:28,
                 from external/grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc:25:
/usr/include/c++/11/bits/move.h:104:5: note: declared here
  104 |     move(_Tp&& __t) noexcept
      |     ^~~~
cc1plus: all warnings being treated as errors
Target //rtp_llm:rtp_llm failed to build
INFO: Elapsed time: 8.088s, Critical Path: 7.22s
INFO: 5247 processes: 4995 internal, 252 local.
FAILED: Build did NOT complete successfully

请检查项目根目录下的 .bazelrc 文件

# ... existing configurations ...

# 针对 grpc 仓库中的所有文件，禁用 unused-result 错误
build --per_file_copt=external/grpc/.*@-Wno-error=unused-result
# 针对 grpc 仓库中的所有文件，禁用 tautological-compare 错误 (新增)
build --per_file_copt=external/grpc/.*@-Wno-error=tautological-compare

# 针对 boringssl 仓库中的所有文件，禁用 array-parameter 错误
build --per_file_copt=external/boringssl/.*@-Wno-error=array-parameter

# 针对 boringssl 仓库中的所有文件，不将任何警告视为错误
build --per_file_copt=external/boringssl/.*@-Wno-error

这样改太多了，先编译通过，删除build --copt -Werror。

编译成功物料处理

root@e7dcd907c6fe:/workspace/ep/rtp-llm# ls bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py 
bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py
root@e7dcd907c6fe:/workspace/ep/rtp-llm# ls bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py 
bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py

ln  -sf `pwd`/bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2_grpc.py  `pwd`/rtp_llm/cpp/model_rpc/proto/
ln  -sf `pwd`/bazel-out/k8-opt/bin/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py  `pwd`/rtp_llm/cpp/model_rpc/proto/model_rpc_service_pb2.py

python3 -m rtp_llm.start_server -h

dowload model:

pip install huggingface_hub
huggingface-cli download Qwen/Qwen1.5-0.5B-Chat --local-dir ./qwen

huggingface-cli download Qwen/Qwen1.5-MoE-A2.7B --local-dir ./Qwen1.5-MoE-A2.7B

python3 -m rtp_llm.start_server --checkpoint_path=../models/qwen --model_type=qwen_2 --start_port=30000

File "/workspace/ep/rtp-llm/rtp_llm/ops/compute_ops.py", line 1, in <module>
    from librtp_compute_ops import *
ImportError: /workspace/ep/rtp-llm/rtp_llm/../bazel-bin/librtp_compute_ops.so: undefined symbol: _ZStlsIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_NSt6thread2idE

或者

cc_library(
    name = "deep_gemm_utils",
    hdrs = ["utils.h", "JIT.h"],
    srcs = ["JIT.cc"],
    copts = copts(),
    linkopts = ["-lpthread"],  # 添加此行
    deps = [
        "//rtp_llm/cpp/utils:core_utils",
        "@boringssl//:ssl",
    ],
    visibility = ["//visibility:public"],
)

发现还是有这个问题：

find / -name "libstdc++.so*"
/opt/nvidia/nsight-compute/2024.1.0/host/linux-desktop-glibc_2_11_3-x64/libstdc++.so.6
/opt/conda310/lib/libstdc++.so.6.0.34
/opt/conda310/lib/libstdc++.so
/opt/conda310/lib/libstdc++.so.6
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so.6.0.34
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so
/opt/conda310/pkgs/libstdcxx-15.2.0-h39759b7_7/lib/libstdc++.so.6
/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30-gdb.py
/usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
/usr/lib/x86_64-linux-gnu/libstdc++.so.6

ldd查看链接的/usr/lib/x86_64-linux-gnu/libstdc++.so.6，编译时可能用的/opt/conda310/lib

export LD_LIBRARY_PATH=/opt/conda310/lib:$LD_LIBRARY_PATH

或者避免系统和conda中so的冲突

rm -rf /opt/conda310
# 重新编译
mkdir -p /opt/conda310/bin
ln -s /usr/bin/python3 /opt/conda310/bin/python
ln -s /usr/include/ /opt/conda310/include
ln -s /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/ /opt/conda310/lib

验证

启动服务：

python3 -m rtp_llm.start_server --checkpoint_path=../models/qwen --model_type=qwen_2 --start_port=30000

cat client.py
import requests
port=30000
url = f"http://localhost:{port}/v1/chat/completions"
json_data = {
     "messages": [
          {
               "role": "user",
               "content": "What is the capital of China?"
          }
     ]
}

response = requests.post(url, json=json_data)
print(f"Output 0: {response.json()}")

server端输出：

initLogger log_file_path: /workspace/ep/rtp-llm/rtp_llm/config/alog.conf
/workspace/ep/rtp-llm/rtp_llm/utils/grpc_util.py:35: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_new.cpp:1561.)
  return torch.frombuffer(t.int32_data, dtype=torch.int32).reshape(list(t.shape))

client请求返回打印：The capital of China is Beijing.

Output 0: {'id': 'chat-', 'object': 'chat.completion', 'created': 1772370776, 'model': '', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The capital of China is Beijing.', 'partial': False}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 26, 'total_tokens': 34, 'completion_tokens': 8}, 'aux_info': {'cost_time': 23.463, 'iter_count': 8, 'prefix_len': 0, 'input_len': 26, 'output_len': 8, 'step_output_len': 8, 'first_token_cost_time': 3.398, 'wait_time': 0.077, 'pd_sep': False, 'cum_log_probs': [], 'beam_responses': [], 'softmax_probs': [], 'reuse_len': 0, 'local_reuse_len': 0, 'remote_reuse_len': 0, 'memory_reuse_len': 0, 'prefill_total_reuse_len': 0, 'prefill_local_reuse_len': 0, 'prefill_remote_reuse_len': 0, 'prefill_memory_reuse_len': 0, 'decode_total_reuse_len': 0, 'decode_local_reuse_len': 0, 'decode_remote_reuse_len': 0, 'decode_memory_reuse_len': 0, 'role_addrs': [], 'aux_string': ''}}

编译验证环境​

编译​

安装conda310 python​

安装cudnn​

nccl 依赖​

libtinfo.so.6​

bazel下载依赖访问不了​

error​

编译成功物料处理​

验证​