MEMO 环境搭建
Linux 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 apt-get update apt-get install sudo sudo apt updatesudo apt upgradesudo apt install curlssh user@IP -p port vim /root/.ssh/authorized_keys sudo apt updatesudo apt install openssh-client -yssh-keygen -t ed25519 -C "467638484@qq.com" eval "$(ssh-agent -s) " ssh-add ~/.ssh/id_ed25519 cat ~/.ssh/id_ed25519.pub sudo apt-get updatesudo apt-get install openssh-server -ysudo service sshd statussudo service sshd startps -e | grep sshd sudo vim /etc/ssh/sshd_configPort 22 PermitRootLogin yes sudo /etc/init.d/ssh restartcat /etc/hostname sudo vim /etc/hosts
Vim:
1 2 3 4 5 6 vim ~/.vimrc set number:%y+
其它:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ls -lhStmux ls tmux new -s download tmux new-session -s session_name tmux attach -t session_name tmux list-sessions tmux kill-session -t session_name scp example.txt user@remote_host:/home/user/ scp 1.png root@139.9.155.20:/home/sss/images/ scp user@remote_host:/home/user/example.txt . export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH :/usr/include/c++/13:/usr/include/c++/13/x86_64-openEuler-linuxexport https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 all_proxy=socks5://127.0.0.1:7890echo "要追加的文本" >> 文件名telnet [ip] [port] nc -zv 1.95.9.213 2222 sudo find / -name "libdrvdsmi_host.so" export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
Docker 1 2 3 4 5 6 7 8 9 10 11 12 13 cd /home/sss/docker/docker-compose -p sss up -d docker exec -it sss /bin/bash docker exec -it <容器名或ID> /bin/bash docker stop <容器名或ID> docker restart <容器名或ID> docker commit <容器名或ID> <镜像名> docker rm <容器名或ID> docker rename <old name> <new name>
Git 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 git config --global user.email "467638484@qq.com" git config --global user.name "shen-shanshan" vim ~/.gitconfig pr = "!f() { git fetch -fu ${2:-$(git remote |grep ^upstream || echo origin)} refs/pull/$1 /head:pr/$1 && git checkout pr/$1 ; }; f" sync = "!f() { git fetch upstream && git rebase upstream/main; }; f" nb = "!f() { git fetch upstream && git checkout -b $1 upstream/main && git branch; }; f" db = "!f() { git branch -D $1 && git branch; }; f" git checkout <commit> git checkout -b <commit> git stash git stash push -u git stash pop git stash drop [stash_id] git stash clear git clone -b 分支名 仓库地址 git cherry-pick <commitHash> git checkout -b <branch_name> <tag_name> git fetch --tags git tag -l git checkout -b release-2.0.1 v2.0.1 git checkout v2.0.1 git revert <commit-hash>..HEAD --no-edit git revert <commit-hash> git revert <commitA>..<commitB> $ git commit -m "Refactor usability tests. \ > > Co-authored-by: NAME <NAME@EXAMPLE.COM> > Co-authored-by: linfeng-yuan <1102311262@qq.com>" curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfsgit lfs install
pre-commit:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 pip install pre-commit pre-commit install sudo apt updatesudo apt install golang-go -ygo version go env -w GOPROXY=https://goproxy.cn,direct --no-verify git config --global http.version HTTP/1.1 pre-commit clean rm -rf ~/.cache/pre-commitpre-commit install pre-commit run --all-files
Pip 1 2 pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple pip install --upgrade transformers
Conda:
1 2 conda install python=3.11
CANN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 source /usr/local/Ascend/ascend-toolkit/set_env.shsource /usr/local/Ascend/nnal/atb/set_env.shsource /home/sss/Ascend/ascend-toolkit/set_env.shsource /home/sss/Ascend/nnal/atb/set_env.shcat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.infocat /home/sss/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.infoexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH :/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlibfind . -name libascend_hal.so env | grep LD_LIBRARY_PATHexport LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/home/sss/Ascend/ascend-toolkit/latest/aarch64-linux/devlib:$LD_LIBRARY_PATH source /usr/local/Ascend/ascend-toolkit/set_env.shsource /usr/local/Ascend/nnal/atb/set_env.sh
HCCL 1 2 3 4 5 6 7 export HCCL_BUFFSIZE=2048export HCCL_IF_BASE_PORT=50000sysctl -w net.ipv4.ip_local_reserved_ports=50000-50015 sysctl -w net.ipv4.ip_local_reserved_ports=60000-60015
Model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 tmux ls tmux new -s download tmux new-session -s session_name tmux attach -t session_name tmux list-sessions tmux kill-session -t session_name
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import osfrom modelscope import snapshot_downloados.environ["MODELSCOPE_CACHE" ] = "/home/sss/.cache/modelscope/hub" os.environ["MODELSCOPE_CACHE" ] = "/root/.cache/modelscope/hub" os.environ["MODELSCOPE_CACHE" ] = "/root/.cache/modelscope/hub" model_dir = snapshot_download("Qwen/Qwen3-Omni-30B-A3B-Thinking" ) from huggingface_hub import snapshot_downloadMODEL = "rednote-hilab/dots.ocr" model_dir = snapshot_download( repo_id=MODEL, revision="main" , local_dir="/home/sss/.cache/huggingface/hub/models/model_name" , local_dir_use_symlinks=False , ) print (f"Download {MODEL} to {model_dir} finished." )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 /home/sss/.cache/modelscope/hub/Qwen/Qwen2.5-0.5B-Instruct /home/sss/cache/modelscope/models/Qwen/Qwen2.5-7B-Instruct /home/sss/cache/modelscope/models/deepseek-ai/DeepSeek-V2-Lite-Chat /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-0.5B-Instruct /root/.cache/modelscope/hub/models/Qwen/Qwen3-8B /root/.cache/modelscope/hub/models/deepseek-ai/DeepSeek-V2-Lite /home/sss/.cache/modelscope/hub/models/Qwen/Qwen2.5-1.5B-Instruct /shared/cache/modelscope/hub/models/Qwen/Qwen2.5-7B-Instruct /shared/cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct /shared/cache/modelscope/hub/models/Qwen/Qwen2-Audio-7B-Instruct /shared/cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B /shared/cache/modelscope/hub/models/ZhipuAI/glm-4-9b /home/sss/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct /shared/cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct /home/sss/models/models/models/vllm-ascend/EAGLE3-LLaMA3.1-Instruct-8B /home/sss/models/models/models/vllm-ascend/DeepSeek-R1-W8A8 /root/.cache/modelscope/hub/models/Qwen/Qwen2___5-0___5B-Instruct /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-7B-Instruct /root/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B /root/.cache/modelscope/hub/models/ZhipuAI/glm-4-9b /root/.cache/modelscope/hub/models/ZhipuAI/GLM-4___5
Ascend 01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 docker pull quay.nju.edu.cn/ascend/vllm-ascend:main export IMAGE=quay.io/ascend/vllm-ascend:mainexport IMAGE=quay.nju.edu.cn/ascend/vllm-ascend:maindocker run \ --name sss \ -e ASCEND_VISIBLE_DEVICES=0,1 \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /home/sss:/home/sss \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /data/disk2/cache:/root/.cache \ -p 8002:8002 \ -p 8333:22 \ -it $IMAGE /bin/bash cd /home/sss/docker-compose -p sss up -d docker-compose -p sss-cann-test up -d
docker-compose.yaml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 services: sss: image: quay.nju.edu.cn/ascend/vllm-ascend:main container_name: sss volumes: - /usr/local/dcmi:/usr/local/dcmi - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi - /usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64 - /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info - /etc/ascend_install.info:/etc/ascend_install.info - /home/sss:/home/sss - /data/disk2/cache:/root/.cache ports: - 8009 :8009 restart: unless-stopped hostname: ascend-01 tty: true devices: - /dev/davinci4 - /dev/davinci5 - /dev/davinci_manager - /dev/devmm_svm - /dev/hisi_hdc cap_add: - SYS_PTRACE shm_size: 20gb
A3 集群 基本信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 %cQlTuPZOdE+/T4TnIPGUNw+ mkdir -p /mnt/sfs_turbomount -t nfs -o vers=3,nolock,proto=tcp,noresvport 23021270-7ebf-43b2-925c-b1686da4868a.sfsturbo.internal:/ /mnt/sfs_turbo 1.95.9.213 (172.22.0.218) 172.22.0.155 172.22.0.188 172.22.0.212 ssh root@172.22.0.218 ssh root@172.22.0.155 ssh root@172.22.0.188 ssh root@172.22.0.212 exit npu-list npu-smi info
启动容器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 export IMAGE=quay.io/ascend/vllm-ascend:mainexport IMAGE=quay.io/ascend/vllm-ascend:main-a3export IMAGE=quay.nju.edu.cn/ascend/vllm-ascend:main-a3docker run \ --privileged=true \ --name sss \ --net=host \ -e ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /home/sss:/home/sss \ -v /mnt/sfs_turbo/ascend-ci-share-nv-action-vllm-benchmarks:/root/.cache \ -p 8333:22 \ -e VLLM_USE_MODELSCOPE=True \ -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \ -it $IMAGE /bin/bash --device /dev/davinci8 \ --device /dev/davinci9 \ --device /dev/davinci10 \ --device /dev/davinci11 \ --device /dev/davinci12 \ --device /dev/davinci13 \ --device /dev/davinci14 \ --device /dev/davinci15 \ docker exec -it sss /bin/bash docker start sss docker stop sss docker rm sss exit
VSCode 1 2 3 4 5 6 7 折叠所有:Ctrl/Cmd + K + 0 展开所有:Ctrl/Cmd + K + J *.md,*.yaml,*.h,*.hpp,*.cu,*.cuh,test *.py,*.cmake,examples/*,tests/*,*.sh,*.env ,*.yml,.gitignore *.md,*.yaml,*.h,*.hpp,*.cu,*.cuh,*.cmake,examples/*,*.sh,*.env ,*.yml,.gitignore
Debug 1 2 3 4 5 6 7 8 9 10 11 12 13 14 if torch.distributed.get_rank() == 0: print ("xxx" ) with open(f"./debug_{torch.distributed.get_rank()}.log" ,"w" ) as f: print (f"scheduler_output: {scheduler_output}" ,file=f, flush=True) print (..., flush=True)
VSCode 调试配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 { "version" : "0.2.0" , "configurations" : [ { "name" : "Python 调试程序: 当前文件" , "type" : "debugpy" , "request" : "launch" , "program" : "${file}" , "console" : "integratedTerminal" , "python" : "/root/miniconda3/envs/vllm/bin/python" , "subProcess" : true , "justMyCode" : false , } ] }
Open Source 开源社区常用话术:
The CI is finally passed and this PR can be merged.
I have rebased on the latest main and nothing changed.
I haven’t got a chance to take a look but let me add ready label.
Sorry for late, I’m busy with working these days. I’ll take a look at this PR today. Sorry for late, I’m busy with working these days. I left some comments and questions. I left some comments and questions. Others LGTM.
Done with my pass. Also CC @ Could someone with write access approve if it looks good to you?
常用符号:
🎯
[!NOTE]
logs