xhj的博客

发表于2025-08-11|LLM

好的，我们来详细解释一下 Ray 框架中的 Placement Group 概念及其作用。核心概念：什么是 Placement Group？Placement Group 是 Ray 中用于精细化控制任务或角色资源布局的抽象概念。它允许你提前声明一“组”资源，并指定这组资源在集群中的“摆放策略”，然后你可以将任务或演员调度到这组资源的“插槽”中。你可以把它想象成：传统调度：就像你去餐厅，告诉服务员“我需要两个座位”，服务员会随机找两个空位给你。使用 Placement Group：就像你提前预定了一个包含特定座位（比如一个沙发座和两个普通座）的卡座，并指定这个卡座要靠窗。之后你的朋友们会直接到这个预定好的卡座就坐。为什么需要 Placement Group？它的作用是什么？在没有 Placement Group 的情况下，Ray 的默认调度器虽然能高效地分配资源，但它主要关注的是“资源量”（例如，需要 2 个 CPU），而不太关心“资源的位置”。这在很多高级场景下会成为一个瓶颈。 Placement Group 的主要作用体现在以下几个方面： 1. 实现任务间的紧...

Ray框架中Actor概念及作用详解

发表于2025-08-10|LLM

好的，我们来详细讲解一下 Ray 框架中的 Actor 概念。 1. Actor 是什么？在 Ray 中，Actor 是一个有状态的“工作进程”。你可以把它理解为一个“活的”对象。普通 Python 类 vs. Ray Actor 一个普通的 Python 类 MyClass 在被实例化后（obj = MyClass()），obj 只是一个存在于当前进程内存中的对象。一个 Ray Actor 是通过 ray.remote(MyClass) 创建的。当你实例化它时（actor_handle = MyClass.remote()），Ray 会在集群中的某个节点上（可能是在远程机器上）启动一个独立的进程，并在该进程中创建一个 MyClass 的实例。你得到的 actor_handle 并不是对象本身，而是一个指向远程对象（即那个进程中的实例）的引用或句柄。核心特征：有状态Actor 的关键在于它可以在其整个生命周期内维护和修改内部状态（即它的实例变量，如 self.value）。所有对该 Actor 的方法调用都会在同一个进程中顺序执行，从而安全地修改和访问这个共享状态。...

Qwen3-VL-8B推理结果为空

发表于2025-08-09|LLM

123456789101112131415161718192021222324252627python -m vllm.entrypoints.openai.api_server \--model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \--dtype bfloat16 \--gpu-memory-utilization 0.9 \--max-model-len 8192 \--max_num_batched_tokens 8192--host 0.0.0.0 \--port 8000 \--compilation-config '{"cudagraph_capture_sizes": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]}'curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: app...

Qwen3-VL-8B在910B4推理调用超过上下文长度的图片报错

发表于2025-08-08|LLM

12345678# localscp -P 8333 images.zip root@139.9.155.20:/media/# remotesudo apt updatesudo apt install -y unzipcd /mediaunzip images.zip 准备图片： 12345678910111213141516#!/usr/bin/env bashimage_base64=$(base64 -w 0 /media/b0.jpg)cat > /media/image_request.json <<EOF{ "model": "/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct", "messages": [ {"role": "system", "content": "You are a helpfu...

Qwen3-Omni没有vllm_config属性

发表于2025-08-07|LLM

1234class Qwen3OmniMoeThinkerForConditionalGeneration: def __init__(...): self.vllm_config = vllm_config

Qwen2.5-VL_RoPE计算流程详解

发表于2025-08-06|LLM

Qwen2.5-VisionTransformer 中 RoPE cos/sin 的计算流程整个流程从 forward(x, grid_thw) 开始，分为以下几个阶段：第一步：初始化时预计算 cos/sin 缓存qwen2_5_vl.py:608-612 123456self.rotary_pos_emb = get_rope( head_size=head_dim, max_position=8192, is_neox_style=True, rope_parameters={"partial_rotary_factor": 0.5},) get_rope 最终创建一个 RotaryEmbedding 对象，关键参数： rotary_dim = head_dim * 0.5 → 只使用一半的 head 维度做旋转在 _compute_cos_sin_cache 中(base.py:83-92)： 12345# inv_freq 形状: [rotary_dim // 2]inv_fr...

Qwen2.5-VL_1

发表于2025-08-05|LLM

Qwen2.5-VL 差异对比Qwen2_5_VLForConditionalGeneration__init__()123456789101112131415161718192021222324252627282930# vllmself.use_data_parallel = multimodal_config.mm_encoder_tp_mode == "data"if multimodal_config.get_limit_per_prompt( "image") or multimodal_config.get_limit_per_prompt("video"): attn_backend_override = ( multimodal_config.mm_encoder_attn_backend if multimodal_config is not None else None ) self.visual = Qwen2_5_VisionTr...

Qwen2.5-VL

发表于2025-08-04|LLM

Qwen2.5-VL12Layer:- Qwen 方法 = vLLM 算子 LayersQwen2_5_VisionTransformer: patch_embed = Qwen2_5_VisionPatchEmbed rotary_pos_emb = Qwen2_5_VisionRotaryEmbedding blocks = Qwen2_5_VisionBlock * layer_num merger = Qwen2_5_VisionPatchMerger Qwen2_5_VisionPatchEmbed: proj = nn.Conv3d Qwen2_5_VisionRotaryEmbedding: Qwen2_5_VisionBlock: norm1 = RMSNorm attn = Qwen2_5_VisionAttention norm2 = RMSNorm mlp = Qwen2_5_VisionMLP Qwen2_5_VisionAttention: ...

Qwen2-VL精度问题

发表于2025-08-03|LLM

Qwen2-VL 精度问题12345678------------------------------ Captured log call -------------------------------WARNING transformers.models.auto.image_processing_auto:logging.py:328 The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that t...

Qwen2-VL报错

发表于2025-08-02|LLM

1234apply_token_matchesapply_text_matchesTypeError: can't convert npu:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.