Hi, I encountered a very similar issue with RayChannelTimeoutError and vLLM crashing when running distributed inference with a large model and batch size. Here are the details:
Environment:
- vLLM version: 0.9.0 (V1 API)
- Model: InternVL3-78B
- GPU: 8 x 95GB (NVIDIA)
- CUDA: 12.x
- Python: 3.11
- OS: Linux 5.4.119-19.0009.56 (Docker container)
- Ray backend: enabled (–distributed-executor-backend ray)
- vLLM launch command:
vllm serve /mnt/afs/share_models/git_models/OpenGVLab/InternVL3-78B/ \ --port 6699 \ --host 0.0.0.0 \ --tensor-parallel-size 8 \ --distributed-executor-backend ray \ --disable-custom-all-reduce \ --dtype half \ --max-model-len 32768 \ --trust-remote-code \ --max-num-batched-tokens 61400 \ --max-num-seqs 128 \ --gpu-memory-utilization 0.95 \ --swap-space 0 \ --cpu-offload-gb 0 \ --block-size 32 \ --enable-chunked-prefill \ --enable-prefix-caching \ --max-seq-len-to-capture 8192 \ --limit-mm-per-prompt '{"image": 1}' \ --disable-log-requests \ --disable-log-stats \ --load-format auto
Client:
Batch size: 16
Using vLLM OpenAI API (/v1/chat/completions)
What happens:
- The first batch of requests (16 samples) completes successfully.
- Starting from the second batch, all requests fail with HTTP 500 errors.
- Client logs show:
Connection pool is full, discarding connection
Failed to establish a new connection: [Errno 111] Connection refused - Server logs show:
ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read.
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue.
Ray workers are killed, SIGTERM received, server process exits.
Sample server log:
ERROR [core.py:502] ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. ObjectID: ... ERROR [core.py:502] The above exception was the direct cause of the following exception: ERROR [core.py:502] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. ... INFO: ... "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ...
Additional details:
- Each input sample consists of one image (about 256 tokens after encoding) and 500 text tokens; the average output is about 200 tokens.
- On the client side, I use a thread pool to send requests concurrently. The thread pool size is set to min(len(batch_requests), 64), i.e., no more than 8 threads per batch, to avoid too many concurrent threads.
- Currently, if I set both the batch size and max_workers to 8, the system runs stably. However, increasing either to 16 immediately triggers the RayChannelTimeoutError and server crash.
- With similar parameters, I have previously run the pipeline at nearly 10x the current speed, which makes the current performance and stability seem unreasonable.
What I have tried:
- Reducing batch size to 8 or 4 (still crashes, but sometimes after more batches)
- Lowering max-num-seqs and max-num-batched-tokens (no fundamental improvement)
- Monitoring GPU/CPU/memory usage (no OOM, but Ray workers die after a few batches)
- Disabling --enable-chunked-prefill and --enable-prefix-caching (no effect)
- Single-node, single-GPU mode works fine (no crash)
- All model weights are complete and compatible
Summary:
- The issue only occurs with Ray distributed backend and large model/batch.
- First batch always works, subsequent batches trigger RayChannelTimeoutError and server crash.
- This seems to be the same root cause as described in this issue.
- Any advice or workaround would be greatly appreciated!
- Is there a recommended way to stabilize vLLM distributed inference for large models?
- Are there any Ray/vLLM config tweaks or patches that help?
- Is this a known bug with a fix in a newer version?
Thank you!
client log details:
root@5a80c4dad3b0:/mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage3_valid# python stage3_vadlid.py --input /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage1_cls/output_stage1/llavaOV_sensenova_inf_30w_35w_48199/llavaOV_sensenova_inf_30w_35w_48199_cls.json --api-url http://172.21.0.67:6699 --batch-size 16
[运行模式] 纯CPU模式:CPU模式
[运行模式] 纯CPU模式:CPU模式
2025-07-29 14:08:00,701 [INFO] [DIAG] 开始环境诊断(vLLM批推理模式)...
2025-07-29 14:08:00,703 [INFO] [DIAG] 系统: Linux 5.4.119-19.0009.56
2025-07-29 14:08:00,703 [INFO] [DIAG] 主机名: 5a80c4dad3b0
2025-07-29 14:08:00,703 [INFO] [DIAG] CPU核心数: 128
2025-07-29 14:08:00,703 [INFO] [DIAG] 容器环境: 是
2025-07-29 14:08:00,703 [INFO] [DIAG] vLLM API地址: http://172.21.0.67:6699
2025-07-29 14:08:00,707 [INFO] [DIAG] ✅ vLLM API服务连接正常
2025-07-29 14:08:00,707 [INFO] [DIAG] 💡 vLLM批推理版本优势:
2025-07-29 14:08:00,707 [INFO] [DIAG] - 智能批处理:根据vLLM服务配置自动调整批次大小
2025-07-29 14:08:00,707 [INFO] [DIAG] - 并发请求:利用线程池最大化vLLM服务的并发处理能力
2025-07-29 14:08:00,707 [INFO] [DIAG] - 高效处理:批量处理提升推理效率
2025-07-29 14:08:00,707 [INFO] [DIAG] - 资源友好:纯CPU运行,不占用本地GPU资源
2025-07-29 14:08:00,707 [INFO] [DIAG] - 宽松标准:部分正确也算正确,提高通过率
2025-07-29 14:08:00,707 [INFO] [DIAG] - 远程vLLM:使用http://172.21.0.67:6699的远程vLLM服务
2025-07-29 14:08:00,707 [INFO] [DIAG] 🔧 调试建议:如遇到问题,请尝试:
2025-07-29 14:08:00,707 [INFO] [DIAG] - 添加 --test-mode --batch-size 4 --debug 参数
2025-07-29 14:08:00,707 [INFO] [DIAG] - 检查远程vLLM服务是否正常运行
2025-07-29 14:08:00,707 [INFO] [DIAG] - 确认网络连接到http://172.21.0.67:6699
2025-07-29 14:08:00,707 [INFO] [DIAG] 📋 必需参数提醒:
2025-07-29 14:08:00,707 [INFO] [DIAG] - 必须指定 --api-url 或 --host + --port 参数
2025-07-29 14:08:00,707 [INFO] [DIAG] - 示例: --api-url http://172.21.0.36:6666
2025-07-29 14:08:00,707 [INFO] [DIAG] - 或者: --host 172.21.0.36 --port 6666
2025-07-29 14:08:00,707 [INFO] [DIAG] 🎯 vLLM批推理说明:
2025-07-29 14:08:00,707 [INFO] [DIAG] - 智能批处理:自动根据vLLM的max_num_seqs配置调整
2025-07-29 14:08:00,707 [INFO] [DIAG] - 并发优化:每批最多8个并发线程,避免服务过载
2025-07-29 14:08:00,707 [INFO] [DIAG] - 批次大小:建议4-16(会自动分割为更小的并发批次)
2025-07-29 14:08:00,707 [INFO] [DIAG] - 服务配置:检测到您的vLLM配置max_num_seqs=32
2025-07-29 14:08:00,707 [INFO] ============================================================
2025-07-29 14:08:00,707 [INFO] Stage-3 Valid InternVL-78B 一步验证脚本(vLLM批推理版本)
2025-07-29 14:08:00,707 [INFO] ============================================================
2025-07-29 14:08:00,707 [INFO] 🖥️ 运行模式: CPU模式
2025-07-29 14:08:00,707 [INFO] 📁 输入文件: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage1_cls/output_stage1/llavaOV_sensenova_inf_30w_35w_48199/llavaOV_sensenova_inf_30w_35w_48199_cls.json
2025-07-29 14:08:00,707 [INFO] 📂 输出根目录: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage3_valid/output_stage3
2025-07-29 14:08:00,707 [INFO] 📋 实际输出目录: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage3_valid/output_stage3/llavaOV_sensenova_inf_30w_35w_48199_cls/
2025-07-29 14:08:00,707 [INFO] ├── llavaOV_sensenova_inf_30w_35w_48199_cls_valid.json # 验证通过的数据
2025-07-29 14:08:00,707 [INFO] ├── llavaOV_sensenova_inf_30w_35w_48199_cls_invalid.json # 验证失败的数据
2025-07-29 14:08:00,708 [INFO] ├── llavaOV_sensenova_inf_30w_35w_48199_cls_report.txt # 详细报告
2025-07-29 14:08:00,708 [INFO] └── checkpoints/ # 检查点文件
2025-07-29 14:08:00,708 [INFO] ⚙️ vLLM批次大小: 16
2025-07-29 14:08:00,708 [INFO] 🌐 vLLM API地址: http://172.21.0.67:6699
2025-07-29 14:08:00,708 [INFO] 🔁 请求重试次数: 3
2025-07-29 14:08:00,708 [INFO] 💾 检查点间隔: 每 100 条数据
2025-07-29 14:08:00,708 [INFO] ✨ 验证模式: 一步宽松验证(部分正确也算正确)
2025-07-29 14:08:00,708 [INFO] 🚀 推理方式: vLLM原生批推理格式
2025-07-29 14:08:00,708 [INFO] ⚡ 资源优化: 纯CPU运行,使用远程vLLM服务
2025-07-29 14:08:00,708 [INFO] 🎯 模型路径: /mnt/afs/share_models/git_models/OpenGVLab/InternVL3-78B
2025-07-29 14:08:00,708 [INFO] ============================================================
2025-07-29 14:08:00,708 [INFO] 正在测试vLLM API连接: http://172.21.0.67:6699
2025-07-29 14:08:00,711 [INFO] ✅ vLLM API健康检查通过
2025-07-29 14:08:00,714 [INFO] 可用模型: {'object': 'list', 'data': [{'id': '/mnt/afs/share_models/git_models/OpenGVLab/InternVL3-78B/', 'object': 'model', 'created': 1753769280, 'owned_by': 'vllm', 'root': '/mnt/afs/share_models/git_models/OpenGVLab/InternVL3-78B/', 'parent': None, 'max_model_len': 32768, 'permission': [{'id': 'modelperm-4af0ba68d6ae4301a7c40d5c0345c0e4', 'object': 'model_permission', 'created': 1753769280, 'allow_create_engine': False, 'allow_sampling': True, 'allow_logprobs': True, 'allow_search_indices': False, 'allow_view': True, 'allow_fine_tuning': False, 'organization': '*', 'group': None, 'is_blocking': False}]}]}
2025-07-29 14:08:00,714 [INFO] ✅ vLLM API连接配置完成: http://172.21.0.67:6699
2025-07-29 14:08:00,714 [INFO] 🔄 检查点恢复: 启用
2025-07-29 14:08:09,656 [INFO] 🔍 检查点状态:
2025-07-29 14:08:09,657 [INFO] validation_progress: 32200 个样本已处理
2025-07-29 14:08:09,657 [INFO] validation_batch: 32160 个样本已处理
2025-07-29 14:08:09,657 [INFO] 📊 将从进度检查点恢复: 32200 个样本
2025-07-29 14:08:09,657 [INFO] 开始运行 Stage-3 Valid 一步验证流程(vLLM批推理版本)...
2025-07-29 14:08:09,657 [INFO] 所有输出将保存至: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage3_valid/output_stage3/llavaOV_sensenova_inf_30w_35w_48199_cls
2025-07-29 14:08:09,657 [INFO] 加载数据集: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage1_cls/output_stage1/llavaOV_sensenova_inf_30w_35w_48199/llavaOV_sensenova_inf_30w_35w_48199_cls.json
2025-07-29 14:08:11,695 [INFO] 成功加载并标准化 48199 / 48199 条数据
2025-07-29 14:08:19,044 [INFO] 检查点将保存至: /mnt/afs/smartbrain/zhukangkang/code/common_data_pipeline/stage3_valid/output_stage3/llavaOV_sensenova_inf_30w_35w_48199_cls/checkpoints
2025-07-29 14:08:22,866 [INFO] [CHECKPOINT] 发现 3 个检查点,尝试加载最新的...
2025-07-29 14:08:22,866 [INFO] [CHECKPOINT] 尝试加载检查点: 32200 个已处理样本
2025-07-29 14:08:24,097 [INFO] [CHECKPOINT] ✅ 成功加载检查点: 正确=26005, 错误=6195, 总计=32200
2025-07-29 14:08:24,097 [INFO] [RESUME] 🔄 从检查点恢复验证流程
2025-07-29 14:08:24,097 [INFO] [RESUME] 📊 已处理: 32200 / 48199 个样本
2025-07-29 14:08:24,097 [INFO] [RESUME] 📈 检查点数据: 正确=26005, 错误=6195
2025-07-29 14:08:24,097 [INFO] [RESUME] 📋 还需处理: 15999 个样本
2025-07-29 14:08:24,097 [INFO] >>> 继续一步验证(从检查点恢复)...
2025-07-29 14:08:24,097 [INFO] 继续处理剩余 15999 个样本...
2025-07-29 14:08:24,098 [INFO] [VALIDATION] 恢复验证: 15999个样本,分1000个批次处理
2025-07-29 14:08:24,098 [INFO] [VALIDATION] 🔄 恢复模式: 从第 32201 个样本开始
2025-07-29 14:08:24,098 [INFO] [VALIDATION] 📊 已有数据: 正确=26005, 错误=6195
2025-07-29 14:08:24,098 [INFO] [VALIDATION] 批处理大小: 16, 预计总时间: 约8000秒
恢复验证: 0%| | 0/1000 [00:00<?, ?batch/s]2025-07-29 14:08:24,106 [INFO] [VALIDATION] ==================== 批次 1/1000 ====================
2025-07-29 14:08:24,106 [INFO] [VALIDATION] 全局样本范围: 32201-32216
2025-07-29 14:08:24,106 [INFO] [VALIDATION] 已处理总数: 32200个样本
2025-07-29 14:08:24,106 [INFO] [vLLM-BATCH] 开始处理批次,大小=16,首个样本ID=id_333620
2025-07-29 14:08:24,106 [INFO] [vLLM-BATCH] 开始准备vLLM批推理数据...
2025-07-29 14:08:24,788 [INFO] [vLLM-BATCH] 数据准备完成: 有效=16, 无效=0
2025-07-29 14:08:24,788 [INFO] [vLLM-INFERENCE] 开始vLLM批推理,有效样本数=16
2025-07-29 14:08:24,788 [INFO] [vLLM-INFERENCE] 准备发送vLLM批推理请求,数量=16
2025-07-29 14:08:24,788 [INFO] [vLLM-API] 使用vLLM批推理处理 16 个请求
2025-07-29 14:08:24,788 [INFO] [vLLM-API] 处理批次 1,大小: 16
2025-07-29 14:08:33,974 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:34,180 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:34,200 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:34,218 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:34,309 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:36,435 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:36,437 [INFO] [vLLM-API] 批次完成,成功: 16/16
2025-07-29 14:08:36,437 [INFO] [vLLM-PARSE] 开始解析vLLM批推理结果,响应数=16
2025-07-29 14:08:36,437 [INFO] [vLLM-PARSE] 结果解析完成: 总结果=16
2025-07-29 14:08:36,437 [INFO] [vLLM-PARSE] 批次统计: 成功=13/16
2025-07-29 14:08:36,438 [INFO] [VALIDATION] 批次1完成,耗时: 12.33秒
2025-07-29 14:08:36,438 [INFO] [VALIDATION] 批次1结果: 正确=13, 错误=3
2025-07-29 14:08:36,438 [INFO] [VALIDATION] 累计统计: 处理=32216, 成功率=80.8%
恢复验证: 0%| | 1/1000 [00:12<3:25:19, 12.33s/batch, processed=32216, success_rate=80.8%]2025-07-29 14:08:36,438 [INFO] [VALIDATION] ==================== 批次 2/1000 ====================
2025-07-29 14:08:36,438 [INFO] [VALIDATION] 全局样本范围: 32217-32232
2025-07-29 14:08:36,438 [INFO] [VALIDATION] 已处理总数: 32216个样本
2025-07-29 14:08:36,438 [INFO] [vLLM-BATCH] 开始处理批次,大小=16,首个样本ID=id_333634
2025-07-29 14:08:36,438 [INFO] [vLLM-BATCH] 开始准备vLLM批推理数据...
2025-07-29 14:08:36,968 [INFO] [vLLM-BATCH] 数据准备完成: 有效=16, 无效=0
2025-07-29 14:08:36,968 [INFO] [vLLM-INFERENCE] 开始vLLM批推理,有效样本数=16
2025-07-29 14:08:36,969 [INFO] [vLLM-INFERENCE] 准备发送vLLM批推理请求,数量=16
2025-07-29 14:08:36,969 [INFO] [vLLM-API] 使用vLLM批推理处理 16 个请求
2025-07-29 14:08:36,969 [INFO] [vLLM-API] 处理批次 1,大小: 16
2025-07-29 14:08:47,841 [WARNING] [vLLM-API] 请求 2 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,841 [WARNING] [vLLM-API] 请求 4 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,842 [WARNING] [vLLM-API] 请求 8 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,842 [WARNING] [vLLM-API] 请求 12 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,842 [WARNING] [vLLM-API] 请求 10 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,842 [WARNING] [vLLM-API] 请求 11 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,843 [WARNING] [vLLM-API] 请求 6 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,843 [WARNING] [vLLM-API] 请求 14 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,843 [WARNING] [vLLM-API] 请求 7 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,843 [WARNING] [vLLM-API] 请求 1 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,845 [WARNING] Connection pool is full, discarding connection: 172.21.0.67. Connection pool size: 10
2025-07-29 14:08:47,846 [WARNING] [vLLM-API] 请求 13 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,846 [WARNING] [vLLM-API] 请求 5 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,846 [WARNING] [vLLM-API] 请求 3 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,846 [WARNING] [vLLM-API] 请求 9 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,847 [WARNING] [vLLM-API] 请求 16 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,847 [WARNING] [vLLM-API] 请求 15 失败: vLLM API请求失败,状态码: 500
2025-07-29 14:08:47,848 [INFO] [vLLM-API] 批次完成,成功: 0/16
2025-07-29 14:08:47,848 [INFO] [vLLM-PARSE] 开始解析vLLM批推理结果,响应数=16
2025-07-29 14:08:47,848 [INFO] [vLLM-PARSE] 结果解析完成: 总结果=16
2025-07-29 14:08:47,848 [INFO] [vLLM-PARSE] 批次统计: 成功=0/16
2025-07-29 14:08:47,849 [INFO] [VALIDATION] 批次2完成,耗时: 11.41秒
2025-07-29 14:08:47,849 [INFO] [VALIDATION] 批次2结果: 正确=0, 错误=16
2025-07-29 14:08:47,849 [INFO] [VALIDATION] 累计统计: 处理=32232, 成功率=80.7%
恢复验证: 0%| | 1/1000 [00:23<3:25:19, 12.33s/batch, processed=32232, success_rate=80.7%]2025-07-29 14:08:47,849 [INFO] [VALIDATION] 预计剩余时间: 189.8分钟
恢复验证: 0%| | 2/1000 [00:23<3:16:06, 11.79s/batch, processed=32232, success_rate=80.7%]2025-07-29 14:08:47,849 [INFO] [VALIDATION] ==================== 批次 3/1000 ====================
2025-07-29 14:08:47,849 [INFO] [VALIDATION] 全局样本范围: 32233-32248
2025-07-29 14:08:47,849 [INFO] [VALIDATION] 已处理总数: 32232个样本
2025-07-29 14:08:47,849 [INFO] [vLLM-BATCH] 开始处理批次,大小=16,首个样本ID=id_333659
2025-07-29 14:08:47,849 [INFO] [vLLM-BATCH] 开始准备vLLM批推理数据...
2025-07-29 14:08:48,025 [INFO] [vLLM-BATCH] 数据准备完成: 有效=16, 无效=0
2025-07-29 14:08:48,025 [INFO] [vLLM-INFERENCE] 开始vLLM批推理,有效样本数=16
2025-07-29 14:08:48,025 [INFO] [vLLM-INFERENCE] 准备发送vLLM批推理请求,数量=16
2025-07-29 14:08:48,025 [INFO] [vLLM-API] 使用vLLM批推理处理 16 个请求
2025-07-29 14:08:48,025 [INFO] [vLLM-API] 处理批次 1,大小: 16
2025-07-29 14:08:48,034 [ERROR] [vLLM-API] 请求 5 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c1f210>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,034 [ERROR] [vLLM-API] 请求 4 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364273d390>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,035 [ERROR] [vLLM-API] 请求 1 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c120d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,037 [ERROR] [vLLM-API] 请求 6 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c18850>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,039 [ERROR] [vLLM-API] 请求 9 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637bcf5d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,039 [ERROR] [vLLM-API] 请求 8 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637bced90>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,059 [ERROR] [vLLM-API] 请求 13 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c050d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,059 [ERROR] [vLLM-API] 请求 11 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637bfdbd0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,059 [ERROR] [vLLM-API] 请求 12 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c06290>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,060 [ERROR] [vLLM-API] 请求 14 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637bcfad0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,062 [ERROR] [vLLM-API] 请求 16 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c1ac50>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,062 [ERROR] [vLLM-API] 请求 10 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c05dd0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,076 [ERROR] [vLLM-API] 请求 3 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c24810>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,078 [ERROR] [vLLM-API] 请求 15 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c1a950>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,078 [ERROR] [vLLM-API] 请求 2 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c07450>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-07-29 14:08:48,079 [ERROR] [vLLM-API] 请求 7 异常: 请求异常: HTTPConnectionPool(host='172.21.0.67', port=6699): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3637c1d410>: Failed to establish a new connection: [Errno 111] Connection refused'))