Ray cluster DeepSeek-R1-Distill-Qwen-32B-AWQ

RunLLM · August 25, 2025, 7:23am

Ваша ошибка “‘NoneType’ object has no attribute ‘invoke’” возникает потому, что переменная prompt не передаётся или не инициализирована в момент вызова debug_llm. Также вы указываете неверный endpoint: для Qwen2.5-VL-7B-Instruct нужно использовать /chat/completions, а не /completions.

Что делать:

Проверьте, что prompt создаётся через get_qa_prompt() и передаётся в debug_llm и create_qa_chain.

Исправьте запуск:

python retrieval_augmented_generation_with_langchain.py --file-path result.txt --milvus-host localhost --milvus-port 19530 --vllm-api-base http://192.168.13.140:8000/v1 --chat-model /mnt/nfs/models/Qwen2.5-VL-7B-Instruct/

(без кавычек вокруг параметров и без /completions в конце vllm-api-base)

В функции debug_llm используйте правильный endpoint:

response = requests.post(
    f"{llm.openai_api_base}/chat/completions",
    ...
)

В цепочке обязательно передавайте prompt:

def create_qa_chain(retriever, llm, prompt):
    def debug_llm_wrapper(input_dict):
        return debug_llm(input_dict, prompt, llm)
    ...

Если всё сделаете так, ошибка исчезнет.

Sources:

retrieval_augmented_generation_with_langchain.py

Vladimir · August 25, 2025, 11:27am

url -X POST “http://192.168.13.140:8000/v1/completions”
-H “Content-Type: application/json”
-d ‘{
“model”: “/mnt/nfs/models/Qwen2.5-VL-7B-Instruct/”,
“prompt”: “What is the capital of France?”,
“max_tokens”: 100,
“temperature”: 0
}’
{“object”:“error”,“message”:“The model does not support Completions API”,“type”:“BadRequestError”,“param”:null,“code”:400}(vllm_env)

RunLLM · August 25, 2025, 11:27am

Модель Qwen2.5-VL-7B-Instruct не поддерживает endpoint /v1/completions, только /v1/chat/completions (документация). Для корректного запроса используйте:

curl -X POST "http://192.168.13.140:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/nfs/models/Qwen2.5-VL-7B-Instruct/",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 100,
    "temperature": 0
  }'

Хотите пример для Python?

Sources:

OpenAI Chat Completions API with vLLM

Vladimir · August 25, 2025, 11:38am

curl -X POST “http://192.168.13.140:8000/v1/chat/completions”
-H “Content-Type: application/json”
-d ‘{
“model”: “/mnt/nfs/models/Qwen2.5-VL-7B-Instruct/”,
“messages”: [
{“role”: “user”, “content”: “What is the capital of France?”}
],
“max_tokens”: 100,
“temperature”: 0
}’
{“object”:“error”,“message”:“The model does not support Chat Completions API”,“type”:“BadRequestError”,“param”:null,“code”:400}

Topic		Replies	Views
Error deploy model with ray General	5	197	August 17, 2025
vLLM cannot connect to existing Ray cluster Ray	16	744	May 8, 2025
Connecting to Ray Cluster and Launching vLLM with TP=8 and PP=2 in Deepseek R1 Mode on Kubernetes General	1	170	July 6, 2025
KeyError: 'model.layers.6.mlp.experts.w2_weight_scale' General	0	345	April 18, 2025
Failed to run distribute Inference with vllm 0.8.2 General	6	632	April 27, 2025

Ray cluster DeepSeek-R1-Distill-Qwen-32B-AWQ

Related topics