能讓輸入上下文比最大上下文更長仍舊能工作嗎?總是產生錯誤:
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] Error in preprocessing prompt inputs
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] Traceback (most recent call last):
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py”, line 235, in create_chat_completion
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ) = await self._preprocess_chat(
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py”, line 1133, in _preprocess_chat
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] prompt_inputs = await self._tokenize_prompt_input_async(
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py”, line 990, in _tokenize_prompt_input_async
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] async for result in self._tokenize_prompt_inputs_async(
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py”, line 1011, in _tokenize_prompt_inputs_async
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] yield await self._normalize_prompt_text_to_input(
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py”, line 881, in _normalize_prompt_text_to_input
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] return self._validate_input(request, input_ids, input_text)
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py”, line 970, in _validate_input
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] raise ValueError(
(APIServer pid=1) ERROR 12-03 00:16:55 [serving_chat.py:257] ValueError: ‘max_tokens’ or ‘max_completion_tokens’ is too large: 21333. This model’s maximum context length is 196608 tokens and your request has 175808 input tokens (21333 > 196608 - 175808).