How to see which model and parameters are currently running

youlearnit · September 17, 2025, 7:30pm

How to see which model and parameters are currently running?
Its in vllm docker

RunLLM · September 17, 2025, 7:31pm

It seems likely that vLLM does not currently provide a direct API endpoint to view all runtime configuration parameters (such as max_model_len) of a running service. However, you can see which model is being served by querying the /v1/models endpoint, for example with curl http://localhost:8000/v1/models, which will return the model IDs and some metadata, including the model name and lineage if using LoRA adapters. This works for both base models and LoRA adapters in OpenAI-compatible mode, even when running in Docker, as described in the docs and examples (docs.vllm.ai - LoRA, docs.vllm.ai - Quickstart).

For detailed runtime parameters (like max_model_len, gpu_memory_utilization, etc.), my understanding is that these are not exposed via any API or endpoint in the current vLLM release, and must be checked from the original Docker run command or logs. There is a feature request for exposing such configuration, but it is not yet implemented (GitHub Issue #12001). Would you like more detail on workarounds or how to check logs for this information?

Sources:

Topic		Replies	Views
"served-model-name" and "model" General	6	1100	January 26, 2026
How to serve two vLLM instance using docker? General	3	527	February 26, 2026
How to benchmark concurrency General	26	1631	September 15, 2025
Need to serve a Qwen3 LLM with 235B params General	9	382	March 7, 2026
Run multiple models General	7	10711	July 28, 2025

How to see which model and parameters are currently running

Related topics