How to see which model and parameters are currently running?
Its in vllm docker
It seems likely that vLLM does not currently provide a direct API endpoint to view all runtime configuration parameters (such as max_model_len) of a running service. However, you can see which model is being served by querying the /v1/models endpoint, for example with curl http://localhost:8000/v1/models, which will return the model IDs and some metadata, including the model name and lineage if using LoRA adapters. This works for both base models and LoRA adapters in OpenAI-compatible mode, even when running in Docker, as described in the docs and examples (docs.vllm.ai - LoRA, docs.vllm.ai - Quickstart).
For detailed runtime parameters (like max_model_len, gpu_memory_utilization, etc.), my understanding is that these are not exposed via any API or endpoint in the current vLLM release, and must be checked from the original Docker run command or logs. There is a feature request for exposing such configuration, but it is not yet implemented (GitHub Issue #12001). Would you like more detail on workarounds or how to check logs for this information?
Sources: