Seeking guidance to start Learning Inference optimization

Hi there!

I recently graduated, I found the LLM inference as my career path… I told myself to start doing LLM inference gateway project in order to get Hands-on. I chose vLLM as my reference, try to create micro version of vLLM framework to understand the concepts behind the inference and getting hands-on. I would appreciate if anyone give me your thoughts how should I be smart in learning the above.

Thanks all!

Great initiative! To understand LLM inference and vLLM, start by reading the Quickstart Guide and User Guide to get hands-on with basic inference and serving. Then, study the architecture diagrams and class hierarchy to grasp how vLLM manages requests, batching, and model execution.

For a micro-framework, focus on implementing: (1) input processing, (2) request batching, (3) model execution (using PyTorch), and (4) output post-processing, as these are vLLM’s core components. Review the LLM class API and example scripts for practical code patterns. Would you like a step-by-step breakdown or code examples?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM I would like step-by-step break down.