Explaination for wait_for_save

During wait_for_save call, will all the data be in blocks? or will they be in flight? Does wait_for_save just means that the data is generated and they are in flight to blocks?

During a wait_for_save call, the method blocks until all asynchronous save operations (i.e., saving KV cache layers from vLLM’s paged buffer to the connector) are complete. This means that after wait_for_save returns, all relevant data has been written to the destination (e.g., external KV cache or offloading backend), and no data is still “in flight”—all blocks have been saved and are no longer being transferred. If wait_for_save is still running, some data may still be in flight (actively being saved); once it returns, all save operations are finished and the data is safely stored in blocks, preventing overwrites of the paged KV buffer before saving is done.

This behavior is explicitly described in the API documentation and source code for connectors such as LMCache and the base KVConnector, which state that wait_for_save “blocks until all the save operations is done” and is called “as the forward context exits to ensure that the async saving from save_kv_layer is complete before finishing the forward” (source, source). Would you like more detail on the internal mechanics or specific connector implementations?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM when wait_for_save is called, are all the data present in the paged_buffer? Because from my observation it seems it takes around ~2-3 seconds to get the data in paged_buffer

No, when wait_for_save is called, it blocks until all data has been saved from the paged buffer to the external destination; it does not guarantee that all data is present in the paged buffer at the moment of the call. The data may still be in the process of being copied (in flight) from the paged buffer, and wait_for_save ensures completion of these async save operations before returning (source, source).

The observed delay (~2-3 seconds) likely reflects the time taken for the asynchronous save operations to complete, including copying data from the paged buffer to the external storage. The method only returns after all such transfers are done, not when all data is initially present in the paged buffer. Would you like a step-by-step breakdown of the save process?

Sources:

Hint: Mention @RunLLM in the post for followups.