By default, the forward function of Qwen2ForCausalLM (or some other decoder-only LLM) takes several pre-defined parameters as inputs:
def forward(
self,
input_ids: torch.Tensor,
positions: torch.Tensor,
kv_caches: List[torch.Tensor],
attn_metadata: AttentionMetadata,
intermediate_tensors: Optional[IntermediateTensors] = None,
inputs_embeds: Optional[torch.Tensor] = None,
) -> Union[torch.Tensor, IntermediateTensors]:
I want to introduce some customed parameters received from past generation turns which will affect behavior of current forward (e.g. an external embedding that will be inject into model forwarding). But i can’t find an appropriate way to implement this. Can anyone help ?