[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for autoregressive decoding #3403

HirokenOvo · 2025-04-28T15:18:52Z

During the decoding phase (auto-regressive generation), Llm::generate() calls forward(input_ids, is_prefill) with input_ids.size() == 1. The current logic in Llm::gen_attention_mask() sets kv_seq_len = seq_len (where seq_len=1 for decoding), which is incorrect. This ignores the accumulated history length（mContext->all_seq_len）, leading to mismatched attention masks and affecting the calculation of attention scores, thereby influencing the model's inference results.

…oregressive decoding

CLAassistant · 2025-04-28T15:18:58Z

All committers have signed the CLA.

[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for aut…

1909274

…oregressive decoding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for autoregressive decoding #3403

[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for autoregressive decoding #3403

HirokenOvo commented Apr 28, 2025

CLAassistant commented Apr 28, 2025 •

edited

Loading

[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for autoregressive decoding #3403

Are you sure you want to change the base?

[LLM:Bugfix] Fix incorrect kv_seq_len in gen_attention_mask() for autoregressive decoding #3403

Conversation

HirokenOvo commented Apr 28, 2025

CLAassistant commented Apr 28, 2025 • edited Loading

CLAassistant commented Apr 28, 2025 •

edited

Loading