[Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high?

ori:
bs 1 tp=1: [05:55<00:00, 4.45s/it, est. speed input: 17.02 toks/s, output: 46.43 toks/s

with ealge:
K=4 AL: 2.06
bs 1 tp=1 k=4: [04:07<00:00, 3.09s/it, est. speed input: 24.49 toks/s, output: 66.86 toks/s

speed up: 44%
The original paper looks 3x better

It seems likely that you are comparing the performance of a model with and without a specific optimization (possibly speculative decoding or a similar technique) and noting the speedup achieved. The original paper might have reported a 3x improvement, but your results show a 44% speedup. This discrepancy could be due to differences in hardware, model configurations, or other experimental conditions.

Would you like more detail on how to potentially achieve the 3x speedup reported in the original paper?

Sources: