vLLM benchmark-accuracy #10328

YnRen22852 · 2024-11-14T14:06:15Z

YnRen22852
Nov 14, 2024

Hi everyone,
I am trying to modify vLLM's code to improve throughput by changing how token similarity is calculated. Specifically, I aim to replace the calculation of token similarity rather than recalculating it every time, which should help improve throughput.

However, I would like to assess the impact of this change on the accuracy of the generated tokens. Currently, the benchmark does not include a code to evaluate the accuracy of the generated tokens.

I am wondering how to modify the code to measure the token accuracy before and after making the optimization. The goal is to compare the token accuracy of the original vLLM generation method with the modified version that uses token similarity replacement for improved throughput.

Any guidance or suggestions on how to approach this would be greatly appreciated. Thank you for your help! Cheers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM benchmark-accuracy #10328

{{title}}

Replies: 0 comments

Select a reply

vLLM benchmark-accuracy #10328

YnRen22852 Nov 14, 2024

Replies: 0 comments

YnRen22852
Nov 14, 2024