vLLM benchmark-accuracy #10328
YnRen22852
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I am trying to modify vLLM's code to improve throughput by changing how token similarity is calculated. Specifically, I aim to replace the calculation of token similarity rather than recalculating it every time, which should help improve throughput.
However, I would like to assess the impact of this change on the accuracy of the generated tokens. Currently, the benchmark does not include a code to evaluate the accuracy of the generated tokens.
I am wondering how to modify the code to measure the token accuracy before and after making the optimization. The goal is to compare the token accuracy of the original vLLM generation method with the modified version that uses token similarity replacement for improved throughput.
Any guidance or suggestions on how to approach this would be greatly appreciated. Thank you for your help! Cheers.
Beta Was this translation helpful? Give feedback.
All reactions