[Bug] Why does max_new_tokens affect token usage? #962
Replies: 3 comments 10 replies
-
It's not a bug. If there are similar issues next time, please ask in the discussion. |
Beta Was this translation helpful? Give feedback.
1 reply
-
if you accept too many requests in, they will run out of memory later and you need to pause them or evict some. This can be inefficient. |
Beta Was this translation helpful? Give feedback.
9 replies
-
see also #1278 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checklist
Describe the bug
so it is not paged_attn ? max new tokens should be allocated on demand,memory waste when max_new_tokens are large?
Reproduction
“The case of serving being too conservative can happen when users send many requests with a large max_new_tokens but the requests stop very early due to EOS or stop strings.”
Environment
Beta Was this translation helpful? Give feedback.
All reactions