-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server #10635
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server #10635
Commits on Nov 25, 2024
-
don't block GIL in tokenization (preprocess) in OpenAI compatible ser…
…ver by using threadpool for tokenization Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6af8e61 - Browse repository at this point
Copy the full SHA 6af8e61View commit details -
Configuration menu - View commit details
-
Copy full SHA for 821665b - Browse repository at this point
Copy the full SHA 821665bView commit details -
remove commit_id that was mistakenly added
Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5f2164a - Browse repository at this point
Copy the full SHA 5f2164aView commit details -
simpler - just assign methods in init
Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dd01b53 - Browse repository at this point
Copy the full SHA dd01b53View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a6efcb - Browse repository at this point
Copy the full SHA 4a6efcbView commit details -
async tokenization also in serving_score.py
Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f89eaa0 - Browse repository at this point
Copy the full SHA f89eaa0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 980fff8 - Browse repository at this point
Copy the full SHA 980fff8View commit details -
no need to make self._tokenize_prompt_inputs async as it's used only …
…in self._tokenize_prompt_input Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for da646c1 - Browse repository at this point
Copy the full SHA da646c1View commit details -
make self._tokenize_prompt_input_or_inputs return a list so make_asyn…
…c will actually make execution run in thread and not just generator creation Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b61a04f - Browse repository at this point
Copy the full SHA b61a04fView commit details
Commits on Nov 26, 2024
-
introduce threadsafe tokenizer and use in MQLLMEngineClient
Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e4cb992 - Browse repository at this point
Copy the full SHA e4cb992View commit details -
Configuration menu - View commit details
-
Copy full SHA for e59cc81 - Browse repository at this point
Copy the full SHA e59cc81View commit details -
Use ThreadPoolExecutor with max_workers=1 to make tokenization async.…
… No need for threadsafe tokenizer anymore since all tokenization happens on the same thread Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f0c0a2f - Browse repository at this point
Copy the full SHA f0c0a2fView commit details
Commits on Nov 27, 2024
-
Add tests to validate that (1) truncated and non-truncated requests c…
…an be sent concurrently and (2) that /health response time is short under high tokenization load Signed-off-by: Tomer Asida <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b35a063 - Browse repository at this point
Copy the full SHA b35a063View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff1d6a9 - Browse repository at this point
Copy the full SHA ff1d6a9View commit details