Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server #10635

Merged
merged 14 commits into from
Nov 27, 2024

Commits on Nov 25, 2024

  1. don't block GIL in tokenization (preprocess) in OpenAI compatible ser…

    …ver by using threadpool for tokenization
    
    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    6af8e61 View commit details
    Browse the repository at this point in the history
  2. format

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    821665b View commit details
    Browse the repository at this point in the history
  3. remove commit_id that was mistakenly added

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    5f2164a View commit details
    Browse the repository at this point in the history
  4. simpler - just assign methods in init

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    dd01b53 View commit details
    Browse the repository at this point in the history
  5. format

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    4a6efcb View commit details
    Browse the repository at this point in the history
  6. async tokenization also in serving_score.py

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    f89eaa0 View commit details
    Browse the repository at this point in the history
  7. format

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    980fff8 View commit details
    Browse the repository at this point in the history
  8. no need to make self._tokenize_prompt_inputs async as it's used only …

    …in self._tokenize_prompt_input
    
    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    da646c1 View commit details
    Browse the repository at this point in the history
  9. make self._tokenize_prompt_input_or_inputs return a list so make_asyn…

    …c will actually make execution run in thread and not just generator creation
    
    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    b61a04f View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2024

  1. introduce threadsafe tokenizer and use in MQLLMEngineClient

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 26, 2024
    Configuration menu
    Copy the full SHA
    e4cb992 View commit details
    Browse the repository at this point in the history
  2. format

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 26, 2024
    Configuration menu
    Copy the full SHA
    e59cc81 View commit details
    Browse the repository at this point in the history
  3. Use ThreadPoolExecutor with max_workers=1 to make tokenization async.…

    … No need for threadsafe tokenizer anymore since all tokenization happens on the same thread
    
    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 26, 2024
    Configuration menu
    Copy the full SHA
    f0c0a2f View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2024

  1. Add tests to validate that (1) truncated and non-truncated requests c…

    …an be sent concurrently and (2) that /health response time is short under high tokenization load
    
    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 27, 2024
    Configuration menu
    Copy the full SHA
    b35a063 View commit details
    Browse the repository at this point in the history
  2. add comment

    Signed-off-by: Tomer Asida <[email protected]>
    tomeras91 committed Nov 27, 2024
    Configuration menu
    Copy the full SHA
    ff1d6a9 View commit details
    Browse the repository at this point in the history