[V1] `get_computed_blocks` avoids recompute #10695

Abatom · 2024-11-27T04:16:39Z

Use self.req_to_blocks[request.request_id] instead of recompute.

Signed-off-by: Abatom <[email protected]>

github-actions · 2024-11-27T04:16:52Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac · 2024-11-27T06:24:50Z

I don't think this is correct. In this way you only share the blocks within a request.

Abatom · 2024-11-27T07:22:32Z

@comaniac Thank you, I will think about it again.

Abatom · 2024-11-27T08:18:52Z

@comaniac I have drawn a diagram to illustrate my modifications. I believe that the cache is not fully shared within a request.

Signed-off-by: Abatom <[email protected]>

comaniac · 2024-11-27T09:16:12Z

What I'm saying is you change makes blocks cannot be shared "across" requests.

Abatom · 2024-11-27T09:24:48Z

@comaniac
I think the orange blocks in the picture are shared across requests.

The green blocks have already been calculated,so there's no need to calculate them again，this is where the optimization comes in.

comaniac · 2024-11-27T09:46:34Z

How do you hit cached blocks for a new request, which hasn't been added to req_to_blocks[request.request_id]?

comaniac · 2024-11-27T09:50:27Z

Ok I see what you meant. Yes this saves hash computation, but I'm afraid that this makes the code complicate. Did you have benchmark numbers to show the overhead is negligible so we have to introduce this optimization?

Abatom · 2024-11-27T10:04:36Z

Alright, I'll think about how to conduct benchmark testing.

Abatom added 2 commits November 27, 2024 10:25

avoid recompute

33d81bc

Signed-off-by: Abatom <[email protected]>

format

d522829

Signed-off-by: Abatom <[email protected]>

Abatom requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96, comaniac and alexm-neuralmagic as code owners November 27, 2024 04:16

fix bug

ad12caf

Signed-off-by: Abatom <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] `get_computed_blocks` avoids recompute #10695

[V1] `get_computed_blocks` avoids recompute #10695

Abatom commented Nov 27, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024

Abatom commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024 •

edited

Loading

comaniac commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024

[V1] get_computed_blocks avoids recompute #10695

Are you sure you want to change the base?

[V1] get_computed_blocks avoids recompute #10695

Conversation

Abatom commented Nov 27, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024

Abatom commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024 • edited Loading

comaniac commented Nov 27, 2024

comaniac commented Nov 27, 2024

Abatom commented Nov 27, 2024

[V1] `get_computed_blocks` avoids recompute #10695

[V1] `get_computed_blocks` avoids recompute #10695

Abatom commented Nov 27, 2024 •

edited by github-actions bot

Loading

Abatom commented Nov 27, 2024 •

edited

Loading