-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Perf: construct & reuse clusters in output thread #3438
Conversation
And another reason is that I can't get one thing working. The rendering seems fine. But when I started typing something, the last line gets printed twice: I'm extremely confused about this. How does rendering somehow affect what's got printed by I could really use some advice here. @miniksa @DHowett-MSFT |
To show how good this is and get everybody's attention, I'm gonna attach some evidence. Using code on master at 94213ad Duration: 5.1s Using this PR: Duration: 1.2s Both result are with Release build and on the same machine. |
private: | ||
// This is the UTF-16 string of characters that form a particular drawing cluster | ||
const std::wstring_view _text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the const
here but I want to reuse existing _text
and update the columns. So I had to make it this way.
|
||
// Ask the helper to paint through this specific line. | ||
_PaintBufferOutputHelper(pEngine, it, screenLine.Origin()); | ||
const auto& rowData = buffer.GetRowByOffset(row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method _PaintBufferOutput
is supposed to handle all kinds of dirty rect refreshing. I kind of made it only working for rows, which is basically all that's needed right now. Should make it better, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably what ended up causing the duplicate conpty output - by just re-rendering the entire row at the time, that's causing the text to get duplicated on the Terminal side
|
||
size_t count = 0; | ||
std::vector<Cluster> clusters; | ||
clusters.reserve(runLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reserve
here is intended to reduce vector reallocation and it works great. Without reserve
, the performance is heavily penalized.
@@ -199,6 +217,19 @@ OutputCellIterator ROW::WriteCells(OutputCellIterator it, const size_t index, co | |||
{ | |||
_charRow.DbcsAttrAt(currentIndex) = it->DbcsAttr(); | |||
_charRow.GlyphAt(currentIndex) = it->Chars(); | |||
|
|||
if (it->DbcsAttr().IsDbcs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind tricky but I need it to handle dbcs
since I'm in CJK environment myself. A cluster with column set to zero will be ignored for rendering.
|
||
size_t runStart = 0; | ||
// This outer loop will continue until we reach the end of the text we are trying to draw. | ||
while (runIndex < runCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where magic happens. We have already processed the text buffer and got the clusters we need. All that left is to walk through the row and render each run.
There's a longer explanation I can give you, but the tl;dr is that conpty works by effectively rendering the contents of the buffer to a stream of characters. This stream of characters includes VT sequences, which let the terminal on the other side re-construct the state of the console buffer. I'd take a cursory look at |
I like the idea and appreciate focus on this area as I know it's definitely a performance sink. However, for this particular TODO, I was hoping we could get away without having a Clusters vector at all. It looks like you saved us by going from generating-it-every-time to generating-on-change and caching it. I was hoping that we could implement a solution that perhaps provided an iterator (like the TextBufferCellIterator) as the first parameter on PaintBufferLine (instead of a span) and the internal loop would iterate until the iterator was false instead of from begin to end of the span. Then we wouldn't have to "store" an array of clusters at all. We could perhaps wrap a That should mean that we don't have to pre-pass anything and we don't have to allocate any additional memory (besides a little bit for the iterator overhead and logic). Then we'd in theory eliminate the CPU time from the vector overhead while not blowing up our memory storage. Probably not exactly as good on CPU time as the extra memory, but a better balance. |
@zadjii-msft Thanks for the explanation. @miniksa I think I know what you mean. I'll give it a try and left this PR open here. |
Another observation that surprised me. The vector thing is affecting my Ryzen 2400G desktop PC not my i7-8750H gaming laptop, which means this PR doesn't really make a difference(less than 1 second saved) on my laptop. Seems Win-tel thing is still there somehow. Update: never mind. I guess it's about the VtEngine @zadjii-msft mentioned. On my laptap I'm using plain |
Closed in favor of #3458 |
Summary of the Pull Request
By constructing clusters for rendering early in the output thread, the rendering thread will be relieved, making overall performance better when under heavy load.
References
#3075
PR Checklist
Detailed Description of the Pull Request / Additional comments
This is an attempt to tackle the TODO listed here:
terminal/src/renderer/base/renderer.cpp
Line 601 in 94fc40e
I may be doing a lots of thing wrong but the performance boost is so good that I gotta share it with you guys. With this PR, catting my
/etc/services
takes only about 1.5 second, comparing to current release which takes about 5 seconds to do the same thing.Validation Steps Performed