Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threaded prof.write(), fixes #6 #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Alloyed
Copy link

@Alloyed Alloyed commented Dec 20, 2018

Hey here's a thing. I wrote it after probably over-instrumenting my code: it takes a save of around 500k events from, like, minutes, to around 2 seconds on my cheapo i5-3320M laptop, so it's a huge improvement.

The benefit on smaller captures, like for example 50k records, is way smaller, but it doesn't seem to hurt performance. I'm leaving this as a opt-in feature because it requires some manual intervention to start the threads, but I don't see much reason to ever not use threaded writes.

Code reviews, opinions etc welcome~


This is an opt-in feature: to enable, call prof.enableThreadedWrite() at
the start of your program. Then instead of saving each event on the main
thread, and doing all the serialization work there, each event will be
assigned to a pool of worker threads which will serialize each event in
chunks at the end of the program.

Potential improvements to this model could include:

  • Writing serialized data to a byte buffer instead of a string. This
    would save the cost of copying the chunk string between VMs, with the
    added complexity of handling ownership of the buffer and potentially
    having data that grows beyond the size of the buffer.
  • Incremental serialization. Right now the worker threads wait until the
    end of the program to start processing each event, but there's no reason
    they can't do that work ahead of time in the background if it doesn't
    affect the runtime of the main program (it might?)
  • Handling file I/O on a background thread, or even possibly in the
    worker threads themselves. Haven't thought about this one too much,
    because prof.write() is typically called at the end of the program where
    there's not much else going on.

Fixes pfirsich#6.

This is an opt-in feature: to enable, call prof.enableThreadedWrite() at
the start of your program. Then instead of saving each event on the main
thread, and doing all the serialization work there, each event will be
assigned to a pool of worker threads which will serialize each event in
chunks at the end of the program.

Potential improvements to this model could include:
* Writing serialized data to a byte buffer instead of a string. This
would save the cost of copying the chunk string between VMs, with the
added complexity of handling ownership of the buffer and potentially
having data that grows beyond the size of the buffer.
* Incremental serialization. Right now the worker threads wait until the
end of the program to start processing each event, but there's no reason
they can't do that work ahead of time in the background if it doesn't
affect the runtime of the main program (it might?)
* Handling file I/O on a background thread, or even possibly in the
worker threads themselves. Haven't thought about this one too much,
because prof.write() is typically called at the end of the program where
there's not much else going on
Copy link
Owner

@pfirsich pfirsich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution!

Since, as you said, there should not be any significant slow-down and a speed-up sometimes, shouldn't this be default behaviour?

What do you think about giving each new event to another thread and having them msgpack the event right away (and not when the capture is done). Then we save even more time in the end. We just need to make sure that we have enough threads (and cores) that we don't need to wait too long in addEvent. This might not be realistic/possible because we need to produce less events/second than we can process in all threads to not significantly affect framerate. Maybe it's also an option to push all the events to some threads, which just takes care of the messaging and accumulation and the actual msgpack-calling worker thread is a different one.

Also could you convert indentation to spaces, so it matches the rest of the file, please?

And a very minor point: why is n a parameter for msgpackListIntoFile now, but list isn't anymore? Imho keep both as parameters or use profData/profDataNumEvents directly for both.

EDIT: I opened this tab days ago and just commented on the source without seeing your thoughts on it. You suggested the same thing and and some more. Sorry!
The first idea is good, but I would personally not want to go throught he hassle for an improvement which I think is not too large. Do you think it will do much?
For the third idea I agree that mostly not much else is going on. Might not be worth the trouble either.

@iamevn
Copy link

iamevn commented Apr 5, 2021

Also need to pass _prefix through to the threads so they can find MessagePack to require it.

iamevn@30eb587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants