Benchmarking portals vs worker threads #709

Nikratio · 2018-10-02T09:45:26Z

I am trying to determine whether it's better to call worker threads from trio, or to call into trio from worker threads. But I am running into some issues.

First, time needed to enter the trio loop:

In [8]: async def do_nothing():
   ...:     global trio_token
   ...:     trio_token = trio.hazmat.current_trio_token()
   ...:     await trio.sleep(60)
In [9]: t = threading.Thread(target=trio.run, args=(do_nothing,))
In [10]: t.start()
In [12]: p = trio.BlockingTrioPortal(trio_token)
In [13]: %timeit p.run_sync(lambda: None)
The slowest run took 9.76 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 113 µs per loop

This factor of 10 happens every time. Why is p.run_sync() sometimes so slow?

Secondly:

In [34]: import timeit
In [35]: async def helper():
    ...:     timeit.timeit('await trio.run_sync_in_worker_thread(lambda : None)')
In [36]: trio.run(helper)
  File "<timeit-src>", line 2
    await trio.run_sync_in_worker_thread(lambda : None)
             ^
SyntaxError: invalid syntax

It looks like there is some issue with async functions and timeit... is there another/better way to get the time taken by an async fn?

The text was updated successfully, but these errors were encountered:

njsmith · 2018-10-02T10:01:22Z

It would be nice to get async support into timeit, but in the mean time a simple timing loop generally works ok:

LOOPS = 100000
async def main():
    start = time.perf_counter()
    for _ in range(LOOPS):
        await trio.run_sync_in_worker_thread(lambda: None)
    end = time.perf_counter()
    print(f"{(end - start)/LOOPS * 10e6:.2f} us/call")

Btw, you can also use run_sync_in_worker_thread to start your thread for the portal timing – that way you don't have to deal with the raw threading API, and the sleep(60) hack becomes unnecessary.

njsmith · 2018-10-02T10:04:44Z

(I made a guess at the LOOPS value there, but in general you should tweak it up or down so that the test takes a few seconds. Generally longer gives more accurate results, but if it's too long then it's annoying to wait for.)

njsmith · 2018-10-02T10:09:42Z

Oh, I misread what your trick for using timeit was there. Never mind :-). If you're using the ipython magic of course you can't do that from inside a trio worker thread.

I don't know why sometimes the portal call takes longer. I'd be curious what the CPU usage is here... Since you have to switch back and forth between threads, it could be something like the os scheduler being lazy about scheduling the other thread or something? And in general it's hard to guess how all of this will generalize to a real program with more stuff going on.

belm0 · 2018-10-05T14:18:58Z

#677 has some sample code which emulates timeit

oremanj · 2019-05-01T07:06:11Z

We have #677 for "there should be an async timeit" and #604 for "there should be some benchmarks" so I don't think there's an additional action item here.

oremanj mentioned this issue May 1, 2019

Introduce performance measurements #604

Open

oremanj closed this as completed May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking portals vs worker threads #709

Benchmarking portals vs worker threads #709

Nikratio commented Oct 2, 2018

njsmith commented Oct 2, 2018

njsmith commented Oct 2, 2018

njsmith commented Oct 2, 2018

belm0 commented Oct 5, 2018

oremanj commented May 1, 2019

Benchmarking portals vs worker threads #709

Benchmarking portals vs worker threads #709

Comments

Nikratio commented Oct 2, 2018

njsmith commented Oct 2, 2018

njsmith commented Oct 2, 2018

njsmith commented Oct 2, 2018

belm0 commented Oct 5, 2018

oremanj commented May 1, 2019