CBL-2791: Enable actor stack trace mechanism #1382

borrrden · 2022-02-16T23:46:34Z

This feature will keep track of two pieces of information:

For any given execution, the path of enqueue and execution calls that led to the execution
For any given actor, the linear history of enqueue and execution calls (regardless of source)

If an exception occurs, this information is dumped to the logs.

This feature will keep track of two pieces of information: 1. For any given execution, the path of enqueue and execution calls that led to the execution 2. For any given actor, the linear history of enqueue and execution calls (regardless of source) If an exception occurs, this information is dumped to the logs.

borrrden · 2022-02-17T00:19:53Z

Mark as draft until resolution of iOS simulator issue

It is not supported in iOS simulator, and since the underlying unit is a queue instead of a thread it makes more logical sense to make use of dispatch_queue_set_specific

borrrden · 2022-02-17T02:02:10Z

@snej Ths GCD implementation of this was a bit awkward. Let me know if you have any ideas about how to improve it. It's complicated by the fact that the manifest used as the "queue manifest" needs to live long enough to be used by however many recursive calls happen in a given execution. For the threaded mailbox this just meant using a thread local static shared_ptr and copying it to each context that uses it. However, thread_local is not allowed in iOS simulator and doesn't really fit well with the queue based logic so I tried to make use of the set_specific API, which needs a pointer. So when retrieving, a pointer to the shared_ptr is retrieved and then deferenced (i.e. copied).

snej · 2022-02-19T00:05:04Z

Do we need this for Apple platforms? It sounds like the same info that Xcode's debugger already shows. (At least item 1.)

borrrden · 2022-02-19T00:40:53Z

That's great if you are running in a debugger, but I'm thinking about this information being put into our logs so that we can have it even from the field.

snej · 2022-02-19T00:45:23Z

That is a lot of overhead to add to production builds! I think I'd need to be convinced that this is necessary. I can see that it would be useful in some occasions, but it would be slowing everything down and adding to memory bloat.

Is this something that can be disabled except by a runtime flag?

borrrden · 2022-02-19T00:52:07Z

I'm trying to balance things out here. If it is enabled or disabled at runtime then I guarantee we get into a case where it's off and we end up back where we started -> with a bunch of intertwined logs that make it difficult to navigate through the flow of an issue of "replicator getting stuck" without context of the calls that led to that point. I disagree that it adds an excessive amount of memory pressure since it's going to prune out entries as it receives new ones (I'm certainly open to decreasing the number of entries that are saved though). I'm going to start proposing a lot of changes like this because our logs in general often leave us puzzled as to what is going on. I want a way to rectify this situation by collecting some data about the state of the program that can be accessed on demand or at exception time in this case. Whether or not having thread / queue local stuff adds too much of a performance penalty is something I could debate about.

In short we need something here to help us navigate an actor based world in which the most common form of bug is a race condition or hang. Simply logging things is not enough. What I am after is an answer to the question "in what order did things happen in order to get here?"

EDIT I also thought of adding this information to logs instead, but I figured that collecting it in memory would be overall better for performance than logging it all in realtime.

snej · 2022-02-19T01:23:15Z

We really need performance testing in CI so we can see whether a new feature like this affects performance...

borrrden · 2022-02-19T01:47:39Z

That's something that could be arranged I think. ~~The only problem is that there is no clear pass/fail metric to use in CI that I can think of.~~

borrrden marked this pull request as draft February 17, 2022 00:19

Refactor GCD mailbox to not use thread_local

96489ca

It is not supported in iOS simulator, and since the underlying unit is a queue instead of a thread it makes more logical sense to make use of dispatch_queue_set_specific

borrrden marked this pull request as ready for review February 17, 2022 01:33

borrrden requested review from snej and jianminzhao February 17, 2022 01:59

borrrden removed request for snej and jianminzhao February 21, 2022 22:32

borrrden marked this pull request as draft February 21, 2022 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBL-2791: Enable actor stack trace mechanism #1382

CBL-2791: Enable actor stack trace mechanism #1382

borrrden commented Feb 16, 2022

borrrden commented Feb 17, 2022

borrrden commented Feb 17, 2022

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022 •

edited

Loading

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022 •

edited

Loading

CBL-2791: Enable actor stack trace mechanism #1382

Are you sure you want to change the base?

CBL-2791: Enable actor stack trace mechanism #1382

Conversation

borrrden commented Feb 16, 2022

borrrden commented Feb 17, 2022

borrrden commented Feb 17, 2022

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022 • edited Loading

snej commented Feb 19, 2022

borrrden commented Feb 19, 2022 • edited Loading

borrrden commented Feb 19, 2022 •

edited

Loading

borrrden commented Feb 19, 2022 •

edited

Loading