Akka.Cluster.Sharding: make entity passivization aware of _all_ messages being sent to entities, not just messages sent via the ShardRegion
#7395
Labels
akka-cluster-sharding
discussion
DX
Developer experience issues - papercuts, footguns, and other non-bug problems.
Is your feature request related to a problem? Please describe.
This is the entity passivization code inside the
Shard
class - it passivates actors based on how long ago they processed their last message. We do this in order to free up memory from unused entity actors:akka.net/src/contrib/cluster/Akka.Cluster.Sharding/Shard.cs
Lines 1788 to 1802 in 4ae4792
The passivization window is configurable and this feature can be disabled altogether - but that's not really what this is about. The problem is that the
Shard
actor only uses data from the Cluster.Sharding system itself to keep entity actors alive:akka.net/src/contrib/cluster/Akka.Cluster.Sharding/Shard.cs
Lines 1932 to 1946 in 4ae4792
akka.net/src/contrib/cluster/Akka.Cluster.Sharding/Shard.cs
Lines 1780 to 1786 in 4ae4792
This can result in weird scenarios where "busy" entity actors can still be killed, such as:
DistributedPubSub
)IActorRef
, rather than theShardRegion
) can die.I think we can probably broaden the definition of "passivate" to include all sources of message traffic that are not recurring messages (i.e.
IWithTimers
orContext.System.Scheduler
) and enforce that inside Akka.Cluster.Sharding. This should make the behavior of automatic entity passivization more closely aligned to what users expect without having to make distinctions between which messages count and which ones don't.Describe the solution you'd like
Two ideas for this:
ReceiveTimeout
works, but in this case there's two parties: passivator (A) and passivatee (B). A basically tells B to set aReceiveTimeout
for some duration and then B does all of the accounting. The rest of the normalINotInfluenceReceiveTimeout
and timeout window code that's already in theActorCell
applies and A gets a notification when B hits its receive timeout. The only real thing we'd need to add here is a private message type, handled automatically viaActorCell
, that allows someone else to set theReceiveTimeout
and then a notification message type back.ActorBase
that allows user-defined actors to customize what happens when they receive aPoisonPill
- that way we can avoid / address issues like Akka.Cluster.Sharding / Akka.Persistence:PoisonPill
message gets processed, kills actor beforePersist()
callback executed #6321 - so if a passivating actor needs to do some work prior to shutting down, maybe it can be given some time to do that. Only problem here though - no guarantee you get the time you need when shutting down, so this might add some extra complexity the framework doesn't need. Did occur to me though.Describe alternatives you've considered
I also considered the alternative of having the entity actors ping the
Shard
every time they receive a message but that would get incredibly noisy and would harm the throughput of the sharding system significantly. Better to push the decision making and state tracking to the edges where it belongs.The text was updated successfully, but these errors were encountered: