Skip to content

Commit

Permalink
Fix OurViewChange small race (#5356)
Browse files Browse the repository at this point in the history
Always queue OurViewChange event before we send view changes to our
peers, because otherwise we risk the peers sending us a message that can
be processed by our subsystems before OurViewChange.

Normally, this is not really a problem because the latency of the
ViewChange we send to our peers is way higher that our subsystem
processing OurViewChange, however on testnets like versi where CPU is
sometimes overcommitted this race gets triggered occasionally, so let's
fix it by sending the messages in the right order.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
  • Loading branch information
alexggh authored Aug 14, 2024
1 parent 00946b1 commit 05a8ba6
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 15 deletions.
30 changes: 15 additions & 15 deletions polkadot/node/network/bridge/src/rx/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -962,6 +962,21 @@ fn update_our_view<Context>(
)
};

let our_view = OurView::new(
live_heads.iter().take(MAX_VIEW_HEADS).cloned().map(|a| (a.hash, a.span)),
finalized_number,
);

dispatch_validation_event_to_all_unbounded(
NetworkBridgeEvent::OurViewChange(our_view.clone()),
ctx.sender(),
);

dispatch_collation_event_to_all_unbounded(
NetworkBridgeEvent::OurViewChange(our_view),
ctx.sender(),
);

let v1_validation_peers =
filter_by_peer_version(&validation_peers, ValidationVersion::V1.into());
let v1_collation_peers = filter_by_peer_version(&collation_peers, CollationVersion::V1.into());
Expand Down Expand Up @@ -1007,21 +1022,6 @@ fn update_our_view<Context>(
metrics,
notification_sinks,
);

let our_view = OurView::new(
live_heads.iter().take(MAX_VIEW_HEADS).cloned().map(|a| (a.hash, a.span)),
finalized_number,
);

dispatch_validation_event_to_all_unbounded(
NetworkBridgeEvent::OurViewChange(our_view.clone()),
ctx.sender(),
);

dispatch_collation_event_to_all_unbounded(
NetworkBridgeEvent::OurViewChange(our_view),
ctx.sender(),
);
}

// Handle messages on a specific v1 peer-set. The peer is expected to be connected on that
Expand Down
18 changes: 18 additions & 0 deletions prdoc/pr_5356.prdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Schema: Polkadot SDK PRDoc Schema (prdoc) v1.0.0
# See doc at https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/prdoc/schema_user.json

title: Fix OurViewChange small race

doc:
- audience: Node Dev
description: |
Always queue OurViewChange event before we send view changes to our peers, because otherwise we risk
the peers sending us a message that can be processed by our subsystems before OurViewChange.
Normally, this is not really a problem because the latency of the ViewChange we send to our peers
is way higher than that of our subsystem processing OurViewChange, however on testnets like versi
where CPUs are sometimes overcommitted this race gets triggered occasionally, so let's fix it by
sending the messages in the right order.

crates:
- name: polkadot-network-bridge
bump: minor

0 comments on commit 05a8ba6

Please sign in to comment.