Keep data in fails cases in sync service #2361

AurelienFT · 2024-10-15T14:30:03Z

Linked Issues/PRs

Description

This pull request introduces a caching mechanism to the sync service to avoid redundant data fetching from the network. The most important changes include adding a cache module, modifying the Import struct to include a cache, and updating related methods to utilize this cache.

Caching Mechanism:

crates/services/sync/src/import.rs: Added a new cache module and integrated it into the Import struct. Updated methods to use the cache for fetching and storing headers and blocks.
Cache mechanism allow use to retrieve a stream of batches of either cached headers, cached full blocks, or range to fetch data.

Test Updates:

Update the P2P port in mocks to use async to simulate more complex tests needed for this feature.

This PR contains 50% of changes in the tests and addition of tests in the cache.

Checklist

Breaking changes are clearly marked as such in the PR description and changelog
New behavior is reflected in tests
The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

I have reviewed the code myself
I have created follow-up issues caused by this PR and linked them here

…sfully imported

netrome

I don't understand the import task well enough to approve right now. I need clarification on the following points:

How do we ensure this cache doesn't grow forever? Is the Import task short-lived? While the import task launches short-lived streams, it seems like a long-living task to me.
How can we be sure we'll query exactly the same ranges as we have cached? Where is that invariant maintained.

Let me know if you want to jump on a call to chat about this, or just write if I'm missing something obvious here.

crates/services/sync/src/import.rs

netrome · 2024-10-16T20:39:31Z

crates/services/sync/src/import.rs

-    header_stream
+    let ranges = range_chunks(range, params.header_batch_size);
+    futures::stream::iter(ranges)
+        .map({


While the pattern was established before this PR, I think it would be nice to use then instead of map here, and skip the .awaits. We'd be able to return just a Stream<Item = SealedBlockBatch> instead of having the nested futures in the returned stream.

I agree and there is a lot more things to improve on this service I don't wanna make this PR even bigger and so I created an issue for that : #2370

then resolved the future, while map allows us create a stream to parallelize it later.

crates/services/sync/src/import.rs

AurelienFT · 2024-10-16T21:09:48Z

@netrome Thanks for taking the time to review this Regarding your interrogations :
1 - Yes for me it will leave a long time but all asked data should be ok at some point and so be cleared otherwise we will only have batch_size as number of element in the cache. But I'm not very sure about this that's why I placed a comment about this in "Interrogation" in the PR. Maybe we need a pruning management.
2 - I was thinking that we re-ask all the same ranges because the batch_size doesn't change but the starting point can change to the last synced block and so ranges can change. I think you are right then the ranges can change I will ask few questions to @xgreenx

rafal-ch

So far looks good, I need to have a deeper look at the tests though.

CHANGELOG.md

crates/services/sync/src/import.rs

crates/services/sync/src/import/back_pressure_tests.rs

AurelienFT · 2024-10-17T09:17:28Z

Convert to draft because of big refacto.

…rs and blocks mixed)

…r and added a bunch of tests

Co-authored-by: Rafał Chabowski <[email protected]>

netrome · 2024-10-21T11:42:30Z

Now everything is cached one by one but there is an issue that I'm having hard time to find a solution. When we successfully had the header but never had the transactions, we need the peer_id to ask the transactions again. However if I cache the peer_id that we used to get the header and failed to give us transactions it will ask him again and I don't think we want to re-ask to someone that returned a fail. But I don't have any ways to find a peer that I know have the transactions.

On top of that the range that I build from cached data could have been fetched from multiple peers. The only solution I see that simplify everything but cache less things is to cache only full blocks. Any ideas @netrome @xgreenx @rafal-ch ?

Had a chat about this. @xgreenx proposed we change the p2p interface to not require any peer ID when requesting transactions, but instead leave it up to the p2p implementation to decide which peer to request them from and return that peer ID in the response.

…keep-data-on-stop

## Linked Issues/PRs This is a requirement for #2361 ## Description This PR adds a way to fetch transactions with p2p but without giving a specific peer and let p2p choose the one they prefer. This will be used in #2361 ## Checklist - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) ### Before requesting review - [x] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here --------- Co-authored-by: Green Baneling <[email protected]>

Copilot reviewed 6 out of 10 changed files in this pull request and generated no suggestions.

Files not reviewed (4)

crates/services/sync/src/import/test_helpers/pressure_peer_to_peer.rs: Evaluated as low risk
crates/services/sync/src/import/tests.rs: Evaluated as low risk
crates/services/sync/src/ports.rs: Evaluated as low risk
CHANGELOG.md: Evaluated as low risk

xgreenx

The change looks really good=)

xgreenx · 2024-11-26T23:29:12Z

crates/services/sync/src/import.rs

-    header_stream
+    let ranges = range_chunks(range, params.header_batch_size);
+    futures::stream::iter(ranges)
+        .map({


then resolved the future, while map allows us create a stream to parallelize it later.

xgreenx · 2024-11-26T23:33:13Z

crates/services/sync/src/import.rs

+                            }
+                        }
+                        BlockHeaderData::Cached(CachedDataBatch::None(_)) => {
+                            unreachable!()


While it is true, let's return an error and print a log that this place shouldn't be reachable

I have added a log and I returned a malformed batch which is used as error in this whole process. I don't want to change the whole architecture of the module for this error. (the other solution is to panic like it's done here :

fuel-core/crates/services/sync/src/import.rs

Line 475 in 99135e3

.expect("We checked headers are not empty above"),

)

crates/services/sync/src/import.rs

xgreenx · 2024-11-26T23:45:30Z

crates/services/sync/src/import.rs

+        Some(peer_id) => {
+            let source_peer = peer_id.clone().bind(range.clone());
+            let Ok(Some(txs)) = p2p
+                .get_transactions_from_peer(source_peer)
+                .await
+                .trace_err("Failed to get transactions")
+            else {
+                report_peer(
+                    p2p,
+                    Some(peer_id.clone()),
+                    PeerReportReason::MissingTransactions,
+                );
+                return None;
+            };
+            Some(SourcePeer { peer_id, data: txs })
+        }


Do we even need to support this case?=)

I think so because if we are in the case where we don't use cache and we already fetched the header to a particular peer and we have his peer_id it's more optimize to directly asks him the transactions instead of running computation to find someone with these infos (and probably end-up on him also)

crates/services/sync/src/import/cache.rs

xgreenx · 2024-11-27T00:10:22Z

crates/services/sync/src/import/cache.rs

+            CachedDataBatch::Headers(batch) => {
+                if batch.results.len() >= max_chunk_size {
+                    chunks.push(CachedDataBatch::Headers(batch));
+                    CachedDataBatch::None(current_height..current_height)


I don't see why we want to return None instead of new Headers with remaining elements.

I see that it was extracted from the loop and there it makes sense, because None is a default value to start the next iteration of the loop. But here, it looks strange.

I think if we had function with name truncate_chunk and did something like current_chunk = truncate_chunk(current_chunk, &mut chunks), it would be simpler to understand=)

I refactored this to split only when the chunk is inserted in the accumulator and it really simplify the whole code. I added some comments also.

crates/services/sync/src/import/cache.rs

xgreenx · 2024-11-27T00:20:00Z

crates/services/sync/src/import/tests.rs

+    p2p.expect_get_sealed_block_headers()
+        .times(1)
+        .in_sequence(&mut seq)
+        .returning(|_| {
+            Box::pin(async move {
+                tokio::time::sleep(Duration::from_millis(300)).await;
+                Err(anyhow::anyhow!("Some network error"))
+            })
+        });
+    p2p.expect_get_sealed_block_headers()
+        .times(2)
+        .in_sequence(&mut seq)
+        .returning(|range| {
+            Box::pin(async move {
+                let peer = random_peer();
+                let headers = Some(range.map(empty_header).collect());
+                let headers = peer.bind(headers);
+                Ok(headers)
+            })
+        });
+    // Then
+    // Reask only for block 4
+    p2p.expect_get_sealed_block_headers()
+        .times(1)
+        .in_sequence(&mut seq)
+        .returning(|range| {
+            Box::pin(async move {
+                let peer = random_peer();
+                let headers = Some(range.map(empty_header).collect());
+                let headers = peer.bind(headers);
+                Ok(headers)
+            })
+        });


Why the sequence is [fail, success, success]? Based on the comments I will expect either [fail, success] or [success(for first 3 blocks), fail, success]. The same question for get transactions.

I have explicit the comments the expected is indeed : [fail (4), success(5), success(6)] and then : [success(4)] only for both of the tests. Tell me if it's more clear :)

xgreenx · 2024-11-27T00:21:31Z

crates/services/sync/src/import/tests.rs

@@ -14,7 +14,11 @@ use crate::{
    },


It would be nice to see test where execution fails, and we see that we will not call p2p because all data is fetched already.

I made a test locally that is failing because this is not the behavior we decided together. I think we said that if execution fails we should remove the data from the cache because it would probably fails again. The line that clears the cache :

fuel-core/crates/services/sync/src/import.rs

Line 340 in a566ac5

cache.remove_element(&height);

AurelienFT · 2024-11-27T14:09:35Z

@xgreenx Thanks for the kind comment and I have addressed all of your concerns some may still need some answers :)

netrome

Nice stuff! Some minor questions and comments from me, but overall looks good.

netrome · 2024-11-27T19:48:41Z

crates/services/sync/src/import/cache.rs

+    }
+
+    pub fn insert_blocks(&mut self, batch: Batch<SealedBlock>) {
+        let mut lock = self.0.lock();


Oh didn't know the parking lot Mutex was infallible. So no poisoned mutexes to worry about, nice!

netrome · 2024-11-27T20:00:10Z

crates/services/sync/src/import/cache.rs

+                ))
+            }
+            (CachedDataBatch::Headers(mut batch), CachedData::Header(data)) => {
+                debug_assert_eq!(batch.range.end, height);


Should we perhaps log a warning if this isn't correct in production?

netrome · 2024-11-27T20:04:06Z

crates/services/sync/src/import/cache.rs

+        CachedDataBatch::None(4..7),
+        CachedDataBatch::None(7..10),
+        CachedDataBatch::None(10..11),
+    ]; "one header and empty ranges")]


Nit: The range exists but there are no blocks. I don't see any test cases with an empty range or a 0 max batch size. Would be interesting to add. Otherwise, love the test suite!

netrome · 2024-11-27T20:14:41Z

crates/services/sync/src/ports.rs

 #[async_trait::async_trait]
+#[cfg_attr(any(test, feature = "benchmarking"), mockall::automock)]


I assume the order here matters (i.e. the previous order didn't add the async_trait sugar to the mocks right?) and that this is a broken-window fix, or how does this change relate` to the current PR? It seems very orthogonal.

AurelienFT added 4 commits October 15, 2024 16:26

Update p2p mock and add a test for the save of data

b191e08

Fix a test

704a583

Add caching of headers or blocks already fetched. Removed when succes…

e94bee2

…sfully imported

Add changelog and fmt

6a61661

AurelienFT marked this pull request as ready for review October 16, 2024 16:44

AurelienFT requested review from xgreenx, Dentosal and MitchTurner as code owners October 16, 2024 16:44

AurelienFT requested a review from a team October 16, 2024 16:44

Fix clippy

34ec1c6

netrome reviewed Oct 16, 2024

View reviewed changes

AurelienFT changed the base branch from release/v0.40.0 to master October 16, 2024 21:21

rafal-ch reviewed Oct 17, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

crates/services/sync/src/import.rs Outdated Show resolved Hide resolved

crates/services/sync/src/import/back_pressure_tests.rs Outdated Show resolved Hide resolved

AurelienFT marked this pull request as draft October 17, 2024 09:17

AurelienFT and others added 8 commits October 17, 2024 17:30

Add new cache structure. (Not fully working because of batch of heade…

c4bb96c

…rs and blocks mixed)

Add new chunk system to distinguish chunks of blocks and blocks heade…

23c7eaa

…r and added a bunch of tests

Improve changes made to back pressure test because going too fast

41966a6

Update get_chunks method to be more readable

8e4fc44

Merge branch 'master' into sync_service/keep-data-on-stop

27a5856

Update CHANGELOG.md

879eb94

Co-authored-by: Rafał Chabowski <[email protected]>

Avoid unecessary changes in changelog

a29e763

Remove old comment

83401ce

AurelienFT mentioned this pull request Oct 18, 2024

Improve readibility of sync service #2370

Open

AurelienFT marked this pull request as ready for review October 18, 2024 10:33

AurelienFT requested review from netrome and rafal-ch October 18, 2024 10:36

AurelienFT and others added 2 commits October 18, 2024 18:46

fix compil benchmark

62cae71

fix clippy

84bac0a

AurelienFT marked this pull request as draft October 21, 2024 11:40

Add a way to fetch transactions without specifying a peer in P2P

42719bf

AurelienFT mentioned this pull request Oct 21, 2024

Add a way to fetch transactions in P2P without specifying a peer #2376

Merged

5 tasks

AurelienFT added 2 commits October 21, 2024 14:41

Update CANGELOG.md

c80f3ac

Merge branch 'add_p2p_fetch_txs_no_peer_specified' into sync_service/…

ab3221c

…keep-data-on-stop

AurelienFT changed the base branch from master to add_p2p_fetch_txs_no_peer_specified October 21, 2024 12:44

Use get transactions without peer_id when use headers from cache

99135e3

AurelienFT marked this pull request as ready for review October 21, 2024 13:22

AurelienFT self-assigned this Oct 24, 2024

Base automatically changed from add_p2p_fetch_txs_no_peer_specified to master October 31, 2024 08:47

AurelienFT and others added 5 commits October 31, 2024 10:48

Merge branch 'master' into sync_service/keep-data-on-stop

862ef36

Merge branch 'master' into sync_service/keep-data-on-stop

0ac1e34

Merge branch 'master' into sync_service/keep-data-on-stop

2407078

Merge branch 'master' into sync_service/keep-data-on-stop

bb3b8ca

Merge branch 'master' into sync_service/keep-data-on-stop

6dae6c7

rymnc requested a review from Copilot November 21, 2024 10:18

Copilot AI reviewed Nov 21, 2024

View reviewed changes

Merge branch 'master' into sync_service/keep-data-on-stop

0862890

xgreenx reviewed Nov 27, 2024

View reviewed changes

Merge branch 'master' into sync_service/keep-data-on-stop

1ab0af5

AurelienFT mentioned this pull request Nov 27, 2024

Use Arc in data of batches in sync service #2461

Open

AurelienFT added 2 commits November 27, 2024 13:54

Improve some caches mechanism in sync and more comments on tests

a566ac5

Simplify chunk lf cache creation algorithm

17a38e4

AurelienFT requested a review from xgreenx November 27, 2024 14:09

Merge branch 'master' into sync_service/keep-data-on-stop

2f18123

toml lint

d9ef69e

netrome approved these changes Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep data in fails cases in sync service #2361

Keep data in fails cases in sync service #2361

AurelienFT commented Oct 15, 2024 •

edited

Loading

netrome left a comment

netrome Oct 16, 2024

AurelienFT Oct 18, 2024

xgreenx Nov 26, 2024

AurelienFT commented Oct 16, 2024 •

edited

Loading

rafal-ch left a comment

AurelienFT commented Oct 17, 2024

netrome commented Oct 21, 2024

xgreenx left a comment

xgreenx Nov 26, 2024

xgreenx Nov 26, 2024

AurelienFT Nov 27, 2024

xgreenx Nov 26, 2024

AurelienFT Nov 27, 2024

xgreenx Nov 27, 2024

AurelienFT Nov 27, 2024

xgreenx Nov 27, 2024

AurelienFT Nov 27, 2024

xgreenx Nov 27, 2024

AurelienFT Nov 27, 2024

AurelienFT commented Nov 27, 2024

netrome left a comment

netrome Nov 27, 2024

netrome Nov 27, 2024

netrome Nov 27, 2024

netrome Nov 27, 2024

		#[async_trait::async_trait]
		#[cfg_attr(any(test, feature = "benchmarking"), mockall::automock)]

Keep data in fails cases in sync service #2361

Are you sure you want to change the base?

Keep data in fails cases in sync service #2361

Conversation

AurelienFT commented Oct 15, 2024 • edited Loading

Linked Issues/PRs

Description

Caching Mechanism:

Test Updates:

Checklist

Before requesting review

netrome left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AurelienFT commented Oct 16, 2024 • edited Loading

rafal-ch left a comment

Choose a reason for hiding this comment

AurelienFT commented Oct 17, 2024

netrome commented Oct 21, 2024

Choose a reason for hiding this comment

xgreenx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AurelienFT commented Nov 27, 2024

netrome left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AurelienFT commented Oct 15, 2024 •

edited

Loading

AurelienFT commented Oct 16, 2024 •

edited

Loading