Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep data in fails cases in sync service #2361

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

AurelienFT
Copy link
Contributor

@AurelienFT AurelienFT commented Oct 15, 2024

Linked Issues/PRs

Closes #2357

Description

This pull request introduces a caching mechanism to the sync service to avoid redundant data fetching from the network. The most important changes include adding a cache module, modifying the Import struct to include a cache, and updating related methods to utilize this cache.

Caching Mechanism:

  • crates/services/sync/src/import.rs: Added a new cache module and integrated it into the Import struct. Updated methods to use the cache for fetching and storing headers and blocks.
  • Cache mechanism allow use to retrieve a stream of batches of either cached headers, cached full blocks, or range to fetch data.

Test Updates:

  • Update the P2P port in mocks to use async to simulate more complex tests needed for this feature.

This PR contains 50% of changes in the tests and addition of tests in the cache.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

@AurelienFT AurelienFT marked this pull request as ready for review October 16, 2024 16:44
@AurelienFT AurelienFT requested a review from a team October 16, 2024 16:44
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the import task well enough to approve right now. I need clarification on the following points:

  1. How do we ensure this cache doesn't grow forever? Is the Import task short-lived? While the import task launches short-lived streams, it seems like a long-living task to me.
  2. How can we be sure we'll query exactly the same ranges as we have cached? Where is that invariant maintained.

Let me know if you want to jump on a call to chat about this, or just write if I'm missing something obvious here.

crates/services/sync/src/import.rs Outdated Show resolved Hide resolved
header_stream
let ranges = range_chunks(range, params.header_batch_size);
futures::stream::iter(ranges)
.map({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the pattern was established before this PR, I think it would be nice to use then instead of map here, and skip the .awaits. We'd be able to return just a Stream<Item = SealedBlockBatch> instead of having the nested futures in the returned stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and there is a lot more things to improve on this service I don't wanna make this PR even bigger and so I created an issue for that : #2370

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then resolved the future, while map allows us create a stream to parallelize it later.

crates/services/sync/src/import.rs Outdated Show resolved Hide resolved
@AurelienFT
Copy link
Contributor Author

AurelienFT commented Oct 16, 2024

@netrome Thanks for taking the time to review this Regarding your interrogations :
1 - Yes for me it will leave a long time but all asked data should be ok at some point and so be cleared otherwise we will only have batch_size as number of element in the cache. But I'm not very sure about this that's why I placed a comment about this in "Interrogation" in the PR. Maybe we need a pruning management.
2 - I was thinking that we re-ask all the same ranges because the batch_size doesn't change but the starting point can change to the last synced block and so ranges can change. I think you are right then the ranges can change I will ask few questions to @xgreenx

@AurelienFT AurelienFT changed the base branch from release/v0.40.0 to master October 16, 2024 21:21
Copy link
Contributor

@rafal-ch rafal-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good, I need to have a deeper look at the tests though.

CHANGELOG.md Outdated Show resolved Hide resolved
crates/services/sync/src/import.rs Outdated Show resolved Hide resolved
crates/services/sync/src/import/back_pressure_tests.rs Outdated Show resolved Hide resolved
@AurelienFT AurelienFT marked this pull request as draft October 17, 2024 09:17
@AurelienFT
Copy link
Contributor Author

Convert to draft because of big refacto.

@AurelienFT AurelienFT marked this pull request as ready for review October 18, 2024 10:33
@AurelienFT AurelienFT marked this pull request as draft October 21, 2024 11:40
@netrome
Copy link
Contributor

netrome commented Oct 21, 2024

Now everything is cached one by one but there is an issue that I'm having hard time to find a solution. When we successfully had the header but never had the transactions, we need the peer_id to ask the transactions again. However if I cache the peer_id that we used to get the header and failed to give us transactions it will ask him again and I don't think we want to re-ask to someone that returned a fail. But I don't have any ways to find a peer that I know have the transactions.

On top of that the range that I build from cached data could have been fetched from multiple peers. The only solution I see that simplify everything but cache less things is to cache only full blocks. Any ideas @netrome @xgreenx @rafal-ch ?

Had a chat about this. @xgreenx proposed we change the p2p interface to not require any peer ID when requesting transactions, but instead leave it up to the p2p implementation to decide which peer to request them from and return that peer ID in the response.

@AurelienFT AurelienFT changed the base branch from master to add_p2p_fetch_txs_no_peer_specified October 21, 2024 12:44
@AurelienFT AurelienFT marked this pull request as ready for review October 21, 2024 13:22
@AurelienFT AurelienFT self-assigned this Oct 24, 2024
AurelienFT added a commit that referenced this pull request Oct 31, 2024
## Linked Issues/PRs
This is a requirement for
#2361

## Description

This PR adds a way to fetch transactions with p2p but without giving a
specific peer and let p2p choose the one they prefer.
This will be used in #2361

## Checklist
- [x] Breaking changes are clearly marked as such in the PR description
and changelog
- [x] New behavior is reflected in tests
- [x] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

### Before requesting review
- [x] I have reviewed the code myself
- [x] I have created follow-up issues caused by this PR and linked them
here

---------

Co-authored-by: Green Baneling <[email protected]>
Base automatically changed from add_p2p_fetch_txs_no_peer_specified to master October 31, 2024 08:47
@rymnc rymnc requested a review from Copilot November 21, 2024 10:18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 10 changed files in this pull request and generated no suggestions.

Files not reviewed (4)
  • crates/services/sync/src/import/test_helpers/pressure_peer_to_peer.rs: Evaluated as low risk
  • crates/services/sync/src/import/tests.rs: Evaluated as low risk
  • crates/services/sync/src/ports.rs: Evaluated as low risk
  • CHANGELOG.md: Evaluated as low risk
Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks really good=)

header_stream
let ranges = range_chunks(range, params.header_batch_size);
futures::stream::iter(ranges)
.map({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then resolved the future, while map allows us create a stream to parallelize it later.

}
}
BlockHeaderData::Cached(CachedDataBatch::None(_)) => {
unreachable!()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is true, let's return an error and print a log that this place shouldn't be reachable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a log and I returned a malformed batch which is used as error in this whole process. I don't want to change the whole architecture of the module for this error. (the other solution is to panic like it's done here :

.expect("We checked headers are not empty above"),
)

crates/services/sync/src/import.rs Show resolved Hide resolved
Comment on lines +574 to +589
Some(peer_id) => {
let source_peer = peer_id.clone().bind(range.clone());
let Ok(Some(txs)) = p2p
.get_transactions_from_peer(source_peer)
.await
.trace_err("Failed to get transactions")
else {
report_peer(
p2p,
Some(peer_id.clone()),
PeerReportReason::MissingTransactions,
);
return None;
};
Some(SourcePeer { peer_id, data: txs })
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to support this case?=)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so because if we are in the case where we don't use cache and we already fetched the header to a particular peer and we have his peer_id it's more optimize to directly asks him the transactions instead of running computation to find someone with these infos (and probably end-up on him also)

crates/services/sync/src/import/cache.rs Outdated Show resolved Hide resolved
crates/services/sync/src/import/cache.rs Outdated Show resolved Hide resolved
CachedDataBatch::Headers(batch) => {
if batch.results.len() >= max_chunk_size {
chunks.push(CachedDataBatch::Headers(batch));
CachedDataBatch::None(current_height..current_height)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we want to return None instead of new Headers with remaining elements.

I see that it was extracted from the loop and there it makes sense, because None is a default value to start the next iteration of the loop. But here, it looks strange.

I think if we had function with name truncate_chunk and did something like current_chunk = truncate_chunk(current_chunk, &mut chunks), it would be simpler to understand=)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this to split only when the chunk is inserted in the accumulator and it really simplify the whole code. I added some comments also.

crates/services/sync/src/import/cache.rs Outdated Show resolved Hide resolved
Comment on lines 273 to 305
p2p.expect_get_sealed_block_headers()
.times(1)
.in_sequence(&mut seq)
.returning(|_| {
Box::pin(async move {
tokio::time::sleep(Duration::from_millis(300)).await;
Err(anyhow::anyhow!("Some network error"))
})
});
p2p.expect_get_sealed_block_headers()
.times(2)
.in_sequence(&mut seq)
.returning(|range| {
Box::pin(async move {
let peer = random_peer();
let headers = Some(range.map(empty_header).collect());
let headers = peer.bind(headers);
Ok(headers)
})
});
// Then
// Reask only for block 4
p2p.expect_get_sealed_block_headers()
.times(1)
.in_sequence(&mut seq)
.returning(|range| {
Box::pin(async move {
let peer = random_peer();
let headers = Some(range.map(empty_header).collect());
let headers = peer.bind(headers);
Ok(headers)
})
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the sequence is [fail, success, success]? Based on the comments I will expect either [fail, success] or [success(for first 3 blocks), fail, success]. The same question for get transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have explicit the comments the expected is indeed : [fail (4), success(5), success(6)] and then : [success(4)] only for both of the tests. Tell me if it's more clear :)

@@ -14,7 +14,11 @@ use crate::{
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to see test where execution fails, and we see that we will not call p2p because all data is fetched already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a test locally that is failing because this is not the behavior we decided together. I think we said that if execution fails we should remove the data from the cache because it would probably fails again. The line that clears the cache :

cache.remove_element(&height);

@AurelienFT
Copy link
Contributor Author

@xgreenx Thanks for the kind comment and I have addressed all of your concerns some may still need some answers :)

Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff! Some minor questions and comments from me, but overall looks good.

}

pub fn insert_blocks(&mut self, batch: Batch<SealedBlock>) {
let mut lock = self.0.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh didn't know the parking lot Mutex was infallible. So no poisoned mutexes to worry about, nice!

))
}
(CachedDataBatch::Headers(mut batch), CachedData::Header(data)) => {
debug_assert_eq!(batch.range.end, height);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps log a warning if this isn't correct in production?

CachedDataBatch::None(4..7),
CachedDataBatch::None(7..10),
CachedDataBatch::None(10..11),
]; "one header and empty ranges")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The range exists but there are no blocks. I don't see any test cases with an empty range or a 0 max batch size. Would be interesting to add. Otherwise, love the test suite!

Comment on lines 37 to +38
#[async_trait::async_trait]
#[cfg_attr(any(test, feature = "benchmarking"), mockall::automock)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the order here matters (i.e. the previous order didn't add the async_trait sugar to the mocks right?) and that this is a broken-window fix, or how does this change relate` to the current PR? It seems very orthogonal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fuel-core-sync should cache the result of responses instead of throwing them away
4 participants