Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid MerkleTreeStoresRead after restarting a 0.45 to 0.46 upgraded validator #4112

Open
Rigorously opened this issue Nov 29, 2024 · 4 comments · May be fixed by #4135
Open

Invalid MerkleTreeStoresRead after restarting a 0.45 to 0.46 upgraded validator #4112

Rigorously opened this issue Nov 29, 2024 · 4 comments · May be fixed by #4135
Labels
bug Something isn't working ledger storage

Comments

@Rigorously
Copy link

Rigorously commented Nov 29, 2024

After successfully upgrading a validator from v0.45 to v0.46 with the dry-run upgrade instructions (while not making the mistake to pass a HTML instead of JSON), restarting namadan results in a crash with the error Invalid MerkleTreeStoresRead.

2024-11-29T01:01:31.068320Z  INFO namada_node: Done loading MASP verifying keys.
2024-11-29T01:01:31.068898Z  INFO namada_node::storage::rocksdb: Using 2 compactions threads for RocksDB.
2024-11-29T01:01:31.086428Z  INFO namada_node::broadcaster: Starting broadcaster.
The application panicked (crashed).
Message:  Merkle tree should be restored: Custom(CustomError(MerkleTree("Invalid MerkleTreeStoresRead")))
Location: /home/runner/work/namada/namada/crates/state/src/wl_state.rs:586

For context, this validator ran v0.45 until block 182000, then was upgraded to v0.46 successfully and ran until it timed out at 182002, where there was not enough VP online to reach consensus. The validator was manually stopped and then restarted to see if the error also happens after a successful upgrade and whether it is related to issue #4108.

@Rigorously Rigorously added the bug Something isn't working label Nov 29, 2024
@grarco
Copy link
Collaborator

grarco commented Nov 29, 2024

I believe this is an issue I've seen on some of my localnets when playing around with protocol versions (e.g. starting with v46 and then downgrading to v45 which does not have the updated pruning logic). It seems like in this case it happens in the opposite order, we might need some help from @yito88

@yito88
Copy link
Member

yito88 commented Nov 29, 2024

Ah, rebuilding the Merkle tree failed because the some subtrees were rebuilt with the new diffs by the migration even though the base tree (and some subtrees) was based on the previous state before the migration. That's why the inconsistency between the base and these subtrees happened.

It means that we can't restore the state unless a new epoch starts after the migration.

@Rigorously
Copy link
Author

Is there something the validators can do right now, besides syncing and not restarting, or should we await a fix?

@Rigorously
Copy link
Author

The chain finally moved to the next epoch and now a restart with v0.46 is possible.

@tzemanovic tzemanovic linked a pull request Dec 2, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ledger storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants