Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genesis File Corruption in Namada Dry Run Node #4073

Open
andreidavid opened this issue Nov 21, 2024 · 3 comments
Open

Genesis File Corruption in Namada Dry Run Node #4073

andreidavid opened this issue Nov 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@andreidavid
Copy link
Contributor

Bug Report: Genesis File Corruption in Namada Dry Run Node

Description:
Genesis file at /nvme/namada_dryrun/namada-dryrun.abaaeaf7b78cb3ac/cometbft/config/genesis.json is truncated mid-line, causing node crash with deserialization error.

Symptoms:

  • Node fails to start with error: Couldn't deserialize the genesis file: Error("EOF while parsing a string", line: 30
  • File ends abruptly mid-value: "value": "AAECAwQ
  • Recurring task panics: ERROR namada_node::abortable: Abortable spawner error: task 64 panicked
  • Task 64 panic occurs approximately every 30 seconds

Context:

  • File was working correctly previous day
  • No system crashes or power losses reported
  • Running Namada dry run network
  • Error logs show persistent task panics for ~1 hour before detection

System Info:

  • Node Type: Full Node (not validator)
  • Chain ID: namada-dryrun.abaaeaf7b78cb3ac
  • Service: namada_dryrun.service

Impact:
Node cannot start, preventing network participation and balance queries.

namada-debug-20241121_140726.zip
Attached logs collected with the Debug Collector

@andreidavid andreidavid added the bug Something isn't working label Nov 21, 2024
@andreidavid
Copy link
Contributor Author

Seem to actually be due to running out of disk space, although I have another 57G of free space on that partition:

Nov 20 06:02:35 volaris namada[2450]: E[2024-11-20|06:02:35.249] CONSENSUS FAILURE!!! module=consensus err="write /nvme/namada/tududes-fragile.ba8b841cd08325/cometbft/data/write-file-atomic-05538696762220113713: no space left on device" stack="goroutine 31924 [running]:\nruntime/debug.Stack()\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:737 +0x46\npanic({0xf15a60?, 0xc00214b680?})\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/privval.(*FilePVLastSignState).Save(0x90?)\n\t/home/runner/work/cometbft/cometbft/privval/file.go:138 +0x86\ngithub.com/cometbft/cometbft/privval.(*FilePV).saveSigned(0xb36cb3a4d9f44b1f?, 0xd9ee60c8e32a07c6?, 0x25d7328?, 0xc0?, {0xc001c6e120?, 0xb41ec6e4cd932401?, 0xb36cb3a4d9f44b1f?}, {0xc0012ba440, 0x40, 0x40})\n\t/home/runner/work/cometbft/cometbft/privval/file.go:394 +0x8e\ngithub.com/cometbft/cometbft/privval.(*FilePV).signVote(0xc0004f4dc0, {0xc0003ca700, 0x1e}, 0xc00107c460)\n\t/home/runner/work/cometbft/cometbft/privval/file.go:338 +0x2fb\ngithub.com/cometbft/cometbft/privval.(*FilePV).SignVote(0xc0000e4380?, {0xc0003ca700?, 0x3d93d47925c7df84?}, 0xf7bc4f44333115d8?)\n\t/home/runner/work/cometbft/cometbft/privval/file.go:255 +0x18\ngithub.com/cometbft/cometbft/consensus.(*State).signVote(0xc0000e4380, 0x1, {0xc000b59ea0, 0x20, 0x20}, {0x0?, {0xc000b59760?, 0x9c8af9?, 0xf00160?}})\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:2282 +0x52c\ngithub.com/cometbft/cometbft/consensus.(*State).signAddVote(0xc0000e4380, 0x1065756?, {0xc000b59ea0, 0x20, 0x20}, {0x1e?, {0xc000b59760?, 0x7a104?, 0xc001e9ac00?}})\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:2328 +0x205\ngithub.com/cometbft/cometbft/consensus.(*State).defaultDoPrevote(0xc0000e4380, 0xea5ae0?, 0xffffff01?)\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:1339 +0x3d8\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrevote(0xc0000e4380, 0x7a105, 0x0)\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:1276 +0x511\ngithub.com/cometbft/cometbft/consensus.(*State).handleCompleteProposal(0xc0000e4380, 0xc001a08400?)\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:2010 +0x38f\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc0000e4380, {{0x1295a20, 0xc000010408}, {0xc0005125a0, 0x28}})\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:847 +0x1a5\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc0000e4380, 0x0)\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:773 +0x3d1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 111\n\t/home/runner/work/cometbft/cometbft/consensus/state.go:384 +0x10c\n"

@Fraccaman
Copy link
Member

its probably due to rocksdb compaction. It requires double the amount of storage used. So if namada is using 100gb for db, it requires another 100gb to be free.

@andreidavid
Copy link
Contributor Author

Looks like this could be it
96G /nvme/namada_dryrun/namada-dryrun.abaaeaf7b78cb3ac/db/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants