-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak Tracking and Remote Debugging #598
Comments
I can take memory dumps and do diffs between them. But there's too much going on in the code to know for sure where the leaks are coming from. What I've done so far.
I think moving forward it's better to isolate parts of the code and reduce the amount of noise in the snapshots. If there are any true leaks then they'll survive a garbage collection. But we can't really know when a GC happens? But we can trigger one within code. So moving forward I can approach the problem a few ways.
|
Moving forward I'm going to modify the source code slightly so that background tasks happen with a much shorter delay, I'll also run 4 nodes locally simulating a network. Hopefully with a lot more happening in the background I can see the memory leak will be more noticeable. |
Hmm, there's new stability issues with the nodes now. One coming from
And one I need to fix in
|
There are too many moving parts in |
The last snapshot I took showed a lot of retained memory from the My next step is to take a closer look at |
Current status update @tegefaulkes? ETA on fix? |
It's hard to give a fixed ETA on this. There's too much unknown to give a number. The problem scope is that somewhere in the code we're doing something that holds on to memory for too long. Right now I have an idea what it is. The handlers in Reading up on memory issues in javascript points out these common suspects. https://amplication.com/blog/understanding-and-preventing-memory-leaks-in-nodejs
So moving forward I'm going to focus on these. But like I said, the problem scope is kind of unbounded, so I can't estimate an ETA. That's why I've been suggesting focusing on the other issues first since their scopes are well known and right now things function with the memory issue. |
I have two tests doing the same thing.
The code looks like this. async function main1() {
console.log('Take Snapshot1 BEFORE');
await sleep(10000);
// example of taking up a lot of memory
let timers = []
for (let i = 0; i < 10000; i++) {
timers.push(createTimer());
}
console.log('Take Snapshot2 ACTIVE TIMERS');
await Promise.all(timers);
console.log('Take Snapshot2 DONE TIMERS');
await sleep(10000);
timers = null;
if (global.gc) global.gc();
console.log('Take Snapshot3 GC');
await sleep(10000);
} The first run leaves the array alone to keep the reference to all created timers. When the test is finished we can see that the snapshot shows 44MB used. I can see that 20MB are allocated when the timers are created. A further 9MB is created when the timers settle.
Since we're holding a reference to all timers, this is about what we'd expect. The The 2nd test does the same but sets
We can see that most of the memory has been freed, but not all of it. So right now we can conclude a few things.
Here we have an example of live memory. We can see what things are still live by the end of the tests. Hmm, some things are still live which seem odd. The |
Ok, so that's not really a leak. with 10x more timers created, the remaining memory remains the same. So the only real problem with |
If you need to do an explicit deletion use |
I pushed a patch to free the handler when the timer is done. I'm going to move on to checking out |
I found a leak in However, each I created a test script to demonstrate this. "use strict";
const { default: Logger } = require('./dist/Logger');
function sleep(delay) {
return new Promise( r => setTimeout(r, delay));
}
async function main() {
console.log('starting');
await sleep(5000);
const root = new Logger();
let head = root;
for (let i = 0; i < 10000; i++) {
head = head.getChild(`Logger ${i}`);
}
console.log('created!');
await sleep(1000);
head = null;
console.log('deleted!')
await sleep(10000);
console.log('done!')
}
main(); Running this script with the inspector shows the following.
So If I modify
I'll push a patch to the logger in a moment. |
Looking over
|
I'm going to look over contexts now. |
Looks like a Some notes about
So it's not the end of the world to not remove event handlers. But since With that said. If you pass in a signal to the |
|
@amydevs can you deploy a new one based on the current PK now. |
@tegefaulkes go to AWS have a look. |
And post another picture of the memory leak since we had 24 hrs. |
The PK CLI auto deploys to the the testnet on every commit to staging. |
We can see that before we had 3% in 8 hrs, now it is 1% growth in 8 hrs. If this stabilises over the weekend, then at the very least the memory leak is solved while idling, but we don't know if that could still occur in other interactions. |
Describe the bug
Testnet 6 has been deployed and we are getting some interesting results from the nodes, they no longer crash but is accumulating memory.
This is evidence of a memory leak. Once it reaches near 100% it will crash.
I recently saw this, and you can see this occurred during an automatic deployment so it was not ac crash.
But you can see the memory usage gets reset again and it will start growing.
Now we could try to find out the problem and directly solve it, but long-term this problem is likely to re-occur. We need a more robust way of detecting memory leaks in production over a long period of time.
I think to achieve this we can add:
nodes status
command should include connection and graph stats Polykey-CLI#36 - but basically we need to expose more of the internal state to be observed over the client service, so we can see what kind of state is being accumulatedTo Reproduce
Expected behavior
It should just flatten out when idling.
Screenshots
See above.
Additional context
Notify maintainers
The text was updated successfully, but these errors were encountered: