Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only re-prepare MatrixrRTC delayed disconnection event on 404 #4575

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

toger5
Copy link
Contributor

@toger5 toger5 commented Dec 9, 2024

n.b. The original scope of this PR has changed (reduced). It is not feasible to test this at the moment. We are planning a refactor of this class to make it easier to test.

With it being set to one the following issue could occur:

// If sending state cancels your own delayed state, prepare another delayed state
// TODO: Remove this once MSC4140 is stable & doesn't cancel own delayed state
if (this.disconnectDelayId !== undefined) {
    try {
        const knownDisconnectDelayId = this.disconnectDelayId;
        await resendIfRateLimited(
            () =>
                this.client._unstable_updateDelayedEvent(
                    knownDisconnectDelayId,
                    UpdateDelayedEventAction.Restart,
                ),
            1000,
        );
    } catch (e) {
        logger.warn("Failed to update delayed disconnection event, prepare it again:", e);
        this.disconnectDelayId = undefined;
        await prepareDelayedDisconnection();
    }
}

This code looks like the catch(e) could never be triggered with 429 (rate limit) because they would be caught by await resendIfRateLimited. EXCEPT that this is only happening once: resendIfRateLimited<T>(func: () => Promise<T>, numRetriesAllowed: number = 1). So as soon as the server sends two rate limits in a row we get the following:

  • we get into the catch(e) because of the rate limit
  • we forget about this.disconnectDelayId = undefined
  • we start a new delayed event await prepareDelayedDisconnection();
  • we do not anymore update the old delayed event which is still running!
  • the running delay event will make us disconnect from the call (call member becomes {})
  • we get into our other error catching mechanism that resends the new state event
  • this cancels the newly created delay leave event (await prepareDelayedDisconnection();)
  • and create another delay leave event.
  • but if we are still rate limited (chances are really high due to the reconnect), this loop will REPEAT

Checklist

  • Tests written for new code (and old code if feasible).
  • New or updated public/exported symbols have accurate TSDoc documentation.
  • Linter and other CI checks pass.
  • Sign-off given on the changes (see CONTRIBUTING.md).

 With it being set to one the following issue could occur:

```
// If sending state cancels your own delayed state, prepare another delayed state
// TODO: Remove this once MSC4140 is stable & doesn't cancel own delayed state
if (this.disconnectDelayId !== undefined) {
    try {
        const knownDisconnectDelayId = this.disconnectDelayId;
        await resendIfRateLimited(
            () =>
                this.client._unstable_updateDelayedEvent(
                    knownDisconnectDelayId,
                    UpdateDelayedEventAction.Restart,
                ),
            1000,
        );
    } catch (e) {
        logger.warn("Failed to update delayed disconnection event, prepare it again:", e);
        this.disconnectDelayId = undefined;
        await prepareDelayedDisconnection();
    }
}
```
This code looks like the `catch(e)` could never be triggered with 429 (rate limit) because they would be caught by `await resendIfRateLimited`. EXCEPT that this is only happening once: `resendIfRateLimited<T>(func: () => Promise<T>, numRetriesAllowed: number = 1)`. So as soon as the server sends two rate limits in a row we get the following:
 - we get into the `catch(e)`  because of the rate limit
 - we forget about `this.disconnectDelayId = undefined`
 - we start a new delayed event `await prepareDelayedDisconnection();`
 - we do not anymore update the old delayed event which is still running!
 - the running delay event will make us disconnect from the call (call member becomes `{}`)
 - we get into our outher error catching mechanism that resends the new state event
 - this cancels the newly created delay leave event (`await prepareDelayedDisconnection();`)
 - and create another delay leave event.
 - but if we are still reate limited (chances are really high due to the reconnect), this loop will REPEAT
Copy link
Member

@hughns hughns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a test case for this, please.

Copy link
Member

@hughns hughns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other questions:

  • It doesn't make sense that we would retry 1000 times. Surely we want to stop when the call disconnects?
  • What prevents the being more than one prepareDelayedDisconnection() running at a time?

@toger5 toger5 force-pushed the toger5/dont-resend-call-membership-after-one-rate-limit-error-response branch from ba4c95d to 81cd4f8 Compare December 11, 2024 15:56
@hughns hughns changed the title Set retry counts of event updating to 1000 (from 1) Only re-prepare MatrixrRTC delayed disconnection event on 404 Dec 11, 2024
Copy link
Member

@hughns hughns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having discussed this with @toger5 it is not going to be feasible to add a new test case at this time.

Instead, we will on refactoring the class to make it more testable and also address the questions about having multiple timers running at once.

@hughns hughns added this pull request to the merge queue Dec 11, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants