Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foundation of new syn2mas tool #3636

Open
wants to merge 10 commits into
base: rei/target_syn2mas
Choose a base branch
from
Open

Conversation

reivilibre
Copy link
Contributor

This is the start of a new version of the syn2mas tool.

Given it's missing a lot of stuff still, I'm intending to merge this to an intermediary branch where I can then build upon this with the missing features.

Notable inclusions

  • parallel insert across multiple writer transactions
  • syn2mas temporarily drops indices and constraints during migration, for much faster insertion performance. (rough numbers: 2 min → 15 s)
  • safety feature: MAS tables are renamed during migration, to make sure a MAS instance doesn't accidentally run and access the database in its vulnerable state
  • safety feature: syn2mas holds an advisory lock preventing other concurrent instances of syn2mas from running
  • syn2mas can restart even if it died or was interrupted half-way through

Notable missing features (for later PRs)

  • migrating any data other than user or user password records
  • migrating records for guest users (it's uncertain whether we want to do this or not)
  • running sanity checks against the Synapse DB before you start migration
  • configuring the server name
  • configuring the source Synapse database location
  • tests
  • documentation page

Performance

This currently runs a synthetic matrix.org-size migration under 15 seconds. (Bear in mind this is only one part of the migration, but it gives a rough idea of where we're going with this.)
The synthetic database was set up with the same number of real users as matrix.org, but guest and appservice users were not populated in the database so in reality this may run slower because the Synapse Postgres instance will need to read and skip rows corresponding to guests and appservice users.

@reivilibre reivilibre requested a review from sandhose December 6, 2024 15:44
Copy link
Member

@sandhose sandhose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising! I've left a bunch of feedback after a first read; I still want to look at the constraint save/restoration in more details later

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Show resolved Hide resolved
/// If you want to use this tool anyway, please pass this argument.
///
/// If you want to migrate from Synapse to MAS today, please use the Node.js-based tool in the MAS repository.
#[clap(long = "i-swear-i-am-just-testing-in-a-staging-environment")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫡

crates/syn2mas/src/checks.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/lib.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/synapse_reader.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/mas_writer.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/mas_writer.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/mas_writer.rs Outdated Show resolved Hide resolved
const NUM_WRITER_CONNECTIONS: usize = 8;

impl Options {
pub async fn run(self, figment: &Figment) -> anyhow::Result<ExitCode> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would appreciate tracing spans on most operations, which would help a lot diagnose what takes time, but that can be introduced later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I still need to figure out tracing (in the sense of spans) to be honest. Can probably just add #[instrument]s without too much hassle though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always add more infos or better names on them later, but this is fine 👍

Copy link

cloudflare-workers-and-pages bot commented Dec 10, 2024

Deploying matrix-authentication-service-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: e7172d6
Status: ✅  Deploy successful!
Preview URL: https://765d99cc.matrix-authentication-service-docs.pages.dev
Branch Preview URL: https://rei-syn2mas-pr1.matrix-authentication-service-docs.pages.dev

View logs

crates/syn2mas/src/synapse_reader.rs Outdated Show resolved Hide resolved
crates/syn2mas/src/migration.rs Show resolved Hide resolved
const NUM_WRITER_CONNECTIONS: usize = 8;

impl Options {
pub async fn run(self, figment: &Figment) -> anyhow::Result<ExitCode> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always add more infos or better names on them later, but this is fine 👍

Comment on lines 72 to 76
connection_rx: Receiver<Result<PgConnection, Error>>,

/// A sender handle to return a writer connection to the pool
/// The connection should still be mid-transaction!
connection_tx: Sender<Result<PgConnection, Error>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The transaction either olds a ref to a connection or a connection from a pool. Weird that there is no good way to have an owned connection in there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks less scary after looking at it again

@reivilibre reivilibre requested a review from sandhose December 11, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants