You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation is missing currently. By properly documenting migration methods in different scenarios helps users trust using Citus as a large-scale data solution.
What are the typical use cases?
Migrate a self-hosted Citus cluster into new setup (perhaps more powerful hardware, different geographical location or using new networking architecture - hopefully with minimal downtime.
How does this work? (devs)
On idea level
deploy a new Citus cluster having same number of workers as the original cluster 1..N
copy the data over and keep it in sync while clients keep on using the old cluster
switch over the clients to the new cluster
I presume step 2 could be achieved using publications & subscriptions of non-citus-metadata tables ("workload" tables) between each old and new worker pair N. Clusters will eventually become in-sync regarding workload data, but the new cluster would have its unique metadata (mainly pg_dist_node).
Using pubsub would allow installing the new cluster with newer PG version, having the PG version upgrade would be a "free" byproduct.
We've done very successfully a few of this kind of pubsub migrations into a newer PG versions with regular Patroni clusters, with downtime of single digit seconds, so it would be nice if same could be done with Citus as well.
Corner cases, gotchas
Are there relevant blog posts or outside documentation about the concept/feature?
I created an issue in the main Citus repo before knowing there exists this documentation repo as well.
More details about my specific use case there, and in a Citus Slack message I wrote earlier.
The text was updated successfully, but these errors were encountered:
Why are we implementing it? (sales eng)
The documentation is missing currently. By properly documenting migration methods in different scenarios helps users trust using Citus as a large-scale data solution.
What are the typical use cases?
Migrate a self-hosted Citus cluster into new setup (perhaps more powerful hardware, different geographical location or using new networking architecture - hopefully with minimal downtime.
How does this work? (devs)
On idea level
I presume step 2 could be achieved using publications & subscriptions of non-citus-metadata tables ("workload" tables) between each old and new worker pair N. Clusters will eventually become in-sync regarding workload data, but the new cluster would have its unique metadata (mainly
pg_dist_node
).Using pubsub would allow installing the new cluster with newer PG version, having the PG version upgrade would be a "free" byproduct.
We've done very successfully a few of this kind of pubsub migrations into a newer PG versions with regular Patroni clusters, with downtime of single digit seconds, so it would be nice if same could be done with Citus as well.
Corner cases, gotchas
Are there relevant blog posts or outside documentation about the concept/feature?
I created an issue in the main Citus repo before knowing there exists this documentation repo as well.
More details about my specific use case there, and in a Citus Slack message I wrote earlier.
The text was updated successfully, but these errors were encountered: