#Slony-I Failover / Switchover

Perl script to assist with performing switchover and failover of replication sets in PostgreSQL databases replicated using Slony-I.

The script can be run in interactive mode to suggest switchover or failover and will create a slonik script to perform the suggested action.

It's hard to put together a script for all situations as different Slony configurations can have different complexities (hence the existence of slonik), but this script is intended to be used for building and running slonik scripts to move all sets from one node to another.

There is also an autofailover mode which will sit and poll each node and perform a failover of failed nodes. This mode should be assumed as experimental, as there can be quite a few decisions to be made when failing over different setups.

##Example usage

Launch interactive mode with command line parameters:

$ ./slony_failover.pl -h localhost -db TEST -cl test_replication

Launch with configuration file, a minimal configuration file would be:

slony_database_host = localhost
slony_database_name = TEST
slony_cluster_name = test_replication

$ ./slony_failover.pl -f slony_failover.conf

Run as a daemon in debian:

$ sudo cp init.debian /etc/init.d/slony_failover
$ cp slony_failover.conf /var/slony/slony_failover/slony_failover.conf
$ sudo chmod +x /etc/init.d/slony_failover
$ sudo update-rc.d slony_failover start 99 2 3 4 5 . stop 24 0 1 6 
$ sudo invoke-rc.d slony_failover start

##Command line parameters

$ ./slony_failover.pl [options]

Switch	Description
-f	Read all configuration from config file
-h	Host running PostgreSQL instance to read state of Slony-I cluster from
-p	Port of above PostgreSQL database instance
-db	Name of above PostgreSQL database instance
-cl	Name of Slony-I cluster
-u	User to connect to above PostgreSQL database instance
-P	Password for above user (Use .pgpass instead where possible)
-i	Print information about slony cluster and exit

##Configuration file parameters

Section	Parameter	Type	Default	Comment
General	lang	en/fr	'en'	The language to print messages in, currently only english and french
General	prefix_directory	/full/path/to/directory	'/tmp/slony_failovers'	Working directory for script to generate slonik scripts and log files
General	separate_working_directory	boolean	'true'	Append a separate working directory to the prefix_directory for each run
General	slonik_path	/full/path/to/bin/directory	null	Slonik binary if not in current path
General	pid_filename	/path/to/pidfile	'/var/run/slony_failover.pid'	Pid file to use when running in autofailover mode
General	enable_try_blocks	boolean	false	Write slonik script with try blocks where possible to aid error handling
General	lockset_method	single/multiple	'multiple'	Write slonik script that locks all sets
General	pull_aliases_from_comments	boolean	false	If true, script will pull text from comment fields and use to generate
				possibly meaningful aliases for nodes and sets.
				For sl_set this uses the entire comment, and sl_node text in parentheses.
General	log_line_prefix	text	null	Prefix to add to log lines, special values:
				%p = process ID
				%t = timestamp without milliseconds
				%m = timestamp with milliseconds
General	failover_offline_subscriber_only	boolean	false	If set to true any subscriber only nodes that are unavailable at the time
				of failover will also be failed over. If false any such nodes will be
				excluded from the preamble and not failed over, however this may be problematic
				especially in the case where the most up to date node is the unavailable one.
General	drop_failed_nodes	boolean	false	After failover automatically drop the failed nodes.
Slon Config	slony_database_host	IP Address/Hostname	null	PostgreSQL Hostname of database to read Slony configuration from
Slon Config	slony_database_port	integer	5432	PostgreSQL Port of database to read Slony configuration from
Slon Config	slony_database_name	name	null	PostgreSQL database name to read Slony configuration from
Slon Config	slony_database_user	username	'slony'	Username to use to connect when reading Slony configuration
Slon Config	slony_database_password	password	''	Recommended to leave blank and use .pgpass file
Slon Config	slony_cluster_name	name	null	Name of Slony-I cluster to read configuration for
Logging	enable_debugging	boolean	false	Enable printing of debug messages to stdout
Logging	log_filename	base file name	'failover.log'	File name to use for script process logging, special values as per strftime spec
Logging	log_to_postgresql	boolean	false	Store details of failover script runs in a postgresql database
Logging	log_database_host	IP Address/Hostname	null	PostgreSQL Hostname of logging database
Logging	log_database_port	integer	null	PostgreSQL Port of logging database
Logging	log_database_name	name	null	PostgreSQL database name of logging database
Logging	log_database_user	username	null	Username to use to connect when logging to database
Logging	log_database_password	password	''	Recommended to leave blank and use .pgpass file
Autofailover	enable_autofailover	boolean	'false'	Rather than interactive mode sit and watch the cluster state for failed
				origin/forwarding nodes; upon detection trigger automated failover.

Autofailover	autofailover_forwarding_providers	boolean	'false'	If true a failure of a pure forwarding provider will also trigger failover
Autofailover	autofailover_config_any_node	boolean	'true'	After reading the initial cluster configuration, subsequent reads of the configuration
				will use conninfo read from sl_subscribe to read from any node.
Autofailover	autofailover_poll_interval	integer	500	How often to check for failure of nodes (milliseconds)
Autofailover	autofailover_node_retry	integer	2	When failure is detected, retry this many times before initiating failover
Autofailover	autofailover_sleep_time	integer	1000	Interval between retries (milliseconds)
Autofailover	autofailover_perspective_sleep_time	integer	20000	Interval between lag reads for failed nodes from surviving nodes. If greater than zero any observation that nodes have failed is checked from surviving nodes perspective by checking if lag times are extending. This does not guarantee 100% the nodes are down but if set to a large enough interval (at least sync_interval_timeout) can back up our observation.
Autofailover	autofailover_majority_only	boolean	false	Only fail over if the quantity of surviving nodes is greater than the quantity of failed nodes. Intended to be used to prevent a split-brain scenario in conjunction with some other logic to monitor and fence off the old origin if it is in the minority.
Autofailover	autofailover_is_quorum	boolean	false	If this script is running on a separate host set to true to treat it as a quorum server. Effectively increments sum of surviving nodes when calculating the majority above.

Changes

08/04/2012 - Hash together some ideas for interactive failover perl script
04/11/2012 - Experiment with different use of try blocks (currently can't use multiple lock sets indide try)
13/04/2014 - Update to work differently for Slony 2.2+
05/05/2014 - Experiment with autofailover ideas
10/09/2014 - Add some logic to autofailover for doing extra checks from perspective of other nodes. Still a naive autofailover implementaition imho.

Licence

See the LICENCE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Changes

Licence

Files

README.md

Latest commit

History

README.md

File metadata and controls

Changes

Licence