Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

barman backup <server> --wait not working with Postgres 17 #1041

Open
cameronmurdoch opened this issue Dec 12, 2024 · 1 comment
Open

barman backup <server> --wait not working with Postgres 17 #1041

cameronmurdoch opened this issue Dec 12, 2024 · 1 comment

Comments

@cameronmurdoch
Copy link

Hi

I set up two otherwise identical postgres servers with barman, once running pg16.6 and the other 17.2, using the following config (XX is the postgres major version):

[servername]
description =  "backup via streaming only"
conninfo = <connstring>
streaming_conninfo = <streaming connstring>
backup_method = postgres
streaming_archiver = on
slot_name = barman
create_slot = auto
path_prefix = "/usr/pgsql-XX/bin"

Both postgres servers are idle. If I run barman backup --wait for the postgres 16 server, I get as expected:

[[email protected] ~]$ barman backup dbpg02 --wait
Starting backup using postgres method for server dbpg02 in /dbfiles/backups/barman/dbpg02/base/20241212T122615
Backup start at LSN: 0/60000C8 (000000010000000000000006, 000000C8)
Starting backup copy via pg_basebackup for 20241212T122615
Copy done (time: 1 second)
Finalising the backup.
Backup size: 5.4 MiB
Backup end at LSN: 0/8000000 (000000010000000000000007, 00000000)
Backup completed (start time: 2024-12-12 12:26:15.062829, elapsed time: 1 second)
Waiting for the WAL file 000000010000000000000007 from server 'dbpg02'
Processing xlog segments from streaming for dbpg02
	000000010000000000000006
Processing xlog segments from streaming for dbpg02
	000000010000000000000007
[[email protected] ~]$

The backup started with walfile 000000010000000000000006 and ended with 000000010000000000000007, and barman receives and archives these as expected almost immediately.

If I do this for the pg17 server, I get this:

[[email protected] ~]$ barman backup dbpg01 --wait
Starting backup using postgres method for server dbpg01 in /dbfiles/backups/barman/dbpg01/base/20241212T122822
Backup start at LSN: 0/5000060 (000000010000000000000005, 00000060)
Starting backup copy via pg_basebackup for 20241212T122822
Copy done (time: 1 second)
Finalising the backup.
Backup size: 6.8 MiB
Backup end at LSN: 0/7000000 (000000010000000000000007, 00000000)
Backup completed (start time: 2024-12-12 12:28:22.841537, elapsed time: 1 second)
Waiting for the WAL file 000000010000000000000007 from server 'dbpg01'
Processing xlog segments from streaming for dbpg01
	000000010000000000000005
	000000010000000000000006

This hangs waiting for walfile 000000010000000000000007, and barman.wal_archiver INFO: No xlog segments found from streaming for dbpg01. is logged constantly in the logfile.

If I manually run switch-wal then the backup completes.

I think this problem is caused by a breaking change in pg17 described in the release notes as:

Change file boundary handling of two WAL file name functions (Kyotaro Horiguchi, Andres Freund, Bruce Momjian) §

The functions pg_walfile_name() and pg_walfile_name_offset() used to report the previous LSN segment number when the LSN was on a file segment boundary; it now returns the current LSN segment.

Here is the commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=344afc776

I see that pg_walfile_name_offset() is used here:

"({pg_walfile_name_offset}(location)).*, "

And that Backup end at LSN: 0/7000000 (000000010000000000000007, 00000000) is at the segment boundary so for pg <= 16 pg_walfile_name_offset() would have given 000000010000000000000006 as the filename, but for pg17 we get 000000010000000000000007

Thanks!
C

@cameronmurdoch
Copy link
Author

Hi,

I've opened a draft PR with quick proof of concept fix that makes barman backup --wait work again on pg17 for me:

#1044

This most likely requires more work :-) and I am unsure if it is even the best approach.

I see also that e.g. pg_walfile_name() is used in switch_wal():

def switch_wal(self):

and whilst I guess it is much less likely to encounter a segment boundary here, if it is possible then this function will also need fixing.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant