An utility script to extract data from Postgres (.sql) dumps.
We have dumps from different postgres/postgis servers, different versions of postgres are involved and it's not simple to just restore them given the necessary data(/relationships/ownerships) integrity that a full restore requires.
Moreover, in this case the dumps contain just read-only data, so the only requirement is to have access to those tables.
The tool iterates through a sql file, that contains statements generated by pg_dump. When a CREATE TABLE
command is found the statement is converted in the DuckDB
dialect and a table with the same data schema is created.
The script then looks for the COPY
statement,
- reads each line after that as TSV
- perform some transformation to match the table schema
- convert this block of rows to an arrow table
- efficiently copies the arrow table into duckdb
- export the table as parquet file
pip install .
usage: pg_dedump [-h] [-v] [-r] [-c CHUNKS] [-t TOTAL] [-d DB] [-p PREFIX] [-o OUTPUT] [--output-type OUTPUT_TYPE] [FILE ...]
extract tables from postgres dumps
positional arguments:
FILE files to read, if empty, stdin is used
options:
-h, --help show this help message and exit
-v, --verbose Print debug output
-r, --drop-db Delete the database if present
-c CHUNKS, --chunks CHUNKS
Chunk insert size - default: 10000
-t TOTAL, --total TOTAL
Total lines - enables the progress bar
-d DB, --db DB Name of the dump database - by default uses dump.ddb
-p PREFIX, --prefix PREFIX
Prefix to add to each table exported
-o OUTPUT, --output OUTPUT
Output path
--output-type OUTPUT_TYPE
Format of the tables output - default: parquet
pg_dump < dump.sql
To create an sql dump from a binary dump use:
pg_restore -f output.sql binary_dump_file
It's advised to extract just the tables you need using option -t
.