Export all tables in a PostgreSQL schema to Parquet files.
Parameters
Name
Type
Description
Default
schema
str
Name of the PostgreSQL database schema.
required
user
str
PostgreSQL user role. If not provided, defaults to the value of the PGUSER environment variable, or (if unset) the current system user.
None
host
str
Host name for the PostgreSQL server. If not provided, defaults to the value of the PGHOST environment variable, or "localhost" if unset.
None
dbname
str
Name of the PostgreSQL database. If not provided, defaults to the value of the PGDATABASE environment variable, or (if unset) the resolved user.
None
port
int
Port for the PostgreSQL server. If not provided, defaults to the value of the PGPORT environment variable, or 5432 if unset.
None
data_dir
str
Root directory of the Parquet data repository. If not provided, defaults to the value of the DATA_DIR environment variable, or the current working directory.
None
row_group_size
int
Maximum number of rows in each written Parquet row group. Must be positive. Default is 1024 * 1024.
1024 * 1024
batched
bool
Whether data are extracted in batches using to_pyarrow_batches() instead of a single call to to_pyarrow(). Using batches reduces memory usage for large tables at the cost of slightly lower performance.
True
threads
int
Number of threads DuckDB is allowed to use. If provided, must be positive.
None
engine
(duckdb, adbc)
Query execution engine used to read PostgreSQL data before writing Parquet.
"duckdb"
numeric_mode
(text, float64, decimal)
Handling for PostgreSQL NUMERIC columns. None keeps the engine-specific default: native decimals on DuckDB, text-backed numerics on ADBC.
"text"
archive
bool
Whether an existing Parquet file should be archived before being replaced.
False
archive_dir
str
Name of the directory (relative to data_dir/schema) where archived Parquet files will be stored.
None
Returns
Name
Type
Description
results
list[str]
List of Parquet file paths returned by db_to_pq(), one for each table in the schema.