core.db_schema_to_pq

core.db_schema_to_pq(
    schema,
    *,
    user=None,
    host=None,
    dbname=None,
    port=None,
    data_dir=None,
    row_group_size=1024 * 1024,
    batched=True,
    threads=None,
    engine=None,
    numeric_mode=None,
    archive=False,
    archive_dir=None,
)

Export all tables in a PostgreSQL schema to Parquet files.

Parameters

Name Type Description Default
schema str Name of the PostgreSQL database schema. required
user str PostgreSQL user role. If not provided, defaults to the value of the PGUSER environment variable, or (if unset) the current system user. None
host str Host name for the PostgreSQL server. If not provided, defaults to the value of the PGHOST environment variable, or "localhost" if unset. None
dbname str Name of the PostgreSQL database. If not provided, defaults to the value of the PGDATABASE environment variable, or (if unset) the resolved user. None
port int Port for the PostgreSQL server. If not provided, defaults to the value of the PGPORT environment variable, or 5432 if unset. None
data_dir str Root directory of the Parquet data repository. If not provided, defaults to the value of the DATA_DIR environment variable, or the current working directory. None
row_group_size int Maximum number of rows in each written Parquet row group. Must be positive. Default is 1024 * 1024. 1024 * 1024
batched bool Whether data are extracted in batches using to_pyarrow_batches() instead of a single call to to_pyarrow(). Using batches reduces memory usage for large tables at the cost of slightly lower performance. True
threads int Number of threads DuckDB is allowed to use. If provided, must be positive. None
engine (duckdb, adbc) Query execution engine used to read PostgreSQL data before writing Parquet. "duckdb"
numeric_mode (text, float64, decimal) Handling for PostgreSQL NUMERIC columns. None keeps the engine-specific default: native decimals on DuckDB, text-backed numerics on ADBC. "text"
archive bool Whether an existing Parquet file should be archived before being replaced. False
archive_dir str Name of the directory (relative to data_dir/schema) where archived Parquet files will be stored. None

Returns

Name Type Description
results list[str] List of Parquet file paths returned by db_to_pq(), one for each table in the schema.

Examples

>>> db_schema_to_pq("crsp")
>>> db_schema_to_pq("audit", archive=True)