core.db_schema_to_pq

core.db_schema_to_pq(
    schema,
    *,
    user=None,
    host=None,
    dbname=None,
    port=None,
    data_dir=None,
    row_group_size=1024 * 1024,
    batched=True,
    threads=None,
    engine=None,
    numeric_mode=None,
    archive=False,
    archive_dir=None,
)

Export all tables in a PostgreSQL schema to Parquet files.

Parameters

Name	Type	Description	Default
schema	str	Name of the PostgreSQL database schema.	required
user	str	PostgreSQL user role. If not provided, defaults to the value of the `PGUSER` environment variable, or (if unset) the current system user.	`None`
host	str	Host name for the PostgreSQL server. If not provided, defaults to the value of the `PGHOST` environment variable, or `"localhost"` if unset.	`None`
dbname	str	Name of the PostgreSQL database. If not provided, defaults to the value of the `PGDATABASE` environment variable, or (if unset) the resolved `user`.	`None`
port	int	Port for the PostgreSQL server. If not provided, defaults to the value of the `PGPORT` environment variable, or `5432` if unset.	`None`
data_dir	str	Root directory of the Parquet data repository. If not provided, defaults to the value of the `DATA_DIR` environment variable, or the current working directory.	`None`
row_group_size	int	Maximum number of rows in each written Parquet row group. Must be positive. Default is `1024 * 1024`.	`1024 * 1024`
batched	bool	Whether data are extracted in batches using `to_pyarrow_batches()` instead of a single call to `to_pyarrow()`. Using batches reduces memory usage for large tables at the cost of slightly lower performance.	`True`
threads	int	Number of threads DuckDB is allowed to use. If provided, must be positive.	`None`
engine	(duckdb, adbc)	Query execution engine used to read PostgreSQL data before writing Parquet.	`"duckdb"`
numeric_mode	(text, float64, decimal)	Handling for PostgreSQL `NUMERIC` columns. `None` keeps the engine-specific default: native decimals on DuckDB, text-backed numerics on ADBC.	`"text"`
archive	bool	Whether an existing Parquet file should be archived before being replaced.	`False`
archive_dir	str	Name of the directory (relative to `data_dir/schema`) where archived Parquet files will be stored.	`None`

Returns

Name	Type	Description
results	list[str]	List of Parquet file paths returned by `db_to_pq()`, one for each table in the schema.

Examples

>>> db_schema_to_pq("crsp")
>>> db_schema_to_pq("audit", archive=True)