core.db_to_pq

core.db_to_pq(
    table_name,
    schema,
    *,
    user=None,
    host=None,
    database=None,
    port=None,
    data_dir=None,
    col_types=None,
    row_group_size=1048576,
    obs=None,
    modified=None,
    alt_table_name=None,
    keep=None,
    drop=None,
    rename=None,
    where=None,
    batched=True,
    threads=None,
    tz='UTC',
    engine=None,
    numeric_mode=None,
    adbc_batch_size_hint_bytes=None,
    adbc_use_copy=None,
    archive=False,
    archive_dir=None,
)

Export a PostgreSQL table to a Parquet file.

Parameters

Name	Type	Description	Default
table_name	str	Name of the source PostgreSQL table.	required
schema	str	Name of the source PostgreSQL schema.	required
user	str	Source PostgreSQL user role.	`None`
host	str	Source PostgreSQL host name.	`None`
database	str	Source PostgreSQL database name.	`None`
port	int	Source PostgreSQL port.	`None`
data_dir	str	Root directory of the Parquet data repository. If omitted, use `DATA_DIR` or the current working directory.	`None`
col_types	dict	Explicit output column types. Only a subset of columns needs to be supplied. Types should describe the exported output columns after any renaming.	`None`
row_group_size	int	Maximum number of rows per written Parquet row group. Default is `1024 * 1024`.	`1048576`
obs	int	Maximum number of rows to export. Implemented with SQL `LIMIT`.	`None`
modified	str	Last-modified string to embed in the Parquet metadata. If omitted, use the source PostgreSQL table comment when available.	`None`
alt_table_name	str	Output Parquet basename. If omitted, defaults to `table_name`.	`None`
keep	str or iterable	Regex pattern(s) describing source columns to keep or drop. If both are supplied, `drop` is applied first.	`None`
drop	str or iterable	Regex pattern(s) describing source columns to keep or drop. If both are supplied, `drop` is applied first.	`None`
rename	dict	Mapping from source column names to output column names. Keys are the original PostgreSQL column names and values are the exported names. When `rename` is used, `col_types` should refer to the output names after renaming.	`None`
where	str	SQL `WHERE` condition used to filter source rows before export.	`None`
batched	bool	If `True`, stream Arrow batches instead of materializing the full result at once. This typically reduces memory use for large tables.	`True`
threads	int	Maximum number of DuckDB worker threads to use on the DuckDB path.	`None`
tz	str	Time zone assumed for `timestamp without time zone` source columns before normalizing output timestamps.	`'UTC'`
engine	(duckdb, adbc)	Query execution engine used to read PostgreSQL data before writing Parquet.	`"duckdb"`
numeric_mode	(text, float64, decimal)	Handling for PostgreSQL `NUMERIC` columns. `None` keeps the engine-specific default behavior. Explicit `col_types` entries take precedence.	`"text"`
adbc_batch_size_hint_bytes	int	On the ADBC path, hint the PostgreSQL driver about the desired Arrow batch size in bytes.	`None`
adbc_use_copy	bool	On the ADBC path, explicitly enable or disable the PostgreSQL driver’s `COPY` optimization.	`None`
archive	bool	Whether an existing Parquet file should be archived before replacement.	`False`
archive_dir	str	Name of the archive directory relative to `data_dir/schema`.	`None`

Returns

Name	Type	Description
	str \| None	Path to the written Parquet file, or `None` if the query returns no rows.

Examples

Export a table using the default DuckDB-backed path:

>>> from db2pq import db_to_pq
>>> db_to_pq("dsi", "crsp")

Rename a column and apply an output type override:

>>> db_to_pq(
...     "company",
...     "public",
...     rename={"conm": "company_name"},
...     col_types={"company_name": "string"},
... )