core.db_to_pq
core.db_to_pq(
table_name,
schema,
*,
user=None,
host=None,
database=None,
port=None,
data_dir=None,
col_types=None,
row_group_size=1048576,
obs=None,
modified=None,
alt_table_name=None,
keep=None,
drop=None,
rename=None,
where=None,
batched=True,
threads=None,
tz='UTC',
engine=None,
numeric_mode=None,
adbc_batch_size_hint_bytes=None,
adbc_use_copy=None,
archive=False,
archive_dir=None,
)Export a PostgreSQL table to a Parquet file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | Name of the source PostgreSQL table. | required |
| schema | str | Name of the source PostgreSQL schema. | required |
| user | str | Source PostgreSQL user role. | None |
| host | str | Source PostgreSQL host name. | None |
| database | str | Source PostgreSQL database name. | None |
| port | int | Source PostgreSQL port. | None |
| data_dir | str | Root directory of the Parquet data repository. If omitted, use DATA_DIR or the current working directory. |
None |
| col_types | dict | Explicit output column types. Only a subset of columns needs to be supplied. Types should describe the exported output columns after any renaming. | None |
| row_group_size | int | Maximum number of rows per written Parquet row group. Default is 1024 * 1024. |
1048576 |
| obs | int | Maximum number of rows to export. Implemented with SQL LIMIT. |
None |
| modified | str | Last-modified string to embed in the Parquet metadata. If omitted, use the source PostgreSQL table comment when available. | None |
| alt_table_name | str | Output Parquet basename. If omitted, defaults to table_name. |
None |
| keep | str or iterable | Regex pattern(s) describing source columns to keep or drop. If both are supplied, drop is applied first. |
None |
| drop | str or iterable | Regex pattern(s) describing source columns to keep or drop. If both are supplied, drop is applied first. |
None |
| rename | dict | Mapping from source column names to output column names. Keys are the original PostgreSQL column names and values are the exported names. When rename is used, col_types should refer to the output names after renaming. |
None |
| where | str | SQL WHERE condition used to filter source rows before export. |
None |
| batched | bool | If True, stream Arrow batches instead of materializing the full result at once. This typically reduces memory use for large tables. |
True |
| threads | int | Maximum number of DuckDB worker threads to use on the DuckDB path. | None |
| tz | str | Time zone assumed for timestamp without time zone source columns before normalizing output timestamps. |
'UTC' |
| engine | (duckdb, adbc) | Query execution engine used to read PostgreSQL data before writing Parquet. | "duckdb" |
| numeric_mode | (text, float64, decimal) | Handling for PostgreSQL NUMERIC columns. None keeps the engine-specific default behavior. Explicit col_types entries take precedence. |
"text" |
| adbc_batch_size_hint_bytes | int | On the ADBC path, hint the PostgreSQL driver about the desired Arrow batch size in bytes. | None |
| adbc_use_copy | bool | On the ADBC path, explicitly enable or disable the PostgreSQL driver’s COPY optimization. |
None |
| archive | bool | Whether an existing Parquet file should be archived before replacement. | False |
| archive_dir | str | Name of the archive directory relative to data_dir/schema. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| str | None | Path to the written Parquet file, or None if the query returns no rows. |
Examples
Export a table using the default DuckDB-backed path:
>>> from db2pq import db_to_pq
>>> db_to_pq("dsi", "crsp")Rename a column and apply an output type override:
>>> db_to_pq(
... "company",
... "public",
... rename={"conm": "company_name"},
... col_types={"company_name": "string"},
... )