from db2pq import db_to_pq, pg_update_pqPostgreSQL to Parquet
Use this path when the source is a PostgreSQL database that is not WRDS, and the destination is a Parquet file or directory tree.
Main helpers
db_to_pq(table_name, schema, ...)exports one PostgreSQL table to Parquet.pg_update_pq(table_name, schema, ...)exports one PostgreSQL table to Parquet only when the source table is newer.db_schema_to_pq(schema, ...)exports all tables in a PostgreSQL schema.ibis_to_pq(table, out_file, ...)writes a PostgreSQL-backed Ibis expression to Parquet.
Typical example
PGHOST=localhost
PGDATABASE=analytics
PGUSER=postgres
db_to_pq(table_name="msf_v2", schema="crsp")db_to_pq(table_name="msf_v2", schema="crsp",
engine="adbc")pg_update_pq(table_name="msf_v2", schema="crsp")crsp.msf_v2 already up to date.
This example uses the default DuckDB-backed path. Reach for engine="adbc" only when you specifically want to try the ADBC export path.
By default, numeric handling follows the selected engine’s least-interfering behaviour: DuckDB preserves PostgreSQL NUMERIC columns as decimals, while the ADBC path defaults to text-backed numerics. Set numeric_mode="text", "float64", or "decimal" if you want to deviate from the default behaviour for each engine.
When to use each helper
- Use
db_to_pq()for one source table. - Use
pg_update_pq()when you want the same update-aware pattern aswrds_update_pq(), but for a local PostgreSQL source table. - Use
db_schema_to_pq()when you want a whole schema exported. - Use
ibis_to_pq()when you already have an Ibis expression and want that query result written directly to Parquet. - Prefer the default DuckDB engine unless you are intentionally testing or benchmarking the ADBC path.
Selecting, renaming, and recasting columns
For the PostgreSQL-backed table helpers, keep= and drop= filter source column names, while rename= maps source names to output names.
from db2pq import db_to_pq
db_to_pq(
"company",
"public",
keep=["gvkey", "conm", "sic"],
rename={"conm": "company_name"},
)If you combine rename= with col_types=, the keys in col_types= should refer to the output names after renaming.
from db2pq import db_to_pq
db_to_pq(
"example",
"public",
rename={"permno": "perm_id"},
col_types={"perm_id": "int32"},
)Ibis example
Here is a more realistic ibis_to_pq() pattern adapted from your WRDS/Ibis workflow note.
from ibis import _
from db2pq import ibis_to_pq
import os
import os
import ibis
db = ibis.postgres.connect(
host="wrds-pgdata.wharton.upenn.edu",
port=9737,
database="wrds",
user=os.environ["WRDS_ID"],
)
factors_daily = db.table("factors_daily", database="ff")
expr = (
factors_daily
.filter(_.date >= "2010-01-01")
.select("date", "mktrf", "smb", "hml", "rf")
.rename({"risk_free": "rf"})
.mutate(mkt = _.mktrf + _.risk_free)
)
ibis_to_pq(expr, "factors_ff3_daily.parquet", compression="zstd")'factors_ff3_daily.parquet'
This is a good fit when you already have a reusable Ibis query pipeline and want to write the result directly to Parquet without materializing the whole result in memory first.