PostgreSQL to Parquet

Use this path when the source is a PostgreSQL database that is not WRDS, and the destination is a Parquet file or directory tree.

Main helpers

db_to_pq(table_name, schema, ...) exports one PostgreSQL table to Parquet.
pg_update_pq(table_name, schema, ...) exports one PostgreSQL table to Parquet only when the source table is newer.
db_schema_to_pq(schema, ...) exports all tables in a PostgreSQL schema.
ibis_to_pq(table, out_file, ...) writes a PostgreSQL-backed Ibis expression to Parquet.

Typical example

PGHOST=localhost
PGDATABASE=analytics
PGUSER=postgres

from db2pq import db_to_pq, pg_update_pq

db_to_pq(table_name="msf_v2", schema="crsp")

db_to_pq(table_name="msf_v2", schema="crsp",
         engine="adbc")

pg_update_pq(table_name="msf_v2", schema="crsp")

crsp.msf_v2 already up to date.

This example uses the default DuckDB-backed path. Reach for engine="adbc" only when you specifically want to try the ADBC export path.

By default, numeric handling follows the selected engine’s least-interfering behaviour: DuckDB preserves PostgreSQL NUMERIC columns as decimals, while the ADBC path defaults to text-backed numerics. Set numeric_mode="text", "float64", or "decimal" if you want to deviate from the default behaviour for each engine.

When to use each helper

Use db_to_pq() for one source table.
Use pg_update_pq() when you want the same update-aware pattern as wrds_update_pq(), but for a local PostgreSQL source table.
Use db_schema_to_pq() when you want a whole schema exported.
Use ibis_to_pq() when you already have an Ibis expression and want that query result written directly to Parquet.
Prefer the default DuckDB engine unless you are intentionally testing or benchmarking the ADBC path.

Selecting, renaming, and recasting columns

For the PostgreSQL-backed table helpers, keep= and drop= filter source column names, while rename= maps source names to output names.

from db2pq import db_to_pq

db_to_pq(
    "company",
    "public",
    keep=["gvkey", "conm", "sic"],
    rename={"conm": "company_name"},
)

If you combine rename= with col_types=, the keys in col_types= should refer to the output names after renaming.

from db2pq import db_to_pq

db_to_pq(
    "example",
    "public",
    rename={"permno": "perm_id"},
    col_types={"perm_id": "int32"},
)

Ibis example

Here is a more realistic ibis_to_pq() pattern adapted from your WRDS/Ibis workflow note.

from ibis import _
from db2pq import ibis_to_pq
import os

import os
import ibis

db = ibis.postgres.connect(
    host="wrds-pgdata.wharton.upenn.edu",
    port=9737,
    database="wrds",
    user=os.environ["WRDS_ID"],
)

factors_daily = db.table("factors_daily", database="ff")

expr = (
    factors_daily
    .filter(_.date >= "2010-01-01")
    .select("date", "mktrf", "smb", "hml", "rf")
    .rename({"risk_free": "rf"})
    .mutate(mkt = _.mktrf + _.risk_free)
)

ibis_to_pq(expr, "factors_ff3_daily.parquet", compression="zstd")

'factors_ff3_daily.parquet'

This is a good fit when you already have a reusable Ibis query pipeline and want to write the result directly to Parquet without materializing the whole result in memory first.

Main helpers

Typical example

When to use each helper

Selecting, renaming, and recasting columns

Ibis example

Related reference pages