PostgreSQL to Parquet

Use this path when the source is a PostgreSQL database that is not WRDS, and the destination is a Parquet file or directory tree.

Main helpers

  • db_to_pq(table_name, schema, ...) exports one PostgreSQL table to Parquet.
  • pg_update_pq(table_name, schema, ...) exports one PostgreSQL table to Parquet only when the source table is newer.
  • db_schema_to_pq(schema, ...) exports all tables in a PostgreSQL schema.
  • ibis_to_pq(table, out_file, ...) writes a PostgreSQL-backed Ibis expression to Parquet.

Typical example

PGHOST=localhost
PGDATABASE=analytics
PGUSER=postgres
from db2pq import db_to_pq, pg_update_pq
db_to_pq(table_name="msf_v2", schema="crsp")
db_to_pq(table_name="msf_v2", schema="crsp",
         engine="adbc")
pg_update_pq(table_name="msf_v2", schema="crsp")
crsp.msf_v2 already up to date.

This example uses the default DuckDB-backed path. Reach for engine="adbc" only when you specifically want to try the ADBC export path.

By default, numeric handling follows the selected engine’s least-interfering behaviour: DuckDB preserves PostgreSQL NUMERIC columns as decimals, while the ADBC path defaults to text-backed numerics. Set numeric_mode="text", "float64", or "decimal" if you want to deviate from the default behaviour for each engine.

When to use each helper

  • Use db_to_pq() for one source table.
  • Use pg_update_pq() when you want the same update-aware pattern as wrds_update_pq(), but for a local PostgreSQL source table.
  • Use db_schema_to_pq() when you want a whole schema exported.
  • Use ibis_to_pq() when you already have an Ibis expression and want that query result written directly to Parquet.
  • Prefer the default DuckDB engine unless you are intentionally testing or benchmarking the ADBC path.

Selecting, renaming, and recasting columns

For the PostgreSQL-backed table helpers, keep= and drop= filter source column names, while rename= maps source names to output names.

from db2pq import db_to_pq

db_to_pq(
    "company",
    "public",
    keep=["gvkey", "conm", "sic"],
    rename={"conm": "company_name"},
)

If you combine rename= with col_types=, the keys in col_types= should refer to the output names after renaming.

from db2pq import db_to_pq

db_to_pq(
    "example",
    "public",
    rename={"permno": "perm_id"},
    col_types={"perm_id": "int32"},
)

Ibis example

Here is a more realistic ibis_to_pq() pattern adapted from your WRDS/Ibis workflow note.

from ibis import _
from db2pq import ibis_to_pq
import os

import os
import ibis

db = ibis.postgres.connect(
    host="wrds-pgdata.wharton.upenn.edu",
    port=9737,
    database="wrds",
    user=os.environ["WRDS_ID"],
)

factors_daily = db.table("factors_daily", database="ff")

expr = (
    factors_daily
    .filter(_.date >= "2010-01-01")
    .select("date", "mktrf", "smb", "hml", "rf")
    .rename({"risk_free": "rf"})
    .mutate(mkt = _.mktrf + _.risk_free)
)

ibis_to_pq(expr, "factors_ff3_daily.parquet", compression="zstd")
'factors_ff3_daily.parquet'

This is a good fit when you already have a reusable Ibis query pipeline and want to write the result directly to Parquet without materializing the whole result in memory first.