db2pq
db2pq is a Python package for primarily built to support Parquet-centered workflows using data sourced from PostgreSQL databases. The original focus was on data from the PostgreSQL database offered by Wharton Research Data Services (WRDS).
db2pq is designed for practical research workflows: move data out of PostgreSQL data sources (including WRDS), keep local Parquet copies current, and work with those files using tools like Polars, pandas, or R.
Working with WRDS data
While db2pq supports any PostgreSQL database as a source for creating Parquet files, much of its functionaliy is adapted working with data from the WRDS PostgreSQL database. The functions here support two primary workflows: either moving WRDS data sets into a local Parquet repository or moving them into a local PostgreSQL database. Once data are in Parquet, you can work with them directly using tools such as Polars or DuckDB in both Python and R.
Working with PostgreSQL data
db2pq also supports workflows where the source is a more general PostgreSQL database rather than WRDS. These workflows let you export PostgreSQL data to a Parquet repository or copy PostgreSQL tables into another PostgreSQL database.