Export a table from the WRDS PostgreSQL database to a Parquet file.
Parameters
Name
Type
Description
Default
table_name
Name of the table in the WRDS PostgreSQL database.
required
schema
Name of the database schema.
required
wrds_id
str
WRDS user ID used to access WRDS services. This parameter is required and must be provided either explicitly or via the WRDS_ID environment variable.
None
data_dir
str
Root directory of the Parquet data repository. If not provided, defaults to the value of the DATA_DIR environment variable, or the current working directory.
None
force
bool
Whether update should proceed regardless of date comparison results.
False
col_types
dict
Dictionary of PostgreSQL data types to be used when importing data to PostgreSQL or writing to Parquet files. For Parquet files, conversion from PostgreSQL to PyArrow types is handled by DuckDB. Only a subset of columns needs to be supplied. Supplied types should be compatible with data emitted by PostgreSQL (i.e., one can’t “fix” arbitrary type issues using this argument). For example, col_types = {'permno': 'int32', 'permco': 'int32'}.
None
row_group_size
int
Maximum number of rows in each written row group. Default is 1024 * 1024.
1048576
obs
int
Number of observations to import from database table. Implemented using SQL LIMIT. Setting this to modest value (e.g., obs=1000) can be useful for testing wrds_update_pq() with large tables.
None
alt_table_name
str
Basename of parquet file. Used when file should have different name from table_name.
None
keep
str or iterable
Regex pattern(s) indicating columns to keep.
None
drop
str or iterable
Regex pattern(s) indicating columns to drop. If both drop and keep are provided, drop is applied first.
None
rename
dict
Mapping from source WRDS PostgreSQL column names to output column names. col_types entries should refer to the output names after renaming.
None
batched
bool
Indicates whether data will be extracting in batches using to_pyarrow_batches() instead of a single call to to_pyarrow(). Using batches degrades performance slightly, but dramatically reduces memory requirements for large tables.
True
threads
int
The number of threads DuckDB is allowed to use. Setting this may be necessary due to limits imposed on the user by the PostgreSQL database server.
3
engine
(duckdb, adbc)
Query execution engine used to read PostgreSQL data before writing Parquet.
"duckdb"
numeric_mode
(text, float64, decimal)
Handling for PostgreSQL NUMERIC columns. None keeps the engine-specific default: native decimals on DuckDB, text-backed numerics on ADBC. Explicit col_types entries take precedence.
"text"
adbc_batch_size_hint_bytes
int
On the ADBC path, hint the PostgreSQL ADBC driver about the desired Arrow batch size in bytes.
None
adbc_use_copy
bool
On the ADBC path, enable or disable the PostgreSQL driver’s COPY optimization explicitly.
None
use_sas
bool
Should update get table comments from SAS data file. If False, then updated string comes from WRDS PostgreSQL table comment.