Exports a table from PostgreSQL to a Parquet file, with optional row filtering, column filtering, column renaming, Arrow type overrides, and timestamp normalization.
Usage
db_to_pq(
table_name,
schema,
host = Sys.getenv("PGHOST", "localhost"),
database = Sys.getenv("PGDATABASE", unset = Sys.getenv("PGUSER")),
user = Sys.getenv("PGUSER", Sys.info()[["user"]]),
password = Sys.getenv("PGPASSWORD", ""),
port = as.integer(Sys.getenv("PGPORT", 5432)),
data_dir = NULL,
out_file = NULL,
where = NULL,
obs = NULL,
keep = NULL,
drop = NULL,
rename = NULL,
alt_table_name = NULL,
chunk_size = 100000L,
transfer_method = c("dbi", "adbc"),
numeric_mode = c("decimal", "float64", "text", "raw"),
con = NULL,
metadata = NULL,
use_comment = TRUE,
col_types = NULL,
tz = NULL
)Arguments
- table_name
Name of the source PostgreSQL table.
- schema
Name of the source PostgreSQL schema.
- host, database, user, password, port
PostgreSQL connection parameters used when
conis not supplied.- data_dir
Root directory of the local Parquet data repository. Defaults to the
DATA_DIRenvironment variable, with interactive setup when needed.- out_file
Optional full output path. Overrides
data_dir,schema, andtable_name.- where
Optional SQL
WHEREclause without theWHEREkeyword.- obs
Optional integer row limit.
- keep, drop
Optional character vectors of regex patterns used to select source columns.
dropis applied beforekeep.- rename
Optional named character vector or list mapping source column names to output column names, e.g.
c(conm = "company_name").- alt_table_name
Optional output basename when
out_fileis omitted.- chunk_size
Number of rows fetched and written per chunk.
- transfer_method
Transfer backend:
"dbi"for the stable DBI path or"adbc"for the optional Arrow/ADBC path.- numeric_mode
Numeric handling mode for PostgreSQL
numericcolumns. The default"decimal"transfers them as text and writes boundednumeric(p, s)columns as Arrow decimal Parquet columns."float64"casts them toDOUBLE PRECISIONbefore transfer."text"transfers them as text without recreating decimal types."raw"keeps the backend default.- con
Optional existing DBI connection.
- metadata
Optional named list of Parquet schema metadata.
- use_comment
If
TRUE(the default), the PostgreSQL table comment is fetched and stored aslast_modifiedin the Parquet schema metadata. Has no effect when the table has no comment. Alast_modifiedkey already present inmetadatatakes precedence.- col_types
Optional named list of Arrow output type overrides. Names refer to output column names after
rename.- tz
Time zone used to interpret
TIMESTAMP WITHOUT TIME ZONEcolumns.