Skip to contents

Renders a lazy dbplyr query to SQL, derives an ADBC PostgreSQL connection from the backing RPostgres connection, and streams Arrow batches directly to a Parquet file.

Usage

tbl_to_pq(
  tbl,
  out_file,
  chunk_size = 100000L,
  metadata = NULL,
  col_types = NULL
)

Arguments

tbl

A lazy table backed by dbplyr.

out_file

Full path to the output Parquet file.

chunk_size

Number of rows written per Parquet row group. Default is 100000.

metadata

Optional named list of schema metadata to embed in the Parquet file.

col_types

Optional named list specifying Arrow type overrides. Values may be string type names (for example "int32" or "date") or Arrow DataType objects.

Value

Invisibly returns out_file.

Details

This is an experimental development path kept separate from lazy_tbl_to_pq while we evaluate the ADBC transfer route. The ADBC path opens a second PostgreSQL connection for the transfer, so the database must allow an additional concurrent connection beyond the one already backing tbl.