Retrieves the last_modified metadata embedded in Parquet files by
wrds_update_pq.
Usage
pq_last_modified(
table_name = NULL,
schema = NULL,
data_dir = NULL,
archive = FALSE,
archive_dir = "archive"
)Arguments
- table_name
Optional. Name of a specific table.
- schema
Optional. Name of the schema (subdirectory under
data_dir).- data_dir
Root directory of the Parquet data repository. Defaults to the
DATA_DIRenvironment variable, with interactive setup when needed.- archive
If
TRUE, look in the archive subdirectory instead of the main schema directory.- archive_dir
Name of the archive subdirectory. Defaults to
"archive".
Value
When table_name is provided and archive = FALSE, a
single string. Otherwise a tibble with columns
file_name, table, schema, last_mod (a
POSIXct in UTC), and last_mod_str.
Details
Behaviour depends on the arguments supplied:
table_nameprovided,archive = FALSE(default): Returns the rawlast_modifiedstring embedded in<data_dir>/<schema>/<table_name>.parquet, or""if the file has no such metadata.table_nameprovided,archive = TRUE: Returns a data frame of all archived files matchingtable_namein<data_dir>/<schema>/<archive_dir>/.Only
schemaprovided (notable_name): Returns a data frame summarising all Parquet files in the schema directory (or its archive subdirectory ifarchive = TRUE).Neither
table_namenorschemaprovided: Returns a data frame summarising all Parquet files across every schema subdirectory ofdata_dir.
Examples
if (FALSE) { # \dontrun{
# Raw metadata string for a single table
pq_last_modified("dsi", "crsp")
# Summary of archived versions of a table
pq_last_modified("company", "comp", archive = TRUE)
# Summary of all tables in a schema
pq_last_modified(schema = "crsp")
# Summary of all tables across all schemas
pq_last_modified(data_dir = "data")
} # }