files.parquet.pq_last_modified

files.parquet.pq_last_modified(
    table_name=None,
    schema=None,
    data_dir=None,
    file_name=None,
    archive=False,
    archive_dir='archive',
)

Get last-updated metadata for parquet data files.

For workflow-oriented examples, see Parquet Utilities Examples and Data management ideas.

Parameters

Name Type Description Default
table_name str Basename of the parquet file to inspect when resolving the source file from schema and data_dir. None
schema str Name of the parquet schema directory to inspect. None
data_dir str Root directory of the parquet data repository. If omitted, defaults to DATA_DIR or the current working directory. None
file_name str or path - like Exact parquet file path to inspect. If supplied, table_name and schema are ignored. None
archive bool If True, inspect archived parquet files under archive_dir. False
archive_dir str Name of the archive directory under the schema directory. Defaults to "archive". 'archive'

Returns

Name Type Description
str or pandas.DataFrame If file_name is supplied, or if table_name is supplied with archive=False, return the embedded last_modified string for that parquet file. Otherwise, return a DataFrame summary of matching parquet files, including file_name, table, schema, last_mod, last_mod_str, and storage columns.

Examples

>>> pq_last_modified(table_name="company", schema="comp")
>>> pq_last_modified(table_name="company", schema="comp", archive=True)
>>> pq_last_modified(schema="comp", archive=True)