files.parquet.pq_last_modified
files.parquet.pq_last_modified(
table_name=None,
schema=None,
data_dir=None,
file_name=None,
archive=False,
archive_dir='archive',
)Get last-updated metadata for parquet data files.
For workflow-oriented examples, see Parquet Utilities Examples and Data management ideas.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | Basename of the parquet file to inspect when resolving the source file from schema and data_dir. |
None |
| schema | str | Name of the parquet schema directory to inspect. | None |
| data_dir | str | Root directory of the parquet data repository. If omitted, defaults to DATA_DIR or the current working directory. |
None |
| file_name | str or path - like | Exact parquet file path to inspect. If supplied, table_name and schema are ignored. |
None |
| archive | bool | If True, inspect archived parquet files under archive_dir. |
False |
| archive_dir | str | Name of the archive directory under the schema directory. Defaults to "archive". |
'archive' |
Returns
| Name | Type | Description |
|---|---|---|
| str or pandas.DataFrame | If file_name is supplied, or if table_name is supplied with archive=False, return the embedded last_modified string for that parquet file. Otherwise, return a DataFrame summary of matching parquet files, including file_name, table, schema, last_mod, last_mod_str, and storage columns. |
Examples
>>> pq_last_modified(table_name="company", schema="comp")
>>> pq_last_modified(table_name="company", schema="comp", archive=True)
>>> pq_last_modified(schema="comp", archive=True)