Examples demonstrating Parquet utility functions

This page groups together examples for the Parquet utility helpers: pq_list_files(), pq_last_modified(), pq_archive(), pq_restore(), and pq_remove().

The goal is different from the API reference. The API pages document arguments and return values. This page shows how the utilities fit together in the practical workflow of maintaining a local Parquet repository.

When to use this page

Use these helpers when you already have a Parquet repository and want to:

  • inspect what is in it
  • check which vintage of a table you currently have
  • archive a current active file before replacing it
  • restore an older archived vintage
  • remove active or archived files explicitly

The related API pages are:

Setup

The executable examples below assume access to a local Parquet repository with at least one known schema and table. Adjust these values to match your local setup before rendering if needed.

Inspect a schema directory

Start by listing the Parquet tables available in a schema.

pq_list_files("comp")
['g_chars',
 'names',
 'g_idx_mth',
 'io_qbuysell',
 'r_giccd',
 'adsprate',
 'g_names_ix_cst',
 'spind_mth',
 'funda_adbc_decimal_33554432',
 'g_company',
 'co_adesind',
 'funda_adbc_decimal_16777216',
 'co_ifndq',
 'secm',
 'r_auditors',
 'security',
 'aco_pnfnda',
 'g_funda',
 'co_afnd2',
 'company.parquet.bak2',
 'fundq_fncd',
 'g_idx_index',
 'secd',
 'sec_divid',
 'funda_adbc_decimal_1048576',
 'names_ix',
 'funda_adbc_float64_1048576',
 'seg_annfund',
 'fundq',
 'wrds_seg_customer',
 'co_filedate',
 'g_security',
 'wrds_segmerged',
 'funda_adbc_float64_16777216',
 'funda_adbc_float64_33554432',
 'g_secnamesd',
 'g_names_ix',
 'g_idxcst_his',
 'g_exrt_dly',
 'funda_adbc_decimal_4194304',
 'funda_adbc_float64_4194304',
 'r_datacode',
 'r_fndfntcd',
 'idx_daily',
 'idxcst_his',
 'g_secm',
 'co_hgic',
 'r_ex_codes',
 'g_secd',
 'g_names',
 'funda_fncd',
 'g_namesq',
 'company',
 'seg_customer',
 'funda_adbc_float64_8388608',
 'funda_adbc_decimal_8388608',
 'sec_history',
 'g_sec_divid',
 'idx_ann',
 'idx_index']

If you want to inspect the archive directory instead:

pq_list_files("comp", archive=True)
['fundq_20260330T060000Z',
 'funda_fncd_20260330T060000Z',
 'g_secd_20260322T060000Z',
 'idx_daily_20260330T060000Z',
 'r_auditors_20260330T060000Z',
 'company_20260331T060000Z',
 'company_20260315T060000Z',
 'funda_20260330T060000Z',
 'g_secd_20260217T070000Z',
 'company_20260303T070000Z',
 'funda_20250315T064012Z',
 'company_20260402T060000Z',
 'company_20260105T070000Z',
 'g_secd_20260216T070000Z',
 'funda_20240614T064046Z',
 'company_20260226T070000Z',
 'g_secd_20250907T161453Z',
 'funda_20260116T070000Z',
 'company_20260323T060000Z',
 'funda_20260407T060000Z',
 'company_20260224T070000Z',
 'funda_20260107T070000Z',
 'g_secd_20260120T070000Z',
 'company_20260225T000000Z',
 'company_20260225T070000Z',
 'company_20260218T070000Z',
 'funda_20251109T064119Z',
 'aco_pnfnda_20260330T060000Z',
 'company_20260209T070000Z',
 'company_20260322T060000Z',
 'company_20260107T070000Z',
 'funda_20250907T064233Z']

If your project uses a repository outside the default DATA_DIR, pass data_dir explicitly.

pq_list_files("ff", data_dir="~/Dropbox/pq_data")
['factors_daily', 'industry48', 'factors_monthly']

Check the current active vintage

Use pq_last_modified() to inspect the embedded metadata on the active file.

pq_last_modified(table_name="dsf", schema="crsp")
'Daily Stock - Securities (Updated 2025-02-08)'

This is often the fastest way to confirm what vintage a local Parquet file represents before starting analysis.

You can also inspect a specific file path directly:

dsf_v2 = Path.home() / "Dropbox/pq_data/crsp/dsf_v2.parquet"

pq_last_modified(file_name=dsf_v2)
'Daily Stock File created by WRDS (Updated 2026-02-06)'

Inspect archived vintages

If you archive replaced files, you can ask for the archived versions of a table:

pq_last_modified(table_name="funda", schema="comp", archive=True)
file_name table schema last_mod last_mod_str storage
0 funda_20240614T064046Z funda comp 2024-06-14 02:40:46-04:00 Last modified: 06/14/2024 02:40:46 local
1 funda_20250315T064012Z funda comp 2025-03-15 02:40:12-04:00 Last modified: 03/15/2025 02:40:12 local
2 funda_20250907T064233Z funda comp 2025-09-07 02:42:33-04:00 Last modified: 09/07/2025 02:42:33 local
3 funda_20251109T064119Z funda comp 2025-11-09 01:41:19-05:00 Last modified: 11/09/2025 01:41:19 local
4 funda_20260107T070000Z funda comp 2026-01-07 02:00:00-05:00 Merged Fundamental Annual File (Updated 2026-0... local
5 funda_20260116T070000Z funda comp 2026-01-16 02:00:00-05:00 Merged Fundamental Annual File (Updated 2026-0... local
6 funda_20260330T060000Z funda comp 2026-03-30 02:00:00-04:00 Merged Fundamental Annual File (Updated 2026-0... local
7 funda_20260407T060000Z funda comp 2026-04-07 02:00:00-04:00 Merged Fundamental Annual File (Updated 2026-0... local

That returns a table-like summary of the archived vintages for the requested dataset.

To inspect all archived files for a schema:

pq_last_modified(schema="comp", archive=True)
file_name table schema last_mod last_mod_str storage
0 aco_pnfnda_20260330T060000Z aco_pnfnda comp 2026-03-30 02:00:00-04:00 Pension Annual Item (Updated 2026-03-30) local
1 company_20260105T070000Z company comp 2026-01-05 02:00:00-05:00 Company (Updated 2026-01-05) local
2 company_20260107T070000Z company comp 2026-01-07 02:00:00-05:00 Company (Updated 2026-01-07) local
3 company_20260209T070000Z company comp 2026-02-09 02:00:00-05:00 Company (Updated 2026-02-09) local
4 company_20260218T070000Z company comp 2026-02-18 02:00:00-05:00 Company (Updated 2026-02-18) local
5 company_20260224T070000Z company comp 2026-02-24 02:00:00-05:00 Company (Updated 2026-02-24) local
6 company_20260225T000000Z company comp 2026-02-24 02:00:00-05:00 Company (Updated 2026-02-24) local
7 company_20260225T070000Z company comp 2026-02-25 02:00:00-05:00 Company (Updated 2026-02-25) local
8 company_20260226T070000Z company comp 2026-02-26 02:00:00-05:00 Company (Updated 2026-02-26) local
9 company_20260303T070000Z company comp 2026-03-03 02:00:00-05:00 Company (Updated 2026-03-03) local
10 company_20260315T060000Z company comp 2026-03-15 02:00:00-04:00 Company (Updated 2026-03-15) local
11 company_20260322T060000Z company comp 2026-03-22 02:00:00-04:00 Company (Updated 2026-03-22) local
12 company_20260323T060000Z company comp 2026-03-23 02:00:00-04:00 Company (Updated 2026-03-23) local
13 company_20260331T060000Z company comp 2026-03-31 02:00:00-04:00 Company (Updated 2026-03-31) local
14 company_20260402T060000Z company comp 2026-04-02 02:00:00-04:00 Company (Updated 2026-04-02) local
15 funda_20240614T064046Z funda comp 2024-06-14 02:40:46-04:00 Last modified: 06/14/2024 02:40:46 local
16 funda_20250315T064012Z funda comp 2025-03-15 02:40:12-04:00 Last modified: 03/15/2025 02:40:12 local
17 funda_20250907T064233Z funda comp 2025-09-07 02:42:33-04:00 Last modified: 09/07/2025 02:42:33 local
18 funda_20251109T064119Z funda comp 2025-11-09 01:41:19-05:00 Last modified: 11/09/2025 01:41:19 local
19 funda_20260107T070000Z funda comp 2026-01-07 02:00:00-05:00 Merged Fundamental Annual File (Updated 2026-0... local
20 funda_20260116T070000Z funda comp 2026-01-16 02:00:00-05:00 Merged Fundamental Annual File (Updated 2026-0... local
21 funda_20260330T060000Z funda comp 2026-03-30 02:00:00-04:00 Merged Fundamental Annual File (Updated 2026-0... local
22 funda_20260407T060000Z funda comp 2026-04-07 02:00:00-04:00 Merged Fundamental Annual File (Updated 2026-0... local
23 funda_fncd_20260330T060000Z funda_fncd comp 2026-03-30 02:00:00-04:00 Fundamental Annual Footnote and Data Code File... local
24 fundq_20260330T060000Z fundq comp 2026-03-30 02:00:00-04:00 Merged Fundamental Quarterly File (Updated 202... local
25 g_secd_20250907T161453Z g_secd comp 2025-09-07 12:14:53-04:00 Last modified: 09/07/2025 12:14:53 local
26 g_secd_20260120T070000Z g_secd comp 2026-01-20 02:00:00-05:00 Merged Global Security Daily File (Updated 202... local
27 g_secd_20260216T070000Z g_secd comp 2026-02-16 02:00:00-05:00 Merged Global Security Daily File (Updated 202... local
28 g_secd_20260217T070000Z g_secd comp 2026-02-17 02:00:00-05:00 Merged Global Security Daily File (Updated 202... local
29 g_secd_20260322T060000Z g_secd comp 2026-03-22 02:00:00-04:00 Merged Global Security Daily File (Updated 202... local
30 idx_daily_20260330T060000Z idx_daily comp 2026-03-30 02:00:00-04:00 Index Daily (Updated 2026-03-30) local
31 r_auditors_20260330T060000Z r_auditors comp 2026-03-30 02:00:00-04:00 Auditors Reference Data (Updated 2026-03-30) local

Archive the currently active file

You can archive a file manually even outside an update workflow.

from db2pq import pq_archive

pq_archive(table_name="funda", schema="comp")

Or archive an exact file path:

company = Path.home() / "Dropbox/pq_data/comp/company.parquet"

company_archive = pq_archive(file_name=company)
company_archive
'/Users/igow/Dropbox/pq_data/comp/archive/company_20260407T060000Z.parquet'

This is useful when you want to preserve the current active vintage before running an experimental refresh or downstream transformation.

Restore an archived vintage

To promote an archived file back into the active schema directory:

from db2pq import pq_restore

archive_files = pq_list_files("comp", archive=True)

if archive_files:
    pq_restore(archive_files[0], "comp")

The archived basename may include or omit the .parquet suffix.

If an active destination file already exists, pq_restore() archives that file first by default before restoring the archived vintage.

Remove a file explicitly

Use pq_remove() when you want to delete an active or archived file rather than archive it.

from db2pq import pq_remove

pq_remove(table_name="dsi", schema="crsp")
'/Users/igow/Dropbox/pq_data/crsp/dsi.parquet'

Of course, I probably want that file, so let me use wrds_update_pq() to recover it!

from db2pq import wrds_update_pq

wrds_update_pq(table_name="dsi", schema="crsp")
Updated crsp.dsi is available.
Beginning file download at 2026-04-07 21:26:29 UTC.
Completed file download at 2026-04-07 21:26:32 UTC.
'/Users/igow/Dropbox/pq_data/crsp/dsi.parquet'

To remove an archived file:

pq_remove(
    table_name="funda_20260331T060000Z",
    schema="comp",
    archive=True,
)

Or remove a file by exact path:

company_archive
'/Users/igow/Dropbox/pq_data/comp/archive/company_20260407T060000Z.parquet'
pq_remove(file_name=company_archive)