Converting lazy data frames into Parquet files (Python version)

Python
Parquet
db2pq
WRDS
Tidy Finance
Author

Ian D. Gow

Published

23 March 2026

Keywords

Python, Polars, Arrow, db2pq, WRDS

In a note I published yesterday (Converting lazy data frames into Parquet files), I showed how my db2pq R package can be used to turn “lazy data frames” produced by the R package dbplyr into Parquet files on disk without needing to load the data into memory. Today, I wondered if I could do the same with my db2pq Python package. In this note, I turn “lazy data frames” produced by Ibis into Parquet files on disk.

Note

The source for this note is available in a folder on GitHub that contains two files:

To run this note locally with uv, put the two files above in a directory on your computer, I will refer to this directory as the project directory in this now.

For the convenience of Tidy Finance readers, I follow the approach laid out here closely. The Tidy Finance instructions start with installation of “uv, a modern Python package and project manager.” You should install uv using the instructions provided by Tidy Finance; these are taken from the homepage for uv.

From the project directory, run the following command:

uv sync

Next, you should follow the instructions provided by Tidy Finance for setting things up to connect to WRDS:

Create a .env file in your project directory. For the purpose of this book, we create and save the following variables (where user and password are our private login credentials):

WRDS_USER=user
WRDS_PASSWORD=password

Once you have done the above, you should be able to run the code in this note. For example, you could uv run jupyter lab and copy-paste the commands show here (in order). Alternatively, if you are comfortable with Quarto, you could open ibis_to_pq.qmd and treat it like a notebook.

1 Motivating example

According to Tidy Finance with Python:

The daily CRSP data file is substantially larger than monthly data and can exceed 20 GB. This has two important implications: you cannot hold all the daily return data in your memory (hence it is not possible to save the entire dataset to your local folder), and in our experience, the download usually crashes (or never stops) because it is too much data for the WRDS cloud to prepare and send to your R session.

There is a solution to this challenge. As with many big data problems, you can split up the big task into several smaller tasks that are easier to handle. That is, instead of downloading data about all stocks at once, download the data in small batches of stocks consecutively.

Below I show how one can simplify the code dramatically (no batches!) and download the data faster (for me, it takes less than a minute) and with no significant burden on either the CPU or RAM.

An issue with translating the lazy_tbl_to_pq() function from the R version of my db2pq package is that there is no equivalent of dbplyr for Python. While Python Polars has its LazyFrame, the Tidy Finance task involves extracting data from a PostgreSQL database and Polars has no way of creating lazy data frames from database connections. The closest analogue is surely Ibis. So I added a function ibis_to_pq() to my Python package db2pq and in this note I take it for a spin.

I start by creating a connection to the WRDS PostgreSQL database. Here I follow the Tidy Finance approach closely. But so long as wrds is a psycopg-backed SQLAlchemy engine for the WRDS PostgreSQL database, you should be able to use whatever approach you are used to.

from sqlalchemy import create_engine
import os
from dotenv import load_dotenv

load_dotenv()

connection_string = (
    "postgresql+psycopg://"
    f"{os.getenv('WRDS_USER')}:{os.getenv('WRDS_PASSWORD')}"
    "@wrds-pgdata.wharton.upenn.edu:9737/wrds"
)

wrds = create_engine(connection_string, pool_pre_ping=True)

Having established wrds, I load in the packages I will be using.

import polars as pl
import ibis
from ibis import _
from db2pq import ibis_to_pq

Then I turn wrds into an Ibis instance with WRDS PostgreSQL as its backend. You can learn more about Ibis here.

db = ibis.postgres.from_connection(
    wrds.connect().connection.driver_connection
)

I then set up Ibis lazy tables for the three tables I will use here.1

dsf = db.table("dsf_v2", database="crsp")
stksecurityinfohist = db.table("stksecurityinfohist", database="crsp")
factors_daily = db.table("factors_daily", database="ff")

Ibis offers an “interactive” option that makes it easier to work with lazy tables.

ibis.options.interactive = True

Let’s look at ff3:

factors_daily
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ date        mktrf          smb            hml            rf             umd           ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ datedecimal(8, 6)decimal(8, 6)decimal(8, 6)decimal(7, 5)decimal(8, 6) │
├────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ 1926-07-010.000900-0.002500-0.0027000.00010NULL │
│ 1926-07-020.004500-0.003300-0.0006000.00010NULL │
│ 1926-07-060.0017000.003000-0.0039000.00010NULL │
│ 1926-07-070.000900-0.0058000.0002000.00010NULL │
│ 1926-07-080.002200-0.0038000.0019000.00010NULL │
│ 1926-07-09-0.0071000.0043000.0057000.00010NULL │
│ 1926-07-100.006100-0.005300-0.0010000.00010NULL │
│ 1926-07-120.000400-0.0003000.0064000.00010NULL │
│ 1926-07-130.004800-0.002800-0.0020000.00010NULL │
│ 1926-07-140.0004000.000700-0.0043000.00010NULL │
│  │
└────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┘

And then dsf:

dsf
┏━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ permno  hdrcusip   permco  siccd  nasdissuno  yyyymmdd  sharetype  securitytype  securitysubtype  usincflg   issuertype  primaryexch  conditionaltype  tradingstatusflg  dlycaldt    dlydelflg  dlyprc          dlyprcflg  dlycap          dlycapflg  dlyprevprc      dlyprevprcflg  dlyprevdt   dlyprevcap      dlyprevcapflg  dlyret          dlyretx         dlyreti         dlyretmissflg  dlyretdurflg  dlyorddivamt    dlynonorddivamt  dlyfacprc       dlydistretflg  dlyvol          dlyclose        dlylow          dlyhigh         dlybid          dlyask          dlyopen         dlynumtrd  dlymmcnt  dlyprcvol       dlycumfacpr      dlycumfacshr     cusip      ticker     exchangetier  shrout ┃
┡━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━┩
│ int32string(8)int32int32int32int32string(3)string(4)string(3)string(1)string(4)string(1)string(3)string(1)datestring(1)decimal(13, 6)string(2)decimal(13, 2)string(2)decimal(13, 6)string(2)datedecimal(13, 2)string(2)decimal(10, 6)decimal(10, 6)decimal(10, 6)string(2)string(2)decimal(13, 6)decimal(13, 6)decimal(10, 6)string(2)decimal(14, 0)decimal(13, 6)decimal(13, 6)decimal(13, 6)decimal(13, 6)decimal(13, 6)decimal(13, 6)int32int16decimal(14, 1)decimal(17, 12)decimal(17, 12)string(8)string(5)string(3)int32  │
├────────┼───────────┼────────┼───────┼────────────┼──────────┼───────────┼──────────────┼─────────────────┼───────────┼────────────┼─────────────┼─────────────────┼──────────────────┼────────────┼───────────┼────────────────┼───────────┼────────────────┼───────────┼────────────────┼───────────────┼────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼───────────────┼──────────────┼────────────────┼─────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────┼────────────────┼─────────────────┼─────────────────┼───────────┼───────────┼──────────────┼────────┤
│  1000068391610 795239901039619860107NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-07N        2.562500BA       9430.00BP       NULLNS           NULLNULLMP           NULLNULLNULLNS           MR          0.0000000.0000001.000000NO           1000NULLNULLNULL2.3750002.750000NULLNULL92562.51.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860108NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-08N        2.500000BA       9200.00BP       2.562500BA           1986-01-079430.00PB           -0.024390-0.0243900.000000NA           D1          0.0000000.0000001.000000NO           12800NULLNULLNULL2.3750002.625000NULLNULL932000.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860109NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-09N        2.500000BA       9200.00BP       2.500000BA           1986-01-089200.00PB           0.0000000.0000000.000000NA           D1          0.0000000.0000001.000000NO           1400NULLNULLNULL2.3750002.625000NULLNULL93500.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860110NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-10N        2.500000BA       9200.00BP       2.500000BA           1986-01-099200.00PB           0.0000000.0000000.000000NA           D1          0.0000000.0000001.000000NO           8500NULLNULLNULL2.3750002.625000NULLNULL1021250.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860113NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-13N        2.625000BA       9660.00BP       2.500000BA           1986-01-109200.00PB           0.0500000.0500000.000000NA           D3          0.0000000.0000001.000000NO           5450NULLNULLNULL2.5000002.750000NULLNULL1014306.31.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860114NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-14N        2.750000BA       10120.00BP       2.625000BA           1986-01-139660.00PB           0.0476190.0476190.000000NA           D1          0.0000000.0000001.000000NO           2075NULLNULLNULL2.6250002.875000NULLNULL105706.31.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860115NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-15N        2.875000BA       10580.00BP       2.750000BA           1986-01-1410120.00PB           0.0454550.0454550.000000NA           D1          0.0000000.0000001.000000NO           22490NULLNULLNULL2.7500003.000000NULLNULL1064658.81.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860116NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-16N        3.000000BA       11040.00BP       2.875000BA           1986-01-1510580.00PB           0.0434780.0434780.000000NA           D1          0.0000000.0000001.000000NO           10900NULLNULLNULL2.8750003.125000NULLNULL1032700.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860117NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-17N        3.000000BA       11040.00BP       3.000000BA           1986-01-1611040.00PB           0.0000000.0000000.000000NA           D1          0.0000000.0000001.000000NO           8470NULLNULLNULL2.8750003.125000NULLNULL1025410.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│  1000068391610 795239901039619860120NS       EQTY        COM            Y        ACOR      Q          RW             A               1986-01-20N        3.000000BA       11040.00BP       3.000000BA           1986-01-1711040.00PB           0.0000000.0000000.000000NA           D3          0.0000000.0000001.000000NO           1000NULLNULLNULL2.8750003.125000NULLNULL103000.01.0000000000001.00000000000068391610 OMFGA    SC1         3680 │
│       │
└────────┴───────────┴────────┴───────┴────────────┴──────────┴───────────┴──────────────┴─────────────────┴───────────┴────────────┴─────────────┴─────────────────┴──────────────────┴────────────┴───────────┴────────────────┴───────────┴────────────────┴───────────┴────────────────┴───────────────┴────────────┴────────────────┴───────────────┴────────────────┴────────────────┴────────────────┴───────────────┴──────────────┴────────────────┴─────────────────┴────────────────┴───────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴───────────┴──────────┴────────────────┴─────────────────┴─────────────────┴───────────┴───────────┴──────────────┴────────┘
start_date = "1960-01-01"
end_date = "2024-12-31"

For clarity, in translating the original Tidy Finance query, I break out the security sub-query from the main query.

security = (
    stksecurityinfohist
    .filter(
        _.sharetype == "NS",
        _.securitytype == "EQTY",
        _.securitysubtype == "COM",
        _.usincflg == "Y",
        _.issuertype.isin(["ACOR", "CORP"]),
        _.primaryexch.isin(["N", "A", "Q"]),
        _.conditionaltype.isin(["RW", "NW"]),
        _.tradingstatusflg == "A",
    ).select(
        "permno",
        "secinfostartdt",
        "secinfoenddt",
    )
)

I next construct the main crsp_daily query.

crsp_daily = (
    dsf
    .filter(_.dlycaldt.between(start_date, end_date))
    .inner_join(security,
                dsf.permno == security.permno)
    .filter(_.dlycaldt.between(_.secinfostartdt, _.secinfoenddt))
    .select(
        permno=_.permno,
        date=_.dlycaldt,
        ret=_.dlyret,
    )
    .drop_null(["permno", "date", "ret"])
    .left_join(
        factors_daily
        .select("date", "rf")
        .rename(risk_free="rf"),
        _.date == factors_daily.date,
    )
    .mutate(
        ret_excess=ibis.greatest(_.ret - _.risk_free, -1)
    )
    .select(
        "permno", "date", 
        ret=_.ret.cast("float64"),
        ret_excess=_.ret_excess.cast("float64"),
    )
)

Let’s take a peek at crsp_daily, which is still a lazy table:

crsp_daily
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ permno  date        ret        ret_excess ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int32datefloat64float64    │
├────────┼────────────┼───────────┼────────────┤
│  100001986-01-08-0.024390-0.024690 │
│  100001986-01-090.000000-0.000300 │
│  100001986-01-100.000000-0.000300 │
│  100001986-01-130.0500000.049700 │
│  100001986-01-140.0476190.047319 │
│  100001986-01-150.0454550.045155 │
│  100001986-01-160.0434780.043178 │
│  100001986-01-170.000000-0.000300 │
│  100001986-01-200.000000-0.000300 │
│  100001986-01-210.000000-0.000300 │
│       │
└────────┴────────────┴───────────┴────────────┘

The next step is to create a Parquet file from crsp_daily using ibis_to_pq().

%%time
ibis_to_pq(crsp_daily, "crsp_daily.parquet")
CPU times: user 45.1 s, sys: 14.5 s, total: 59.6 s
Wall time: 4min 17s
'crsp_daily.parquet'

For some reason, ibis_to_pq() does not perform as well as its sibling lazy_tbl_to_pq() in the R version of db2pq. This performance is a bit disappointing given that I had ibis_to_pq() use the ADBC driver rather than the default psycopg driver used by Ibis. I was surprised that Ibis did not support the Arrow driver.

%%time
ibis_to_pq(factors_daily, "factors_ff3_daily.parquet")
CPU times: user 10.1 s, sys: 580 ms, total: 10.7 s
Wall time: 25.9 s
'factors_ff3_daily.parquet'

1.1 Using the data

I figured that it would be a little dull to do no more than simply create the Parquet files. But looking through Tidy Finance with Python, the only use of crsp_daily seemed to be in the section Estimating Beta Using Daily Returns.

There Tidy Finance say:

We then create a connection to the daily CRSP data, but we don’t load the whole table into our memory. We only extract all distinct permno because we loop the beta estimation over batches of stocks with size 500. To estimate the CAPM over a consistent lookback window while accommodating different return frequencies, we adjust the minimum required number of observations accordingly. Specifically, we require at least 1,000 daily returns over a five‑year period for a valid estimation. This threshold is consistent with the monthly requirement of 48 observations out of 60 months, given that there are roughly 252 trading days in a year.

The Tidy Finance code uses pandas, but I will use Python Polars, which does much better with larger data sets. I start by creating LazyFrame instances of the two tables that are used in calculating betas.

crsp_daily = pl.scan_parquet("crsp_daily.parquet")
factors_ff3_daily = pl.scan_parquet("factors_ff3_daily.parquet")

The next step is to implement the approach described above. Python Polars can handle the details of creating the windows and so on, but we don’t really want to have to hand over all the data to Statsmodels to do the regressions.2 But if you’re reading Tidy Finance with Python, it seems reasonable to assume that you know how to calculate a univariate regression coefficient, so let’s just use Polars expressions and do it by hand.

Unfortunately, it turns out that we will be calculating about 2.5 million betas over windows of up to about 1,250 days each and that will be a lot of data even for Polars!3

So I ended up using batches of 500, just as Tidy Finance do. I make a function that calculates \(\beta\) for 500 permno values at a time. The key element here is group_by_dynamic(): For each permno, this creates a sequence of monthly time windows over date, then aggregates within each window. More specifically:

  • group_by="permno" means each stock is handled separately.
  • every="1mo" means start a new window every 1 month.
  • period="60mo" means each window spans 60 months.
  • offset="-60mo" shifts each window backward by 60 months relative to its label/start schedule, so each monthly output row looks back over the previous 60 months rather than forward.
  • closed="left" means the left endpoint is included and the right endpoint is excluded.
  • label="right" means the output date attached to each aggregated window is the right edge of the window.

So in plain English: For each stock, every month, compute summary statistics using up to 60 months of prior data, and label the result with the date at the right edge of that lookback window.

Tidy Finance do not provide any indication of how long their code takes to run, but let’s see how long this takes to run:

%%time
min_obs = 1000
batch_size = 500

def get_betas_batch(permno_batch: pl.DataFrame) -> pl.LazyFrame:
    return (
        crsp_daily
        .join(permno_batch.lazy(), on="permno", how="semi")
        .select("permno", "date", "ret_excess")
        .join(
            factors_ff3_daily.select(
                "date",
                pl.col("mktrf").cast(pl.Float64),
                pl.col("rf").cast(pl.Float64),
            ),
            on="date",
            how="inner",
        )
        .select(
            "permno",
            "date",
            "ret_excess",
            mkt_excess=pl.col("mktrf") - pl.col("rf"),
        )
        .sort("permno", "date")
        .group_by_dynamic(
            "date",
            every="1mo",
            period="60mo",
            offset="-60mo",
            group_by="permno",
            closed="left",
            label="right",
        )
        .agg(
            beta=pl.cov("mkt_excess", "ret_excess") / pl.col("mkt_excess").var(),
            mean_ret=pl.col("ret_excess").mean(),
            mean_mkt=pl.col("mkt_excess").mean(),
            n=pl.len(),
        )
        .filter(pl.col("n") >= min_obs)
        .with_columns(
            alpha=pl.col("mean_ret") - pl.col("beta") * pl.col("mean_mkt")
        )
        .drop("mean_ret", "mean_mkt")
    )

permnos = (
    crsp_daily
    .select("permno")
    .unique()
    .collect()
)

betas = pl.concat(
    (
        get_betas_batch(permnos.slice(i, batch_size)).collect()
        for i in range(0, permnos.height, batch_size)
    ),
    rechunk=True,
)
CPU times: user 27.8 s, sys: 11.6 s, total: 39.4 s
Wall time: 18.7 s

So about 15 seconds for the whole lot.

And let’s take a peek at the data:

betas
shape: (2_566_212, 5)
permno date beta n alpha
i32 date f64 u32 f64
10102 1964-01-01 0.890627 1003 -0.000383
10102 1964-02-01 0.890677 1025 -0.000366
10102 1964-03-01 0.889289 1044 -0.000399
10102 1964-04-01 0.890453 1065 -0.000339
10102 1964-05-01 0.888674 1087 -0.000376
93305 2019-03-01 1.430157 1104 -0.000478
93305 2019-04-01 1.429 1083 -0.000386
93305 2019-05-01 1.385605 1062 -0.000257
93305 2019-06-01 1.380545 1041 -0.00016
93305 2019-07-01 1.373495 1020 -0.000235

Footnotes

  1. The R version of the note did not use ff.factors_daily from WRDS, but I do so here because I don’t have the same copy_inline() functionality I had when using R in yesterday’s note.↩︎

  2. The original pandas code uses smf.ols() from Statsmodels.↩︎

  3. There will be a lot of overlap in these windows and I had thought that Polars would just take the data and just process the windows at the start of each month as it passed through for each permno, but it seems it literally stacks copies of data for each window and I quickly ran out of RAM when I tried this.↩︎