Notes

This site publishes a curated set of notes. The notes below are grouped loosely by topic to make the collection easier to scan.

Data curation, databases, and software tools

Note Topics
ACNC Registry data: Arrow version Australia, Arrow
Data curation and the data science workflow Data curation, Australia, ASX, SIRCA
Data curation: The case of Call Reports Data curation, Polars, DuckDB
Data collection (with spreadsheets) Data curation, Spreadsheets
Getting SEC EDGAR XBRL data SEC, XBRL, EDGAR
Responsive open-source software: Two examples from dbplyr Data curation, dbplyr, SQL, DuckDB
Responsive open-source software: Two examples from dbplyr dbplyr, SQL
Shared code Research, web data
Why aren’t more data people talking about ibis? Python, Ibis, DuckDB, SQL
Writing better SQL without writing SQL SQL, dbplyr
SIRCA ASX End of Day (EOD) collection Australia, SIRCA, ASX, CSV, Parquet
SIRCA Mergers and Acquisitions collection Australia, M&A, ASX, SIRCA, Parquet
Converting lazy data frames into Parquet files R, Parquet, db2pq, WRDS, Tidy Finance
Converting lazy data frames into Parquet files (Python version) Python, Parquet, db2pq
Benchmarking local PostgreSQL-to-Parquet export paths PostgreSQL, Parquet, benchmarking
Improving performance of SQLite data Tidy Finance, SQLite

Python, finance, and data workflows

Note Topics
Ball and Brown (1968): Replication using Python Polars Accounting research, Replication, Python
Data management ideas for researchers Python, Parquet, db2pq, WRDS
Data management ideas for researchers (R version) R, Parquet, db2pq, WRDS
Some benchmarks with comp.g_secd SAS, WRDS, CRSP, Parquet, Python
The best of both worlds: Using modern data frame libraries to create pandas data WRDS, Polars, Ibis, pandas
Using SAS to create pandas data SAS, pandas, wrds2pg
Calculating betas using DuckDB Tidy Finance, DuckDB, WRDS, Finance
Trading days per year (crsp.dsf) CRSP, WRDS
Stock returns on Yahoo Finance Yahoo, finance
Adding delisting returns to monthly data SAS, Stock returns, CRSP

Research methods, replication, and commentary

Note Topics
Should Bao et al. (2020) be retracted? Research methods, Machine learning
Missing Form APs? Research methods, PCAOB, XBRL, SEC
The elephant in the room: p-hacking and accounting research Research methods, p-hacking
The Gino-Colada Affair Reproducibility, Research methods
Reproducible data collection Reproducibility, Research methods
Does @Beardsley_2021 show anything? Research methods
Analysis of IPOs on the ASX Australia, IPOs

R, graphics, and miscellaneous data notes

Note Topics
A quick look at City of Melbourne bike data Melbourne, transport
Data visualization challenge Data visualization, ggplot2
Making plotnine more ‘Pythonic’ Python, plotnine, graphics
Retail sales Retail, ABS, R
Defining winter and summer in Boston Weather, Boston
Defining winter and summer in Melbourne Weather, Australia, Melbourne
Defining winter and summer in Sydney Weather, Australia, Sydney
Defining winter and summer in Oxford Weather, Oxford, Python
Sunrise and sunset times Datetimes, Weather, Australia, Melbourne, Boston