11 Capital markets research in accounting
This chapter opens the part of the book focused on capital markets research, which is arguably the area in which academic research has most contributed to our understanding of real-world accounting phenomena. One goal of this part is to provide a solid introduction to classical ideas and papers related to capital markets research in accounting.
According to Kothari (2001, p. 113), “Ball and Brown (1968) and Beaver (1968) heralded empirical capital markets research as it is now known.” Prior to that period, accounting research was a largely theoretical discipline focused on normative research, that is, research concerned with the “right” or “best” way to account for various events and transactions. In addition to being normative, accounting theory was largely deductive, meaning that detailed theories were derived from general principles.
Beaver (1998) identifies one approach as asking, say, “what properties should the ‘ideal’ net income have?” One answer to this question is that accounting income a period should reflect the change in the net present value of cash flows (plus cash distributions) to shareholders during the period. But other answers existed. Accounting researchers would start with a set of desired properties and use these to derive the “best” approach to accounting for depreciation of long-lived assets, inventory, or lease assets. Kothari (2001) points out that there was “little emphasis on the empirical validity” of theory.
Similar ideas still permeate the thinking of standard-setters, who purport to derive detailed accounting standards from their “conceptual frameworks”, which outline broad definitions of things such as assets and liabilities that standard-setters can supposedly use to derive the correct accounting approach in any given setting.
However, in the period since Ball and Brown (1968), these approaches have been largely discarded in academic research. A largely normative, theoretical emphasis has been replaced by a positive, empirical one.
library(dplyr, warn.conflicts = FALSE)
library(DBI)
library(ggplot2)
library(lubridate) # For floor_date function
11.1 The CRSP database
According to its website, “the Center for Research in Security Prices, LLC (CRSP) maintains the most comprehensive collection of security price, return, and volume data for the NYSE, AMEX and NASDAQ stock markets. Additional CRSP files provide stock indices, beta-based and cap-based portfolios, treasury bond and risk-free rates, mutual funds, and real estate data. [CRSP] maintains the most comprehensive collection of security price, return, and volume data for the NYSE, AMEX and NASDAQ stock markets. Additional CRSP files provide stock indices, beta-based and cap-based portfolios, treasury bond and risk-free rates, mutual funds, and real estate data.” We discussed the CRSP/COMPUSTAT Merged Database in Chapter 9.87
CRSP provides PERMNO, its own “permanent identifier” for each security in its database. Additionally, it provides a company-level identifier, PERMCO, for each company. CRSP’s goals in creating these identifiers is to allow “for clean and accurate backtesting, time-series and event studies, measurement of performance, accurate benchmarking, and securities analysis.”
“CRSP contains end-of-day and month-end prices on all listed NYSE, Amex, and NASDAQ common stocks along with basic market indices, and includes the most comprehensive distribution information available, with the most accurate total return calculations.”
End-of-day prices are found on crsp.dsf
and month-end prices are on crsp.msf
.
Let’s take a look at these two tables.
pg <- dbConnect(RPostgres::Postgres())
dsf <- tbl(pg, sql("SELECT * FROM crsp.dsf"))
msf <- tbl(pg, sql("SELECT * FROM crsp.msf"))
dsf %>% collect(n = 5)
## # A tibble: 5 × 20
## cusip permno permco issuno hexcd hsiccd date bidlo askhi prc vol
## <chr> <int> <int> <int64> <int> <int6> <date> <dbl> <dbl> <dbl> <int>
## 1 36720410 10001 7953 10398 2 4925 1987-07-27 5.75 6.25 -6 0
## 2 36720410 10001 7953 10398 2 4925 1987-07-28 5.75 5.75 5.75 1000
## 3 36720410 10001 7953 10398 2 4925 1987-07-29 5.88 5.88 5.88 2200
## 4 36720410 10001 7953 10398 2 4925 1987-07-30 6 6.25 6 2100
## 5 36720410 10001 7953 10398 2 4925 1987-07-31 6 6 6 1000
## # … with 9 more variables: ret <dbl>, bid <dbl>, ask <dbl>, shrout <dbl>,
## # cfacpr <dbl>, cfacshr <dbl>, openprc <dbl>, numtrd <int64>, retx <dbl>
## # A tibble: 5 × 21
## cusip permno permco issuno hexcd hsiccd date bidlo askhi prc vol
## <chr> <int> <int> <int64> <int> <int6> <date> <dbl> <dbl> <dbl> <int>
## 1 44299090 12335 22348 0 1 1310 1955-07-29 126. 137 132 335
## 2 44299090 12335 22348 0 1 1310 1955-08-31 126 131 128 119
## 3 44299090 12335 22348 0 1 1310 1955-09-30 120. 129 124 233
## 4 44299090 12335 22348 0 1 1310 1955-10-31 118 140. 136. 594
## 5 44299090 12335 22348 0 1 1310 1955-11-30 134 141. 140. 313
## # … with 10 more variables: ret <dbl>, bid <dbl>, ask <dbl>, shrout <dbl>,
## # cfacpr <dbl>, cfacshr <dbl>, altprc <dbl>, spread <dbl>, altprcdt <date>,
## # retx <dbl>
The CRSP Indices database contains a number of CRSP indices.
Here we focus on two index tables, crsp.dsi
and crsp.msi
, which can be viewed complementing crsp.dsf
and crsp.msf
respectively.
dsi <- tbl(pg, sql("SELECT * FROM crsp.dsi"))
msi <- tbl(pg, sql("SELECT * FROM crsp.msi"))
dsi %>% collect(n = 5)
## # A tibble: 5 × 11
## date vwretd vwretx ewretd ewretx sprtrn spindx totval totcnt
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int6>
## 1 1925-12-31 NA NA NA NA NA NA 2.75e7 503
## 2 1926-01-02 0.00569 0.00569 0.00952 0.00952 NA NA 2.76e7 497
## 3 1926-01-04 0.000706 0.000706 0.00578 0.00578 NA NA 2.76e7 502
## 4 1926-01-05 -0.00482 -0.00487 -0.00193 -0.00203 NA NA 2.75e7 501
## 5 1926-01-06 -0.000423 -0.000427 0.00118 0.00116 NA NA 2.76e7 505
## # … with 2 more variables: usdval <dbl>, usdcnt <int64>
## # A tibble: 5 × 11
## date vwretd vwretx ewretd ewretx sprtrn spindx totval totcnt
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int64>
## 1 1925-12-31 NA NA NA NA NA 12.5 27487487. 503
## 2 1926-01-30 0.000561 -0.00140 0.0232 0.0214 0.0225 12.7 27624241. 506
## 3 1926-02-27 -0.0330 -0.0366 -0.0535 -0.0555 -0.0440 12.2 26752064. 514
## 4 1926-03-31 -0.0640 -0.0700 -0.0968 -0.101 -0.0591 11.5 25083173. 519
## 5 1926-04-30 0.0370 0.0340 0.0330 0.0302 0.0227 11.7 25886744. 521
## # … with 2 more variables: usdval <dbl>, usdcnt <int64>
11.1.1 Exercises
If you look at the stock tables (
crsp.dsf
andcrsp.msf
), you will see thatprc
can be negative on either table. Do negative stock prices make sense economically speaking? What do negative stock prices on CRSP mean? (CRSP documentation can be found here.) What would be some alternative approaches to encode this information? (Write code to recast the data using one of these approaches.) Why do you think that CRSP chose the approach used?How do
ret
andretx
differ? Which variable are you more likely to use in research?Looking at the
date
variable oncrsp.msf
, is it always the last day of the month? If not, why not?Suggest the “natural” primary key for these tables. Check that this is a primary key for
crsp.msf
.What is being depicted in each of the two plots below? What are the sources of variation in the first plot? Looking at the plots, what appears to be the main driver of variation in the first plot. Create an additional plot to visualize the source of variation in the first not depicted below. In the code below, we are using
collect()
followed bymutate(month = floor_date(date, "month"))
to calculatemonth
. What changes in terms of where the processing occurs if we replace these two lines withmutate(month = as.Date(date_trunc("month", date)))
? Do we get different results?
plot_data <-
dsf %>%
select(date) %>%
filter(date >= "2018-01-01") %>%
collect() %>%
mutate(month = floor_date(date, "month"))
freqs <-
plot_data %>%
count(month)
freqs %>%
ggplot(aes(x = month, y = n)) +
geom_bar(stat = "identity") +
scale_x_date(breaks = "1 month") +
theme(axis.text.x = element_text(angle = 90))

freqs_alt <-
plot_data %>%
distinct() %>%
count(month)
freqs_alt %>%
ggplot(aes(x = month, y = n)) +
geom_bar(stat = "identity") +
scale_x_date(breaks = "1 month") +
theme(axis.text.x = element_text(angle = 90))

What is the primary key for
crsp.dsi
andcrsp.msi
? Verify that it is a valid key for both tables.Using the
dplyr
verbanti_join
, determine if there are any dates oncrsp.dsf
that do not appear oncrsp.dsi
or vice versa. Do the same forcrsp.msi
andcrsp.msf
.
11.2 Efficient capital markets
One of the core ideas in capital market research is the efficient markets hypothesis (EMH). Fama (1991) defines the EMH as “the simple statement that security prices fully reflect all available information.” The EMH is perhaps the most empirically tested proposition in all of social sciences.
In Fama’s formulation, the terms fully reflect and all available information are doing a lot of work. One widely understood implication of the notion that security prices fully reflect a piece of information is that there are no opportunities to generate risk-adjusted profits by trading on that information.88
The EMH is particularly important for accounting research and practice for at least two reasons. First, accounting information is often a component of “all available information” against which the EMH is tested. In particular, Beaver (1998, p. 136) points out that accounting earnings “are widely analyzed by the investment community. No other firm-specific variable receives more attention by the analysts and other capital market participants than earnings.” Second, whether the EMH holds or not has significant implications for preparers, users, and regulators of accounting information.
Richard Thaler identifies two notions of the EMH, which he labels the “price is right” and “no free lunch” principles. The price is right principle says asset prices will “fully reflect” available information and thus “provide accurate signals for resource allocation”. The no free lunch principle holds that market prices are impossible to predict making it very hard for an investor to beat the market after taking account of risk. The “no free lunch” principle is the more plausible—and the more empirically testable—variant of the EMH.
Most empirical studies of the EMH test the “no free lunch” principle. However, Shiller (1984) laments the existence of “claims that because real returns are nearly unforecastable, the real price of stocks is close to the intrinsic value” and suggest that “this argument for the efficient markets hypothesis represents one of the most remarkable errors in the history of economic thought” (1984, p. 459). In other words, a common fallacy is to collate the two variants, so that evidence for the “no free lunch” variant is adduced as supporting the “price is right” theory.
The “price is right” theory often underlies papers that use event studies to evaluate the merits of corporate policies or regulation. We will discuss some issues with conflation of these two principles in Chapter 15.