11 Capital markets research in accounting

This chapter opens the part of the book focused on capital markets research, which is arguably the area in which academic research has most contributed to our understanding of real-world accounting phenomena. One goal of this part is to provide a solid introduction to classical ideas and papers related to capital markets research in accounting.

According to Kothari (2001, p. 113), “Ball and Brown (1968) and Beaver (1968) heralded empirical capital markets research as it is now known.” Prior to that period, accounting research was a largely theoretical discipline focused on normative research, that is, research concerned with the “right” or “best” way to account for various events and transactions. In addition to being normative, accounting theory was largely deductive, meaning that detailed theories were derived from general principles.

Beaver (1998) identifies one approach as asking, say, “what properties should the ‘ideal’ net income have?” One answer to this question is that accounting income a period should reflect the change in the net present value of cash flows (plus cash distributions) to shareholders during the period. But other answers existed. Accounting researchers would start with a set of desired properties and use these to derive the “best” approach to accounting for depreciation of long-lived assets, inventory, or lease assets. Kothari (2001) points out that there was “little emphasis on the empirical validity” of theory.

Similar ideas still permeate the thinking of standard-setters, who purport to derive detailed accounting standards from their “conceptual frameworks”, which outline broad definitions of things such as assets and liabilities that standard-setters can supposedly use to derive the correct accounting approach in any given setting.

However, in the period since Ball and Brown (1968), these approaches have been largely discarded in academic research. A largely normative, theoretical emphasis has been replaced by a positive, empirical one.

library(dplyr, warn.conflicts = FALSE)
library(DBI)
library(ggplot2)
library(lubridate) # For floor_date function

11.1 The CRSP database

According to its website, “the Center for Research in Security Prices, LLC (CRSP) maintains the most comprehensive collection of security price, return, and volume data for the NYSE, AMEX and NASDAQ stock markets. Additional CRSP files provide stock indices, beta-based and cap-based portfolios, treasury bond and risk-free rates, mutual funds, and real estate data. [CRSP] maintains the most comprehensive collection of security price, return, and volume data for the NYSE, AMEX and NASDAQ stock markets. Additional CRSP files provide stock indices, beta-based and cap-based portfolios, treasury bond and risk-free rates, mutual funds, and real estate data.” We discussed the CRSP/COMPUSTAT Merged Database in Chapter 9.87

CRSP provides PERMNO, its own “permanent identifier” for each security in its database. Additionally, it provides a company-level identifier, PERMCO, for each company. CRSP’s goals in creating these identifiers is to allow “for clean and accurate backtesting, time-series and event studies, measurement of performance, accurate benchmarking, and securities analysis.”

“CRSP contains end-of-day and month-end prices on all listed NYSE, Amex, and NASDAQ common stocks along with basic market indices, and includes the most comprehensive distribution information available, with the most accurate total return calculations.” End-of-day prices are found on crsp.dsf and month-end prices are on crsp.msf. Let’s take a look at these two tables.

pg <- dbConnect(RPostgres::Postgres())

dsf <- tbl(pg, sql("SELECT * FROM crsp.dsf")) 
msf <- tbl(pg, sql("SELECT * FROM crsp.msf"))
dsf %>% collect(n = 5)
## # A tibble: 5 × 20
##   cusip    permno permco  issuno hexcd hsiccd date       bidlo askhi   prc   vol
##   <chr>     <int>  <int> <int64> <int> <int6> <date>     <dbl> <dbl> <dbl> <int>
## 1 36720410  10001   7953   10398     2   4925 1987-07-27  5.75  6.25 -6        0
## 2 36720410  10001   7953   10398     2   4925 1987-07-28  5.75  5.75  5.75  1000
## 3 36720410  10001   7953   10398     2   4925 1987-07-29  5.88  5.88  5.88  2200
## 4 36720410  10001   7953   10398     2   4925 1987-07-30  6     6.25  6     2100
## 5 36720410  10001   7953   10398     2   4925 1987-07-31  6     6     6     1000
## # … with 9 more variables: ret <dbl>, bid <dbl>, ask <dbl>, shrout <dbl>,
## #   cfacpr <dbl>, cfacshr <dbl>, openprc <dbl>, numtrd <int64>, retx <dbl>
msf %>% collect(n = 5)
## # A tibble: 5 × 21
##   cusip    permno permco  issuno hexcd hsiccd date       bidlo askhi   prc   vol
##   <chr>     <int>  <int> <int64> <int> <int6> <date>     <dbl> <dbl> <dbl> <int>
## 1 44299090  12335  22348       0     1   1310 1955-07-29  126.  137   132    335
## 2 44299090  12335  22348       0     1   1310 1955-08-31  126   131   128    119
## 3 44299090  12335  22348       0     1   1310 1955-09-30  120.  129   124    233
## 4 44299090  12335  22348       0     1   1310 1955-10-31  118   140.  136.   594
## 5 44299090  12335  22348       0     1   1310 1955-11-30  134   141.  140.   313
## # … with 10 more variables: ret <dbl>, bid <dbl>, ask <dbl>, shrout <dbl>,
## #   cfacpr <dbl>, cfacshr <dbl>, altprc <dbl>, spread <dbl>, altprcdt <date>,
## #   retx <dbl>

The CRSP Indices database contains a number of CRSP indices. Here we focus on two index tables, crsp.dsi and crsp.msi, which can be viewed complementing crsp.dsf and crsp.msf respectively.

dsi <- tbl(pg, sql("SELECT * FROM crsp.dsi")) 
msi <- tbl(pg, sql("SELECT * FROM crsp.msi"))
dsi %>% collect(n = 5)
## # A tibble: 5 × 11
##   date          vwretd    vwretx   ewretd   ewretx sprtrn spindx   totval totcnt
##   <date>         <dbl>     <dbl>    <dbl>    <dbl>  <dbl>  <dbl>    <dbl> <int6>
## 1 1925-12-31 NA        NA        NA       NA           NA     NA   2.75e7    503
## 2 1926-01-02  0.00569   0.00569   0.00952  0.00952     NA     NA   2.76e7    497
## 3 1926-01-04  0.000706  0.000706  0.00578  0.00578     NA     NA   2.76e7    502
## 4 1926-01-05 -0.00482  -0.00487  -0.00193 -0.00203     NA     NA   2.75e7    501
## 5 1926-01-06 -0.000423 -0.000427  0.00118  0.00116     NA     NA   2.76e7    505
## # … with 2 more variables: usdval <dbl>, usdcnt <int64>
msi %>% collect(n = 5)
## # A tibble: 5 × 11
##   date          vwretd   vwretx  ewretd  ewretx  sprtrn spindx    totval  totcnt
##   <date>         <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>     <dbl> <int64>
## 1 1925-12-31 NA        NA       NA      NA      NA        12.5 27487487.     503
## 2 1926-01-30  0.000561 -0.00140  0.0232  0.0214  0.0225   12.7 27624241.     506
## 3 1926-02-27 -0.0330   -0.0366  -0.0535 -0.0555 -0.0440   12.2 26752064.     514
## 4 1926-03-31 -0.0640   -0.0700  -0.0968 -0.101  -0.0591   11.5 25083173.     519
## 5 1926-04-30  0.0370    0.0340   0.0330  0.0302  0.0227   11.7 25886744.     521
## # … with 2 more variables: usdval <dbl>, usdcnt <int64>

11.1.1 Exercises

  1. If you look at the stock tables (crsp.dsf and crsp.msf), you will see that prc can be negative on either table. Do negative stock prices make sense economically speaking? What do negative stock prices on CRSP mean? (CRSP documentation can be found here.) What would be some alternative approaches to encode this information? (Write code to recast the data using one of these approaches.) Why do you think that CRSP chose the approach used?

  2. How do ret and retx differ? Which variable are you more likely to use in research?

  3. Looking at the date variable on crsp.msf, is it always the last day of the month? If not, why not?

  4. Suggest the “natural” primary key for these tables. Check that this is a primary key for crsp.msf.

  5. What is being depicted in each of the two plots below? What are the sources of variation in the first plot? Looking at the plots, what appears to be the main driver of variation in the first plot. Create an additional plot to visualize the source of variation in the first not depicted below. In the code below, we are using collect() followed by mutate(month = floor_date(date, "month")) to calculate month. What changes in terms of where the processing occurs if we replace these two lines with mutate(month = as.Date(date_trunc("month", date)))? Do we get different results?

plot_data <-
  dsf %>%
  select(date) %>%
  filter(date >= "2018-01-01") %>%
  collect() %>%
  mutate(month = floor_date(date, "month")) 

freqs <-
  plot_data %>%
  count(month)

freqs %>%
  ggplot(aes(x = month, y = n)) +
  geom_bar(stat = "identity") +
  scale_x_date(breaks = "1 month") +
  theme(axis.text.x = element_text(angle = 90))
freqs_alt <-
  plot_data %>%
  distinct() %>%
  count(month)

freqs_alt %>%
  ggplot(aes(x = month, y = n)) +
  geom_bar(stat = "identity") +
  scale_x_date(breaks = "1 month") +
  theme(axis.text.x = element_text(angle = 90))
  1. What is the primary key for crsp.dsi and crsp.msi? Verify that it is a valid key for both tables.

  2. Using the dplyr verb anti_join, determine if there are any dates on crsp.dsf that do not appear on crsp.dsi or vice versa. Do the same for crsp.msi and crsp.msf.

11.2 Efficient capital markets

One of the core ideas in capital market research is the efficient markets hypothesis (EMH). Fama (1991) defines the EMH as “the simple statement that security prices fully reflect all available information.” The EMH is perhaps the most empirically tested proposition in all of social sciences.

In Fama’s formulation, the terms fully reflect and all available information are doing a lot of work. One widely understood implication of the notion that security prices fully reflect a piece of information is that there are no opportunities to generate risk-adjusted profits by trading on that information.88

The EMH is particularly important for accounting research and practice for at least two reasons. First, accounting information is often a component of “all available information” against which the EMH is tested. In particular, Beaver (1998, p. 136) points out that accounting earnings “are widely analyzed by the investment community. No other firm-specific variable receives more attention by the analysts and other capital market participants than earnings.” Second, whether the EMH holds or not has significant implications for preparers, users, and regulators of accounting information.

Richard Thaler identifies two notions of the EMH, which he labels the “price is right” and “no free lunch” principles. The price is right principle says asset prices will “fully reflect” available information and thus “provide accurate signals for resource allocation”. The no free lunch principle holds that market prices are impossible to predict making it very hard for an investor to beat the market after taking account of risk. The “no free lunch” principle is the more plausible—and the more empirically testable—variant of the EMH.

Most empirical studies of the EMH test the “no free lunch” principle. However, Shiller (1984) laments the existence of “claims that because real returns are nearly unforecastable, the real price of stocks is close to the intrinsic value” and suggest that “this argument for the efficient markets hypothesis represents one of the most remarkable errors in the history of economic thought” (1984, p. 459). In other words, a common fallacy is to collate the two variants, so that evidence for the “no free lunch” variant is adduced as supporting the “price is right” theory.

The “price is right” theory often underlies papers that use event studies to evaluate the merits of corporate policies or regulation. We will discuss some issues with conflation of these two principles in Chapter 15.