18 Earnings management

A significant body of accounting research focuses on earnings management. One definition of earnings management might be “intervention in the accounting process with a view to achieving financial reporting outcomes that benefit managers.”

A classic form of earnings management is channel stuffing, which is a way for a company to report higher sales (and perhaps higher profits) in a period by pushing more products through a distribution channel than is needed to meet underlying demand. A classic case of channel stuffing involved the Contact Lens Division (“CLD”) of Bausch and Lomb (“B&L”). According to the SEC, “B&L materially overstated its net income for 1993 by improperly recognizing revenue from the sale of contact lenses. These overstatements of revenue … arose from sales of significant amounts of contact lenses to the CLD’s distributors less than two weeks before B&L’s 1993 fiscal year-end in connection with a marketing program that effectively resulted in consignment sales.” In the case, the sales were not appropriately recognized as revenue during fiscal 1993 because “certain employees of the CLD granted unauthorized rights of return to certain distributors and shipped contact lenses after the fiscal year-end.”

While B&L’s channel stuffing was clearly earnings management (and in violation of generally accepted accounting principles, or GAAP), firms may engage in less extreme practices that are motivated by a desire to deliver higher sales in the current period, but do not involve any violation of GAAP or direct manipulation of the accounting process, yet would be generally regarded as earnings management. In such cases, earnings management is achieved by so-called real activities (i.e., those affecting business realities such as when products are delivered) and this form of earnings management is called real earnings management. Thus not all forms of earnings management involve direct manipulation of the accounting process.

But once we allow for real earnings management, it can be difficult to distinguish, even in principle, actions taken to increase firm value that happen to benefit managers because of their financial reporting effects from actions that might fit more conventional notions of earnings management.

Another difficulty relates to two alternative views of earnings management discussed by Beaver (1998). One view relates to actions by managers to “manipulate the financial reporting system in ways that enhance management’s well-being to the [detriment] of capital suppliers and others” (Beaver, 1998, p. 84). The alternative view allows for the possibility that earnings management allows managers “to reveal some of its private information to investors.”

Finally, note that Beaver (1998, p. 83) views earnings management as just one form of “discretionary behaviour”, which also includes voluntary disclosures, such as earnings forecasts.

18.1 Measuring earnings management

A challenge for researchers seeking to understand earnings management—the prevalence of the mechanisms through which it is achieved, and the effects that it has—is detecting and measuring earnings management. In this section, we use two early papers (Jones, 1991; McNichols and Wilson, 1988) to illustrate some key issues and approaches that researchers have used to address these.

18.1.1 Discussion questions

  1. Jones (1991) focuses on a small number of firms. Why does Jones (1991) have such a small sample? What are the disadvantages of a small sample? Are there advantages of a smaller sample or narrower focus?
  2. What are the primary conclusions of Jones (1991)? Which table presents the main results of Jones (1991)? Describe the empirical test used in that table. Can you suggest an alternative approach? What do you see as the primary challenges to the conclusions of Jones (1991)?
  3. Can you think of refinements to the broad research question? What tests might you use to examine these?
  4. McNichols and Wilson (1988) state at the outset that their paper “examines whether managers manipulate earnings.” Is this a good statement of the main research question of McNichols and Wilson (1988)? If not, suggest an alternative summary of the research questions of McNichols and Wilson (1988).
  5. What do McNichols and Wilson (1988) mean by “nondiscretionary accruals”? How “operationalizable” is this concept?
  6. McNichols and Wilson (1988) say “if \(\mathit{DA}\) were observable, accrual-based tests of earnings management would be expressed in terms of the following regression: \[ \mathit{DA} = \alpha + \beta \textit{PART} + \epsilon \] where \(\textit{PART}\) is a dummy variable that partitions the data into two groups for which earnings management predictions are specified”. Healy (1985) points out that bonus plans can give managers incentives to increase earnings or decrease earnings depending on the situation. How is this problematic for the formulation of McNichols and Wilson (1988) above? How might a researcher address this?
  7. What are the benefits and costs of focusing on a single item (bad debt expense) in a study of earnings management?
  8. The main results of McNichols and Wilson (1988) are in Tables 6 and 7. How persuasive do you find the evidence of earnings management found in the “residual provision” columns of those tables?
  9. How well does the \(\mathit{PART}\) framework apply to Jones (1991)? Does the framework require modification for this paper? In which periods would \(\mathit{PART}\) be set to one in Jones (1991)?

18.2 Evaluating measures of earnings management

A natural question that arises is how well measures of earnings management such as that used in Jones (1991) perform. An ideal measure would detect earnings management when it is present, but not detect earnings management when it is absent. This leads to two questions. First, how well does a given measure detect earnings management when it is present? Second, how does a given measure perform when earnings management is not present?

Dechow et al. (1995) evaluate five earnings management measures from prior research on these terms. Each of these measures uses an estimation period to create a model of non-discretionary accruals which is then applied to measure discretionary accruals for a period as the difference between total accruals and estimated non-discretionary accruals.

  • The Healy Model (Healy, 1985) measures non-discretionary accruals as mean total accruals during the estimation period \[ \textit{NDA}_{\tau} = \frac{\sum_t \textit{TA}_t}{T} \] where \(\textit{TA}_t\) is (here and below) total accruals scaled by lagged total assets.
  • The DeAngelo Model (DeAngelo, 1986) uses last period’s total accruals lagged total assets) as the measure of nondiscretionary accruals. \[ \textit{NDA}_{\tau} = \textit{TA}_{\tau - 1} \]
  • The Jones Model (Jones, 1991) “attempts to control for the effect of changes in a firm’s economic circumstances on nondiscretionary accruals” \[ \textit{NDA}_{\tau} = \alpha_1 (1/A_{\tau-1}) + \alpha_2 \Delta \textit{REV}_{\tau} + \alpha_3 \textit{PPE}_{\tau} \] where \(\textit{AT}_{\tau-1}\) is total assets at \(\tau - 1\), \(\Delta \textit{REV}_{\tau}\) is revenues in year \(\tau\) less revenues in year \(\tau-1\) scaled by \(\textit{AT}_{\tau-1}\), and \(\textit{PPE}_{\tau}\) is gross property plant and equipment in year \(\tau\) scaled by \(\textit{AT}_{\tau-1}\).
  • Modified Jones Model. Dechow et al. (1995) consider a modified version of the Jones Model “designed to eliminate the conjectured tendency of the Jones Model to measure discretionary accruals with error when discretion is exercised over revenues” (1995, p. 199). In this model, non-discretionary accruals are estimated during the event period as: \[ \textit{NDA}_{\tau} = \alpha_1 (1/A_{\tau-1}) + \alpha_2 \Delta (\textit{REV}_{\tau} - \textit{REC}_{\tau}) + \alpha_3 \textit{PPE}_{\tau} \] where \(\textit{REC}_{\tau}\) is net receivables in year \(\tau\) less net receivables in year \(\tau-1\) scaled by \(\textit{AT}_{\tau-1}\).
  • The Industry Model “relaxes the assumption that non-discretionary accruals are constant over time. The Industry Model assumes that variation in the determinants of non-discretionary accruals are common across firms in the same industry” (1995, p. 199). In this model, non-discretionary accruals are estimated during the event period as: \[ \textit{NDA}_{\tau} = \gamma_1 + \gamma_2 \mathrm{median}(\textit{TA}_\tau)\] In each of the models above the parameters (i.e., \((\alpha_1, \alpha_2, \alpha_3)\) or \((\gamma_1, \gamma_2)\)) are estimated on a firm-specific basis.

Dechow et al. (1995) conduct analyses on four distinct samples, with each designed to test a different question. Here we omit the fourth sample (based on SEC enforcement actions) and consider the remaining three samples. Drawing on the framework from McNichols and Wilson (1988), an indicator variable \(\mathit{PART}\) is set to one for a subset of firm-years in each sample:

  1. Randomly selected samples of 1000 firm-years.
  2. Samples of 1000 firm-years randomly selected from firm-years experiencing extreme financial performance.
  3. Samples of 1000 firm-years randomly selected to which a fixed and known amount of accrual manipulation is introduced.

Data for our analysis come from two tables on Compustat. As in Chapter 17, we supplement data on comp.funda with SIC codes from comp.company.

pg <- dbConnect(RPostgres::Postgres(), bigint = "integer", 
                check_interrupts = TRUE)

funda <- tbl(pg, sql("SELECT * FROM comp.funda"))
company <- tbl(pg, sql("SELECT * FROM comp.company"))

sics <- 
  company %>%
  select(gvkey, sic)

funda_mod <-
  funda %>%
  filter(indfmt == "INDL", datafmt == "STD",
         consol == "C", popsrc == "D") %>%
  left_join(sics, by = "gvkey") %>%
  mutate(sic = coalesce(sich, as.integer(sic)))

Sloan (1996, p. 293) suggests that the data needed to calculate accruals are not available for “banks, life insurance or property and casualty companies”, so we exclude these firms (those with SIC codes starting with 6) . Following Dechow et al. (1995), we restrict the sample to the years 1950 to 1991.123

acc_data_raw <-
  funda_mod %>% 
         pddur == 12,
         !between(sic, 6000, 6999)) %>%
  mutate(across(c(che, dlc, sale, rect), ~ coalesce(., 0))) %>%
  select(gvkey, datadate, fyear, at, ib, dp, rect, ppegt, ni, sale,
         act, che, lct, dlc, sic) %>%
  filter(between(fyear, 1950, 1991)) %>%

Like Sloan (1996) and Jones (1991), Dechow et al. (1995) measure accruals using a balance-sheet approach. The following function takes a data frame with the necessary Compustat variables, and calculates accruals for each firm-year, returning the resulting data set.

calc_accruals <- function(df) {
  df %>% 
    group_by(gvkey) %>%
    arrange(datadate) %>%
    mutate(lag_at = lag(at),
           d_ca = act - lag(act),
           d_cash = che - lag(che),
           d_cl = lct - lag(lct),
           d_std = dlc - lag(dlc),
           d_rev = sale - lag(sale),
           d_rec = rect - lag(rect)) %>%
    ungroup() %>%
    mutate(acc_raw =  (d_ca - d_cash - d_cl + d_std) - dp)

Like Jones (1991), Dechow et al. (1995) split firm-level data into an estimation period and a test firm-year and estimate earnings management models on a firm-specific basis. Dechow et al. (1995) require at least 10 years in the estimation period and each sample firm will have one test firm-year by construction. To give effect to this, we construct a sample of candidate firm-years comprising firms with at least 11 years with required data.

test_sample <-
  acc_data_raw %>%
  calc_accruals() %>%
  filter(lag_at > 0, sale > 0, ppegt > 0, !is.na(acc_raw), 
         !is.na(d_rev), !is.na(d_rec), !is.na(ppegt)) %>% 
  group_by(gvkey) %>%
  filter(n() >= 11) %>%
  ungroup() %>%
  select(gvkey, fyear)

Most of our analysis will focus on a single random sample of 1,000 firms. For each of these 1,000 firms we select a single fiscal year for which we set part to TRUE. Because we use the lagged value of accruals for the DeAngelo Model, we constrain the random choice to be any year but the first year.

We combine the data on part for our sample firm-years with the Compustat data in acc_data_raw to form merged_sample_1.124


sample_1_firm_years <- 
  test_sample %>%
  mutate(rand = rnorm(n = nrow(.))) %>%
  group_by(gvkey) %>%
  filter(rand == min(rand), fyear > min(fyear)) %>%
  ungroup() %>%
  top_n(1000, wt = rand) %>%
  select(gvkey, fyear) %>%
  mutate(part = TRUE)

sample_1 <-
  test_sample %>%
  semi_join(sample_1_firm_years, by = "gvkey") %>%
  left_join(sample_1_firm_years, by = c("gvkey", "fyear")) %>%
  mutate(part = coalesce(part, FALSE))

merged_sample_1 <-
  sample_1 %>%
  inner_join(acc_data_raw, by = c("gvkey", "fyear"))

If we were conducting a simple study of observed earnings management, it would be natural to calculate our measures of earnings management and then proceed to our analyses. However, in our analysis here we will, like Dechow et al. (1995), be manipulating accounting measures ourselves and doing so will require us to recalculate earnings management measures and inputs to these, such as measures of total accruals. To facilitate this process, we embed the calculations for all five earnings management measures in the following function.

get_nda <- function(df) {
  df_mod <- 
    df %>%
    calc_accruals() %>%
    mutate(sic2 = substr(as.character(sic), 1, 2),
           acc_at = acc_raw/lag_at,
           one_at = 1/lag_at,
           d_rev_at = d_rev/lag_at,
           d_rev_alt_at = (d_rev - d_rec)/lag_at,
           ppe_at = ppegt/lag_at) %>%
    group_by(sic2) %>%
    mutate(acc_ind = median(if_else(part, NA_real_, acc_at), na.rm = TRUE)) %>%
  da_healy <-
    df_mod %>%
    group_by(gvkey) %>%
    arrange(fyear) %>%
    mutate(nda_healy = mean(if_else(part, NA_real_, acc_at), na.rm = TRUE),
           da_healy = acc_at - nda_healy,
           nda_deangelo = lag(acc_at),
           da_deangelo = acc_at - nda_deangelo) %>%
    ungroup() %>%
    select(gvkey, fyear, part, nda_healy, da_healy, nda_deangelo,

  fit_jones <- function(df) {
    fm <- lm(acc_at ~ one_at + d_rev_at + ppe_at - 1, 
             data = df, subset = !part)
    df %>% 
      mutate(nda_jones = predict(fm, newdata = df),
             da_jones = acc_at - nda_jones) %>%
      select(fyear, nda_jones, da_jones)

  df_jones <-
    df_mod %>%
    nest_by(gvkey) %>%
  fit_mod_jones <- function(df) {
    fm <- lm(acc_at ~ one_at + d_rev_alt_at + ppe_at - 1, 
             data = df, subset = !part)
    df %>% 
      mutate(nda_mod_jones = predict(fm, newdata = df),
             da_mod_jones = acc_at - nda_mod_jones) %>%
      select(fyear, nda_mod_jones, da_mod_jones)

  df_mod_jones <-
    df_mod %>%
    nest_by(gvkey) %>%
  fit_industry <- function(df) {
    fm <- lm(acc_at ~ acc_ind, data = df, subset = !part)
    df %>% 
      mutate(nda_industry = suppressWarnings(predict(fm, newdata = df)),
             da_industry = acc_at - nda_industry) %>%
      select(fyear, nda_industry, da_industry)
  df_industry <-
    df_mod %>%
    nest_by(gvkey) %>%
  da_healy %>%
    left_join(df_jones, by = c("gvkey", "fyear")) %>%
    left_join(df_mod_jones, by = c("gvkey", "fyear")) %>%
    left_join(df_industry, by = c("gvkey", "fyear"))

Applying this get_nda function to our main sample (merged_sample_1) to create reg_data for further analysis requires just one line:

reg_data <- get_nda(merged_sample_1)

18.2.1 Table 1 of Dechow et al. (1995)

Table 1 of Dechow et al. (1995) presents results from regressions of discretionary accruals on \(\textit{PART}\) for each of the five models. For each model, three rows are provided. The first row provides summary statistics for the the estimated coefficients on \(\textit{PART}\) from firm-specific regressions for the 1,000 firms in the sample. The second and third rows provide summary statistics on the estimated standard errors of the coefficients on \(\textit{PART}\) and \(t\)-statistic testing the null hypothesis that the coefficients on \(\textit{PART}\) are equal to zero.

To facilitate creating a similar table, we make two functions. The first (fit_model) takes a data frame and, for each firm, regresses the measure of discretionary accruals corresponding to type on the part variable, returning the fitted models. The second function (multi_fit) runs regressions for all five models, returning the resulting data frame.

fit_model <- function(df, type = "healy") {
  df %>%
    nest_by(gvkey) %>% 
    summarize(model = list(lm(as.formula(paste0("da_", type, " ~ part")),
                      data = data)), .groups = "drop") %>%
    mutate(type =  !!type)

multi_fit <- function(df) {
  models <- c("healy", "deangelo", "jones", "mod_jones", "industry")
  bind_rows(lapply(models, function(x) fit_model(df, x)))

With these functions in hand, estimating firm-specific regressions for the five models requires a single line of code.

results <- multi_fit(reg_data)

The returned results comprise three columns: gvkey, model, and type, with model being the fitted model for the firm and model indicated by gvkey and type. Note that model is a list column and contains the value returned by the lm function. We can interrogate the values stored in model to extract whatever details about the regression we need.

## # A tibble: 6 × 3
##   gvkey  model  type 
##   <chr>  <list> <chr>
## 1 001012 <lm>   healy
## 2 001017 <lm>   healy
## 3 001018 <lm>   healy
## 4 001021 <lm>   healy
## 5 001048 <lm>   healy
## 6 001064 <lm>   healy

We will use the tidy function to extract the coefficients, standard error, t-statistics, and p-values in each fitted model as a data frame. For our version of Table 1 of Dechow et al. (1995), we are only interested in the coefficient on part (i.e., the one labelled partTRUE) and thus can discard the other row (this will be the constant of each regression) and the column term in the function get_stats that will be applied to each model. The function table_1_stats calculates the statistics presented in the columns of Table 1 of Dechow et al. (1995). To produce our version of Table 1 of Dechow et al. (1995), we use map from the purrr library to apply get_stats to each model, then unnest_wider and pivot_longer (both from the tidyr package) to arrange the statistics in a way that can be summarized to create a table.

get_stats <- function(fm) {
  fm %>%
    tidy() %>%
    filter(term == "partTRUE") %>% 

table_1_stats <- function(x) {
  tibble(mean = mean(x, na.rm = TRUE),
         sd = sd(x, na.rm = TRUE),
         q1 = quantile(x, p = 0.25, na.rm = TRUE),
         median = median(x, na.rm = TRUE),
         q3 = quantile(x, p = 0.75, na.rm = TRUE))

results %>%
  mutate(stats = map(model, get_stats)) %>% 
  unnest_wider(stats) %>% 
  pivot_longer(estimate:statistic, names_to = "stat") %>%
  group_by(type, stat) %>%
  summarize(table_1_stats(value), .groups = "drop") %>%
  knitr::kable(digits = 3)
type stat mean sd q1 median q3
deangelo estimate 0.011 0.434 -0.075 -0.003 0.071
deangelo statistic -0.016 1.168 -0.614 -0.028 0.583
deangelo std.error 0.206 0.603 0.079 0.128 0.205
healy estimate 0.010 0.386 -0.052 -0.002 0.045
healy statistic 0.125 1.980 -0.544 -0.018 0.569
healy std.error 0.147 0.517 0.056 0.092 0.148
industry estimate 0.004 0.462 -0.052 -0.001 0.046
industry statistic 0.121 2.009 -0.553 -0.010 0.583
industry std.error 0.144 0.479 0.056 0.091 0.147
jones estimate 0.017 0.325 -0.046 0.001 0.049
jones statistic 0.132 2.243 -0.699 0.014 0.779
jones std.error 0.090 0.080 0.046 0.069 0.111
mod_jones estimate 0.018 0.354 -0.050 0.000 0.048
mod_jones statistic 0.149 2.317 -0.703 -0.005 0.732
mod_jones std.error 0.094 0.084 0.047 0.073 0.116 Discussion questions

  1. What interpretation do Dechow et al. (1995) provide for their Table 1 results?

  2. Compare the results above with those in Table 1 of Dechow et al. (1995). What differences appear to be significant?

  3. Compare the values in the standard deviation column of Table 1 of Dechow et al. (1995) with other statistics. Do these differences make sense? Or do they suggest anomalies in the underlying data?

  4. Compare the values in the standard deviation column of the “earnings management” rows of Table 1 of Dechow et al. (1995) with the values in the mean column of the \(t\)-statistic rows. What is the relationship between these values? What would you expect the relationship between these values to be? Do you observe similar relations in the table we created above?

18.2.2 Tables 2 of Dechow et al. (1995)

Table 2 of Dechow et al. (1995) presents rejection rates for the null hypothesis of no earnings management in the \(\mathit{PART}\) year for the five measures. Given that Sample 1 comprises 1,000 firms selected at random with the \(\mathit{PART}\) year in each case also being selected at random, we expect the rejection rates to equal the size of the test being used (i.e., either 5% or 1%).

h_test <- function(fm) {
  coefs <- coef(summary(fm))
  if (dim(coefs)[1]==2) { 
    t_stat <- coefs[2 ,3]
    df <- fm$df.residual
    tibble(neg_p01 = pt(t_stat, df, lower = TRUE) < 0.01,
           neg_p05 = pt(t_stat, df, lower = TRUE) < 0.05,
           pos_p01 = pt(t_stat, df, lower = FALSE) < 0.01,
           pos_p05 = pt(t_stat, df, lower = FALSE) < 0.05)
  } else {
    tibble(neg_p01 = NA, neg_p05 = NA, pos_p01 = NA, pos_p05 = NA)

test_results <-
  results %>% 
  mutate(map_dfr(model, h_test)) 

test_results %>%
  group_by(type) %>%
  summarize(across(matches("p0"), ~ mean(., na.rm = TRUE))) %>%
  knitr::kable(digits = 3)
type neg_p01 neg_p05 pos_p01 pos_p05
deangelo 0.020 0.061 0.017 0.058
healy 0.010 0.040 0.027 0.061
industry 0.011 0.042 0.027 0.062
jones 0.035 0.084 0.049 0.105
mod_jones 0.033 0.082 0.050 0.103

Dechow et al. (1995) indicate cases where the Type I error rate is statistically significantly different from the size of the test using a “two-tailed binomial test.” This may be confusing at an initial reading, as the statistics presented in Table 2 are based on one-sided tests. But note that whether we are conducting one-sided tests or two-sided tests of the null hypothesis, we should expect rejection rates to equal the size of the test (e.g., 5% or 1%) if we have constructed the tests correctly. For example, if we run 1000 tests with a true null and set the size of the test at 5%, then rejecting the null hypothesis 10 times (1%) or 90 times (9%) will lead to rejection of the null (meta-)hypothesis that our test of our null hypothesis is properly sized, as the following \(p\)-values confirm. Absent a reason to expect over-rejection or under-rejection a priori, it makes sense to consider two-sided test statistics against a null hypothesis that the rejection rate equals the size of the test.

binom.test(x= 10, n = 1000, p = 0.05)$p.value
## [1] 6.476681e-12
binom.test(x= 90, n = 1000, p = 0.05)$p.value
## [1] 1.322284e-07

Here we embed binom.test in a small function and apply it to the test results above, adjusting the p argument based on the size of the test used in each case.

binom_test <- function(x, p) {
  x <- x[!is.na(x)]
  binom.test(sum(x), length(x), p = p)$p.value

test_results %>%
  group_by(type) %>%
  summarize(neg_p01 = binom_test(neg_p01, p = 0.01),
            neg_p05 = binom_test(neg_p05, p = 0.05),
            pos_p01 = binom_test(pos_p01, p = 0.01),
            pos_p05 = binom_test(pos_p05, p = 0.05)) %>%
  knitr::kable(digits = 3)
type neg_p01 neg_p05 pos_p01 pos_p05
deangelo 0.004 0.133 0.045 0.261
healy 1.000 0.167 0.000 0.110
industry 0.749 0.276 0.000 0.094
jones 0.000 0.000 0.000 0.000
mod_jones 0.000 0.000 0.000 0.000 Discussion questions and exercises

  1. Focusing on the Healy Model, DeAngelo Model and the Industry Model, compare the rejection rates produced above with those presented in Table 2 of Dechow et al. (1995) and those produced above. What might explain any differences? Could these be attributed to differences between our results and those reported in Table 1 of Dechow et al. (1995)? Or do you expect that these differences have another cause?
  2. How do you interpret the results of our binom_test reported in the second table above? Does it make sense to interpret each of the columns independently of the others?

Turning to the Jones Model and the Modified Jones Model, it is quite clear that we are over-rejecting the (true) null hypothesis. One possible explanation for this over-rejection is provided by footnote 11 of Dechow et al. (1995, p. 204):

The computation of the standard error of \(\hat{b}_j\) requires special attention because the measures of discretionary accruals in the event period (estimation period) are prediction errors (fitted residuals) from a first-pass estimation process. An adjustment must therefore be made to reflect the fact that the standard errors of the prediction errors are greater than the standard errors of the fitted residuals. Likewise, the degrees of freedom in the \(t\)-test must reflect the degrees of freedom used up in the first-pass estimation. This can be accomplished by … estimating a single-stage regression that includes both \(\textit{PART}\) and the determinants of nondiscretionary accruals.

The invocation of a single-stage regression might remind some readers of the Frisch-Waugh-Lovell theorem, which we discuss in section 4.3. But an important element of the single-stage regression approach suggested by the Frisch-Waugh-Lovell theorem is that the first- and second-stage regressions that are shown by the theorem to be equivalent have the same observations in both stages. In contrast, the first stage of the Jones Model and Modified Jones Model approaches used by Dechow et al. (1995) comprise only the estimation sample (i.e., they exclude the test firm-year of primary interest). Fortunately the single-stage regression invoked by Dechow et al. (1995) is that attributed to Salkever (1976).

Salkever (1976) demonstrates that the estimated value of discretionary accruals in the test year can be obtained by running a single regression including both the estimation and test periods and a dummy variable for the test year. The prediction error for the test observation (i.e., the estimated discretionary accruals for the test firm-year) will be equal to the coefficient on the \(\textit{PART}\) variable and the correct standard error for this prediction will be the standard error of that coefficient.

Because the Salkever (1976) approach seems so infrequently used in accounting research notwithstanding Dechow et al. (1995), yet is seemingly quite relevant in a number of settings, we spend some time exploring it in these exercise questions. (Note that in the following, to keep things manageable, we pull a single GVKEY value at random from our sample. You may need to modify this code to ensure that you are drawing the GVKEY of a firm in your sample, which may differ from ours.)

To keep things simple, we pull one firm from our sample.

df_test <- 
  merged_sample_1 %>%
  filter(gvkey == "001304")

We then create the variables needed to run the Jones Model and then fit a (differently) modified Jones Model on the estimation sample, which we store in fm1a. We can then calculate non-discretionary accruals for the full sample using the predict function. Finally, we can estimate the regression of discretionary accruals on the \(\textit{PART}\) variable and store the results in fm2a. To implement the approach suggested by Salkever (1976)—and used by Dechow et al. (1995)—we run a single regression on the entire sample with the addition of the \(\textit{PART}\) indicator.

df_mod <- 
    df_test %>%
    calc_accruals() %>%
    mutate(acc_at = acc_raw/lag_at,
           one_at = 1/lag_at,
           d_rev_at = d_rev/lag_at,
           d_rev_alt_at = (d_rev - d_rec)/lag_at,
           ppe_at = ppegt/lag_at) %>%

fm1a <- lm(acc_at ~ one_at + d_rev_at + ppe_at, 
          data = df_mod, subset = !part)

res1 <-
  df_mod %>% 
  mutate(nda_jones = predict(fm1a, newdata = .)) %>%
  select(fyear, part, acc_at, nda_jones) %>%
  mutate(da_jones = acc_at - nda_jones)

fm2a <- lm(da_jones ~ part, data = res1)

fm2 <- lm(acc_at ~ one_at + d_rev_at + ppe_at + part, 
          data = df_mod)
  1. Confirm that the coefficient on \(\textit{PART}\) from the regression in fm2a can be recovered from the regression in fm2. How do the standard errors differ across the two regressions?

  2. Modify the code above to check that the same is true for the Modified Jones Model.

  3. We described the Jones Model above as “a (differently) modified Jones Model”. In what way is the model different from the Jones Model estimated in the fit_jones function above? Does the Salkever (1976) equivalence hold if we use the Jones Model from the fit_jones function? If so, why? If not, how might this affect how you would use the Jones Model and the Salkever (1976) approach? (For example, do we expect the the “(differently) modified Jones Model” to produce materially different results from the Jones Model?)

  4. Do the issues related to a first and second stage apply to either the Healy Model or the DeAngelo Model or both? If so, could we apply the Salkever (1976) approach to address these issues? If not, are there “one-stage” equivalents to the Healy Model and DeAngelo Model approaches as implemented above?

18.2.3 Table 3 of Dechow et al. (1995)

Table 3 of Dechow et al. (1995) presents results from regressions using the second set of samples (“samples of 1000 firm-years randomly selected from firm-years experiencing extreme financial performance”). Table 3 is analogous to Table 2, for which we provided parallel results above.

We leave reproduction of a parallel to Table 3 as an exercise for the reader and merely provide code producing a sample that can be used for that purpose.

The following code proceeds in four steps. First, we create code for all firms meeting the sample criteria (i.e., those in test_sample) and then create a variable earn_dec that sorts firm-years into earnings deciles. To calculate deciles, we use the form_deciles function from the farr package (this uses essentially the same code as the function we saw in Chapter 17).

Second, we create sample_2_firm_years, which selects firm-years from the top earnings decile (subject to the constraint that the year is not the first year for the firm, as a prior year is required for the DeAngelo Model). When a firm has more than one firm-year in the top earnings decile, one of those firm-years is selected at random.

Third, we create sample_2 by pulling firm-years from test_sample for firms found in sample_2_firm_years and then pulling in the firm-years where part is TRUE based on the value of part from sample_2_firm_years and then setting the value of part to FALSE when it is missing (i.e., not found on sample_2_firm_years).

Finally, we create the analogue of merged_sample_1, which we call merged_sample_2 (creativity with variable names not being our strong point) by merging sample_2 with the underlying accounting data in acc_data_raw.

earn_deciles <- 
  acc_data_raw %>% 
  semi_join(test_sample, by = c("gvkey", "fyear")) %>%
  group_by(gvkey) %>%
  arrange(fyear) %>%
  mutate(earn = ib/lag(at)) %>% 
  ungroup() %>% 
  mutate(earn_dec = form_deciles(earn)) %>%
  select(gvkey, fyear, earn_dec)

sample_2_firm_years <- 
  earn_deciles %>%
  filter(earn_dec == 10) %>%
  select(gvkey, fyear) %>%
  mutate(rand = rnorm(n = nrow(.))) %>%
  group_by(gvkey) %>%
  filter(rand == min(rand), fyear > min(fyear)) %>%
  ungroup() %>%
  top_n(1000, wt = rand) %>%
  select(gvkey, fyear) %>%
  mutate(part = TRUE)

sample_2 <-
  test_sample %>%
  semi_join(sample_2_firm_years, by = "gvkey") %>%
  left_join(sample_2_firm_years, by = c("gvkey", "fyear")) %>%
  mutate(part = coalesce(part, FALSE))

merged_sample_2 <-
  sample_2 %>%
  inner_join(acc_data_raw, by = c("gvkey", "fyear"))

Table 3 actually involves two samples. One sample is similar to the above and a second sample would be based on the above, but with filter(earn_dec == 1) being used in the creation of sample_2_firm_years.

Table 4 is similar, but is based on deciles of cash flow from operations, where cash flow from operations is calculated using earnings and accruals, as cash flow statements were not required for most of the sample period in Dechow et al. (1995). Exercises

  1. Produce the equivalent of Table 3 from Dechow et al. (1995) by adapting the code used above to create merged_sample_2 and the version of Table 2 above. (Challenge version: Implement the approach of Salkever (1976) in doing so.)

  2. Produce the equivalent of Table 4 from Dechow et al. (1995) by adapting the code used above to create merged_sample_2 and the version of Table 2 above.

18.2.4 Figure 4 of Dechow et al. (1995)

The final analysis of Dechow et al. (1995) that we consider here relates to the third set of samples considered by Dechow et al. (1995), namely “samples of 1000 firm-years randomly selected to which a fixed and known amount of accrual manipulation is introduced”.

Figure 4 of Dechow et al. (1995) presents power functions for three different forms of earnings management, the five measures of earnings management, and levels of induced earnings management from zero to 100% of total assets.

To implement the introduction of a “fixed and known amount of accrual manipulation” we use the function manipulate, which takes a data set with the required variables from Compustat (e.g., gvkey, fyear, sale, at), an argument specifying the level of earnings management as a percentage of lagged total assets, and an argument specifying the type of earnings management, which can be expense, revenue or margin, as described in Dechow et al. (1995).

manipulate <- function(df, level = 0, type) {
  df <-
    df %>%
    group_by(gvkey) %>%
    arrange(datadate) %>%
    mutate(ni_ratio = median(if_else(part, NA_real_, ni/sale), na.rm = TRUE),
           lag_at = lag(at),
           manip_amt = lag_at * level,
           manip_amt_gross = manip_amt/ni_ratio)
  if (type == "expense") {
    df %>% 
      mutate(lct = if_else(part, lct - manip_amt, lct)) %>%
  } else if (type == "revenue") {
    df %>% 
      mutate(sale = case_when(part ~ sale + manip_amt, 
                              lag(part) ~ sale - manip_amt,
                              TRUE ~ sale),
             rect = if_else(part, rect + manip_amt, rect),
             act = if_else(part, act + manip_amt, act)) %>%
  } else if (type == "margin") {
    df %>% 
      mutate(sale = case_when(part & ni_ratio > 0 ~ sale + manip_amt_gross,
                              lag(part) & ni_ratio > 0 ~ sale - manip_amt_gross,
                              TRUE ~ sale),
             rect = if_else(part & ni_ratio > 0, rect + manip_amt_gross, rect),
             act = if_else(part & ni_ratio > 0, act + manip_amt_gross, act),
             lct = if_else(part & ni_ratio > 0, 
                           lct + manip_amt_gross - manip_amt, lct)) %>%
  } else {
    df %>%

We use the manipulate function above and apply it to levels of earning management from 0 to 100% of lagged total assets for each of the three types. The result from the step above is fed to the get_nda function from above, which is then fed to the multi_fit function to calculate the results of regressing non-discretionary accruals on the \(\textit{PART}\) variable. The result of these steps is stored in the data frame names manip_df. (Note that creating manip_df takes some time—a bit over five minutes on an M1 Mac using the code below.125) In addition to processing time, the code creating manip_df is quite memory-intensive—requiring more than 10GB of RAM depending on the precise code used. So if you have less than about 16GB of RAM, this code might require modification to run smoothly on your machine.

manip_df <-
  expand_grid(level = seq(from = 0, to = 1, by = 0.1),
              manip_type = c("expense", "revenue", "margin")) %>%
  mutate(data = map2(level, manip_type, 
                     ~ manipulate(merged_sample_1, .x, .y))) %>%
  mutate(accruals = map(data, get_nda)) %>%
  mutate(results = map(accruals, ~ multi_fit(.x))) %>%
  select(-data, -accruals)  

Note that we could combine three of the steps above into one with the following function:

manip_fit <- function(df, level, manip_type) {
  multi_fit(get_nda(manipulate(df, level, manip_type)))

With results from regressions for various values of level, the three values of manip_type, and the five models (manip_type) stored in manip_df, we can create plots like those presented in Figure 4 of Dechow et al. (1995) using the following code. We first create a function (h_test_5) that takes a fitted model and returns a logical value indicating whether the null hypothesis is rejected at the 5% level. The code below applies this function to each row of manip_df. The result of this step is use to calculate the proportion of firms for which the null is rejected for each value of (level, manip_type, type), which is then easily plotted using the ggplot function.

h_test_5 <- function(fm) {
  coefs <- coef(summary(fm))
  if (dim(coefs)[1]==2) { 
    t_stat <- coefs[2 ,3]
    df <- fm$df.residual
    pt(t_stat, df, lower = FALSE) < 0.05
  } else {

manip_df %>% 
  unnest(results) %>% 
  group_by(level, manip_type, type) %>% 
  mutate(reject_null = map_lgl(model, h_test_5)) %>%
  summarize(prop_reject = mean(reject_null, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = level, y = prop_reject)) +
  geom_line() +
  facet_grid(type ~ manip_type) Discussion questions

  1. How do the results of the figure above compare with those in Figure 4 of Dechow et al. (1995)?

  2. According to the SEC’s filing referenced above related to B&L, “B&L recognized, in contravention of GAAP and the Company’s own revenue recognition policies, $42.1 million of revenue, resulting in at least a $17.6 million, or 11%, overstatement of the net income originally reported for its 1993 fiscal year.” According to a subsequent SEC filing, B&L’s total assets for 1994 were $2,457,731,000 (it seems reasonable to assume that the 1993 value was not radically different from this). Based on this information (plus any information in the SEC’s filing), which of Dechow et al. (1995)’s three categories did B&L’s earnings management fall into? What is the approximate magnitude relative to the \(x\)-axes of the plots in Figure 4 of Dechow et al. (1995) (or the equivalent above)? Based on these data points, what is the approximate estimated probability of the various models detecting earnings management of this magnitude?

  3. What do you view as the implications of the power analysis conducted above for research on earnings management? Are these implications consistent with the extensive literature on earnings management subsequent to Dechow et al. (1995)? If so, explain why. If not, how would you reconcile the inconsistencies?

  4. Does each of the three forms of earnings management implemented in the manipulate function above agree precisely with the corresponding description in Dechow et al. (1995, pp. 201–202)? If not, does one approach seem more correct than the other? (Note that one issue arises with negative or zero net income ratio. How are such cases handled by Dechow et al. (1995) and by the manipulate function?)