# 24 Extreme values and sensitivity analysis

In this chapter, we look at approaches used in handling the kinds of **extreme values** encountered in real-world data sets. We start the chapter with Leone et al. (2019), which provides guidance on how to address extreme values. We then examine Call et al. (2018), which explores a setting in which extreme values are present. We also use this chapter to explore the idea of **sensitivity analysis** using the approach of Frank (2000) seen in Call et al. (2018).

The code in this chapter uses the following packages. We load `tidyverse`

because we use several packages from the Tidyverse. For instructions on how to set up your computer to use the code found in this book, see Section 1.2. Quarto templates for the exercises below are available at https://github.com/iangow/far_templates.

## 24.1 Leone et al. (2019)

Leone et al. (2019) show that many papers in accounting research modify or eliminate observations with extreme values, as these can have an outsized influence on regression results. **Winsorization** defines extreme values as those above or below certain quantiles and replaces these with values *at* those quantiles (we saw winsorization in Chapter 25). Another approach is **truncation**, which is like winsorization, except that extreme observations are eliminated from subsequent analysis.

Leone et al. (2019) also cover two alternative approaches to handling extreme values. The first alternative approach is the use of diagnostic statistics to identify influential observations where “such observations are excluded from second-pass (or later) regressions estimated to draw a researcher’s ‘final’ inferences” (p. 343).

The second alternative approach is **robust regression**. While Leone et al. (2019) allude to an extensive literature on robust regression, we follow Leone et al. (2019) in focusing on so-called **MM-estimators**, which are made available via the `robustbase`

package.

Leone et al. (2019) provide three kinds of evidence regarding approaches to extreme values and influential observations. First, they survey the literature to gather data describing what researchers do to address extreme observations. The results of this effort are found in Table 1 of Leone et al. (2019).

Second, Leone et al. (2019) replicate three studies and examine how inferences vary by approach. Below we study one of these replications.

Third, Leone et al. (2019) provide simulation evidence on the effectiveness of the various approaches. We replicate this simulation evidence in Section 24.4.

### 24.1.1 Replication analysis

To better understand the approaches discussed in Leone et al. (2019), we will follow their replication of Chen et al. (2018), which is also the focus of material later in this chapter. While Chen et al. (2018) focus on fiscal years from 1996 through 2015, we get data for fiscal-year 1995, so that lagged values are available for 1996. Note that after collecting the data, we convert `fyear`

into a factor, which facilitates estimation of fixed effects and interaction terms.^{1}

```
db <- dbConnect(duckdb::duckdb())
company <- load_parquet(db, schema = "comp", table = "company")
funda <- load_parquet(db, schema = "comp", table = "funda")
```

```
comp <-
funda |>
filter(indfmt == "INDL", datafmt == "STD",
consol == "C", popsrc == "D") |>
filter(!between(sich, 6000, 6999) | is.na(sich)) |>
filter(between(fyear, 1995, 2015)) |>
group_by(gvkey) |>
window_order(fyear) |>
mutate(big_n = as.integer(au) %in% 1:8L,
lag_at = lag(at),
inv_at = if_else(lag_at > 0, 1 / lag_at, NA),
ta = if_else(lag_at > 0, (ib - oancf) / lag_at, NA),
roa = if_else(lag_at > 0, ib / lag_at, NA),
cfo = if_else(lag_at > 0, oancf / lag_at, NA),
mkt_cap = prcc_f * csho,
lag_mkt_cap = lag(mkt_cap),
size = if_else(lag_mkt_cap > 0, log(lag_mkt_cap), NA),
debt = coalesce(dltt, 0) + coalesce(dlc, 0),
lev = if_else(lag_at > 0, debt / lag_at, NA),
mtb = if_else(lag(ceq) > 0, lag_mkt_cap / lag(ceq), NA),
assets = lag_at,
d_sale = if_else(lag_at > 0, (revt - lag(revt)) / lag_at, NA),
d_ar = if_else(lag_at > 0, (rect - lag(rect)) / lag_at, NA),
ppe = if_else(lag_at > 0, ppent / lag_at, NA)) |>
ungroup() |>
select(gvkey, datadate, fyear, big_n, ta, big_n, roa, cfo, size, lev,
mtb, inv_at, d_sale, d_ar, ppe) |>
collect() |>
mutate(fyear = as.factor(fyear))
```

We create two functions that are quite similar in structure. The `winsorize()`

function offers more than we use here, as we stick to the default argument of `prob = 0.01`

, which is then mapped into `p_low = 0.01`

and `p_high = 0.99`

, which implements “the 1 percent and 99 percent winsorization rule common to accounting studies” (and also used in Chen et al. (2018)). The first line of the function uses `quantile()`

to identify the 1st and 99th percentiles of the distribution of `x`

and stores these two values in the vector `cuts`

. The `type`

argument specifies how ties are resolved and how averages at breakpoints are calculated. The `type`

argument should be “an integer between 1 and 9 selecting one of the nine quantile algorithms … to be used.”^{2}. By selecting `type = 2`

, we get the SAS default, which means we get the same values as we would using the SAS macro commonly used by other researchers. The third line of the `winsorize()`

function sets all values below `cuts[1]`

, which is the `p_low`

quantile, equal to `cuts[1]`

. The fourth line sets all values above `cuts[2]`

(the `p_high`

quantile) equal to `cuts[2]`

. The last line returns the modified value of `x`

.

The `truncate()`

function is almost identical, except that values outside the bounds in `cuts`

are set to `NA`

, which effectively removes them from later analysis.

To apply these functions we use the `mutate()`

function `across()`

all variables between `ta`

and `ppe`

inclusive (i.e., we do not apply them to `gvkey`

, `datadate`

, `fyear`

, or `big_n`

) and store the results in `comp_win`

and `comp_trunc`

.

We next run regressions. The first model we consider is essentially that found in Chen et al. (2018). As Leone et al. (2019) point out, `roa`

and `cfo`

can only be included in a model with `ta`

as the dependent variable when winsorization breaks the identity `ta = roa - cfo`

.^{3} In the second model, we use raw data, but drop `roa`

to make the model meaningful.

```
fms <- list(
"Wins, w/ROA" = feols(ta ~ big_n + roa + cfo + size + lev + mtb +
fyear * (inv_at + I(d_sale - d_ar) + ppe),
~ gvkey + fyear,
data = comp_win, na.action = na.exclude),
"Raw" = feols(ta ~ big_n + cfo + size + lev + mtb +
fyear * (inv_at + I(d_sale - d_ar) + ppe),
~ gvkey + fyear,
data = comp, na.action = na.exclude))
```

The third and fourth models tweak the second model (`"Raw"`

) by using winsorized and truncated data respectively. We use the `update()`

function to make the relationships between the models clearer and to avoid errors caused by unintended differences between models.

Given that we don’t use `comp_trunc`

other than to estimate `fms[["Trunc"]]`

, we could have used the following alternative to avoid creating `comp_trunc`

as a separate variable.

The fifth model we estimate uses Cook’s D to identify extreme observations for exclusion. As the `fixest`

objects returned by `feols()`

do not support the `cooks.distance()`

function, we estimate an OLS model and use that to identify extreme observations for exclusion.

The sixth and final model uses robust regression. Leone et al. (2019) use the `robreg`

module made available for Stata, which defaults to an asymptotic efficiency of 85% for the bisquare estimator used in the second step of the MM estimation procedure and we mimic this choice here. This efficiency level is obtained by specifying the `tuning.psi`

parameter as `3.4437`

.^{4}

An alternative would be to follow Koller and Stahel (2017) by selecting `control = lmrob.control("KS2014")`

. Koller and Stahel (2017) suggest that this approach “allows for … estimation in designs with factors and interactions between factors and continuous regressors”, which describes our setting here due to the interaction of `fyear`

(a factor) with `inv_at + I(d_sale - d_ar) + ppe`

(continuous variables).

Following the recommendation in Leone et al. (2019), we recover the weights from the fitted `lmrob`

object and feed those to an update of the `fixest`

model in `fms[["Raw"]]`

, as doing so allows us to use cluster-robust standard errors. Note that many values returned by `weights()`

will be `NA`

, so we use `coalesce(0)`

to convert those to zeros.

Results of estimating each of these models are shown in Table 24.1.

```
modelsummary(fms,
estimate = "{estimate}{stars}",
statistic = "({statistic})",
coef_omit = "(fyear|ppe|inv_at|d_sale)",
gof_map = c("nobs", "r.squared"),
stars = c('*' = .1, '**' = 0.05, '***' = .01))
```

Wins, w/ROA | Raw | Wins | Trunc | Cook's D | Robust | |
---|---|---|---|---|---|---|

(Intercept) | −0.031*** | 0.262 | −0.085*** | −0.119*** | −0.107*** | −0.020*** |

(−8.918) | (1.207) | (−7.305) | (−14.800) | (−10.322) | (−17.870) | |

big_nTRUE | 0.000 | −0.073 | 0.006 | 0.009** | 0.020*** | −0.010*** |

(0.254) | (−1.052) | (1.292) | (2.614) | (3.950) | (−18.968) | |

roa | 0.815*** | |||||

(63.992) | ||||||

cfo | −0.786*** | 0.799* | 0.257*** | 0.124*** | 0.194*** | −0.386*** |

(−37.418) | (1.763) | (8.918) | (5.573) | (5.211) | (−159.147) | |

size | 0.003*** | −0.066* | 0.004** | 0.007*** | 0.001 | 0.003*** |

(5.223) | (−1.857) | (2.660) | (5.680) | (0.686) | (18.439) | |

lev | −0.035** | −0.072 | −0.111*** | −0.031*** | −0.074*** | −0.019*** |

(−2.639) | (−0.344) | (−5.101) | (−4.478) | (−4.174) | (−22.238) | |

mtb | −0.001*** | −0.003 | −0.007*** | −0.006*** | −0.001*** | 0.000*** |

(−4.683) | (−0.931) | (−9.127) | (−11.030) | (−3.797) | (−19.849) | |

Num.Obs. | 113991 | 113991 | 113991 | 106581 | 113563 | 83529 |

R2 | 0.911 | 0.561 | 0.278 | 0.095 | 0.129 | 0.928 |

### 24.1.2 Discussion questions

- Table 5 of Leone et al. (2019) presents results from a simulation analysis. Which panel of that table likely best reflects the kind of data researchers encounter in practice?
- Consider the left half of Table 5 of Leone et al. (2019) (i.e., \(\beta_1 = 0\)). Using the panel you identified in question 1 above, interpret the implications of the simulation results for researchers in practice. Note that the mean estimates of \(\beta_1\) using Cook’s D or robust regression are very similar. Does Table 5 of Leone et al. (2019) provide strong support for rejecting Cook’s D in favour of robust regression?
- In the search for new research questions, accounting researchers increasingly need to study variables whose most plausible effect sizes are consistent with the right half of Table 5 of Leone et al. (2019) (i.e., \(\beta_1 = 0\)). Using the panel you identified in question 1 above, interpret the implication of the simulation results for researchers in practice.
- Table 2 of Leone et al. (2019) reports results similar to those given in columns (4), (5), and (6) of Table 24.1. They say, “[when we] perform estimation using RR, the coefficient on \(BIG\_N\) is \(-0.011\) (significant at the 1 percent level), opposite of what Chen et al. (2018) find, and instead consistent with the negative coefficient documented in prior studies using the two-step procedure.” In light of the simulation evidence, how persuasive do you find the evidence above in support of a negative coefficient on
`big_n`

? How probative is the evidence of prior studies if it is based on approaches (e.g., winsorization and two-step procedures) that Leone et al. (2019) and Chen et al. (2018) demonstrate lead to unreliable results?

## 24.2 Call et al. (2018)

Call et al. (2018, p. 123) examines “whether there is a link between whistleblower involvement and the outcomes of enforcement action” by US regulators. Specifically, they show that “whistleblower involvement is associated with larger monetary penalties for the targeted firms and longer prison sentences for targeted employees” (2018, p. 126).

Kuvvet (2019) points out that a relatively small number of observations account for the bulk of the monetary penalties imposed on firms, employees and others. The `cmsw_2018`

data frame provided with the `farr`

package includes data used in Call et al. (2018), which allows us to reproduce Table 2 of Kuvvet (2019) as Table 24.2.^{5}

```
cmsw_2018 |>
filter(tousesox == 1) |>
pivot_longer(cols = matches("(penalty|mos)"),
values_to = "amount", names_to = "target") |>
group_by(target) |>
arrange(desc(amount)) |>
mutate(rank = row_number()) |>
summarize(total = sum(amount),
total_top = sum(if_else(rank <= 6, amount, 0)),
.groups = "drop") |>
mutate(top_perc = round(total_top / total * 100, 1),
across(total:total_top, \(x) round(x, 0)))
```

target | total | total_top | top_perc |
---|---|---|---|

emppenalty | 21,176 | 15,877 | 75.0 |

empprisonmos | 17,181 | 3,658 | 21.3 |

firmpenalty | 13,578 | 7,396 | 54.5 |

otherpenalty | 11,107 | 9,684 | 87.2 |

### 24.2.1 Econometric claims in accounting research

Call et al. (2018, p. 134) concede the existence of “two challenges” for empirical analysis in their setting: “the large number of zero-valued observations (i.e., enforcement actions without any resultant penalties or criminal prison sentences) and the severe positive skewness in the dependent variable (i.e., some extremely large penalties).”

Call et al. (2018, p. 134) assert that “whereas other regression techniques … suffer from potentially severe bias … prior research shows that the Poisson pseudo-maximum likelihood (PPML) estimator … is a particularly effective modeling technique for data distributions characterized by a disproportionate number of zeros and severe skewness” and cite seven sources for this claim.

This approach to econometrics is fairly standard in accounting research. While it is rare to see econometric analysis of estimators in accounting research papers, it is common for claims to be made and for readers to be directed to research in economics, finance, or econometrics for support for those claims. Given the prevalence of this approach, we believe it is useful to demonstrate how a reader can evaluate such claims.

We consider each the seven cited sources as a basis for the claims made in Call et al. (2018). The first of these (Irarrazabal et al., 2013) can be eliminated, as it does not use this estimator, let alone provide econometric support for these claims.^{6} Similarly, Cameron and Trivedi (2009) and Wooldridge (2010) are standard econometrics references that say precisely nothing about the ability of Poisson regression to handle “a disproportionate number of zeros and severe skewness” in any setting, let alone in settings that do not use the count dependent variables that motivate the use of Poisson regression.

The two papers by Santos Silva and Tenreyo seem more relevant.^{7} Santos Silva and Tenreyro (2006, p. 642) note that “the pioneering work of Tinbergen (1962) initiated a vast theoretical and empirical literature on the gravity equation for trade. … In its simplest form, the gravity equation for trade states that the trade flow from country \(i\) to country \(j\), denoted by \(T_{ij}\), is proportional to the product of the two countries’ GDPs, denoted by \(Y_i\) and \(Y_j\), and inversely proportional to their distance, \(D_{ij}\)”.

This implies the following equation (in stochastic form):

\[ T_{ij} = \alpha_0Y_i^{\alpha_1} Y_j^{\alpha_2} D_{ij}^{\alpha_3} \eta_{ij}\] where \(\alpha_0\), \(\alpha_1\), \(\alpha_2\), and \(\alpha_3\) are unknown parameters and \(\eta_{ij}\) is an error factor assumed the independent of regressors. Santos Silva and Tenreyro (2006) note that “there is a long tradition in the trade literature of log-linearizing” the equation above and estimating using OLS:

\[ \ln T_{ij} = \alpha_0 + \alpha_1 \ln Y_i + \alpha_2 \ln Y_j + \alpha_3 \ln D_{ij} + \ln \eta_{ij} \]

Notwithstanding the popularity of this approach, Santos Silva and Tenreyro (2006) find that “the present of heteroskedasticity can generate strikingly difference estimates when the gravity equation is log-linearized, rather than estimated in levels [and] … that inferences drawn on log-linearized regressions can produce misleading conclusions.”

Fortunately, Santos Silva and Tenreyro (2011) summarize Santos Silva and Tenreyro (2006) as providing “simulation evidence that the PPML is well behaved in a wide range of situations and is resilient to the presence of a specific type of measurement error of the dependent variable.” However, Santos Silva and Tenreyro (2006) provides little evidence on the performance of PPML when the dependent variable is frequently zero and Santos Silva and Tenreyro (2011) seeks to address this by simulating a simple structural model in which the number and size of exporters is stochastic in a model that incorporate ideas from the gravity model, but yields zero trade levels for most of their simulated samples. Santos Silva and Tenreyro (2011) find that PPML performs very well in this simulated setting.

Before discussing the implications of these results, it is important to note that that PPML as used by Santos Silva and Tenreyro (2006) and Santos Silva and Tenreyro (2011) is nothing more or less than Poisson regression. An important property of the Poisson model is that the mean and variance of a Poisson-distributed random variable are equal. While statisticians have developed generalizations of the basic Poisson model to allow for (say) variance greater than the mean (this is called **overdispersion** and is common in real-world data), Santos Silva and Tenreyro (2006, p. 645) argue that such models “might give excessive weight to the observations that are more prone to measurement errors” in the context of trade data. As such, “the Poisson regression emerges as a reasonable compromise, giving less weight to the observations with larger variance than the standard NLS estimator, without giving too much weight to observations more prone to contamination by measurement error” (Santos Silva and Tenreyro, 2006, p. 646).

Now that we better understand the results of prior research (Santos Silva and Tenreyro, 2011, 2006), we can return to the claims of Call et al. (2018) provided above. Unfortunately, Call et al. (2018) provides no reason to believe that the results of Santos Silva and Tenreyro (2011) carry over to whatever model seems applicable to the setting of Call et al. (2018). There is no “vast theoretical and empirical literature” to suggest any model of the factors driving firm penalties, let alone that either the gravity model or the Santos Silva and Tenreyro (2011) model is applicable to this new setting. As such, the results of prior research are of little value in the Call et al. (2018) setting.

The reasoning of Call et al. (2018) seems analogous to saying that “prior research shows that when \(y\) is a function of \(X\) the best estimator is OLS with OLS standard errors” (in the sense used in Chapter 5) without noting that such results are applicable to the classical linear model with spherical errors and without justifying that model in the current setting.

One concern highlighted by the above discussion is the presence of fundamental asymmetries in the making and checking of claims like those in Call et al. (2018). *Making* such as claim is close to costless; a simple internet search is likely to yield papers supporting the use of a particular estimator in a context that shares features with the authors’ setting.

Additionally, the econometric knowledge needed to make such claims is close to zero. This is illustrated by footnote 13 of Call et al. (2018, p. 134), where it is claimed that “the primary difference between PPML and conventional Poisson regression is that PPML does not impose the assumption of equality in the first and second moments of the distribution.” Yet if the authors looked at their own Stata code, they would see that, as in Santos Silva and Tenreyro (2006), PPML *is* Poisson regression.^{8}

In contrast, *checking* such claims is far from costless. One needs to carefully understand what is going on in those papers and how they relate to the current setting and this requires some knowledge of econometrics. Adding to the asymmetry is the matter of incentives, authors have strong incentives to claim the appropriateness and robustness of those methods that yield results, whereas reviewers have very weak incentives of any kind.

Loose econometric analysis in accounting research is not new. Many researchers claimed that FM-NW and Z2 estimators of standard errors had properties that they did not have. These claims were either unsubstantiated or justified by citing papers that did not address them in any way. Once these claims were evaluated rigorously (Gow et al., 2010), they were found to have no merit.

In light of the discussion above, the response of Call et al. (2019) to the concern of Kuvvet (2019) about influential observations that Call et al. (2018) “addresses the role of extreme observations in enforcement actions with an estimator designed specifically to handle skewed data (Poisson pseudo-maximum likelihood)” seems unconvincing.

### 24.2.2 Replicating Call et al. (2018)

To better understand the estimator used by Call et al. (2018), we replicate their Table 4 below. First, we prepare the data. Following Call et al. (2018), we winsorize a subset of the control variables at the 1st and 99th percentiles. Second, we follow Call et al. (2018) in calculating a log-transformation of the dependent variables considered (we will use these transformed variables below). We use `set_names()`

so that the result of applying `map()`

to `yvars`

retains the values of `yvars`

in the names of the elements. We also convert `ff12`

to a factor variable and any logical variables to integers.^{9}

```
yvars <- set_names(c("firmpenalty", "emppenalty", "empprisonmos"))
cmsw <-
cmsw_2018 |>
mutate(across(c(blckownpct, initabret, pctinddir, mkt2bk, lev),
winsorize),
across(any_of(yvars), \(x) log(1 + x), .names = "ln_{.col}"),
ff12 = as.factor(ff12),
across(where(is.logical), as.integer)) |>
filter(tousesox == 1)
```

Next, we create a function `get_poisson_fit()`

to estimate Poisson regressions using `glm()`

. Because we refer to the same set of controls in other contexts below, we specify `controls`

outside `get_poisson_fit()`

.

```
x <- "wbflag"
controls <- c("selfdealflag", "blckownpct", "initabret", "lnvioperiod",
"bribeflag", "mobflag", "deter", "lnempcleveln",
"lnuscodecnt", "viofraudflag", "misledflag", "audit8flag",
"exectermflag", "coopflag", "impedeflag", "pctinddir",
"recidivist", "lnmktcap", "mkt2bk", "lev", "lndistance",
"ff12")
get_poisson_fit <- function(y, df = cmsw) {
form <- as.formula(str_c(y, " ~ ",
str_c(c(x, controls), collapse = " + ")))
fm <- glm(form, family = "poisson", data = df,
control = glm.control(maxit = 100))
}
```

We then use `get_poisson_fit()`

to get regression results.

`fms <- map(yvars, get_poisson_fit)`

We use `coeftest()`

to return “robust” standard errors that match those used in Call et al. (2018). In addition to `type = "HC1"`

, `vcovHC()`

offers `"HC2"`

and `"HC3"`

. The differences between these options relate to degrees-of-freedom corrections that have the greatest impact with small samples. Chapter 8 of Angrist and Pischke (2008) provides more discussion and explanation of HC1, HC2, and HC3.

Finally, we tabulate the results in Table 24.3. To save space, we omit estimated coefficients for the control variables.^{10}

```
modelsummary(map(fms, get_coefs),
estimate = "{estimate}{stars}",
statistic = "{statistic}",
coef_map = "wbflag",
gof_map = c("nobs", "r.squared"),
stars = c('*' = .1, '**' = 0.05, '***' = .01))
```

firmpenalty | emppenalty | empprisonmos | |
---|---|---|---|

wbflag | 1.250* | 1.111*** | 0.621** |

1.949 | 2.637 | 2.372 | |

Num.Obs. | 658 | 658 | 658 |

Call et al. (2019) flag concerns raised by Kuvvet (2019) about the “tipster” variable as an example of “claims offered by Kuvvet [that] reflect a misunderstanding of both the enforcement action setting and the whistleblower designations in CMSW”. Call et al. (2019) argue that “CMSW’s ‘tipster’ and ‘non-tipster’ whistleblower designations are defined specifically in relation to the enforcement process and are not an attempt to identify the individual who first uncovered the misconduct.” However, Call et al. (2018) say “some whistleblowers sound the first alarm about potential violations, bringing to the attention of regulators possible violations that could be investigated. We refer to these whistleblowers as *tipsters*.”

### 24.2.3 Discussion questions

Suppose you were a regulator interested in understanding the effects of whistleblowers on enforcement outcomes. How might you design an experiment to examine these effects? What challenges would you expect to face in implementing your experiment? How would the experiment different from the setting of Call et al. (2018)?

As an additional example of “misunderstanding” by Kuvvet (2019), Call et al. (2019) claim that “Kuvvet argues that CMSW’s findings speak to correlation rather than causation. The published version of CMSW makes this point clearly throughout the paper.” What claim in Kuvvet (2019) are Call et al. (2019) addressing here? Do you agree that “CMSW makes this point clearly throughout the paper”?

“The published version of CMSW empirically addresses the role of extreme observations in enforcement actions with an estimator designed specifically to handle skewed data (Poisson pseudo-maximum likelihood) and with additional robustness tests, including one focused on the incidence rather than the magnitude of penalties.” As we see in Table 24.3, the estimator used in Call et al. (2018) is standard Poisson regression, which is called in R using the

`glm()`

function with`family = "poisson"`

. How might we use data sets covered in Section 24.1 or Section 24.4 to evaluate the claim that Poisson regression estimator is “designed specifically to handle skewed data”?Could we use approaches covered in Section 24.1 or Section 24.4 to address extreme observations in the setting of Call et al. (2018)?

Call et al. (2019) argue that “unlike many other settings in accounting, finance, and economics where the focus is often on the average firm, the enforcement action setting is inherently extreme” and claim that this is another example of misunderstanding” in Kuvvet (2019). Do you agree that “the enforcement action setting is inherently extreme”? Does this inherent extremeness undermine the arguments of Kuvvet (2019)?

What claim by Kuvvet (2019) are Call et al. (2019) trying to refute regarding the “tipster” variable? Do you find this response convincing?

## 24.3 Sensitivity analysis

In general, **sensitivity analysis** is used to assess how outputs differ as inputs or assumptions are varied. If we could assume that assignment of whistleblower status to an enforcement action is (conditional on control variables) unconfounded in the setting of Call et al. (2018), then we can interpret regression coefficients as estimates of causal effects.^{11} For this reason, we focus in this section on the analysis of the sensitivity of inferences to posited violations of the unconfoundedness of treatment assignment.

In practice, whistleblowers are not just randomly assigned to enforcement actions. Rather there is the possibility that the factors driving assignment of whistleblowers also affect the enforcement outcomes, making them confounders in the sense studied in Section 4.2. If these factors are not observed by the researcher, then they cannot be controlled for in empirical analysis.

Even in this narrow sense of the term we use here, there are various approaches to sensitivity analysis that can be employed. In this section, we apply the approach proposed by Frank (2000), which involves the calculation of the **impact threshold for a confounding variable** or **ITCV**.

To proceed with the calculation of ITCV, we begin by estimating OLS regressions, as the Frank (2000) approach is predicated on OLS regressions.

Following Call et al. (2018), we use the log-transformed versions of the confounding variable and robust standard errors.

Regression results analogous to those in columns (7) through (9) of Panel A of Table 8 of Call et al. (2018) are reported in Table 24.4.

```
modelsummary(map(fms, get_coefs),
estimate = "{estimate}{stars}",
statistic = "{statistic}",
coef_map = "wbflag",
gof_map = c("nobs", "r.squared"),
stars = c('*' = .1, '**' = 0.05, '***' = .01))
```

firmpenalty | emppenalty | empprisonmos | |
---|---|---|---|

wbflag | 0.484*** | 0.072 | 0.181 |

3.316 | 0.627 | 1.118 | |

Num.Obs. | 658 | 658 | 658 |

The impact of a confounding variable in this context is the effect it would have on the estimated coefficient of the variable of interest if it were included in the regression analysis. In the context of OLS regression in which we are regressing \(y\) on \(x\) and controls \(Z\), we know from Section 3.3 that the regression coefficient \(\beta\) on \(x\) can be calculated from a regression of the residuals of a regression of \(y\) on \(Z\) against the residuals of a regression of \(x\) on \(Z\). If we consider a confounding variable \(cv\), then its impact comes from (i) its effect on the residuals of a regression of \(y\) including both \(Z\) and \(cv\) and (ii) its effect on the residuals of a regression of \(x\) including both \(Z\) and \(cv\).

It turns out that these two quantities relate to **partial correlations**. A partial correlation between \(x\) and \(y\) measures the degree of association between \(x\) and \(y\), with the effect of a set of control variables (say \(Z\)) removed. The partial correlations of interest will be that between \(y\) and the hypothetical confounding variable \(cv\) (\(r_{y \cdot cv}\)) and that between \(x\) and the hypothetical confounding variable \(cv\) (\(r_{x \cdot cv}\)).

The ITCV can be expressed in terms of the product \(r_{y \cdot cv} \times r_{x \cdot cv}\). In effect, the ITCV represents the minimum value of this product that would turn a positive and statistically significant coefficient into a positive and *just* statistically significant coefficient, with impact values greater that the ITCV leading to statistically insignificant coefficients.

In Table 24.4, we have a statistically significant coefficient in just one regression (the one with firm penalties as the dependent variable), so like Call et al. (2018) we focus on this regression in calculating ITCV.^{12}

The `get_itcvs()`

function below calculates the ITCV for \(\alpha \in \{0.01, 0.05, 0.10\}\). While we refer the reader to Frank (2000) for details on the mathematics, it seems noteworthy that we extract just two quantities from the fitted model: the \(t\)-statistic for the coefficient on the variable of interest (`var`

) and the degrees of freedom for the model. Lower \(\alpha\) values imply higher critical values for the \(t\)-statistic, hence lower threshold values.

```
get_itcvs <- function(fm, var) {
coefs <- get_coefs(fm)
tstat <- coefs[var, "t value"]
df <- df.residual(fm)
numer <- tstat^2 / df
r_yx <- sqrt(numer / (1 + numer))
tibble(alpha = c(0.01, 0.05, 0.1)) |>
mutate(crit_val = qt(1 - alpha / 2, df),
rhash = crit_val / sqrt(df + crit_val^2),
itcv = (r_yx - rhash) / (1 - abs(rhash))) |>
select(-rhash)
}
```

We can now apply the `get_itcvs()`

function to the first model and the variable of interest (`wbflag`

). Results are reported in Table 24.5.^{13}

`get_itcvs(fms[[1]], "wbflag")`

alpha | crit_val | itcv |
---|---|---|

0.01 | 2.584 | 0.032 |

0.05 | 1.964 | 0.058 |

0.10 | 1.647 | 0.070 |

Evaluating ITCVs requires some sense of plausible magnitudes for \(r_{y \cdot cv}\) and \(r_{x \cdot cv}\). Absent some prior beliefs about confounding variables, a set of natural benchmarks comes from the control variables included in the regression (\(Z\)). Inclusion of these control variables is (or should be) based on these variables being confounding variables themselves.

As such we can calculate \(r_{y \cdot z}\) and \(r_{x \cdot z}\) for each \(z \in Z\). To do, we first create a function `pcor()`

to calculate partial correlations. This function is adapted from code for the the `pcor()`

function in the `ppcor`

package: it is trimmed down to focus on our limited needs here, which are limited to the vector of partial correlations between the first variable in `x`

and the remaining variables.

The `pcor()`

function requires data in matrix form, which in turn requires that each factor variable be encoded as a set of indicator variables. The `get_factors()`

flags the factor variables in a data frame.

The `make_dummies()`

function converts `var`

in the data frame `df`

to a set of dummies.

Finally `convert_factors()`

calls `get_factors()`

to identify factor variables then `make_dummies()`

to convert each of these.

```
convert_factors <- function(df) {
vars <- get_factors(df)
for (var in vars) {
df <- make_dummies(df, var)
}
df
}
```

Finally, we create the function `get_impacts()`

to calculate the elements of impact for the control variables.

```
get_impacts <- function(df, y, x, controls) {
corr_data <-
df |>
select(any_of(c(y, x, controls))) |>
convert_factors() |>
na.omit()
r_yz <-
corr_data |>
select(-all_of(x)) |>
pcor()
r_xz <-
corr_data |>
select(-all_of(y)) |>
pcor()
var <- names(corr_data |> select(-any_of(c(x, y))))
tibble(var, r_yz, r_xz) |>
mutate(impact = r_yz * r_xz) |>
arrange(desc(impact))
}
```

In Table 24.6 the impact of the 10 most impactful control variables are shown. Only one of these control variables (`lnmktcap`

) has an impact exceeding the ITCV for \(\alpha = 0.05\), suggesting that the conjectured control variable would have to rank second on this list of controls to have enough impact to drive the \(p\)-value above \(0.05\). As such, the OLS results reported in Table 24.4 for firm penalties might be interpreted as somewhat robust.

```
get_impacts(cmsw,
y = "ln_firmpenalty",
x = "wbflag",
controls = controls) |>
top_n(n = 10, wt = impact)
```

var | r_yz | r_xz | impact |
---|---|---|---|

lnmktcap | 0.332 | 0.274 | 0.091 |

bribeflag | 0.289 | 0.035 | 0.010 |

lnvioperiod | 0.140 | 0.053 | 0.007 |

lev | 0.070 | 0.100 | 0.007 |

blckownpct | -0.092 | -0.075 | 0.007 |

lnuscodecnt | 0.158 | 0.036 | 0.006 |

mkt2bk | -0.060 | -0.079 | 0.005 |

misledflag | -0.051 | -0.079 | 0.004 |

initabret | 0.098 | 0.040 | 0.004 |

recidivist | 0.034 | 0.080 | 0.003 |

However, a concern with this analysis is that it may be unrealistic to imagine that a single confounding variable has been omitted from the analysis. This concern seems accentuated in a setting where our understanding of the factors driving assignment to treatment is limited. It seems plausible that the factors driving the presence of whistleblowers are complex and poorly captured by the set of controls used in Call et al. (2018).

Another concern is that there is some arbitrariness to the measurement of controls. For example, \(z_1\), \(z_2\), and \(z_3\) might all be noisy proxies for an underlying confounder \(z\). Including all three variables might make sense to control for \(z\), but the impact of each variable is likely to be low because of multicollinearity, thus understating the impact of \(z\).

One approach to addressing this is to consider the controls as a single variable. The fitted value \(\hat{z}\) from a regression of \(x\) on \(Z = \{z_i \}\) can be used a single control in a regression where \(y\) is the dependent variable without affecting the coefficient on \(x\).^{14} As such, we can examine the impact of \(\hat{z}\) as a composite control.

We embed this logic in the function `get_combined_impact()`

.

```
get_combined_impact <- function(df, y, x, controls) {
form <- as.formula(str_c(x, " ~ ",
str_flatten(controls, collapse = " + ")))
fm <- lm(form, data = df)
df |>
mutate(z = fitted(fm)) |>
get_impacts(y = y,
x = x,
controls = "z")
}
```

Results from `get_combined_impact()`

are provided in Table 24.7. There it can be seen that the impact of the included controls as a set (0.273) is much higher than the ITCV for \(\alpha = 0.05\) (0.0578), suggesting that the impact of omitted confounders only has to be a fraction of the impact of included variables for the results to be statistically insignificant.

```
get_combined_impact(cmsw,
y = "ln_firmpenalty",
x = "wbflag",
controls = controls)
```

var | r_yz | r_xz | impact |
---|---|---|---|

z | 0.553 | 0.494 | 0.273 |

## 24.4 Appendix: Simulation study from Leone et al. (2019)

Here we present code to run simulations like those underlying Table 5 of Leone et al. (2019). One reason we present this here is to help the reader better understand the results in Table 5.

An additional reason for presenting this particular simulation is to illustrate a modular approach to simulations that is fostered by use of R functions. This approach potentially offers a template for readers looking to develop simulations.

A big-picture view of the simulation is that the `run_sim()`

function uses `get_data()`

to generate the data, then feeds these data to `fit_models()`

, then sends the returned fitted models to `compile_stats()`

. The `get_results()`

function feeds parameter values from `params`

to the `run_sim()`

function.

The first step is to produce the data. If you wanted to change the data-generating process, this could be achieved simply by editing this function. The purpose of this function is to replicate the data-generating process described in Leone et al. (2019).^{15} Rather than create separate code for each of the three panels of Table 5, we create one function and use argument values to select the data-generating process used to create the data. For example, the default values (`b_true = 0.8`

, `clean = FALSE`

, and `random = TRUE`

) produce data matching the left half of Panel C of Table 5.

```
get_data <- function(b_true = 0.8, clean = FALSE,
random = TRUE, n = 10000) {
prop_cont <- if_else(random, 0.02, 0.25)
df <- tibble(x1 = rnorm(n = n),
x2 = rnorm(n = n),
x3 = rnorm(n = n),
v = rnorm(n = n),
y = b_true * x1 + 0.4 * x2 + 0.2 * x3 + v,
z = rnorm(n = n, mean = 3, sd = 1)) |>
mutate(id = row_number(),
contaminated =
case_when(!clean & !random ~ id < prop_cont * n & x1 < -1.5,
!clean & random ~ id < prop_cont * n,
.default = FALSE),
cy = if_else(contaminated, y + z, y),
cyw = winsorize(cy), x1w = winsorize(x1),
x2w = winsorize(x2), x3w = winsorize(x3),
cyt = truncate(cy), x1t = truncate(x1),
x2t = truncate(x2), x3t = truncate(x3))
df
}
```

Another benefit of putting `get_data()`

in a separate function that returns a data frame is that we can examine the pieces one at a time.

`get_data()`

```
# A tibble: 10,000 × 17
x1 x2 x3 v y z id contaminated cy cyw
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <lgl> <dbl> <dbl>
1 -0.890 1.19 0.873 0.285 0.225 2.77 1 TRUE 2.99 2.99
2 -1.08 -1.52 1.08 0.303 -0.950 3.32 2 TRUE 2.37 2.37
3 0.113 -1.39 1.76 -0.201 -0.314 2.28 3 TRUE 1.96 1.96
4 -1.37 0.333 -1.38 1.38 0.150 2.64 4 TRUE 2.79 2.79
5 -0.560 -0.203 1.20 -0.105 -0.394 3.48 5 TRUE 3.09 3.09
6 0.481 0.943 0.325 0.427 1.25 2.16 6 TRUE 3.42 3.42
7 0.458 1.55 -1.26 0.923 1.66 2.23 7 TRUE 3.89 3.63
8 -1.56 0.207 0.180 0.0936 -1.03 4.03 8 TRUE 3.00 3.00
9 0.738 -0.673 1.26 -0.843 -0.269 1.32 9 TRUE 1.05 1.05
10 0.106 -0.928 1.96 0.906 1.01 3.20 10 TRUE 4.21 3.63
# ℹ 9,990 more rows
# ℹ 7 more variables: x1w <dbl>, x2w <dbl>, x3w <dbl>, cyt <dbl>,
# x1t <dbl>, x2t <dbl>, x3t <dbl>
```

The next function fits the models of interest. Following Leone et al. (2019), we fit seven models and store each model as an element of a list, as this makes manipulation of the models (e.g., extracting statistics) easier.

```
fit_models <- function(df) {
fms <- list(lm(cy ~ x1, data = df))
fms[[2]] <- update(fms[[1]], cyw ~ x1w + x2w + x3w)
fms[[3]] <- update(fms[[1]], cy ~ x1w + x2w + x3w)
fms[[4]] <- update(fms[[1]], cyt ~ x1t + x2t + x3t)
fms[[5]] <- update(fms[[1]], cy ~ x1t + x2t + x3t)
fms[[6]] <-
df |>
mutate(cooksd = cooks.distance(fms[[1]]),
extreme = cooksd > 4 / nobs(fms[[1]])) |>
filter(!extreme) |>
update(fms[[1]], data = _)
fmrob <- lmrob(formula(fms[[1]]),
data = df,
method = "MM",
control = lmrob.control(tuning.psi = 3.4437))
fms[[7]] <- fmrob
fms
}
```

For the simulation output, we want to retain estimated coefficients and results of tests of the null hypothesis at sizes of 5% and 1%. We create the `extract_stats()`

function to do this, as this makes for leaner, easier-to-debug code.

We next create `compile_stats()`

, which applies `extract_stats()`

to a list of models (such as those returned by `fit_models()`

) and puts the results in a data frame with human-friendly labels for each of the models.

```
compile_stats <- function(fms, b_null) {
map(fms, extract_stats, b_null = b_null) |>
list_rbind() |>
mutate(model_num = row_number(),
model = c("Do nothing",
"Winsorize all variables", "Winsorize only X",
"Truncate all variables", "Truncate only X",
"Cook's Distance", "Robust regression"))
}
```

Now we create `run_sim()`

, which runs one iteration of the simulation and returns a data frame with parameter values and statistics from the simulation.

```
run_sim <- function(sim_num = 1, b_true = 0.8, b_null = b_true,
clean = FALSE, random = FALSE) {
get_data(b_true = b_true, clean = clean, random = random) |>
fit_models() |>
compile_stats(b_null = b_null) |>
mutate(sim_num = sim_num, b_true = b_true, b_null = b_null,
clean = clean, random = random)
}
```

We now compile a data frame with the parameters we will run in the simulation. We will consider different two different values of each of `b_true`

, `clean`

, and `random`

. Note that when `clean`

is `TRUE`

, the value of `random`

has no significance, so we delete the case where `clean`

is `TRUE`

and `random`

is also `TRUE`

to avoid running redundant parameter values.

```
params <- expand_grid(b_true = c(0, 0.8),
clean = c(TRUE, FALSE),
random = c(FALSE, TRUE)) |>
filter(!(clean & random))
```

Now we finally have almost all the pieces to run the simulation. We will do this using `get_results()`

, which uses `future_map()`

from the `furrr`

package to run in multiple threads for substantially shorter run times.

```
num_sims <- 1000L
get_results <- function(b_true = 0, clean = FALSE, random = FALSE, ...) {
future_map(1:num_sims, run_sim, b_true = b_true,
clean = clean, random = random,
.options = furrr_options(seed = 2021)) |>
list_rbind()
}
```

The following step, which actually runs `get_results()`

, takes more than a minute to run and produces data for all panels of Table 5 of Leone et al. (2019).

```
plan(multisession)
all_results <-
pmap(params, get_results) |>
list_rbind() |>
system_time()
```

```
user system elapsed
5.548 0.205 96.247
```

Table 24.8 provides a version of the left half of Panel B of Table 5 of Leone et al. (2019).

```
all_results |>
filter(b_true == 0.8, random == FALSE, clean == FALSE) |>
group_by(b_true, clean, random, model_num, model) |>
summarize(across(est:p_1, mean), .groups = "drop") |>
select(model, est, p_5, p_1)
```

model | est | p_5 | p_1 |
---|---|---|---|

Do nothing | 0.703 | 1.000 | 1.000 |

Winsorize all variables | 0.705 | 1.000 | 1.000 |

Winsorize only X | 0.715 | 1.000 | 1.000 |

Truncate all variables | 0.682 | 1.000 | 1.000 |

Truncate only X | 0.710 | 1.000 | 1.000 |

Cook’s Distance | 0.760 | 0.929 | 0.828 |

Robust regression | 0.774 | 0.523 | 0.281 |

While much of the following code resembles that used in replicating Lawrence et al. (2011) in Chapter 25, there are minor differences.↩︎

See help by typing

`? quantile`

at the R console; R Core Team (2021)↩︎Actually this is the model estimated in column (1) of Table 2 of Leone et al. (2019), as a model estimated with “raw data” would not work. Additionally, Andy Leone confirms that the coefficients on

`roa`

and`cfo`

have been switched in Table 2 of Leone et al. (2019).↩︎See p.31 of Maronna et al. (2019). The Stata routine

`robreg`

uses the function`mm_biweight_k`

from the`moremata`

routine to convert efficiency to the equivalent of the`tuning.psi`

parameter.↩︎There is a slight discrepancy apparent here for employee penalties, for which Kuvvet (2019) appears to have summed the top 7 values instead of the top 6.↩︎

Footnote 13 of Irarrazabal et al. (2013, p. 80) says “the Poisson estimator … delivers qualitatively similar results (available on request)”.↩︎

Tenreyro (2007) involves an extension of Santos Silva and Tenreyro (2006) involving instrumental variables and directs readers to Santos Silva and Tenreyro (2006) regarding its efficiency and robustness. The seventh paper (Karolyi and Taboada, 2015) has a more careful linking of prior research to its setting, but does not provide independent evidence of the effectiveness of PPML.↩︎

Even if the authors were using the

`ppml`

Stata routine provided by the authors of Santos Silva and Tenreyro (2006), they would see that it’s just Poisson regression by default.↩︎The latter transformation does not affect the substance of the regressions, but does simplify their presentation.↩︎

The coefficients on the control variables are reported in Table 4 of Call et al. (2018).↩︎

Of course, even in such a case we need to be mindful of functional-form issues of the kind explored in Chapter 25.↩︎

The ITCV is not meaningfully defined when the coefficient of interest is not statistically significant even without the inclusion of a hypothetical confounding variable.↩︎

He we see a slight difference from the

`0.059`

reported in Call et al. (2018), perhaps due to their using`1.96`

as the critical value in place of the more exact value returned by`qt()`

in our code here.↩︎Consistency might suggest \(\hat{x}\) as the better notation, but \(\hat{z}\) seems to better capture what we are aiming for here.↩︎

We thank Miguel Minutti-Meza for sharing his Stata code with us, but note that the description of the simulation in the Leone et al. (2019) is clear and complete and sufficient to allow a reader to reproduce it without this code.↩︎