5 Causal inference

The importance of causal inference in accounting research is clear from the research questions that accounting researchers seek to answer. Most long-standing questions in accounting research are causal:

  • Does conservatism affect the terms of loan contracts?
  • Do higher quality earnings reports lead to lower information asymmetry?
  • Did International Financial Reporting Standards cause an increase in liquidity in the jurisdictions that adopted them?
  • Do managerial incentives lead to managerial misstatements in financial reports?

Gow et al. (2016) provide a taxonomy for empirical research papers that comprises four categories:

  • descriptive research papers
  • papers focused on prediction
  • papers that focus on measurement of some construct
  • papers that seek—whether explicitly or not—to draw causal inferences

Gow et al. (2016) find that most original research papers in the top three accounting research journals use observational data and that about 90% of these papers fall into the last category because they seek to draw causal inferences.

That accounting researchers focus on causal inference is consistent with the view that “the most interesting research in social science is about questions of cause and effect” (Angrist and Pischke, 2008, p. 3). While you may hear people talk about “interesting associations” at times, the reality is that associations (or correlations) are only interesting if there’s an interesting possible causal explanation for the associations. Associations that are mere coincidence, such as those found here are not at all interesting.28

At times, authors appear to disclaim any intention to draw causal inferences. Bertrand and Schoar (2003) is fairly typical: “There is no such thing as a random allocation of top executives to firms. Therefore, we are not hoping in this section to estimate the causal effect of managers on firm practices. Instead, our objective is more modest. We want to assess whether there is any evidence that firm policies systematically change with the identity of the top managers in these firms.”

There are at least two issues with this claim. First, why would anyone be interested in “evidence that firm policies systematically change with the identity of the top managers” if such changes come with no understanding as to why they change? Second, this claim is a bit of a pretence. The title of the paper is, after all, “Managing with style: The effect of managers on firm policies” and the first sentence of the abstract is “This paper investigates whether and how individual managers affect corporate behavior and performance” (emphasis added).

As another example, suppose a researcher argues that a paper that claims that “theory predicts \(X\) is associated \(Y\) and, consistent with that theory, we show \(X\) is associated with \(Y\)” is merely a descriptive paper that does not make causal inferences. However, theories are invariably causal in that they posit how exogenous variation in certain variables leads to changes in other variables. Further, by stating that “consistent with … theory, \(X\) is associated with \(Y\)”, the clear purpose is to argue that the evidence tilts the scale, however slightly, in the direction of believing the theory is a valid description of the real world: in other words, a causal inference is drawn. A paper that argues that \(Z\) is a common cause of \(X\) and \(Y\) and claims to find evidence of this is still making causal inferences (i.e., that \(Z\) causes \(X\) and \(Z\) causes \(Y\)).

Making causal inferences requires strong assumptions about the causal relations among variables. For example, as discussed below, estimating the causal effect of \(X\) on \(Y\) requires that the researcher has controlled for variables that could confound estimates of such effects.

Recently, some social scientists have argued that better research designs and statistical methods can increase the credibility of causal inferences. For example, Angrist and Pischke (2010) suggest that “empirical microeconomics has experienced a credibility revolution, with a consequent increase in policy relevance and scientific impact.” Angrist and Pischke (2010, p. 26) argue that such “improvement has come mostly from better research designs, either by virtue of outright experimentation or through the well-founded and careful implementation of quasi-experimental methods.”

The code in this chapter uses the following packages. For instructions on how to set up your computer to use the code found in this book, see Chapter 1.2.1.

library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(stargazer)

We use the stargazer package to produce neat output from regressions. For the HTML version of this book, we set sg_format to "html", but "text" would be a better option if looking at the results interactively, and you would probably use "latex" if compiling a PDF.29

sg_format <- "html"

5.1 Econometrics

Empirical financial accounting research can be viewed as fundamentally a highly specialized area of applied microeconomics. Financial accounting researchers typically take classes (at some level) in microeconomics, statistics, and econometrics, either before or in parallel with more specialized classes in accounting research. This model has two significant gaps that budding researchers need to address.

The first gap is the translation of ideas from econometrics and microeconomics into what researchers do from day to day. As we will see, sometimes things get lost in this translation. One goal of this book is to reduce the gap on the empirical methods side.

But it’s the second gap that we want to address here and this gap affects not only financial accounting researchers, but applied economic researchers more generally. This gap is between what accounting researchers want to do, which is causal inference, and what econometrics textbooks talk about, such as consistency, unbiasedness, and asymptotic variance.30

5.1.1 Fictional history of econometrics textbooks

Someone somewhere wrote the first econometrics textbook. Let’s call that someone Orinoco of Wimbledon. Now, Orinoco was a young professor who had been following the work such as Wright (1921) and Haavelmo (1944) and who wanted to write a textbook to bring the new ideas to budding economists around the world. Early in Orinoco’s textbook, he stated

Suppose we have the following structural model of an economic phenomenon \(y\):

\[ y = X \beta + \epsilon \] where \(y\) and \(\epsilon\) are \(N\)-element vectors, \(X\) is an \(N \times K\) matrix, and \(\beta\) is a \(K\)-element vector of coefficients. If we knew the coefficients in \(\beta\), then we could understand how changes in \(X\) cause changes in \(y\). For example, if \(x_{ik}\) goes from \(0\) to \(1\), then we expect \(y_i\) to increase by \(\beta_k\). In other words, \(\beta_k\) can be viewed as the (causal) effect of \(x_{k}\) on \(y\). Of course, in reality, we don’t know \(\beta\); we need to estimate it using data. The rest of this textbook is devoted to explaining how (and when) we can estimate \(\beta\) accurately and efficiently.

The precise details of what happened next are lost to (fictional) history. But it seems that Orinoco was heading home one day when he was accosted by the faculty toughs from his department. These hard-nosed economists ridiculed Orinoco mercilessly. Tomsk, the larger of the group, said: “That’s not a real structural model. A real structural model has agents optimizing given preferences and technologies. A real structural model is certainly not linear.” Wellington chimed in: “And all this nonsense about ‘causation’ … what does this even mean? Just focus on statistical properties of your estimators.” Orinoco was shaken by the incident in the car park and even feared for his physical safety. To prevent it from happening again, Orinoco rushed out a second edition of his textbook with the following passage in place of the one above:

Suppose we have the following population model of \(y\):

\[ y = X \beta + \epsilon \] where \(y\) and \(\epsilon\) are \(N\)-element vectors, \(X\) is an \(N \times K\) matrix, and \(\beta\) is a \(K\)-element vector of coefficients. The rest of this textbook is devoted to explaining how (and when) we can estimate \(\beta\) accurately and efficiently.

Subsequently, standard textbooks seem to have followed the lead of Orinoco’s second edition. If you look at the indexes of some standard textbooks from just a few years ago, you will see no entries for causal, causation, or similar terms.31 But we would argue that econometrics properly conceived is the social science concerned with estimation of parameters of (structural) economic models. We would further argue that any time we believe we can read of an estimate of a causal effect from an econometric analysis, we are estimating a structural model, even if the model we are using would not impress Tomsk. For example, if we randomly assign observations to treatment and control, then most would agree that we can (under certain assumptions) read causal effects off a regression of the outcome of interest on the treatment indicator. So this is a “structural” model, albeit a very simple one.

Of course, Tomsk is correct in some ways. Sometimes the reasoning behind the use of particular models is poor, and models implied by what researchers do can be poorer than they need to be. But models are always simplifications of reality and thus always in some sense “wrong”. And, even if we accept a model in a given setting, it is often the case that we “know” that our estimators are unlikely to provide unbiased estimates of the model’s true parameter values. Yet that does not change what researchers are trying to do when they conduct empirical analysis using econometric techniques.

The definition of econometrics that we provide here is not vacuous; not everything that accounting researchers do with econometric techniques could be described as econometric analysis (even if we’re charitable). At time, researchers run regressions without giving any thought as to the model they are trying to estimate. As we will see later in this book, there are settings where accounting researchers have drawn (causal) inferences from estimated coefficients about economic phenomena whose connection to the empirical models used is very far from clear. In other settings, researchers will implicitly assume that the null hypothesis implies a zero coefficient on some variable without do any modelling of this. One of the goals of this book is to encourage researchers to keep in mind what we are trying to do when we conduct econometric analysis.

Recently, some econometric textbooks have been more explicit about causal inference as the goal of almost all empirical research in economics. One example is Mostly Harmless Econometrics (Angrist and Pischke, 2008) and another is Causal Inference: The Mixtape (Cunningham, 2021). It may be that the authors of these books felt they could distract their Tomsks and Wellingtons by including humorous references to a series of British novels from the 1970s or to technology popular in the 1980s.32

5.1.2 Econometrics: A brief illustration

While this is a text book on accounting research, let’s consider a stylized example from labour economics. Suppose that we posit the following structural model:

\[ \begin{aligned} y_i &= \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_3 x_{i3} + \epsilon_i \\ x_{i1} &= \alpha_0 + \alpha_2 x_{i2} + \alpha_3 x_{i3} + \eta_i \end{aligned} \] where the \(i\) subscripts denote individual \(i\). In the first equation, \(y_i\) is income at age 30, \(x_{i1}\) is years of education, \(x_{i2}\) is a measure of industriousness, \(x_{i3}\) is a measure of intelligence, and \(\epsilon_i\) can be interpreted as random factors that affect \(y_i\) independent of \(X_i\). In the second equation, we add coefficients, \(\alpha := \left(\alpha_0, \alpha_1, \alpha_2 \right)\), and \(\eta_i\), which can be interpreted as random factors that affect \(x_{i1}\) independent of \(x_{i2}\) and \(x_{i3}\).33

As researchers, we can postulate a model of the form above and we might obtain data on \((y, X)\) that we could use to estimate \(\beta := \left(\beta_0, \beta_1, \beta_2, \beta_3 \right)\). But we don’t know that the model is correct and even if we did, we don’t know the values in \(\beta\).

As we saw in Chapter 4, the OLS regression estimator can be written

\[ \hat{\beta} = \left(X^{\mathsf{T}}X \right)^{-1} (X^{\mathsf{T}}y) \] and it can be shown mathematically that OLS has good properties—such as unbiasedness and efficiency—under certain conditions. But in a world of cheap computing, we don’t need to break out our pencils to do mathematics. Instead, we can “play God” in some sense and fix parameter values, simulate the data, then examine how well a researcher would do in estimating the parameter values that we set.

set.seed(2021)

n <- 100000

df <- tibble(
  industry = rnorm(n),
  intelligence = rnorm(n),
  education = 3 * intelligence + 4 * industry + rnorm(n),
  income = 10 + 5 * education + 6 * intelligence + 7 * industry + rnorm(n))

Now we can estimate three different models. It will turn out to be convenient to store the fitted models in a list (fms), which we initiate by creating an empty list with list().

fms <- list()
fms[[1]] <- lm(income ~ education, data = df)
fms[[2]] <- lm(income ~ education + intelligence + industry, data = df)
fms[[3]] <- lm(income ~ intelligence + industry, data = df)

The results from these models are presented in the table below.

stargazer(fms, type = sg_format, 
          header = FALSE, omit.stat = c("ser", "f"))
Dependent variable:
income
(1) (2) (3)
education 6.767*** 4.999***
(0.001) (0.003)
intelligence 6.003*** 21.008***
(0.010) (0.016)
industry 7.000*** 27.007***
(0.013) (0.016)
Constant 9.997*** 10.003*** 10.019***
(0.007) (0.003) (0.016)
Observations 100,000 100,000 100,000
R2 0.996 0.999 0.978
Adjusted R2 0.996 0.999 0.978
Note: p<0.1; p<0.05; p<0.01

5.1.3 Exercises

  1. Looking at the simulation code, what are the true values of \(\beta := \left(\beta_0, \beta_1, \beta_2, \beta_3 \right)\) and \(\alpha := \left(\alpha_0, \alpha_1, \alpha_2 \right)\)?
  2. Does any one of the three equations provide good estimates of \(\beta\)?
  3. Consider model (3), with regard to the first of the two equations, are there any issues with regard to estimating \(\beta\)? What (if any) OLS assumption is violated?
  4. What happens if you substitute the second equation (for \(x_{i1}\)) into the first equation (for \(y_i\))? Does this equation satisfy OLS assumptions in some way?
  5. Using the equations, what happens if arbitrarily increase the value of industry (\(x_{i3}\)) by one unit? What happens to education (\(x_{i1}\))? What happens to income (\(y_i\))?
  6. Can you read the effect sizes from the previous question off any of the regression results? Which one(s)?

5.2 Basic causal relations

The section provides a brief introduction to causal diagrams. Fuller coverage of this topic can be found in Morgan and Winship (2014) and Huntington-Klein (2021); Pearl (2009a) offers a more advanced treatment.

Figures 5.1, 5.2, and 5.3 illustrate the basic ideas of causal diagrams and how they can be used to facilitate thinking about causal inference. Each figure depicts potential relationships among three observable variables. In each case, we are interested in understanding how the presence of a variable \(Z\) impacts the estimation of the causal effect of \(X\) on \(Y\). The only difference between the three graphs is the direction of the arrows linking either \(X\) and \(Z\), or \(Y\) and \(Z\). The boxes (or nodes) represent random variables and the arrows (or edges) connecting boxes represent hypothesized causal relations, with each arrow pointing from a cause to a variable assumed to be affected by it.

Pearl (2009a) shows that, if we are interested in assessing the causal effect of \(X\) on \(Y\), we may be able to do so by conditioning on a set of variables, \(Z\), that satisfies certain criteria. These criteria imply that very different conditioning strategies are needed for each of the causal diagrams (see Gow et al. (2016) for a more formal discussion).

While conditioning on variables is much like the standard notion of “controlling for” such variables in a regression, there are critical differences. First, conditioning means estimating effects for each distinct level of the set of variables in \(Z\). This concept of nonparametric conditioning on \(Z\) is more demanding than simply including \(Z\) as another regressor in a linear regression model.34 Second, the inclusion of a variable in \(Z\) may not be an appropriate conditioning strategy. Indeed, it can be that the inclusion of \(Z\) results in biased estimates of causal effects.

$Z$ is a confounder

Figure 5.1: \(Z\) is a confounder

Figure 5.1 is straightforward. In this case, we need to condition on \(Z\) in order to estimate the causal effect of \(X\) on \(Y\). Note the notion of “condition on” again is more general than just including \(Z\) in a parametric (linear) model.35 The need to condition on \(Z\) arises because \(Z\) is what is known as a confounder.

$Z$ is a mediator

Figure 5.2: \(Z\) is a mediator

Figure 5.2 is a bit different. Here \(Z\) is a mediator of the effect of \(X\) on \(Y\). No conditioning is required in this setting to estimate the total effect of \(X\) on \(Y\). If we condition on \(X\) and \(Z\), then we obtain a different estimate, one that includes the indirect effect of \(X\) on \(Z\).

$Z$ is a collider

Figure 5.3: \(Z\) is a collider

Finally in Figure 5.3, we have \(Z\) acting as what is referred to as a “collider” variable (Pearl, 2009b).36 Again, not only do we not need to condition on \(Z\), but that we should not condition on \(Z\) to get an estimate of the total effect of \(X\) on \(Y\). While in epidemiology, the issue of “collider bias can be just as severe as confounding” (p.,186 of Glymour and Greenland (2008)), collider bias appears to receive less attention in accounting research than confounding. Many intuitive examples of collider bias involve selection or stratification. Admission to university could be a function of combined test scores (\(T\)) and interview performance (\(I\)) exceeding a threshold, i.e., \(T + I \geq C\). Even if \(T\) and \(I\) are unrelated unconditionally, a regression of \(T\) on \(I\) conditioned on admission to university is likely to show a negative relation between these two variables.

set.seed(2021)

n <- 100000

admissions <- tibble(
  test = rnorm(n),
  interview = rnorm(n),
  score = test + interview,
  cutoff = quantile(score, .90),
  admitted = score >=cutoff)

admissions %>% 
  ggplot(aes(x = test, y = interview, color = admitted)) +
  geom_point()

We will fit some model to the data we have generated.

In the following code, we estimate four models and store the estimated models in a list fms (for “fitted models”).37 Models (3) and (2) capture subsets of data where applicant are or are not admitted, respectively. Models (1) and (4) use all observations, but differ in terms of the regression terms included.

fms <- list()
fms[[1]] <- lm(interview ~ test, data = admissions)
fms[[2]] <- lm(interview ~ test, data = admissions, subset = !admitted)
fms[[3]] <- lm(interview ~ test, data = admissions, subset = admitted)
fms[[4]] <- lm(interview ~ test * admitted, data = admissions)

The results from these models are presented in the following table.

stargazer(fms, type = sg_format, 
          header = FALSE, omit.stat = c("ser", "f"))
Dependent variable:
interview
(1) (2) (3) (4)
test -0.008*** -0.174*** -0.719*** -0.174***
(0.003) (0.003) (0.007) (0.003)
admitted 2.298***
(0.017)
test:admitted -0.545***
(0.012)
Constant 0.001 -0.161*** 2.137*** -0.161***
(0.003) (0.003) (0.010) (0.003)
Observations 100,000 90,000 10,000 100,000
R2 0.0001 0.031 0.515 0.227
Adjusted R2 0.0001 0.031 0.515 0.227
Note: p<0.1; p<0.05; p<0.01

5.2.1 Exercises

  1. Imagine that, while you understand the basic idea that both tests and interviews affect admissions, you only have access to the regression results reported above. Which of the four models seems to best “describe the data”? Which of the four models seems to do worst?
  2. How are the coefficients in model (4) related to those in models (2) and (3)? Is this coincidence? Or would we always expect these relations to hold?
  3. As economists, we should be alert to the “endogeneity” of institutions. For example, if universities admitted students based on test and interview performance, they probably have good reasons for doing so. Can you think of reasons why a university might add test and interview performance into a single score? Does your story have implications for the relationship between test and interview performance?
  4. Using mutate, create a fourth variable test_x_admitted that is the product of test and admitted and run a version of model (4) above that uses this variable. Then regress, interview on test, admitted, and test_x_admitted. Do you get the same results are are shown above for model (4)? Why or why not?

5.3 Causal diagrams: Formalities

Here we provide a brief formal treatment of some of the ideas on causal diagrams discussed above. See Pearl (2009a) for more detailed coverage.

5.3.1 Definitions and a result

We first introduce some basic definitions and a key result. While this material may seem forbiddingly formal, the ideas are actually fairly straightforward and we will cover a few examples to demonstrate this. We recommend that in your first pass through the material, you spend just a little time on the two definitions and the theorem and then go back over them after reading the guide we provide just after the theorem.

Definition 5.1 A path \(p\) is said to be \(d\)-separated (or blocked) by a set of nodes \(Z\) if and only if

  1. \(p\) contains a chain \(i \rightarrow m \rightarrow j\) or a fork \(i \leftarrow m \rightarrow j\) such that the middle node \(m\) is in \(Z\), or
  2. \(p\) contains an inverted fork (or ) \(i \rightarrow m \leftarrow j\) such that the middle node \(m\) is not in \(Z\) and such that no descendant of \(m\) is in \(Z\).

Definition 5.2 A set of variables \(Z\) satisfies the back-door criterion relative to a an ordered pair of variables \((X, Y)\) in a directed acyclic graph (DAG) \(G\) if:

  1. no node in \(Z\) is a descendant of \(X\); and
  2. \(Z\) blocks every path between \(X\) and \(Y\) that contains an arrow into \(X\).38

Given this criterion, Pearl (2009a, p. 79) proves the following result.

Theorem 5.1 If a set of variables \(Z\) satisfies the back-door criterion relative to \((X, Y)\), then the causal effect of \(X\) on \(Y\) is identifiable and is given by the formula \[ P(y | x) = \sum_{z} P(y | x, z) P(z), \] where \(P(y|x)\) stands for the probability that \(Y = y\), given that \(X\) is set to level \(X=x\) by external intervention.39

In plainer language, the theorem tells us that we can estimate the causal effect of \(X\) on \(Y\) if we have a set of variables \(Z\) that satisfies the back-door criterion. Then the second definition tells us that \(Z\) needs to block all back-door paths while not containing any descendant of \(X\). Finally, the first definition tells what it means to block a path.

5.3.2 Application of back-door criterion to basic diagrams

Applying the back-door criterion to Figure 5.1 is straightforward and intuitive. The set of variables \(\{Z\}\) or simply \(Z\) satisfies the criterion, as \(Z\) is not a descendant of \(X\) and \(Z\) blocks the back-door path \(X \leftarrow Z \rightarrow Y\). So by conditioning on \(Z\), we can estimate the causal effect of \(X\) on \(Y\). This situation is a generalization of linear model in which \(Y = X \beta + Z \gamma + \epsilon_Y\) and \(\epsilon_Y\) is independent of \(X\) and \(Z\), but \(X\) and \(Z\) are correlated. In this case, it is well known that omission of \(Z\) would result in a biased estimate of \(\beta\), the causal effect of \(X\) on \(Y\), but by including \(Z\) in the regression, we get an unbiased estimate of \(\beta\). In this situation, \(Z\) is a confounder.

Turning to Figure 5.2, we see that \(Z\), which is a of the effect of \(X\) on \(Y\), does not satisfy the back-door criterion, because \(Z\) is a descendant of \(X\). However, \(\emptyset\) (i.e., the empty set) does satisfy the back-door criterion. Clearly, \(\emptyset\) contains no descendant of \(X\). Furthermore, the only path other than \(X \rightarrow Y\) that exists is \(X \rightarrow Z \rightarrow Y\), which does not have a back-door into \(X\). Note that the back-door criterion not only implies that we need not condition on \(Z\) to obtain an unbiased estimate of the causal effect of \(X\) on \(Y\), but that we should not condition of \(Z\) to get such an estimate.

Finally in Figure 5.3, we have \(Z\) acting as what Pearl (2009b) (p. 17) refers to as a “collider” variable.40 Again, we see that \(Z\) does not satisfy the back-door criterion, because \(Z\) is a descendant of \(X\). However, \(\emptyset\) again satisfies the back-door criterion. First, it contains no descendant of \(X\). Second, the only path other than \(X \rightarrow Y\) that exists is \(X \rightarrow Z \leftarrow Y\), which does not have a back-door into \(X\). Again, the back-door criterion not only implies that we need not condition on \(Z\), but that we should not condition of \(Z\) to get an unbiased estimate of the causal effect of \(X\) on \(Y\).

5.3.3 Exercises

  1. Draw the DAG for the structural model above relating intelligence, industriousness, education, and income. For each \(x\) variable, identify the sets of conditioning variables that satisfy the backdoor criterion with respect to estimating a causal effect of the variable on \(y\).
  2. For any set of conditioning variables not considered in the regression table above, run regressions to confirm that these indeed deliver good estimates of causal effects.

5.4 Discrimination and bias

Let’s examine a real-world example related to possible gender discrimination in labour markets.41 When critics claimed that Google systematically underpaid its female employees, Google responded that that when you take “location, tenure, job role, level and performance” into consideration, women’s pay was basically identical to that of men’s.42 In other words, controlling for characteristics of the job, women received the same pay.

But what if stereotyping means men are given roles that are paid better? In this case, naive comparisons of wages by gender “controlling for” occupation would understate the presence of discrimination. Let’s illustrate this with a DAG based on a simple occupational sorting model with unobserved heterogeneity.

Discrimination

Figure 5.4: Discrimination

Note that there is in fact no effect of being female (\(F\)) on earnings (\(Y\)) except through discrimination (\(D\)). Thus, if we could control for discrimination, we’d get a coefficient of zero on \(F\). In this example, we aren’t interested in estimating the effect of being female on earnings; we are interested in estimating the effect of discrimination itself. Note also that discrimination is not directly observed, but given our DAG, we can use \(F\) as a proxy for \(D\), as we have assumed that there is no relation between \(F\) and \(Y\) (or between \(F\) and \(O\), which denotes occupational assignment) except through \(D\).

In this DAG, there are two paths between \(D\) and \(Y\):

\[ \begin{aligned} D &\rightarrow Y \\ D &\rightarrow O \rightarrow Y \end{aligned} \] Neither path is a backdoor path between \(D\) and \(Y\) because neither has an arrow pointing into \(D\). Conditioning on \(O\) causes the estimated effect of discrimination on income to be biased because \(O\) is a descendant of \(D\), and a set of conditioning variables \(Z\) satisfies the back-door criterion only if no node in \(Z\) is a descendant of \(D\).

n <- 100000

df <- tibble(
  female = runif(n) >= 0.5,
  discrimination = female,
  occupation = 1 + 0 * female - 2 * discrimination + rnorm(n),
  salary = 1 - 1 * discrimination + 2 * occupation + rnorm(n) 
)

lm_1 <- lm(salary ~ female, data = df)
lm_2 <- lm(salary ~ female + occupation, data = df)

stargazer(lm_1, lm_2, type = sg_format, 
          header = FALSE, omit.stat = c("ser", "f"))
Dependent variable:
salary
(1) (2)
female -4.996*** -1.006***
(0.014) (0.009)
occupation 1.997***
(0.003)
Constant 3.004*** 1.006***
(0.010) (0.005)
Observations 100,000 100,000
R2 0.556 0.911
Adjusted R2 0.556 0.911
Note: p<0.1; p<0.05; p<0.01

5.4.1 Exercises

  1. Given the equations for occupation and salary in the simulation above, what is the direct effect of discrimination on salary? What is the indirect effect (i.e., the effect via occupation) of discrimination on salary? What is the total effect of discrimination on salary? How do each of these effects show up in the reported regression results? (Note: Because of sampling variation, the relationships will not be exact.)
  2. Consider the possibility of an additional unobserved variable, ability (\(A\)), that affects role assignment (\(O\)) and also affects income (\(Y\)) directly.
Discrimination with ability

Figure 5.5: Discrimination with ability

What would be the correct conditioning strategy in this case? Would it now make sense to condition on \(O\) if the goal is to estimate the total effect of discrimination on \(Y\)? (Hint: In answering this question, it may help to adapt the simulation above to generate an ability variable and to incorporate that in the model as follows.)

ability = rnorm(n),
occupation = 1 + 0 * female - 2 * discrimination + 2 * ability + rnorm(n),
salary = 1 - 1 * discrimination + 2 * occupation + 0.3 * ability + rnorm(n)
  1. Consider the additional possibility of different occupational preferences between males and females, as depicted below.
Discrimination with ability and job preferences

Figure 5.6: Discrimination with ability and job preferences

Given this DAG, is it possible to identify a set of conditioning variables that allow you to estimate the total effect of discrimination on salary? What roles does \(O\) have in this DAG? (Hint: Replace the 0 coefficient in the occupation equation used in the last question with a value of either -1 or +1. Does the sign of this coefficient affect the sign of bias, if any?) Is it possible to estimate the direct effect of discrimination on salary? If so, how? If not, why?

5.5 Causal diagrams: Applications in accounting

A typical paper in accounting research will include many variables to “control for” the potential confounding of causal effects. While many of these variables should be considered confounders, less attention is given to explaining why it is reasonable to assume that they are not mediators or colliders. Such a discussion is important because the inclusion of “controls” that are mediators or colliders will generally lead to bias.

One paper that does discuss this distinction is Larcker et al. (2007), who use a multiple regression (or logistic) model of the form:43

\[\begin{equation} Y = \alpha + \sum_{r=1}^R \gamma _r Z_r + \sum_{s=1}^S \beta_s X_s + \epsilon \tag{5.1} \end{equation}\]

Larcker et al. (2007) suggest that

“One important feature in the structure of Equation (5.1) is that the governance factors [\(X\)] are assumed to have no impact on the controls (and thus no indirect impact on the dependent variable). As a result, this structure may result in conservative estimates for the impact of governance on the dependent variable. Another approach is to only include governance factors as independent variables, or:

\[\begin{equation} Y = \alpha + \sum_{s =1}^S \beta_s X_s + \epsilon \tag{5.2} \end{equation}\]

The structure in Equation (5.2) would be appropriate if governance impacts the control variables and both the governance and control variables impact the dependent variable (i.e., the estimated regression coefficients for the governance variables will capture the total effect or the sum of the direct effect and the indirect effect through the controls).”

But there are some subtle issues here. If some elements of \(Z_r\) are mediators and others are confounders, then both equations will be subject to bias. Equation (5.2) will be biased due to omission of confounders, while Equation (5.1) will be biased due to inclusion of mediating variables. Additionally, the claim that the estimates are “conservative” is only correct if the indirect effect via mediators is of the same sign as the direct (i.e., unmediated) effect. If this is not the case, then the relation between the magnitude (and even the sign) of the direct effect and the indirect effect is unclear.

Additionally, this discussion does not allow for the possibility of colliders. For example, governance plausibly affects leverage choices, while performance is also likely to affect leverage. If so, “controlling for” leverage might induce associations between governance and performance even absent a true relation between these variables.44 While the with-and-without-controls approach used by Larcker et al. (2007) has intuitive appeal, a more robust approach requires careful thinking about the plausible causal relations between the treatment variables, the outcomes of interest, and the candidate control variables.