1 Introduction

1.1 Structure of the book

The book is organized into four parts.

Part I: Foundations covers a variety of topics, including research computing, statistics, causal inference, and some details of data sets commonly used in accounting research. This part of the book covers material often not included in the formal coursework of a PhD in accounting. For example, material related to statistics and causal inference is often assumed to be covered in coursework in statistics and econometrics rather than in the accounting-specific courses. Material on research computing and detailed investigation of data sets is generally not covered in PhD coursework at all, with the typical approach being for these skills and knowledge to be picked up informally.

Assuming very little in terms of prior knowledge, Part I: Foundations covers core concepts and skills in data analysis, statistics, and causal inference.

Chapter 1 provides an introduction to the book, including a reading guide and instructions for setting up your computer.
Given the centrality of data skills to getting the full value out of this book, we provide a fast-paced tutorial-style introduction to Python and Python Polars in Chapter 2.
As we assume very little knowledge of statistics and regression analysis, we provide an introduction to the basics of regression analysis in Chapter 3.
Chapter 4 builds on Chapter 3 to provide an introduction to elements of causal inference.
Chapter 5 provides an introduction to statistical inference, which is a core part of empirical accounting research.

Part I: Foundations introduces key data sets frequently used in empirical accounting research.

Chapters 6 and 8 provide an introduction to Compustat and accessing data through WRDS.
Chapter 7 discusses the linking of data sets from different providers with a focus on linking financial statement data from Compustat with stock return data from CRSP.
We wrap up Part I with Chapter 9, which provides additional data skills useful for both later chapters and (we hope) readers’ own research efforts.

Part I provides the foundations for the remaining parts of the book. Depending on the preferences of readers and instructors, one could either continue with Part II: Capital Markets Research or skip ahead to Part III: Causal Inference. While some parts of Part III draw on skills and concepts covered in Part II, we flag such instances in each case.

The material of Part I could be covered in a number of ways. One approach would be to cover this material in a standalone introductory course or “boot camp”. If supplemented by materials going deeper into the data science skills, there is plenty of material here for a full-fledged course focused on such skills.

Another approach might be to assign Part I: Foundations to students on a self-study basis, perhaps with select portions being covered when they are most relevant for later portions of the book. For example, for a course based on Part II: Capital Markets Research of the book, Chapter 7 covers the important topic of correctly linking databases—not often encountered in PhD courses—and could be assigned as background work as and when relevant to material from Part II.

Part II: Capital Markets Research provides the basis for a PhD-level course focused on capital markets research. This part alone easily provides materials for about eight weeks of coursework. For a ten- or twelve-week course, an instructor could draw on materials from other parts of the book, or could easily supplement using other materials. Part II is deliberately focused on more “classical” material and thus could easily complement related material that focuses on more contemporary work in financial accounting research. Part II starts with research from the 1960s—such as Fama et al. (1969), Ball and Brown (1968), and Beaver (1968)—and covers some of the most important studies of subsequent decades, including Bernard and Thomas (1989), Sloan (1996), and key earnings management papers of the 1980s and 1990s.

Part III: Causal Inference provides the basis for a PhD-level course focused on causal inference in empirical accounting research. Part III has a more contemporary orientation and is not focused on capital markets research.

Depending on the needs of students in a given program, Part III could be taught as a standalone course with elements of Part I being drawn upon as needed. Topics in Chapter 19 draw on materials in Part I, with extensive discussion of causal diagrams (Chapter 4), standard errors (Chapter 5), linking databases (Chapter 7), using regular expressions (Chapter 9), and two-step regressions (drawing on materials covered in Chapter 3).

While there are connections between Part II and Part III (e.g., Chapter 19 covers measures of accruals and earnings management that are covered in Chapters 15 and 16), these do not seem to rise to the level of considering Part II a prerequisite for Part III. Chapter 19 focuses on earnings management, which is the topic of an entire chapter in Part II (Chapter 16). While the material of Part III might typically be covered later in the coursework of an accounting PhD program, we have endeavoured to present this material in a way that is fairly self-contained and therefore accessible to students earlier in their PhD studies (perhaps using materials from Part I to fill in gaps). There may even be merit in covering most of Part III before Part II, as it will allow students to read Part II materials (mostly older papers) through a more contemporary lens.

Part IV: Additional Topics provides chapters on topics such as matching, handling extreme values, selection models, and statistical (machine) learning. While these are important topics, we believe they are less closely related than the materials of Parts II and III. Instructors could easily incorporate chapters from Part IV in courses based on Part II or Part III of this book, or as standalone material for courses not based on this book.

1.2 Setting up your computer

Instructions for setting up your computer to run the code in this book are available on the support page for this book.

The support page includes the current Quarto templates together with instructions for installing the required software, creating a project environment, setting up a .env file, and downloading data from WRDS.