Empirical Research in Accounting: Tools and Methods

Course book for accounting research

Ian D. Gow

University of Melbourne

Tony Ding


01 May 2024



This is the on-line version of the work-in-progress edition of Empirical Research in Accounting: Tools and Methods by Ian D. Gow and Tony Ding. We detail recent changes to the book in Recent changes and ongoing work in Materials to come.

This book provides a course on financial accounting research that begins at an upper-undergraduate (“honours”) or introductory PhD level. One goal of the course, like most PhD courses, is to prepare PhD students to take further research courses and to go on to do their own research. Another goal of the course is to provide students with a set of skills that is useful in other domains, such as consulting or finance. This second goal stems from the origins of parts of this course as a joint honours-PhD course at the University of Melbourne, where the honours students are undergraduates completing an additional year of study with a focus on research. While some honours students progress to PhD studies, most elect to take jobs in industry, such as consulting, auditing, or public service.

Features of this book

Some features of this book mean that a course based on it will be distinguished from a more traditional PhD-level course in a number of respects that we discuss here.

Pedagogically driven selection of papers

Many syllabuses for PhD courses in accounting focus on recent papers with a view to giving students a sense of the current themes and trends in research to help students spot gaps in the literature that they can fill with their own research. We view such courses as complementary to this course, but take a different approach.

Aiming to provide a more fundamental understanding of accounting research, the selection of papers in this course is driven more by pedagogical goals than an attempt to represent the current state of play in accounting research. In some cases, this means covering older papers (e.g., Ball and Brown, 1968), but in other cases we use a recent paper that features a core idea or approach.

Incorporation of data analysis skills

A second feature that distinguishes this course from most PhD courses in accounting is an emphasis on data analysis skills, which are deliberately woven into the course throughout. While it might have been possible to make this a course focused exclusively on such skills, it is our view that these skills are best learned through applying them to real research questions. Conversely, we believe that being able to pull data, run simulations, and get more involved with critical elements of the research process engenders a better understanding of research.

We build data analysis and computing skills into the course at each step in a systematic way. In practice, research computing skills are the bread and butter of a researcher’s toolkit but are generally neglected in PhD programs’ formal curricula. The prevailing ethos seems to be that research computing skills are acquired informally from other students, through research assistantships and collaboration with faculty members, and so on.

In some doctoral programs, this approach may work to a degree. But in many doctoral programs, such informal learning fails to prepare students adequately. For example, if students’ collaboration with faculty members is informal and related to collaboration on papers where the students do the data analysis with limited or no hands-on guidance from the faculty member, then the opportunity for clear and comprehensive guidance is limited.

Greater emphasis on research design and methods

Accounting research is overwhelmingly an empirical discipline seeking to draw causal inferences and, as such, significant research training should be focused on research design issues. Part III of the book examines causal inference in depth, including natural experiments, regression discontinuity designs, instrumental variables, and fixed effects. However, there we will see that the common thinking that these techniques offer hope for warranted causal inference, rather than the settings in which they can be deployed, is generally flawed. Throughout the course, we will offer a broader set of tools for making inferences about real-world phenomena.

In this course, we strive to equip students with systematic data analysis and computing skills that are needed to conduct the analyses that we cover. We believe that understanding of statistical and econometric techniques by accounting researchers is more likely to be enhanced by hands-on simulation analysis than by analysis of consistency and asymptotic variance of estimators. By building in the data analysis skills needed to perform such simulations, we hope that this course provides a platform for accounting researchers to think more carefully about the properties of their estimators. A core commitment we make in this book is that every analysis we present can be conducted by the reader by copying and pasting the code found herein.


We presume both prior knowledge of some topics and access to certain computing resources. We have endeavoured to keep these requirements to a minimum.

  1. Knowledge of accounting and business. In terms of accounting, we assume a solid understanding of the content of an introductory financial accounting course and enough understanding of business to make sense of accounting.

  2. Prior exposure to statistics and econometrics. Some familiarity with the elements of statistical inference and ordinary least-squares (OLS) regression will be helpful. While we do provide some introductory material on elements of these in 3  Regression fundamentals, 4  Causal inference, and 5  Statistical inference, but this is selective and it may help to use a textbook to brush up on these topics as you work through the book.

  3. Access to academic journals. The course makes extensive use of papers in academic journals. If you are a faculty member, researcher, or student at an academic institution, then you should be able to access the papers we use through your library. Some universities provide a service (perhaps for a fee) for alumni access to academic journals. Unfortunately, if you cannot access the papers, it will be difficult to make full use of this course book beyond Part I.

  4. Access to a computer and the internet. To accommodate a broader audience and minimize set-up costs, we assume nothing of the reader other than access to a computer and the internet and basic proficiency in using these.

  5. Ability to install R and RStudio. In this course, while we focus on R, a popular open-source programming language for statistics and data science, we do not assume any knowledge of R in this course. In 2  Describing data, we provide an introductory tutorial on R and RStudio that we hope gives your the required platform for later chapters whether you are a complete novice or coming to R from another statistical software system. Throughout the book, we direct the reader to additional resources for learning more detail than we can fit in this book. We make particular reference to R for Data Science, which is available as a book or for free on the internet. With the occasional detour to resources like R for Data Science, working through this book should provide the reader with a strong set of skills in data science. Our hope is that these skills will prove useful whether you continue down the path of academic research, or pursue a position in practice.

  6. Access to WRDS data. Because it is difficult to go very far in accounting research without WRDS data, this book is targeted at the reader who has a WRDS account. If you do not have a WRDS account, but are eligible for one (e.g., you are a graduate student, researcher, or faculty member at a WRDS-subscribing institution), then you should apply for such an account.1

A guide for readers

The book is written so as to be fairly accessible to a novice reading independently (subject to the prerequisites outlined above). We recommend that such readers work through the first few chapters in order, including running the code, completing the exercises, and thinking about the discussion questions. That said, some elements of the exercises and discussion questions are subtle and having an instructor or someone to discuss these with will help you to get the full value from this material.

But we hope that this book will be useful to a variety of readers, learners, and instructors beyond the novices. Below we discuss possible approaches for some hypothetical readers.

  • I am interested in learning more about issues related to research design and causal inference. You might find you can dive into 2  Describing data and 4  Causal inference, then move to 17  Natural experiments and subsequent chapters.
  • I am interested in learning more about issues related to research design and causal inference, but I don’t really want to learn R. The plan for the hypothetical reader in the previous bullet point likely works. Even if you aren’t interested in learning R, we think that running the code helps solidify understanding and that what the code is doing is sufficiently clear that copy-pasting the code into your own computer should be enough to get the gist of what is going on.
  • I have heard about R and would like to learn more about it. 2  Describing data and 3  Regression fundamentals cover some of the basics. But if you’re already proficient in something like SAS or Stata, you may find it pretty easy to skip those chapters (after meeting the prerequisites above) and go to a chapter that aligns with your research interests, and see if you can figure out what the code is doing as you work through it. We have deliberately written the chapters so that you can work on each chapter independent of the others.


While this book draws on materials we have been using for many years, writing this book began in earnest in early 2021. Since then we have received help from many others, ranging from supplying code and data, suggestions on content, feedback on drafts, and simply encouragement to persist with the project. We would like to recognize the help of Ulrich Atz, Andrew Baker, Ray Ball, Jeremy Bertomeu, Stu Black, Mark Bradshaw, Philip Brown, Patty Dechow, Jenny Zha Giedt, Lucy Gow, Amy Hutton, James Kavourakis, David Larcker, Andy Leone, Ying Liang, Christian Leuz, Miguel Minutti-Meza, Matt Pinnuck, Steve O’Byrne, Shiva Rajgopal, Mario Schabus, Stefan Schantl, Richard Sloan, Dan Taylor, Jake Thomas, Jake Thornock, Stephen Walker, Charlie Wang, Yihong Wang, Eddie Watts, and Anastasia Zakolyukina.

We also thank the many students who suffered through earlier versions of the materials here, including students at Deakin, Harvard, Melbourne, Michigan, and Wharton. Quinn Swanquist taught a PhD class at the University of Alabama using an earlier version of the book. We thank Quinn and his students—Susan Rykowski, Amy Mathews, and Jack Archer—for providing detailed feedback.

Some notes on style

We follow British (hence Australian) conventions for the most part. Reflecting the enduring influence of a Pocket Oxford Dictionary one of us received at age seven, we tend to use “-ize” spellings instead of “-ise” spellings (in any case, these are more familiar to American readers). Also we likely use the Oxford comma more often than not. One benefit of our choice is that we do not have to follow the prescription of American English that commas and full stops (periods) always go inside quotes and can instead put them where they naturally belong (i.e., where speakers of languages other than American English put them) even if this produces sentences that may look odd to some American readers. (It’s hard to disagree with Hadley Wickham—the lead author of the Tidyverse—on this point: “That is literally the stupidest rule in American English and I refuse to follow it.”)

For code, we largely follow the Tidyverse style guide for R code, except that we often put the first item after the assignment operator (<-) on a new line.

Recent and future updates

Recent changes

  • March 2024
    • Finalized chapter on matching
    • Replaced discussion of history of econometrics in 4  Causal inference
    • Extensive proofreading
    • More discussion of code
    • Added “alt text” for most figures
  • February 2024
    • Many changes to increase consistency of code
    • Switch to use of stringr functions throughout
    • Improved adherence to R style guidelines
    • Added many index entries (PDF version)
  • January 2024
  • November 2023
    • Added chapter on GLMs
    • Added appendix on parquet data
  • October 2023
    • Added material on extreme values and matching
    • Prepared templates for Part III
  • July 2023
    • Converted source code from bookdown to Quarto. One benefit is a much better search engine for this site
    • Switched from the magrittr pipe (%>%) to the native pipe (|>).
    • Updated references to “R for Data Science” given recent release of the second edition of that book
    • Switched to native form of anonymous functions (\(x))
    • Migrated from stargazer to modelsummary
    • Migrated from lfe to fixest
  • April 2023
    • Prepared templates for most of Part I and Part II
    • Polished material on the efficient markets hypothesis
    • Polished chapter on event studies
  • February 2023
  • January 2023
  • September 2022
    • Added material on Zhang (2007) to 13  Event studies (event studies)
    • Refined chapter on matching
  • August 2022
    • Added material on Beaver (1968).
  • July 2022
    • Added more material to SQL primer (appendix)
  • June 2022
  • April 2022
    • Filled out chapter on accrual anomaly (Sloan, 1996)
    • Added chapter on earnings management mostly focused on DSS (1995)
  • January 2022
    • Added separate chapter on FFJR (1969)
    • Added separate chapter on Ball and Brown (1968)
  • November 2021
    • Added chapter on RDD
    • Added simulation from Leone et al. (2019)
  • October 2021
    • Added chapter on natural experiments
  • July 2021
    • Added chapter on panel data
    • Extensive revisions to material on IV

Materials to come

Below is our current work plan. Most significant changes are likely to be limited to online supplements as we prepare the text for publication.

Methods (online 2024)

  • Complete chapter on selection models

More accruals (online 2024)

  • Build up simulations using (somewhat) realistic models of actual business and accounting processes
  • Coverage of accrual quality models, including Dechow and Dichev (2002)

Structural models (online 2024)

  • Include material on structural models from Gow, Larcker and Reiss (2016)
  • Explicitly discuss weaknesses implicit in Gow, Larcker and Reiss (2016) material
  • Add an application, perhaps the one from Bertomeu, Beyer, and Taylor (“BBT”)
  • Examine a modification of BBT’s model

If you have suggestions for the book or requests, please feel free to contact either Ian or Tony. Alternatively, you may create a new issue describing your suggestion in the repository for the companion package for this course here.

  1. Go to wrds-www.wharton.upenn.edu/register to do so.↩︎