Empirical Research in Accounting: Tools and Methods

Course book for accounting research

Authors

Affiliations

Ian D. Gow

Unaffiliated

Tongqing Ding

University of Melbourne

Published

10 Jul 2026

Preface

Empirical Research in Accounting: Tools and Methods

Note

This is the published R edition of the book. A Python version of the book is available at era_pl_book.

Welcome

This is the on-line version of Empirical Research in Accounting: Tools and Methods by Ian D. Gow and Tony Ding, which was published by CRC Press in December 2024.

This book provides a course on financial accounting research that begins at an upper-undergraduate (“honours”) or introductory PhD level. One goal of the course, like most PhD courses, is to prepare PhD students to take further research courses and to go on to do their own research. Another goal of the course is to provide students with a set of skills that is useful in other domains, such as consulting or finance. This second goal stems from the origins of parts of this course in a joint honours-PhD course at the University of Melbourne, where the honours students are undergraduates completing an additional year of study with a focus on research. While some honours students progress to PhD studies, most elect to take jobs in industry, such as consulting, auditing, or public service.

Features of this book

Some features of this book mean that a course based on it will be distinguished from a more traditional PhD-level course in a number of respects that we discuss here.

Pedagogically driven selection of papers

Many syllabuses for PhD courses in accounting focus on recent papers with a view to giving students a sense of the current themes and trends in research to help students spot gaps in the literature that they can fill with their own research. We view such courses as complementary to this course, but take a different approach.

Aiming to provide a more fundamental understanding of accounting research, the selection of papers in this course is driven more by pedagogical goals than an attempt to represent the current state of play in accounting research. In some cases, this means covering older papers (e.g., Ball and Brown, 1968), but in other cases we use a recent paper that features a core idea or approach.

Incorporation of data analysis skills

A second feature that distinguishes this course from most PhD courses in accounting is an emphasis on data analysis skills, which are woven into the course throughout. While it might have been possible to make this a course focused exclusively on such skills, it is our view that these skills are best learned through applying them to real research questions. Conversely, we believe that being able to pull data, run simulations, and get more involved with critical elements of the research process engenders a better understanding of research.

We build data analysis and computing skills into the course at each step in a systematic way. In practice, research computing skills are the bread and butter of a researcher’s toolkit but are generally neglected in PhD programs’ formal curricula. The prevailing ethos seems to be that research computing skills are acquired informally from other students, through research assistantships and collaboration with faculty members, and so on.

In some doctoral programs, this approach may work to a degree. But in many doctoral programs, such informal learning fails to prepare students adequately. For example, if students’ collaboration with faculty members is informal and related to collaboration on papers where the students do the data analysis with limited or no hands-on guidance from the faculty member, then the opportunity for clear and comprehensive guidance is limited.

Greater emphasis on research design and methods

Accounting research is overwhelmingly an empirical discipline seeking to draw causal inferences and, as such, significant research training should be focused on research design issues. Part III of the book examines causal inference in depth, including natural experiments, regression discontinuity designs, instrumental variables, and fixed effects. However, we will see that there are flaws in the common belief that it is these techniques that offer hope for warranted causal inference, rather than the settings in which they can be deployed. Throughout the course, we will offer a broader set of tools for making inferences about real-world phenomena.

In this course, we strive to give students the skills that are needed to conduct the analyses that we cover. We believe that understanding of statistical and econometric techniques by accounting researchers is more likely to be enhanced by hands-on simulation analysis than by analysis of consistency and asymptotic variance of estimators. By building in the data analysis skills needed to perform such simulations, we hope that this course provides a platform for accounting researchers to think more carefully about the properties of their estimators. A core commitment we make in this book is that every analysis we present can be conducted by the reader using the code found herein.

Prerequisites

We presume prior knowledge of some topics and access to certain computing resources. We have endeavoured to keep these requirements to a minimum.

Knowledge of accounting and business. In terms of accounting, we assume a solid understanding of the content of an introductory financial accounting course and enough understanding of business to make sense of accounting.
Prior exposure to statistics and econometrics. Some familiarity with the elements of statistical inference and ordinary least-squares (OLS) regression will be helpful. While we do provide some introductory material on elements of these in Chapters 3 Regression fundamentals–5 Statistical inference, this is selective and it may help to use a textbook to brush up on these topics as you work through the book.
Access to academic journals. The course makes extensive use of papers in academic journals. If you are a faculty member, researcher, or student at an academic institution, then you should be able to access the papers we use through your library. Some universities provide a service (perhaps for a fee) for alumni access to academic journals. Unfortunately, if you cannot access the papers, it will be difficult to make full use of this course book beyond Part I.
Access to a computer and the internet. To accommodate a broader audience and minimize set-up costs, we assume nothing of the reader other than access to a computer and the internet and basic proficiency in using these.
Ability to install R and RStudio. In this course, while we focus on R, a popular open-source programming language for statistics and data science, we do not assume any knowledge of R in this course. 2 Describing data provides an introductory tutorial on R and RStudio. We hope that this provides the required platform for later chapters whether you are a complete novice or coming to R from another statistical software system. Throughout the book, we direct the reader to additional resources for learning more detail than we can fit in this book. With the occasional detour to resources like R for Data Science, working through this book should provide the reader with a strong set of data science skills.¹ Our hope is that these skills will prove useful whether you continue down the path of academic research or pursue a position in practice.
Access to WRDS data. Because it is difficult to go very far in accounting research without WRDS data, this book is targeted at the reader who has a WRDS account. If you do not have a WRDS account but are eligible for one (e.g., you are a graduate student, researcher, or faculty member at a WRDS-subscribing institution), then you should apply for such an account.²

A guide for readers

The book is written so as to be fairly accessible to a novice reading independently (subject to the prerequisites outlined above). We recommend that such readers work through the first few chapters in order, including running the code, completing the exercises, and thinking about the discussion questions. That said, some elements of the exercises and discussion questions are subtle and having an instructor or someone to discuss these with will help you to get the full value from this material.

We hope that this book will be useful to a variety of readers, learners, and instructors beyond the novices. Below we discuss possible approaches for some hypothetical readers.

I am interested in learning more about issues related to research design and causal inference. You might find you can dive into Chapters 2 Describing data and 4 Causal inference, then move to 17 Natural experiments and subsequent chapters.
I am interested in learning more about issues related to research design and causal inference, but I don’t really want to learn R. The plan for the hypothetical reader in the previous bullet point likely works. Even if you aren’t interested in learning R, we think that running the code helps solidify understanding and that what the code is doing is sufficiently clear that copy-pasting the code into your own computer should be enough to get the gist of what is going on.
I have heard about R and would like to learn more about it. Chapters 2 Describing data and 3 Regression fundamentals cover some of the basics. But if you’re already proficient in something like SAS or Stata, you may find it pretty easy to skip those chapters (after meeting the prerequisites above) and go to a chapter that aligns with your research interests, and see if you can figure out what the code is doing as you work through it. We have deliberately written the book so that, apart from initial set-up in Setting up your computer, code in each chapter is independent of that in the others.

Acknowledgements

While this book draws on materials we have been using for many years, writing this book began in earnest in early 2021. Since then we have received help from many others, ranging from supplying code and data, suggestions on content, feedback on drafts, and simply encouragement to persist with the project. We would like to recognize the help of Ulrich Atz, Andrew Baker, Ray Ball, Jeremy Bertomeu, Stu Black, Mark Bradshaw, Philip Brown, Jade Shizhe Chen, Patty Dechow, Jenny Zha Giedt, Lucy Gow, Amy Hutton, Rohit Kattamuri, James Kavourakis, David Larcker, Changju Lee, Andy Leone, Ying Liang, Christian Leuz, Miguel Minutti-Meza, Casey Mulligan, Matt Pinnuck, Steve O’Byrne, Shiva Rajgopal, Mario Schabus, Stefan Schantl, Richard Sloan, Dan Taylor, Jake Thomas, Jake Thornock, Stephen Walker, Charlie Wang, Yihong Wang, Eddie Watts, and Anastasia Zakolyukina.

We also thank the many students who suffered through earlier versions of the materials here, including students at Deakin, Harvard, Melbourne, Michigan, and Wharton. Quinn Swanquist taught a PhD class at the University of Alabama using an earlier version of the book. We thank Quinn and his students—Susan Rykowski, Amy Mathews, and Jack Archer—for providing detailed feedback.

Some notes on style

We follow British (hence Australian) conventions for the most part. Reflecting the enduring influence of a Pocket Oxford Dictionary one of us received at age seven, we tend to use “-ize” spellings instead of “-ise” spellings (in any case, these are more familiar to American readers). Also we likely use the Oxford comma more often than not. One benefit of our choice is that we do not have to follow the prescription of American English that commas and full stops (periods) always go inside quotes and can instead put them where they naturally belong (i.e., where speakers of languages other than American English put them) even if this produces sentences that may look odd to some American readers. (It’s hard to disagree with Hadley Wickham—the lead author of the Tidyverse—on this point: “That is literally the stupidest rule in American English and I refuse to follow it.”)

For code, we largely follow the Tidyverse style guide for R code, except that we often put the first item after the assignment operator (<-) on a new line.

Timeline of updates

January 2025
- Added errata section to this page
October 2024
- Many corrections in advance of publication
March 2024
- Finalized chapter on matching
- Replaced discussion of history of econometrics in 4 Causal inference
- Extensive proofreading
- More discussion of code
- Added “alt text” for most figures
February 2024
- Many changes to increase consistency of code
- Switch to use of stringr functions throughout
- Improved adherence to R style guidelines
- Added many index entries (PDF version)
January 2024
- Refined chapters in Part I
- Replaced material in 4 Causal inference
- Completed remaining portion of 5 Statistical inference
- Completed 23 Beyond OLS and 24 Extreme values and sensitivity analysis
- Removed incomplete chapter on selection models
- Extensive proofreading
November 2023
- Added chapter on GLMs
- Added appendix on parquet data
October 2023
- Added material on extreme values and matching
- Prepared templates for Part III
July 2023
- Converted source code from bookdown to Quarto. One benefit is a much better search engine for this site
- Switched from the magrittr pipe (%>%) to the native pipe (|>).
- Updated references to R for Data Science given recent release of the second edition of that book
- Switched to native form of anonymous functions (\(x))
- Migrated from stargazer to modelsummary
- Migrated from lfe to fixest
April 2023
- Prepared templates for most of Part I and Part II
- Polished material on the efficient markets hypothesis
- Polished chapter on event studies
February 2023
- Added repository of Quarto templates for exercises.
- Many edits to Part I of the book.
January 2023
- Added “data science bootcamp” chapter (2 Describing data)
- Added chapter on prediction (26 Prediction)
September 2022
- Added material on Zhang (2007) to 13 Event studies (event studies)
- Refined chapter on matching
August 2022
- Added material on Beaver (1968).
July 2022
- Added more material to SQL primer (appendix)
June 2022
- Organized book into parts. See Structure of the book for more on how the book is structured.
- Initial draft of second chapter on natural experiments
- Added material on evaluating natural experiments and the parallel trends assumption to 19 Natural experiments revisited
April 2022
- Filled out chapter on accrual anomaly (Sloan, 1996)
- Added chapter on earnings management mostly focused on DSS (1995)
January 2022
- Added separate chapter on FFJR (1969)
- Added separate chapter on Ball and Brown (1968)
November 2021
- Added chapter on RDD
- Added simulation from Leone et al. (2019)
October 2021
- Added chapter on natural experiments
July 2021
- Added chapter on panel data
- Extensive revisions to material on IV

If you have comments or requests, please feel free to contact either Ian or Tony. Alternatively, you may create a new issue describing your suggestion in the repository for the companion package for this course here.

Errata

A number of minor errors have been detected since the first printing. We have classified the errata into three categories:

typesetting: Nothing is wrong with the text per se, but there is a minor gremlin in typesetting (e.g., extra space). We also include minor code style issues in here.
typo: Incorrect words or minor punctuation issues.
fix: Corrections to code (e.g., due to changes in data).

The corrections are below. The page numbers refer to the place in the hardback version of the book. All of these errata have been fixed in the online version.

Typos and typesetting issues

multiple pages (typesetting): Made spelling of “cutoff” consistent (no longer a mix of “cutoffs” and “cut-offs”).
p. 27 (typesetting): Add space to mutate(roe = ib / ceq).
p. 47 (typo): In “if you highlight the text from test_scores |>”, reference to test_scores should be to camp_scores.
p. 54 (typo): In the first full paragraph, the two references to size should be to cfo to match the code below.
p. 55 (typo): Replace size with cfo in “there is no visually discernible relation between size…”.
p. 59 (typesetting): Extra space after paragraph that begins with “Making causal inferences …”.
p. 60 (typo): Add apostrophe to get “researchers’ focus”.
p. 60 (typo): Remove comma from $\mathbb{E}[X, \epsilon] \neq 0$.
p. 61 (typo): Change “of” to “off” to get “read off and estimate”.
p. 63 (typesetting): Remove space from “text book” for consistency.
p. 63 (typo): Change $\alpha := \left(\alpha_0, \alpha_1, \alpha_2 \right)$ to $\alpha := \left(\alpha_0, \alpha_2, \alpha_3 \right)$.
p. 64 (typo): Change “The section provides” to “This section provides”.
p. 64 (typo): Replace “either $X$ and $Z$, or $Y$ and $Z$” with “$Z$ with $X$ or $Y$”.
p. 66 (typo): Replace “but that we” with “but we”.
p. 68 (typesetting): Add comma after “That is” before “regress”.
p. 69 (typo): Change “generalization of linear model” to “generalization of a linear model”.
p. 71 (typesetting): Paragraph at bottom of page should be indented to make it clearer that it’s part of Q2.
p. 72 (typesetting): Paragraph below Figure 4.7 should go before it to make it clearer that it’s part of Q3.
p. 74 (typo): Word should be “causal” not “casual”.
p. 79 (typo): get_hgt_sample” should be get_hgt_sample() to be consistent with other code in the book (i.e., function references always include ()).
p. 95 (typo): Remove word “is” before “addresses concerns”.
p. 95 (typo): Replace words “cross-section than” with “cross-sectional dependence than”.
p. 95 (typo): Replace word “warrant” with “justify”.
p. 96 (typo): Replace “use” in “We first estimate use nest() …” with word “using”.
p. 101 (typo): Replace “cross-section than” with “cross-sectional dependence than”.
p. 103 (typo): Replace “paper” with “papers”.
p. 105 (typo): References to pg should be to db.
p. 105 (typo): Replace “onto” with “on”.
p. 120 (typo): In Q5, floor_date("month", date) should be floor_date(date, "month") to match code below.
p. 122 (typo): Add the words “is that” after “The basic idea of crsp.ccmxpf_lnkhist”.
p. 125 (typo): Remove the unnecessary comma after lead(linkdt) in code counting the number of overlapping rows.
p. 125 (typo): In footnote, replace “actually already use” with “actually already used”.
p. 127 (typo): Add by = before join_by() in code creating rdq_permnos. Change has no effect, but maintains consistency with other nearby queries.
p. 131 (typo): Reference to gkvey should be to gvkey.
p. 138 (typo): Remove (unhelpful) hint from Q6.
p. 148: Replace http in URLs with https. Use mode = "wb" with download.file().
p. 152: Use mode = "wb" with download.file().
p. 160 (typesetting): In Q2 amount/amount_left_on_table should be amount / amount_left_on_table to be consistent with other code in the book.
p. 161: Use mode = "wb" with download.file().
p. 177 (typo): In footnote, “backed” should be “backend”.
p. 182 (typo): Add the word “for” after “accounting income” to get “accounting income for a period”.
p. 184 (typo): Sentence beginning with “Does this also mean …” should end with “?” not “.”.
p. 185 (typo): Q5 should end with “?” not “.”.
p. 192: Add full stop after crsp.msf (just before assignment to me_values).
p. 195 (typo): In Q6, “the function pivot_wider” should be “pivot_wider()” to be consistent with other code in the book (i.e., function references always include ()).
p. 197 (typo): Delete the word “the” (fourth word on page).
p. 197 (typo): Add word “you” after “Why do” in Q6.
p. 213 (typo): Replace “Inspired by , see 12” with “Inspired by Beaver (1968) (see Chapter 12)”.
p. 225 (typo): Replace year and month with year() and month() respectively to be consistent with the rest of the book.
p. 245 (typo): Changed footnote to: “This is one definition that can be tightened and that varies by context.”
p. 246 (typo): Formula for accruals comes from p. 107 of Hribar and Collins (2002), not p. 10.
p. 246 (typo): Change $\Delta \mathit{DEP}$ to $\mathit{DEP}$.
p. 247 (typo): Change “cash flow from operations” to “cash flows from operations” in Q3.
p. 253 (typo): Replaced last sentence discussion finance indicator variable. Note that the way this variable is used in code is less than ideal (filter should be applied earlier) and this could be a discussion point in class. I retained existing code to avoid creating an inconsistency between the print and online books.
p. 255 (typesetting): Fixed indentation for code creating ccm_link.
p. 255 (typo): Replaced “permnos” with “permno values”.
p. 257 (typo): Sentence starting “Some degree of …” should have “… similarity of the values seen in Table 15.2 …”.
p. 263 (typo): Change “cash flow from operating” to “cash flows from operating” in Q5.
p. 268 (typesetting): Added page reference for Green et al. (2011) quotation in Q2. Also added citations in the quote to references list for the book.
p. 268 (typo): Reference to Q1 in Q3 should be to Q2.
p. 268 (typesetting): Word “Hint” is not italicized to match other chapters.
p. 269 (typo): Replace ‘just one form a wide class of “discretionary behaviour”’ with ‘just one form in a wide class of “discretionary behaviours”’.
p. 277 (typo): References to the variable type in results should be to the variable measure.
p. 278 (typo): Replace “earning management” with “earnings management” in table caption.
p. 285 (typo): Replace “earning management” with “earnings management”.
p. 286 (typo): Output from system_time() was mistakenly omitted.
p. 286 (typesetting): Fixed spacing in function h_test_5().
p. 295 (typo): Replace “their seriousness” with “the seriousness”.
p. 304 (typo): Move comma from after “analysis” to after “above”.
p. 311 (typo): Add the word “the” before “MHSA website”.
p. 315 (typo): Q3 should start with “According to the …” not “According the …”.
p. 349 (typesetting): The filter in Q8 should be filter() to be consistent with other code in the book.
p. 352 (typesetting): Added explanatory sentence: “An alternative approach uses future_map() from furrr to run a variant of this code in parallel.”
p. 373 (typo): Replace “namely means” with “i.e.,”.
p. 380 (typo): Replace “perhaps help us” with “perhaps helps us”.
p. 383 (typesetting): Code creating disclosure was indented by four spaces. Should be two spaces.
p. 386 (clarification): Add words “the first two columns of” to “The results reported in”. Also add clarifying footnote.
p. 386 (typo): Replace “year related to adoption” with “year relative to the year of adoption”.
pp. 388–389 (typesetting): Footnote on p. 389 about align argument belongs on p. 388.
p. 389 (typo): First sentence of Q1 should end with “.” not with “?”.
p. 394 (typo): “$\gamma_2 D_i x_i$” should be “$\gamma_1 D_i x_i$”.
p. 396 (typo): “In response to such concerns.” should be “In response to such concerns,”.
p. 397 (typo): Add word “not” to get “did not require auditor attestation”.
p. 398 (typo): “Use the firm …” should be “Using the firm …”.
p. 403 (typo): Reference to auopin should be to auopic.
p. 403 (typo): Sentence ending with “each” should end with “each side of the cutoff”.
p. 404 (typo): In caption for Figure 22.2, add word “indicator” to say “SOX 404 report indicator” to make caption consistent with that for Figure 22.1.
p. 409 (typo): “inherently estimated” should be “inherently estimates”.
p. 411 (typesetting): Removed extra space in fyear < 11.
p. 419 (typo): Denominator in expression at the bottom of the page should be $x_j$, not $x_i$.
p. 420 (typo): Should be “… density function and the coefficient $\beta_j$” (i.e., not $\beta_i$).
p. 427 (typo): Remove word “below” from “following line of code below”.
p. 427 (typo): Replace “you many not need” with “you may not need”.
p. 431 (typo): Remove word “named” from sentence ending “above named”.
p. 431 (typo): Add word “function” after load_parquet() in footnote 13. Delete word “function” after read_parquet() in the same footnote.
p. 433 (typo): “increase from 1994 to 1995” should be “a significant increase from 1994 to 1995”.
p. 444 (typo): Replace “filings” with “filing” in “… around 10-K filings dates”.
p. 445 (typo): Remove word “use” from “can use collect”.
p. 448 (typo): “we saw winsorization in Chapter 25” should refer to Chapters 19 and 22.
p. 449: Remove all references to company, which was not used in the code.
p. 449 (typo): Remove words “fixed effects and” (code does not include fixed effects).
p. 452 (typo): “left half of Table 5 … (i.e., $\beta_1 = 0$)” should be “left half of Table 5 … (i.e., $\beta_1 = 0.8$)”.
p. 455 (typo): Change “assumed the independent of regressors” to “assumed to be independent of the regressors”.
p. 455 (typo): “strikingly difference estimates” should be “strikingly different estimates”.
p. 458 (typo): Add word “the” to “claim that Poisson regression”.
p. 459 (typo): Add missing left-quote around “misunderstanding” in Q5.
p. 459 (typo): “versions of the confounding variable” should be “versions of the dependent variables”.
p. 463 (typo): “the impact of” should be “the impacts of”.
p. 466 (typo): Delete “different” from “consider different two”.
p. 474 (typo): Replace “use matchit() function the” with “use the matchit() function from the”.
p. 476 (typo): Delete second of two commas in “as, , respectively”.
p. 482 (typo): Delete word “of” after “Equation (1)” in question 2.
p. 482: For clarity’s sake, replace “results presented above” in Q1 with “results presented in Table 25.7”.
p. 504 (typesetting): Change auc in Q4 to auc() to be consistent with other code in the book.
p. 559 (typesetting): Replace the two-line comment beginning with # The segment data with # compseg. The WRDS database changed since that comment was written and the comment is obsolete.

Fixes

While the issues above are basically cosmetic some of the following changes are needed to get the code to run. Other changes are marked “cosmetic”, “deprecated” (i.e., previous code used a deprecated approach), or “performance” (i.e., previous code worked, but was slow).

p. 194 (deprecated): Changed code for bs_coefs() to avoid superseded map_df(). Result of code is unchanged and previous code still works.
p. 154: File name for Ritter IPO data is now money-left-on-the-table-in-IPOs.pdf. URLs have been updated online.
p. 188 (performance): Code producing rets_all was too slow on WRDS PostgreSQL server. See online book for replacement code. Result of code is unchanged.
p. 190: Changes in data on Ken French’s website broke the code here in two places. Code in the online version has been updated. In the first code block on this page, the word Average needed to be added to the the search strings. In the second block of code, the "^Dec " regular expression needed to become "-Dec" in the two places it was used.
p. 191: There should be a code block following the paragraph beginning “The second set of data …”, but this was omitted in the printed edition. This can now be found in the online version.
p. 191: Changes in data on Ken French’s website broke the code in the last code block in two places. First, skip = 1 needs to be skip = 2. Second, for the argument n_max, the - 3 needs to be replaced by - 4. The corrected code can now be found in the online version.
p. 239 (performance): Switched from list_rbind() to bind_rows() in calculation of results. Result of code is unchanged, but it seems that bind_rows() runs more reliably with plan(multisession).
p. 240: Updated the decile breakpoints used to create results_deciles so that values below the previous quarter’s minimum earnings surprise are assigned to Decile 1 and values above the previous quarter’s maximum earnings surprise are assigned to Decile 10. The online version now sets the lower bound to -Inf, the upper bound to Inf, and uses include.lowest = TRUE in cut().
p. 249: The expression in the last line of the simulation should be df$se[i] <- df$se[i-1] + df$ni[i] - df$div[i], not df$se[i] <- df$se[i-1] + df$ni[i] - df$ni[i]. Because the dividend payout ratio is 100%, nothing in the book is affected by this issue.
p. 255 (performance): Added collect() |> copy_inline(db, df = _) at end of pipeline creating crsp_link to address abysmal performance with previous code.
p. 257: Added floor_date(x, "month") to calculations of start_month and end_month. Previous code would result in loss of months in calculation of returns. Only output affected is Table 15.11 and differences are small.
p. 268 (cosmetic): Split code calculating hedge_ret over two lines so that code does not go outside the box.
p. 279 (cosmetic): Replaced h_test() function with clearer code, though code in printed book works fine.
p. 286 (cosmetic): Replaced h_test_5() function with clearer code, though code in printed book works fine.
p. 286 (performance): Code producing power_plot_data was too slow. See online book for replacement code. Result of code is unchanged.
p. 330 (performance): Switched from list_rbind() to bind_rows() in calculation of results. Result of code is unchanged, but it seems that bind_rows() runs more reliably with plan(multisession).
p. 346 (cosmetic): Wrapped feols() in the reg_year_fe() function inside suppressWarnings() to suppress warnings emitted by cov().
p. 342 (cosmetic): The use of .groups = "drop" with mutate() has no effect and can be eliminated.
p. 343 (cosmetic): Remove unused das argument from get_das().
p. 342 (deprecated): Replaced case_when() with if()/else if()/ else.
p. 343 (deprecated): Replaced group_by() |> do() with nest_by() |> summarize().
p. 349 (cosmetic): Replace winsorize, prob = 0.01 with \(x) winsorize(x, prob = 0.01).
p. 352 (performance): Switched from list_rbind() to bind_rows() in calculation of rand_results. Result of code is unchanged, but it seems that bind_rows() runs more reliably with plan(multisession).
p. 382: Code creating biggest_customers was incorrect resulting in duplicates. Corrected code is found in online chapter. Change only affects Figure 21.1.
p. 387 (enhancement): Changed factor_t() and year_diff() functions to add a ref argument. These changes have no effect on any analysis in the book, but facilitate answering questions in exercises.
p. 398 (cosmetic): The use of .groups = "drop" with mutate() has no effect and can be eliminated.
p. 398 (deprecated): As top_n() is deprecated, replace with slice_max().
p. 420 (deprecated): Replaced case_when() with if()/else if()/ else.
p. 421 (deprecated): Replaced case_when() with if()/else if()/ else.
p. 430: Changes to numbers: (63, 74) becomes (63, 77); (75, 86) becomes (78, 89); (87, 98) becomes (90, 101); (99, 150) becomes (102, NA).
p. 432: Changes to numbers: (63, 74) becomes (63, 77); (75, 86) becomes (78, 89); (87, 98) becomes (90, 101); (99, 150) becomes (102, NA).
p. 450 (cosmetic): In first code chunk, big_n appears twice in select(). Delete second appearance.
p. 465 (deprecated): Replaced case_when() with if()/else if()/ else.
pp. 552–560 (deprecated): Updated to incorporate new db2pq R package.
pp. 555 (enhancement): Altered code to handle missing values in a way that matches code in the 21 Panel data.

R for Data Science is available at https://r4ds.hadley.nz/↩︎
Go to https://wrds-www.wharton.upenn.edu/register to do so.↩︎