7 Financial statements: A first look
In this chapter, we provide an introduction to data on financial statements, which are the primary focus of financial reporting. While a lot of contemporary accounting research has only a tangential connection to financial reporting, it’s not clear there would be a field of accounting research without financial reporting and most accounting research paper use financial reporting data to some degree (e.g., as controls). In this chapter, we will focus on annual financial statement data of North American firms, as provided by Compustat.
7.1 Setting up WRDS
Academic researchers generally get Compustat data through Wharton Research Data Services, more commonly referred to as WRDS (pronounced like “words”). In the following, we assume that you have set up your computer (as described in section 1.2.1), that you have a WRDS account, and that you are connected to the internet.
In this chapter, we will use the R libraries specified in the following code.
We have seen dplyr
and ggplot2
in earlier chapters, but the DBI
library is new.
The DBI
library provides a standardized interface to relational database systems, including PostgreSQL (the system used by WRDS), MySQL, and SQLite.
First, let’s load the required packages
Second, let’s connect to the WRDS database.
To actually use the code below, you should replace your_WRDS_ID
with your actual WRDS ID.
You may also need to add a line Sys.setenv(PGPASSWORD="your_password")
as the third line of code.53
Sys.setenv(PGHOST = "wrds-pgdata.wharton.upenn.edu",
PGPORT = 9737L,
PGUSER = "your_WRDS_ID",
PGDATABASE = "wrds")
pg <- dbConnect(RPostgres::Postgres(), bigint = "integer")
The first line of code above sets up environment variables containing information that R can use as defaults when connecting to a PostgreSQL database.
The second line of code creates the actual connection to the database and assigns it to the variable pg
.
An alternative to the above code would be to pass connection information to the dbConnect()
function directly, as shown in the code below.54
pg <- dbConnect(RPostgres::Postgres(),
host = "wrds-pgdata.wharton.upenn.edu",
port = 9737L,
# user = "your_WRDS_ID",
# password = "a_really_good_password",
dbname ="wrds",
bigint = "integer")
However, for reasons we discuss below, we recommend using the environment variable-based approach above.55
Now that we have established a connection to the WRDS database, we can get some data. Data sources on WRDS are organized into libraries according to their source (in PostgreSQL, these libraries are called schemas, but we will stick with the WRDS terminology for now).
In this chapter we will focus on Compustat data, which is found in the comp
library.
While there are other sources of financial statement data, in accounting research the pre-eminent data source is Compustat, which is typically obtained via WRDS.
We first look at the table, comp.company
, which contains information about the companies in Compustat’s North American database (the equivalent table in Compustat’s “Global” database is comp.g_company
).
By running the following code, we create R objects that point to the data on the WRDS database and behave a lot like the data frames you have seen in earlier chapters.
What have we done here?
In the first line, we have created an object company
that “points to” the data returned by running the SQL query SELECT * FROM comp.company
on the data source pg
(which is the WRDS database).
From the help for the tbl
function, we learn that in tbl(src, ...)
, src
refers to the data source.
Here pg
is an object representing the connection to the WRDS PostgreSQL database.
The ...
allows us to pass other arguments to the function.
In this case, we provide sql("SELECT * FROM comp.company")
, which tells the tbl
function that we’re passing SQL SELECT * FROM comp.company
.
This code selecting all variables (*
) and all rows from the table comp.company
.
But what exactly is company
?
One way to get information about company
is to use the class
function:
class(company)
## [1] "tbl_PqConnection" "tbl_dbi" "tbl_sql" "tbl_lazy"
## [5] "tbl"
The output here is a little complicated for a new user, but the basic idea is that company
is a tbl_PqConnection
object, which is a kind of tbl_sql
, which is a kind of tbl_dbi
, which is a kind of tbl_lazy
, which is a kind of tbl
, which means it a tibble of some kind.
The critical idea here is that funda
is a tbl_sql
object, which means it behaves like a tibble, but is actually effectively a reference to a table or query in a relational database.
For want of a better term, company
is a remote data frame.
While there are some technical details here that are unimportant for our discussion, the important points are that, as a tbl_sql
, funda
has three important properties:
- It’s not a data frame in memory on our computer. This means that we don’t need to download the all of the data and load it into memory.
- It’s not a data frame in memory on the remote server (in this case, the WRDS PostgreSQL server). This means that we didn’t load the full data set into memory even onto the remote computer.
- Notwithstanding the previous two points,
company
behaves in many important ways just like a data frame.
We can use filter
to get only rows meeting certain conditions and we can use select
to indicate which columns we want to retrieve:
## # Source: SQL [3 x 3]
## # Database: postgres [iangow@/tmp:5432/iangow]
## gvkey conm state
## <chr> <chr> <chr>
## 1 001690 APPLE INC CA
## 2 001820 ASTRONOVA INC RI
## 3 027925 AVID TECHNOLOGY INC MA
When we are dealing with a remote data frame, the dplyr
code that we write is automatically translated into SQL, a specialized language from manipulating tabular data “used by pretty much every database in existence”
(Appendix C has more on SQL and dplyr
).
We can inspect the translated SQL using the show_query()
function:
company %>%
filter(gsubind == "45202030",
substr(conm, 1, 1) == "A") %>%
select(gvkey, conm, state) %>%
show_query()
## <SQL>
## SELECT "gvkey", "conm", "state"
## FROM (SELECT * FROM comp.company) "q01"
## WHERE ("gsubind" = '45202030') AND (SUBSTR("conm", 1, 1) = 'A')
Here we see that the filter
verb is translated into equivalent SQL using WHERE
.
While remote data frames behave a lot like local data frames, they are not quite the same thing and, for many purposes, we need to use local data frames.
For example, the lm()
function for fitting regressions assumes that the object provided as its data
argument is a data frame and it will not work if we give it a remote data frame instead.
To convert the table from a remote data frame to a local one is achieved using the collect
verb.
df_company <-
company %>%
filter(gsubind == "45202030",
substr(conm, 1, 1) == "A") %>%
select(gvkey, conm, state) %>%
collect()
class(df_company)
## [1] "tbl_df" "tbl" "data.frame"
df_company
## # A tibble: 3 × 3
## gvkey conm state
## <chr> <chr> <chr>
## 1 001690 APPLE INC CA
## 2 001820 ASTRONOVA INC RI
## 3 027925 AVID TECHNOLOGY INC MA
We use collect
and store the result in df_company
.
Looking at class(df_company)
, we see that it is not only a “tibble” (tbl_df
), but also data.frame
, meaning that we can pass it to functions (such as lm
) that expect data.frames
objects.
7.2 Financial statement data
The two core tables for North American data are comp.funda
, which contains annual financial statement data, and comp.fundq
, which contains quarterly data.
We can examine this object in R by typing it’s name in the R console:
funda
## # Source: SQL [?? x 948]
## # Database: postgres [iangow@/tmp:5432/iangow]
## gvkey datadate fyear indfmt consol popsrc datafmt tic cusip conm acctchg
## <chr> <date> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 0012… 2019-12-31 2019 INDL C D STD ALX 0147… ALEX… ASU16-…
## 2 0012… 2019-12-31 2019 INDL C D SUMM_S… ALX 0147… ALEX… <NA>
## 3 0012… 2020-12-31 2020 FS C D STD ALX 0147… ALEX… <NA>
## 4 0012… 1983-05-31 1982 INDL C D STD CGYNQ 1391… CAPC… <NA>
## 5 0017… 1959-06-30 1959 INDL C D STD WM1 9393… WASH… <NA>
## 6 0012… 2020-12-31 2020 INDL C D STD ALX 0147… ALEX… <NA>
## 7 0012… 2020-12-31 2020 INDL C D SUMM_S… ALX 0147… ALEX… <NA>
## 8 0012… 2021-12-31 2021 FS C D STD ALX 0147… ALEX… <NA>
## 9 0012… 1984-05-31 1983 INDL C D STD CGYNQ 1391… CAPC… <NA>
## 10 0012… 2021-12-31 2021 INDL C D STD ALX 0147… ALEX… <NA>
## # … with more rows, and 937 more variables: acctstd <chr>, acqmeth <chr>,
## # adrr <dbl>, ajex <dbl>, ajp <dbl>, bspr <chr>, compst <chr>, curcd <chr>,
## # curncd <chr>, currtr <dbl>, curuscn <dbl>, final <chr>, fyr <int>,
## # ismod <int>, ltcm <chr>, ogm <chr>, pddur <int>, scf <int>, src <int>,
## # stalt <chr>, udpl <chr>, upd <int>, apdedate <date>, fdate <date>,
## # pdate <date>, acchg <dbl>, acco <dbl>, accrt <dbl>, acdo <dbl>, aco <dbl>,
## # acodo <dbl>, acominc <dbl>, acox <dbl>, acoxar <dbl>, acqao <dbl>, …
From this code we can see that comp.funda
is a very wide table, with 948 columns.
With that many columns, it’s a moderately large table (at the time of writing, 1212 MB in PostgreSQL), but by focusing on certain columns, the size can be dramatically reduced.
Let’s learn more about the funda
table.
First, how many rows does it have?
## [1] 869854
From this snippet we can see that comp.funda
has 869854 rows.
As covered in Chapter 5 of R4DS, funda %>% count()
is equivalent to count(funda)
.
When given an ungrouped data frame as the first argument, the count
function simply counts the number of rows in the data frame.56
In this case funda %>% count()
returns a server-side data frame (tbl_df
) with a single column (n
).
The command pull
extracts that single column as a vector in R.57
While not evident from casual inspection of the table, the primary key of comp.funda
is
(gvkey, datadate, indfmt, consol, popsrc, datafmt)
.
One requirement for a valid primary key is that each value of the primary key is associated with only one row of the data set.58
That this is a valid key can be seen from the following code, which counts the number of rows associated with each set of values for (gvkey, datadate, indfmt, consol, popsrc, datafmt)
and stores that number in num_rows
and then displays the various values of num_rows
.
funda %>%
group_by(gvkey, datadate, indfmt, consol, popsrc, datafmt) %>%
summarize(num_rows = n(), .groups = "drop") %>%
count(num_rows)
## # Source: SQL [1 x 2]
## # Database: postgres [iangow@/tmp:5432/iangow]
## num_rows n
## <int> <int>
## 1 1 869854
Because num_rows
is equal to 1
in 869854 cases and this is the number of rows in the dataset, we have (gvkey, datadate, indfmt, consol, popsrc, datafmt)
as a valid key.
Another requirement of a primary key is that none of the columns contains null values (in SQL, NULL
; in R, NA
).
The code below checks this.59
funda %>%
filter(is.na(gvkey), is.na(datadate), is.na(indfmt),
is.na(consol), is.na(popsrc), is.na(datafmt)) %>%
count() %>%
pull()
## [1] 0
Teasing apart the primary key, we have a firm identifier (gvkey
) and financial period identifier (datadate
), along with four variables that are more technical in nature: indfmt
, consol
, popsrc
, and datafmt
.
## # Source: SQL [12 x 5]
## # Database: postgres [iangow@/tmp:5432/iangow]
## # Ordered by: desc(n)
## indfmt consol popsrc datafmt n
## <chr> <chr> <chr> <chr> <int>
## 1 INDL C D STD 554619
## 2 INDL C D SUMM_STD 267949
## 3 FS C D STD 45183
## 4 INDL P D STD 1167
## 5 INDL R D STD 480
## 6 INDL P D SUMM_STD 298
## 7 FS R D STD 51
## 8 INDL R D SUMM_STD 28
## 9 INDL D D SUMM_STD 25
## 10 INDL D D STD 25
## 11 INDL C D PRE_AMENDS 19
## 12 INDL C D PRE_AMENDSS 10
As discussed on the CRSP website, the first set of values, which covers 554619 observations, represents the standard (STD
) secondary keyset, as is what is used by most researchers when using Compustat.
We can create a version of funda
that limits data to this STD
secondary keyset as follows:
For funda_mod
, (gvkey, datadate)
should represent a primary key.
Let’s check this using code like that we used above:
funda_mod %>%
group_by(gvkey, datadate) %>%
summarize(num_rows = n(), .groups = "drop") %>%
count(num_rows)
## # Source: SQL [1 x 2]
## # Database: postgres [iangow@/tmp:5432/iangow]
## num_rows n
## <int> <int>
## 1 1 554619
Now, let’s look at the table comp.company
, which the following code confirms has 39 columns and for which gvkey
is a primary key.
company
## # Source: SQL [?? x 39]
## # Database: postgres [iangow@/tmp:5432/iangow]
## conm gvkey add1 add2 add3 add4 addzip busdesc cik city conml costat
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 A & E … 0010… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> A & … I
## 2 A & M … 0010… 1924… <NA> <NA> <NA> 94104 <NA> 0000… Tulsa A & … I
## 3 AAI CO… 0010… 124 … <NA> <NA> <NA> 21030… Textro… 0001… Hunt… AAI … I
## 4 A.A. I… 0010… 7700… <NA> <NA> <NA> 63125 A.A. I… 0000… St. … A.A.… I
## 5 AAR CO… 0010… One … <NA> <NA> <NA> 60191 AAR Co… 0000… Wood… AAR … A
## 6 A.B.A.… 0010… 1026… <NA> <NA> <NA> 33782 A.B.A.… <NA> Pine… A.B.… I
## 7 ABC IN… 0010… 301 … <NA> <NA> <NA> 46590 ABC In… <NA> Wino… ABC … I
## 8 ABKCO … 0010… 1700… <NA> <NA> <NA> 10019 ABKCO … 0000… New … ABKC… I
## 9 ABM CO… 0010… 3 Wh… <NA> <NA> <NA> 92714 <NA> <NA> Irvi… ABM … I
## 10 ABS IN… 0010… Inte… <NA> <NA> <NA> 44904 Makes … 0000… Will… ABS … I
## # … with more rows, and 27 more variables: county <chr>, dlrsn <chr>,
## # ein <chr>, fax <chr>, fic <chr>, fyrc <int>, ggroup <chr>, gind <chr>,
## # gsector <chr>, gsubind <chr>, idbflag <chr>, incorp <chr>, loc <chr>,
## # naics <chr>, phone <chr>, prican <chr>, prirow <chr>, priusa <chr>,
## # sic <chr>, spcindcd <int>, spcseccd <int>, spcsrc <chr>, state <chr>,
## # stko <int>, weburl <chr>, dldte <date>, ipodate <date>
## # Source: SQL [1 x 2]
## # Database: postgres [iangow@/tmp:5432/iangow]
## num_rows n
## <int> <int>
## 1 1 52448
## # Source: SQL [1 x 1]
## # Database: postgres [iangow@/tmp:5432/iangow]
## n
## <int>
## 1 0
7.2.1 Illustration: Microsoft versus Apple
Suppose that we were interested in comparing the profitability of Apple and Microsoft over time.
To measure performance, we will calculate a measure of return on assets, here measured as the value of “Income Before Extraordinary Items” scaled by “Total Assets” for Microsoft (GVKEY: 012141
) and Apple (GVKEY: 001690
) over time.60
## # Source: SQL [2 x 2]
## # Database: postgres [iangow@/tmp:5432/iangow]
## conm gvkey
## <chr> <chr>
## 1 APPLE INC 001690
## 2 MICROSOFT CORP 012141
Here we can see that sample
is a server-side data frame with the company name (conm
) and identifier (gvkey
) for Apple and Microsoft.
This might be a useful juncture at which to explain what the server-side data frame represents.
To this end, we can use the show_query
function:
sample %>%
show_query()
## <SQL>
## SELECT "conm", "gvkey"
## FROM (SELECT * FROM comp.company) "q01"
## WHERE ("gvkey" IN ('001690', '012141'))
What this shows is that sample
is actually a shorthand for an SQL query that is equivalent to the following SQL.61
SELECT conm, gvkey
FROM comp.company
WHERE "gvkey" IN ('001690', '012141');
Here filter
is translated into a WHERE
condition, %in%
is translated to IN
, and c("001690", "012141")
is translated to ('001690', '012141')
.
It turns out that “Income Before Extraordinary Items” is represented by ib
and “Total Assets” is at
(see here for details).
We can further restrict funda
beyond funda_mod
so that it contains just the key variables (gvkey, datadate)
and the values for ib
and at
as follows.
(In the following code, we will gradually refine the data set assigned to ib_at
.)
## # Source: SQL [?? x 4]
## # Database: postgres [iangow@/tmp:5432/iangow]
## gvkey datadate ib at
## <chr> <date> <dbl> <dbl>
## 1 001257 2019-12-31 60.1 1266.
## 2 001258 1983-05-31 -0.365 1.57
## 3 001738 1959-06-30 4.85 147.
## 4 001257 2020-12-31 41.9 1404.
## 5 001258 1984-05-31 -0.693 0.908
## 6 001257 2021-12-31 131. 1392.
## 7 001257 2022-12-31 57.6 1398.
## 8 001258 1985-05-31 -0.345 0.589
## 9 001258 1986-05-31 -0.294 0.291
## 10 001258 1987-05-31 -0.086 0.171
## # … with more rows
and then restrict this by joining it with sample
as follows:
ib_at <-
funda_mod %>%
select(gvkey, datadate, ib, at) %>%
inner_join(sample)
## Joining with `by = join_by(gvkey)`
ib_at
## # Source: SQL [?? x 5]
## # Database: postgres [iangow@/tmp:5432/iangow]
## gvkey datadate ib at conm
## <chr> <date> <dbl> <dbl> <chr>
## 1 001690 1981-09-30 39.4 255. APPLE INC
## 2 001690 1982-09-30 61.3 358. APPLE INC
## 3 001690 1983-09-30 76.7 557. APPLE INC
## 4 001690 1980-09-30 11.7 65.4 APPLE INC
## 5 001690 1984-09-30 64.1 789. APPLE INC
## 6 001690 1985-09-30 61.2 936. APPLE INC
## 7 001690 1986-09-30 154. 1160. APPLE INC
## 8 001690 1987-09-30 217. 1478. APPLE INC
## 9 001690 1988-09-30 400. 2082. APPLE INC
## 10 001690 1989-09-30 454. 2744. APPLE INC
## # … with more rows
In the code above, we used a natural join, which joins by variables found in both data sets (in this case, gvkey
).
In some cases, we want to be explicit about the join variables, which we can do using the by
argument to the join function:
ib_at <-
funda_mod %>%
select(gvkey, datadate, ib, at) %>%
inner_join(sample, by = "gvkey")
ib_at
## # Source: SQL [?? x 5]
## # Database: postgres [iangow@/tmp:5432/iangow]
## gvkey datadate ib at conm
## <chr> <date> <dbl> <dbl> <chr>
## 1 001690 1981-09-30 39.4 255. APPLE INC
## 2 001690 1982-09-30 61.3 358. APPLE INC
## 3 001690 1983-09-30 76.7 557. APPLE INC
## 4 001690 1980-09-30 11.7 65.4 APPLE INC
## 5 001690 1984-09-30 64.1 789. APPLE INC
## 6 001690 1985-09-30 61.2 936. APPLE INC
## 7 001690 1986-09-30 154. 1160. APPLE INC
## 8 001690 1987-09-30 217. 1478. APPLE INC
## 9 001690 1988-09-30 400. 2082. APPLE INC
## 10 001690 1989-09-30 454. 2744. APPLE INC
## # … with more rows
We next calculate a value for return on assets (using income before extraordinary items and the ending balance of total assets for simplicity).
ib_at <-
funda_mod %>%
select(gvkey, datadate, ib, at) %>%
inner_join(sample, by = "gvkey") %>%
mutate(roa = ib/at)
The final step is to bring these data into R
.
At this stage, ib_at
is still a server-side data frame (tbl_sql
), which we can see using the class
function.62
## [1] TRUE
To bring the data into R
, we can use the collect function, as follows:
ib_at <-
funda_mod %>%
select(gvkey, datadate, ib, at) %>%
inner_join(sample, by = "gvkey") %>%
mutate(roa = ib/at) %>%
collect()
Now we have a local data frame that is no longer an instance of the tbl_sql
class:
class(ib_at)
## [1] "tbl_df" "tbl" "data.frame"
Note that, because we have used select
to get to just five fields and filter
to focus on two firms, the tibble ib_at
is just 5.8 kB in size.
Note the placement of collect
at the end of this pipeline means that the amount of data we need to retrieve from the database and load into R is quite small.
If we had placed collect()
immediately after funda_mod
, we would have been retrieving data on thousands of observations and 948 variables.
If we had placed collect()
immediately after the select
statement, we would have been retrieving data on thousands of observations, but just 4 variables.
But by placing collect()
after the inner_join
, we are only retrieving data for firm-years related to Microsoft and Apple.
Retrieving data into our local R instance requires both reading it off disk and, if our PostgreSQL server is remote, transferring it over the internet.
Each of these operations is slow and thus should be avoided where possible.
Having collected our data, we can make a quick plot.63

7.3 Exercises
Suppose we didn’t have access to Compustat (or an equivalent database) for the analysis above, describe a process you would use to get the data required to make the plot above comparing performance of Microsoft and Apple.
In the following code, how do
funda_mod
andfunda_mod_alt
differ? (For example, where are the data for each table?) What does the statementcollect(n = 100)
at the end of this code do?
funda <- tbl(pg, sql("SELECT * FROM comp.funda"))
funda_mod <-
funda %>%
filter(indfmt == "INDL", datafmt == "STD",
consol == "C", popsrc == "D") %>%
filter(fyear >= 1980)
funda_mod_alt <-
funda_mod %>%
collect(n = 100)
- The table
comp.company
has data on SIC (Standard Industrial Classification) codes in the fieldsic
. In words, what is thecase_when
function doing in the following code? Why do we end up with just two rows?
company <- tbl(pg, sql("SELECT * FROM comp.company"))
sample <-
company %>%
select(gvkey, sic) %>%
mutate(co_name = case_when(gvkey == "001690" ~ "Apple",
gvkey == "012141" ~ "Microsoft",
TRUE ~ NA_character_)) %>%
filter(!is.na(co_name))
- What does the data frame
another_sample
represent? What happens if we change theinner_join
statement below to simplyinner_join(sample)
. What happens if we change it toinner_join(sample, by = "sic")
(i.e., omit thesuffix = c("", "_other")
portion)? Why do you think we wantfilter(gvkey != gvkey_other)
?
another_sample <-
company %>%
select(gvkey, sic) %>%
inner_join(sample, by = "sic", suffix = c("", "_other")) %>%
filter(gvkey != gvkey_other) %>%
mutate(group = paste(co_name, "peer")) %>%
select(gvkey, group)
- What is the following code doing?
total_sample <-
sample %>%
rename(group = co_name) %>%
select(gvkey, group) %>%
union_all(another_sample)
- Suppose that we are interested in how firms’ level of R&D activity.
One measure of R&D activity is R&D Intensity, which can be defined as “R&D expenses” (Compustat item
xrd
) scaled by “Total Assets” (Compustat itemat
). Inxrd_at
, what’s the difference betweenrd_intensity
andrd_intensity_alt
? Doesfilter(at > 0)
seem like a reasonable filter? What happens if we omit it?
xrd_at <-
funda_mod %>%
select(gvkey, datadate, fyear, conm, xrd, at) %>%
filter(at > 0) %>%
mutate(rd_intensity = xrd/at,
xrd_alt = coalesce(xrd, 0),
rd_intensity_alt = xrd_alt/at) %>%
inner_join(total_sample, by = "gvkey")
- Looking at a sample of rows from
xrd_at_sum
, it appears that the three R&D intensity measures are always identical for Apple and Microsoft, but generally different for their peer groups. What explains these differences? Can you say that one measure is “correct”? Or would you say “it depends”?
xrd_at_sum <-
xrd_at %>%
group_by(group, fyear) %>%
summarize(total_at = sum(at, na.rm = TRUE),
total_rd = sum(xrd, na.rm = TRUE),
rd_intensity1 = mean(xrd/at, na.rm = TRUE),
rd_intensity2 = mean(xrd_alt/at, na.rm = TRUE),
.groups = "drop") %>%
mutate(rd_intensity3 = if_else(total_at > 0, total_rd/total_at, NA_real_))
xrd_at_sum %>%
select(-total_at, -total_rd) %>%
arrange(desc(fyear), group) %>%
collect(n = 8)
## # A tibble: 8 × 5
## group fyear rd_intensity1 rd_intensity2 rd_intensity3
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Apple 2022 0.0744 0.0744 0.0744
## 2 Apple peer 2022 0.138 0.138 0.111
## 3 Microsoft 2022 0.0672 0.0672 0.0672
## 4 Microsoft peer 2022 0.135 0.116 0.0835
## 5 Apple 2021 0.0624 0.0624 0.0624
## 6 Apple peer 2021 0.127 0.122 0.0947
## 7 Microsoft 2021 0.0621 0.0621 0.0621
## 8 Microsoft peer 2021 0.235 0.194 0.0709
- Write code to produce the following plot.
Also produce plots that use
rd_intensity1
andrd_intensity2
as measures of R&D intensity. Do the plots help you think about which of the three measures makes most sense?
