Warm tip: This article is reproduced from stackoverflow.com, please click
dplyr r

Counting the number of rows for different combinations of factors

发布于 2020-03-31 23:00:28

Considering a dataset such as the classical mtcars, I want to know the number of observations (=rows) by different levels of factors, taking them separately as well as together.

For example, the following code will generate a column N with the number of observations per level of cyl and gear, but not the number of observations for cyl and gear separately.

mtcars %>% dplyr::group_by(cyl, gear) %>% dplyr::summarise(N = n()) 

I know that a separate number of observations for cyl and gear can be obtained just in a similar way, creating separate dataframes, and merging all together. The following would generate the expected output:

df <- mtcars %>% dplyr::group_by(cyl, gear) %>% dplyr::summarise(N = n())
df_gear <- mtcars %>% dplyr::group_by(gear) %>% dplyr::summarise(Ngear = n())
df_cyl <- mtcars %>% dplyr::group_by(cyl) %>% dplyr::summarise(Ncyl = n())
df %>% dplyr::left_join(df_cyl) %>% dplyr::left_join(df_gear)

But I am wondering if there is a cleaner way to generate this dataset, hopefully without needing to generate intermediate datasets.

Questioner
elcortegano
Viewed
17
H 1 2020-01-31 20:44

Here is one way that you might approach this, relying on mutate() and ave() instead of group_by() and summarise() for compactness:

library(dplyr)

mtcars %>% 
  mutate(n = ave(cyl, cyl, gear, FUN = length),
         n_cyl = ave(cyl, cyl, FUN = length),
         n_gear = ave(gear, gear, FUN = length)) %>%
  select(gear, cyl, n, n_cyl, n_gear) %>%
  distinct()

  gear cyl  n n_cyl n_gear
1    4   6  4     7     12
2    4   4  8    11     12
3    3   6  2     7     15
4    3   8 12    14     15
5    3   4  1    11     15
6    5   4  2    11      5
7    5   8  2    14      5
8    5   6  1     7      5