Numbering rows within groups in a data frame

61.4k 2017-03-15 06:06

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

Frank 2017-03-15 06:07:28

It might be worth mentioning that ave gives a float instead of an int here. Alternately, could change df$val to seq_len(nrow(df)). I just ran into this over here: stackoverflow.com/questions/42796857/…

hannes101 2017-07-28 20:23:01

Interestingly this data.table solution seems to be quicker than using frank: library(microbenchmark); microbenchmark(a = DT[, .(val ,num = frank(val)), by = list(cat)] ,b =DT[, .(val , id = seq_len(.N)), by = list(cat)] , times = 1000L)

EcologyTom 2018-04-10 22:16:39

Thanks! The dplyr solution is good. But if, like me, you kept getting weird errors when trying this approach, make sure that you are not getting conflicts between plyr and dplyr as explained in this post It can be avoided by explicitly calling dplyr::mutate(...)

chinsoon12 2018-05-23 08:14:07

another data.table method is setDT(df)[, id:=rleid(val), by=.(cat)]

Przemyslaw Remin 2018-07-24 17:31:59

How to modify library(plyr) and library(dplyr) answers to make the ranking val column in descending order?

Related issues

Filter in rows of all columns with specific conditions

ggplot2 both axis labels inside plot area

Error: could not find function ... in R

Create loading messages that will change based on loading time of plot in a shiny app

cut.default error in heatmap generation R

Problem with apply function in r: it's applied only in the first column

R to create a tally of previous events within a sliding window time period

combine many columns from one dataframe into another dataframe using setDT

Display download button in a tab based on actions in other tabs of a shiny dashboard

Parsing dates in R with weird format