温馨提示:本文翻译自stackoverflow.com，查看原文请点击：r - Numbering rows within groups in a data frame

dataframe r r-faq

r - 为数据框中的组内的行编号

发布于 2020-03-27 10:48:23

使用类似于以下内容的数据框：

set.seed(100)  
df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))             
df <- df[order(df$cat, df$val), ]  
df  

   cat        val  
1  aaa 0.05638315  
2  aaa 0.25767250  
3  aaa 0.30776611  
4  aaa 0.46854928  
5  aaa 0.55232243  
6  bbb 0.17026205  
7  bbb 0.37032054  
8  bbb 0.48377074  
9  bbb 0.54655860  
10 bbb 0.81240262  
11 ccc 0.28035384  
12 ccc 0.39848790  
13 ccc 0.62499648  
14 ccc 0.76255108  
15 ccc 0.88216552

我正在尝试在每个组中添加一列编号。这样做显然不使用R的能力：

 df$num <- 1  
 for (i in 2:(length(df[,1]))) {  
   if (df[i,"cat"]==df[(i-1),"cat"]) {  
     df[i,"num"]<-df[i-1,"num"]+1  
     }  
 }  
 df  

   cat        val num  
1  aaa 0.05638315   1  
2  aaa 0.25767250   2  
3  aaa 0.30776611   3  
4  aaa 0.46854928   4  
5  aaa 0.55232243   5  
6  bbb 0.17026205   1  
7  bbb 0.37032054   2  
8  bbb 0.48377074   3  
9  bbb 0.54655860   4  
10 bbb 0.81240262   5  
11 ccc 0.28035384   1  
12 ccc 0.39848790   2  
13 ccc 0.62499648   3  
14 ccc 0.76255108   4  
15 ccc 0.88216552   5

什么是做到这一点的好方法？

提问者

eli-k

被浏览

207

查看英文版

查看原文

61.4k 2017-03-15 06:06

使用ave，ddply，dplyr或data.table：

df$num <- ave(df$val, df$cat, FUN = seq_along)

要么：

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

要么：

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

或（最有效的内存，由内的引用分配DT）：

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

Frank 2017-03-15 06:07:28

可能值得一提的是，ave此处给出的是浮点数而不是整数。或者，可以更改df$val为seq_len(nrow(df))。我在这里遇到了这个问题：stackoverflow.com/questions/42796857/…

hannes101 2017-07-28 20:23:01

有趣的是，此data.table解决方案似乎比使用frank以下方法更快： library(microbenchmark); microbenchmark(a = DT[, .(val ,num = frank(val)), by = list(cat)] ,b =DT[, .(val , id = seq_len(.N)), by = list(cat)] , times = 1000L)

EcologyTom 2018-04-10 22:16:39

谢谢！该dplyr解决方案是好的。但是，如果像我一样，在尝试这种方法时不断出现奇怪的错误，请确保您之间不会发生冲突plyr，dplyr正如本文所述，可以通过显式调用来避免dplyr::mutate(...)

chinsoon12 2018-05-23 08:14:07

另一种data.table方法是setDT(df)[, id:=rleid(val), by=.(cat)]

Przemyslaw Remin 2018-07-24 17:31:59

如何修改library(plyr)和library(dplyr)答案以使排名val列按降序排列？

相关问题

1

过滤具有特定条件的所有列的行

2

ggplot2绘图区域内的两个轴标签

3

错误：在R中找不到函数...

4

创建加载消息，这些消息将根据 shiny 的应用程序中情节的加载时间而改变

5

热图生成R中的cut.default错误

6

r中的apply函数存在问题：仅在第一列中应用

7

R在滑动窗口时间段内创建先前事件的计数

8

使用setDT将一个数据帧中的许多列合并到另一数据帧中

9

根据 shiny dashboard 其他选项卡中的操作在选项卡中显示下载按钮

10

用奇怪的格式解析R中的日期

热门github

1

2

Python tool for converting files and office documents to Markdown.

3

4

Home of the WebKit project, the browser engine used by Safari, Mail, App Store and many other applications on macOS, iOS and Linux. (翻译：WebKit 项目的主页，Safari、Mail、App Store 和 macOS、iOS 和 Linux 上的许多其他应用程序使用的浏览器引擎。)

5

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

6

Lightweight coding agent that runs in your terminal

7

🔥 🔥 🔥 Open Source Airtable Alternative (翻译：将任何 MySQL、PostgreSQL、SQL Server、SQLite 和 MariaDB 转换为智能电子表格。)

8

基于大模型和 RAG 的智能问数系统。Text-to-SQL Generation via LLMs using RAG.

9

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

10

An AI Hedge Fund Team

11

Tongyi DeepResearch, the Leading Open-source DeepResearch Agent

12

AI coding agent, built for the terminal.

13

Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.

14

Powerful menu bar manager for macOS

15

Flutter makes it easy and fast to build beautiful apps for mobile and beyond (翻译：Flutter 可以轻松快速地为移动设备及其他应用构建漂亮的应用程序)