Warm tip: This article is reproduced from serverfault.com, please click

dataframe r r-faq aggregate

How to sum a variable by group

发布于 2009-11-02 09:01:28

I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific groups from "Category".

For example:

Category     Frequency
First        10
First        15
First        5
Second       2
Third        14
Third        20
Second       3

I want to sort the data by Category and sum all the Frequencies:

Category     Frequency
First        30
Second       5
Third        34

How would I do this in R?

Questioner

user5243421

Viewed

0

192 2018-10-29 04:40:39

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

r2evans 2016-12-19 04:35:12

@AndrewMcKinlay, R uses the tilde to define symbolic formulae, for statistics and other functions. It can be interpreted as "model Frequency by Category" or "Frequency depending on Category". Not all languages use a special operator to define a symbolic function, as done in R here. Perhaps with that "natural-language interpretation" of the tilde operator, it becomes more meaningful (and even intuitive). I personally find this symbolic formula representation better than some of the more verbose alternatives.

Dodecaphone 2018-10-28 10:42:06

Being new to R (and asking the same sorts of questions as the OP), I would benefit from some more detail of the syntax behind each alternative. For instance, if I have a larger source table and want to subselect just two dimensions plus summed metrics, can I adapt any of these methods? Hard to tell.

QAsena 2020-06-24 20:32:03

Is there anyway of maintaining an ID column? Say the categories are ordered and the ID column is 1:nrow(df), is it possible to keep the starting position of each category after aggregating? So the ID column would end up as, for example, 1, 3, 4, 7 after collapsing with aggregate. In my case I like aggregate because it works over many columns automatically.

热门帖子

1

Macbook M1 升级 M3 MAX

2

电信疯狂 QoS 香港的下行？

3

🇭🇰关于香港开户和旅游简单但有用的分享（附白拿 HKD 300 方法）

4

家庭 or 个人用的 NAS 有什么可以推荐的吗？

5

远程兼职 Web3 工程师

6

请假理由填写问题

7

[烟台大樱桃] 人生不只有上班一条路，被裁后决定专心转行做水果

8

[记录] 2024-05-04 清晨

9

这大约是独立开发的顶流了吧， v 友们怎么看

10

6 年软开求职国外务工求职比如澳新美等地

热门github

1

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

2

A Windows and Office activator using HWID / Ohook / KMS38 / Online KMS activation methods, with a focus on open-source code and fewer antivirus detections.

3

Get up and running with Llama 2, Mistral, Gemma, and other large language models.

4

该项目可以让你通过订阅的方式使用Cloudflare WARP+，自动获取流量。This project enables you to use Cloudflare WARP+ through subscription, automatically acquiring traffic.

5

Multi functional app to find duplicates, empty folders, similar images etc.

6

Xray panel supporting multi-protocol multi-user expire day & traffic & ip limit (Vmess & Vless & Trojan & ShadowSocks & Wireguard)

7

The Free Software Media System

8

lightweight, standalone C++ inference engine for Google's Gemma models.

9

📚 Freely available programming books

10

A collective list of free APIs

11

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

12

🎓 Path to a free self-taught education in Computer Science!

13

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

14

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

15

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.