温馨提示:本文翻译自stackoverflow.com,查看原文请点击:其他 - How to group by consecutive rows in a R dataframe?
r

其他 - 如何按R数据帧中的连续行分组?

发布于 2020-03-27 12:05:25

我在时间序列数据中有一个带有时间戳,类型,值列的数据框。类型指的是峰还是谷。我想要:

Group all data by consecutive types For groups of "peak" type I want to select the highest For groups if "valley" type I want to select the lowest Filter the dataframe by these highest/lowest Expectation: I would have a dataframe that alternated each row between the highest peak and lowest valley.

The only way I know how to do this is by using a for loop and then adding consecutive values into a vector and then getting the max, then shoving this in a new dataframe and so on.

For those who know python, this is what I did in that (I need to transfer my code to R though):

segmentation['min_v'] = segmentation.groupby( segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(min)
segmentation['max_p'] = segmentation.groupby( segmentation.segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(max)

EDIT

Sample data set:

types <- c('peak', 'peak', 'valley', 'peak', 'valley', 'valley', 'valley')
values <- c(1.01,   1.00,    0.4,     1.2,     0.3,      0.1,      0.2)
segmentation <- data.frame(types, values)
segmentation

expectedTypes <- c('peak', 'valley', 'peak', 'valley')
expectedValues <- c(1.00, 0.4, 1.2, 0.1 )
expectedResult <- data.frame(expectedTypes, expectedValues)
expectedResult

I dont know a better way to generate the data.

查看更多

查看更多

提问者
Fred Johnson
被浏览
18
akrun 2019-07-04 03:18

使用时R,一种实现dplyr方式是将'pv_type'与'pv_type'的逻辑比较的累积和lag作为分组列,然后将'price' minmax'price'作为两个新列

library(dplyr)
segmentation %>%
       group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
                 default = first(pv_type))) %>%
       mutate(min_v = min(price), max_p = max(price))

更新资料

在OP的示例中,预期输出为summarised,因此我们使用summarise代替mutate另外,使用rleid(from data.table)代替逻辑累积和

library(data.table)
segmentation %>% 
    group_by(grp = rleid(types)) %>% 
    summarise(types = first(types), expectedvalues = min(values)) %>%
    ungroup %>%
    select(-grp)
# A tibble: 4 x 2
#  types  expectedvalues
# <fct>           <dbl>
#1 peak              1  
#2 valley            0.4
#3 peak              1.2
#4 valley            0.1