我在时间序列数据中有一个带有时间戳,类型,值列的数据框。类型指的是峰还是谷。我想要:
Group all data by consecutive types For groups of "peak" type I want to select the highest For groups if "valley" type I want to select the lowest Filter the dataframe by these highest/lowest Expectation: I would have a dataframe that alternated each row between the highest peak and lowest valley.
The only way I know how to do this is by using a for loop and then adding consecutive values into a vector and then getting the max, then shoving this in a new dataframe and so on.
For those who know python, this is what I did in that (I need to transfer my code to R though):
segmentation['min_v'] = segmentation.groupby( segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(min)
segmentation['max_p'] = segmentation.groupby( segmentation.segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(max)
EDIT
Sample data set:
types <- c('peak', 'peak', 'valley', 'peak', 'valley', 'valley', 'valley')
values <- c(1.01, 1.00, 0.4, 1.2, 0.3, 0.1, 0.2)
segmentation <- data.frame(types, values)
segmentation
expectedTypes <- c('peak', 'valley', 'peak', 'valley')
expectedValues <- c(1.00, 0.4, 1.2, 0.1 )
expectedResult <- data.frame(expectedTypes, expectedValues)
expectedResult
I dont know a better way to generate the data.
使用时R
,一种实现dplyr
方式是将'pv_type'与'pv_type'的逻辑比较的累积和lag
作为分组列,然后将'price' min
和max
'price'作为两个新列
library(dplyr)
segmentation %>%
group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
default = first(pv_type))) %>%
mutate(min_v = min(price), max_p = max(price))
在OP的示例中,预期输出为summarise
d,因此我们使用summarise
代替mutate
。另外,使用rleid
(from data.table
)代替逻辑累积和
library(data.table)
segmentation %>%
group_by(grp = rleid(types)) %>%
summarise(types = first(types), expectedvalues = min(values)) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# types expectedvalues
# <fct> <dbl>
#1 peak 1
#2 valley 0.4
#3 peak 1.2
#4 valley 0.1
%>%的作用/含义是什么?
@ user2330270是chan运算符,它连接用于进一步处理的lhs输出
许多人无法同时掌握两种语言。通过降低投票率,它降低了人们的价值,从而阻止人们回答代码转换问题。的确,OP没有提供可复制的示例,但是代码转换实际上并不需要
好的,谢谢,现在测试您的答案
@ user2330270第二个是摘要输出。在python中,您正在
transform
创建一个新列。另外,示例中的列名也不同