I have a dataframe with columns TimeStamp, Type, Value in time series data. Type refers to whether it is a peak or valley. I want to:
Group all data by consecutive types For groups of "peak" type I want to select the highest For groups if "valley" type I want to select the lowest Filter the dataframe by these highest/lowest Expectation: I would have a dataframe that alternated each row between the highest peak and lowest valley.
The only way I know how to do this is by using a for loop and then adding consecutive values into a vector and then getting the max, then shoving this in a new dataframe and so on.
For those who know python, this is what I did in that (I need to transfer my code to R though):
segmentation['min_v'] = segmentation.groupby( segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(min)
segmentation['max_p'] = segmentation.groupby( segmentation.segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(max)
EDIT
Sample data set:
types <- c('peak', 'peak', 'valley', 'peak', 'valley', 'valley', 'valley')
values <- c(1.01, 1.00, 0.4, 1.2, 0.3, 0.1, 0.2)
segmentation <- data.frame(types, values)
segmentation
expectedTypes <- c('peak', 'valley', 'peak', 'valley')
expectedValues <- c(1.00, 0.4, 1.2, 0.1 )
expectedResult <- data.frame(expectedTypes, expectedValues)
expectedResult
I dont know a better way to generate the data.
With R
, an implementation using dplyr
would be to take the cumulative sum of the logical comparison between the 'pv_type' and the lag
of 'pv_type' as a grouping column and then get the min
and max
of 'price' as two new columns
library(dplyr)
segmentation %>%
group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
default = first(pv_type))) %>%
mutate(min_v = min(price), max_p = max(price))
With the OP's example, the expected output is summarise
d, so we use summarise
instead of mutate
. Also, used rleid
(from data.table
) instead of the logical cumulative sum
library(data.table)
segmentation %>%
group_by(grp = rleid(types)) %>%
summarise(types = first(types), expectedvalues = min(values)) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# types expectedvalues
# <fct> <dbl>
#1 peak 1
#2 valley 0.4
#3 peak 1.2
#4 valley 0.1
What does %>% do/mean?
@user2330270 It is chan operator which connects the lhs output to be used for futher processing
There are many people who don't have grasp in both languages at the same time. By downvoting, it is preventing people to respond to code conversion questions and there by reducing the value. It is a true that the OP didn't provide a reproducible example, but the code conversion doesn't really require that
ok thanks, having a test of your answer now
@user2330270 Second one is summarised output. In the python, you were
transform
ing and creating a new column. Also, the column names were ddifferent in example