How to group by consecutive rows in a R dataframe?

akrun 2019-07-04 03:18

With R, an implementation using dplyr would be to take the cumulative sum of the logical comparison between the 'pv_type' and the lag of 'pv_type' as a grouping column and then get the min and max of 'price' as two new columns

library(dplyr)
segmentation %>%
       group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
                 default = first(pv_type))) %>%
       mutate(min_v = min(price), max_p = max(price))

Update

With the OP's example, the expected output is summarised, so we use summarise instead of mutate. Also, used rleid (from data.table) instead of the logical cumulative sum

library(data.table)
segmentation %>% 
    group_by(grp = rleid(types)) %>% 
    summarise(types = first(types), expectedvalues = min(values)) %>%
    ungroup %>%
    select(-grp)
# A tibble: 4 x 2
#  types  expectedvalues
# <fct>           <dbl>
#1 peak              1  
#2 valley            0.4
#3 peak              1.2
#4 valley            0.1

Fred Johnson 2019-07-04 00:19:12

What does %>% do/mean?

akrun 2019-07-04 00:20:28

@user2330270 It is chan operator which connects the lhs output to be used for futher processing

akrun 2019-07-04 01:15:47

There are many people who don't have grasp in both languages at the same time. By downvoting, it is preventing people to respond to code conversion questions and there by reducing the value. It is a true that the OP didn't provide a reproducible example, but the code conversion doesn't really require that

Fred Johnson 2019-07-04 02:08:27

ok thanks, having a test of your answer now

akrun 2019-07-04 05:10:54

@user2330270 Second one is summarised output. In the python, you were transforming and creating a new column. Also, the column names were ddifferent in example

Related issues

Filter in rows of all columns with specific conditions

ggplot2 both axis labels inside plot area

Error: could not find function ... in R

Create loading messages that will change based on loading time of plot in a shiny app

cut.default error in heatmap generation R

Problem with apply function in r: it's applied only in the first column

R to create a tally of previous events within a sliding window time period

combine many columns from one dataframe into another dataframe using setDT

Display download button in a tab based on actions in other tabs of a shiny dashboard

Parsing dates in R with weird format